Neon robots racing. Can Claude 3 win the AI race with its brand new set of models?

Claude 3 – the New AI Models Putting Anthropic Back in the Game

You’d be forgiven for not knowing Claude. This chirpily-named AI assistant from Anthropic has been around for a while, like its celebrity cousin ChatGPT. But while ChatGPT hit the big time, Claude hasn’t quite progressed beyond the Other Platforms heading in most AI presentations – until now.

What changed everything this month was Anthropic’s release of all-new Claude 3 models – models that not only caught up with ChatGPT-4 benchmarks, but surpassed them. It’s wise to take benchmarks with a pinch of salt, not least because they’re often internal, proprietary measures. But the buzz around this latest release echoed through the newsletters, podcasts and socials, suggesting that this really was big news.

Tiers of a Claude

Claude 3 comes in three flavours. The most powerful, Opus, is the feistiest ChatGPT-beater by far. It’s also, understandably, the most processor-intensive, so available only as a premium option. That cost is on a level with competitors’ premium offerings, at just under £20 a month.

But just a notch beneath Opus, we have Sonnet. That’s Claude 3’s mid-range model, and the one you’ll chat with for free at https://claude.ai/chats. Anthropic reports that Sonnet still pips ChatGPT-4 on several reasoning benchmarks, with users praising how naturally conversational it seems.

Finally, we have a third tier, Haiku. This is the most streamlined of the three in terms of computing power. But it still manages to trounce ChatGPT-3.5 while coming impressively close to most of those ChatGPT-4 benchmarks. And the real clincher?

It’s cheap.

Haiku costs a fraction of the price per token of competing models to developers. That means it’s a lot cheaper to build it into language learning apps, opening up a route for many to incorporate AI into their software. That lower power usage too is a huge win against a backdrop of serious concerns around AI energy demands.

Claude and Content Creation

So how does it measure up in terms of language learning content? I set Claude’s Sonnet model loose on the sample prompt from my recent Gemini Advanced vs. ChatGPT-4 battle. And the verdict?

It more than holds its own.

Here’s the prompt (feel free to adapt and use this for your own worksheets – it creates some lovely materials!):

Create an original, self-contained French worksheet for students of the language who are around level A2 on the CEFR scale. The topic of the worksheet is “Reality TV in France“.

The worksheet format is as follows:

– An engaging introductory text (400 words) using clear and idiomatic language
– Glossary of 10 key words / phrases from the text (ignore obvious cognates with English) in table format
– Reading comprehension quiz on the text (5 questions)
– Gap-fill exercise recycling the same vocabulary and phrases in a different order (10 questions)
– ‘Talking about it’ section with useful phrases for expressing opinions on the topic
– A model dialogue (10-12 lines) between two people discussing the topic
– A set of thoughtful questions to spark further dialogue on the topic
– An answer key covering all the questions

Ensure the language is native-speaker quality and error-free.

Sonnet does an admirable job. If I’m nitpicking, the text is perhaps slightly less fun and engaging than Gemini Advanced. But then, that’s the sort of thing you could sort out by tweaking the prompt.

Otherwise, it’s factual and relevant, with some nice authentic cultural links. The questions make sense and the activities are useful. Claude also followed instructions closely, particularly with the inclusion of an answer key (so often missing in lesser models).

There’s little to quibble over here.

A language learning worksheet created with Claude 3 Sonnet.

A Claude 3 French worksheet. Click here to download the PDF!

Another Tool For the Toolbox

The claims around Claude 3 are certainly exciting. And they have substance – even the free Sonnet model available at https://claude.ai/chats produces content on a par with the big hitters. Although our focus here is worksheet creation, its conversational slant makes it a great option for experimenting with live AI language games, too.

So if you haven’t had a chance yet, go and get acquainted with Claude. Its all-new model set, including a fabulous free option, makes it one more essential tool in the teacher’s AI toolbox.

Two AI robots squaring up to each other

AI Worksheet Wars : Google Gemini Advanced vs. ChatGPT-4

With this week’s release of Gemini Advanced, Google’s latest, premium AI model, we have another platform for language learning content creation.

Google fanfares Gemini as the “most capable AI model” yet, releasing benchmark results that position it as a potential ChatGPT-4 beater. Significantly, Google claims that their new top model even outperforms humans at some language-based benchmarking.

So what do those improvements hold for language learners? I decided to put Gemini Advanced head-to-head with the leader to date, ChatGPT-4, to find out. I used the following prompt on both ChatGPT-4 and Gemini Advanced to create a topic prep style worksheet like those I use before lessons. A target language text, vocab support, and practice questions – perfect topic prep:

Create an original, self-contained French worksheet for students of the language who are around level A2 on the CEFR scale. The topic of the worksheet is “Reality TV in France“.

The worksheet format is as follows:

– An engaging introductory text (400 words) using clear and idiomatic language
– Glossary of 10 key words / phrases from the text (ignore obvious cognates with English) in table format
– Reading comprehension quiz on the text (5 questions)
– Gap-fill exercise recycling the same vocabulary and phrases in a different order (10 questions)
– ‘Talking about it’ section with useful phrases for expressing opinions on the topic
– A model dialogue (10-12 lines) between two people discussing the topic
– A set of thoughtful questions to spark further dialogue on the topic
– An answer key covering all the questions

Ensure the language is native-speaker quality and error-free.

I then laid out the results, with minimal extra formatting, in PDF files (much as I’d use them for my own learning).

Here are the results.

ChatGPT-4

ChatGPT-4, gives solid results, much as expected. I’d been using that platform for my own custom learning content for a while, and it’s both accurate dependable.

The introductory text referenced the real-world topic links very well, albeit a little dry in tone. The glossary was reasonable, although ChatGPT-4 had, as usual, problems leaving out “obvious cognates” as per the prompt instructions. It’s a problem I’ve noticed often, with other LLMs too – workarounds are often necessary to fix these biases.

Likewise, the gap-fill was not “in a different order”, as prompted (and again, exposing a weakness of most LLMs). The questions are in the same order as the glossary entries they refer to!

Looking past those issues – which we could easily correct manually, in any case – the questions were engaging and sensible. Let’s give ChatGPT-4 a solid B!

A French worksheet on Reality TV, created by AI platform ChatGPT-4.

You can download the ChatGPT-4 version of the worksheet from this link.

Gemini Advanced

And onto the challenger! I must admit, I wasn’t expecting to see huge improvements here.

But instantly, I prefer the introductory text. It’s stylistically more interesting; it’s just got the fact that I wanted it to be “engaging”. It’s hard to judge reliably, but I also think it’s closer to a true CEFR A2 language level. Compare it with the encyclopaedia-style ChatGPT-4 version, and it’s more conversational, and certainly more idiomatic.

That attention to idiom is apparent in the glossary, too. There’s far less of that cognate problem here, making for a much more practical vocab list. We have some satisfyingly colloquial phrasal verbs that make me feel that I’m learning something new.

And here’s the clincher: Gemini Advanced aced the randomness test. While the question quality matched ChatGPT-4, the random delivery means the output is usable off the bat. I’m truly impressed by that.

A French worksheet on Reality TV, created by Google's premium AI platform, Gemini Advanced.

You can download the Gemini Advanced version of the worksheet from this link.

Which AI?

After that storming performance by Gemini Advanced, you might expect my answer to be unqualified support for that platform. And, content-wise, I think it did win, hands down. The attention to the nuance of my prompt was something special, and the texts are just more interesting to work with. Big up for creativity.

That said, repeated testing of the prompt did throw up the occasional glitch. Sometimes, it would fail to output the answers, instead showing a cryptic “Answers will follow.” or similar, requiring further prompting. Once or twice, the service went down, too, perhaps a consequence of huge traffic during release week. They’re minor things for the most part, and I expect Google will be busy ironing them out over coming months.

Nonetheless, the signs are hugely promising, and it’s up to ChatGPT-4 now to come back with an even stronger next release. I’ll be playing around with Gemini Advanced a lot in the next few weeks – I really recommend that other language learners and teachers give it a look, too!

If you want to try Google’s Gemini Advanced, there’s a very welcome two-month free trial. Simply head to Gemini to find out more!