ChatGPT takes conversation to the next level with Advanced Voice Mode

ChatGPT Advanced Voice Mode is Finally Here (For Most of Us!)

Finally – and it has taken SO much longer to get it this side of the Pond – Advanced Voice Mode has popped up in my ChatGPT. And it’s a bit of a mind-blower to say the least.

Multilingually speaking, it’s a huge step up for the platform. For a start, its non-English accents are hugely improved – no longer French or German with an American twang. Furthermore, user language detection seems more reliable, too. Open it up, initiate a conversation in your target language, and it’s ready to go without further fiddling.

But it’s the flexibility and emotiveness of those voices which is the real game-changer. There’s real humanity in those voices, now, reminiscent of Hume’s emotionally aware AI voices. As well as emotion, there’s variation in timbre and speed. What that means for learners is that it’s now possible to get it to mimic slow, deliberate speech when you ask that language learning staple “can you repeat that more slowly, please?”. It makes for a much more adaptive digital conversation partner.

Likewise – and rather incredibly – it’s possible to simulate a whole range of regional accents. I asked for Austrian German, and believe me, it is UNCANNILY good. Granted, it did occasionally verge on parody, but as a general impression, it’s shocking how close it gets. It’s a great way to prepare for speaking your target language with real people, who use real, regionally marked speech.

Advanced Voice Mode, together with its recently added ability to remember details from past conversations (previously achievable only via a hack), is turning ChatGPT into a much cannier language learning assistant. It was certainly worth the wait. And for linguaphiles, it’ll be fascinating to see how it continues to develop as an intelligent conversationalist from here.

Mapping out conversational probabilities - it's much easier with flowcharts.

Vocabulary Flowcharts : Preparing for Probabilities with ChatGPT

The challenge in preparing for a speaking task in the wild is that you’re dealing with multiple permutations. You ask your carefully prepared question, and you get any one of a number of likely responses back. That, in turn, informs your next question or reply, and another one-of-many comebacks follows.

It’s probability roulette.

What if you could map all of these conversational pathways out, though? Flowcharts have long been the logician’s tool of choice for visualising processes that involve forking choices. Combined with generative AI’s penchant for assembling real-world language, we have a recipe for much more dynamic language prep resources than a traditional vocab list.

And, thanks to a ready-made flowchart plugin for ChatGPT – courtesy of the charting folks at Whimsical.com – it’s really easy to knock one together.

Vocabulary Flowcharts in Minutes

In your ChatGPT account, you’ll need to locate the Whimsical GPT. Then, it’s just a case of detailing the conversational scenario you want to map out. Here’s an example for ‘opening a bank account in Germany’:

Create a flowchart detailing different conversational choices and paths in German for the scenario “Opening a bank account as a non-resident of Germany planning to work there for six months.” Include pathways for any problems that might occur in the process. Ensure all the text reflects formal, conversational German.

The result should be a fairly detailed ‘probability map’ of conversational turns:

A 'vocabulary flowchart' in German, created by the Whimsical.com GPT on ChatGPT.

A ‘vocabulary flowchart’ in German, created by the Whimsical.com GPT on ChatGPT.

Vocabulary flowcharts are another tool in your AI arsenal for speaking prep. Have you given them a whirl yet? Tell us about your own prep in the comments!

A robot making clones of its voice - now quick and easy with tools like ElevenLabs.

You, But Fluent – Voice Cloning for Language Learners

I could barely contain my excitement in last week’s post on ElevenLabs’ brilliant text-to-speech voice collection. I’ve had a week of playing around with it now, and if anything, I’m only more enthusiastic about it.

After a bit of deep-delving, it’s the voice clone features that have me hooked right now. ElevenLabs can make a digital version of your voice from just 30 seconds of training speech. And it’s fast. I expected a bit of a wait for audio processing the first time I used it. But no – after reading in a couple of passages of sample text, my digital TTS voice was ready to use within seconds.

For a quick ‘n’ easy tool, it does a brilliant job of picking up general accent. It identified mine as British English, captured most of my Midlands features (it struggled with my really low u in bus, though – maybe more training would help), and it got my tone bang on. Scarily so… I can understand why cybersecurity pundits are slightly nervous about tech like this.

Your Voice, Another Language

The most marvellous thing, though, was using my voice to read foreign language texts. Although not 100% native-sounding – the voice was trained on me reading English, of course – it’s uncannily accurate. Listening to digital me reading German text, I’d say it sounds like a native-ish speaker. Perhaps someone who’s lived in Germany for a decade, and retains a bit of non-native in their speech.

But as far as models go, that’s a pretty high standard for any language learner.

ElevenLabs' TTS interface with the custom voice 'Richard' selected.

ElevenLabs’ TTS interface with the custom voice ‘Richard’ selected, ready to read some German.

The crux of it is that you can have your voice reading practice passages for memory training (think: island technique). There’s an amazing sense of personal connect that comes from that – that’s what you will sound like, when you’ve mastered this.

It also opens up the idea for tailoring digital resources with sound files read by ‘you’. Imagine a set of interactive language games for students, where the voice is their teacher’s. Incredible stuff.

In short, it’s well worth the fiver-a-month starter subscription to play around with it.

A robot reading a script. The text-to-speech voices at ElevenLabs certainly sound intelligent as well as natural!

ElevenLabs Voices for Free, Custom Language-Learning Material

There’s been a lot on the grapevine of late about AI-powered leaps forward in text-to-speech voices. From providing accent models to in-depth speaking games, next-gen TTS is poised to have a huge impact on language learning.

The catch? Much of the brand new tech isn’t available to the average user-on-the-street yet.

That’s why I was thrilled to happen across TTS service ElevenLabs recently. ElevenLabs’ stunning selection of voices powers a number of eLearning and audiobook sites already, and it’s no hype to say that they sound as close to human as you can get right now.

Even better, you can sign up for a free account that gives you 10,000 characters of text-to-speech conversion each month. For $5 a month you can up that to 30,000 characters too, as well as access voice-cloning features. Just imagine the hours of fun if you want to hear ‘yourself’ speak any number of languages!

Using ElevenLabs in Your Own Learning

There’s plenty to do for free, though. For instance, if you enjoy the island technique in your learning, you can get ElevenLabs to record your passages for audio practice / rote memorising. I make this an AI double-whammy, using ChatGPT to help prepare my topical ‘islands’ before pasting them into ElevenLabs.

The ChatGPT > ElevenLabs workflow is also brilliant for dialogue modelling. On my recent Sweden trip, I knew that a big conversational contact point would be ordering at coffee shops. This is the prompt I used to get a cover-all-bases model coffee-shop convo:

Create a comprehensive model dialogue in Swedish to help me learn and practise for the situation “ordering coffee in a Malmö coffee shop”.

Try to include the language for every eventuality / question I might be asked by the coffee shop employee. Ensure that the language is colloquial and informal, and not stilted.

The output will be pasted into a text-to-speech generator, so don’t add speaker names to the dialogue lines – just a dash will suffice to indicate a change of speaker.

I then ran off the audio file with ElevenLabs, and hey presto! Custom real-world social prep. You can’t specify different voices in the same file, of course. But you could run off the MP3 twice, in different voices, then splice it up manually in an audio editor like Audacity for the full dialogue effect. Needless to say, it’s also a great way for teachers to make custom listening activities.

The ElevenLabs voices are truly impressive – it’s worth setting up a free account just to play with the options and come up with your own creative use cases. TTS is set to only get better in the coming months – we’re excited to see where it leads!

A robot interviewing another robot - a great speaking game on ChatGPT!

So Interview Me! Structured Speaking with ChatGPT

The addition of voice chat mode to ChatGPT – soon available even to free users in an impressive, all-new format – opens up tons of possibilities for AI speaking practice. When faced with it for the first time, however, learners can find that it’s all a bit undirected and woolly. To make the most of it for targeted speaking practice, it needs some nudging with prompts.

Since AI crashed into the language learning world, the prompt bank has filled with ways to prime your chatbot for more effective speaking practice and prep. But there’s one activity I’ve been using lately that offers both structure, tailored to your level and topic, and a lot of fun. I call it So Interview Me!, and it involves you playing an esteemed expert on a topic of your choice, with ChatGPT as the prime-time TV interviewer.

So Interview Me!

Here’s an example you can paste into ChatGPT Plus straight away (as text first, then switching to voice mode after the initial response):

Let’s role-play so I can practise my Swedish with you. You play the role of a TV interviewer on a news programme. I play an esteemed expert on the topic of ‘the history of Eurovision’. Conversational turn by turn, interview me in the target language all about the topic. Don’t add any translations or other directions – you play the interviewer and no other role. Wind up the interview after about 15 turns. Keep the language quite simple, around level B1 on the CEFR scale. Are you ready? Start off by introducing me and asking the first question!

The fun of it is that you are the star of the show. You can completely throw yourself into it, interacting with your interviewer with all the gusto and gumption of a true expert. Or you can have some fun with it, throwing it off with silly answers and bending the scenario to your will (maybe you turn out not to be the expert!).

Either way, it’s a brilliant one to wind up and set going before you start the washing up!

A musical, emotive robot. OpenAI's new model GPT-4o will make digital conversations even more natural.

GPT-4o – OpenAI Creates A Perfect Fit For Language Learners

Just a couple of weeks after the excitement around Hume.ai, OpenAI has joined the emotive conversational bandwagon with a stunning new release of its GPT-4o model.

GPT-4o is a big deal for language learners because it is multimodal in much more powerful ways than previous models. It interacts with the world more naturally across text, audio and vision in ways that mimic our own interactions with language speakers. Demos have included the model reacting to the speaker’s appearance and expression, opening a path to more realistic digital conversation practice than ever.

As with Hume, its voice capabilities have been updated with natural-sounding emotion and intonation, along with a deeper understanding of the speaker’s tone. It even does a better job at sarcasm and irony, long the exclusive domain of human speakers. Heck, it can even sing now. Vocal, emotional nuance – at least simulated – does seem to be the latest big leap forward in AI, transforming the often rather staid conversations into something uncannily humanlike. And as with many of these developments, it almost feels like it was made with us linguists in mind.

Perhaps surprisingly, there’s no wait to try the new model this time, at least in text mode. OpenAI have rolled it out almost immediately, including to free users. That suggests a quite confidence in how impressed users will be with it.

As for the multimodal capabilities, we’ll have to wait a little longer, unfortunately – chat updates are being propagated more gradually, although you may already the next time you open chat mode, you may already get the message that big changes are coming. Definitely a case of watch this space – and I don’t know about you, but I’m already impatiently refreshing my ChatGPT app with increasing frequency!

A picture of a robot heart - conversation with emotion with Hume.ai

Conversation practice with emotion : Meet Hume.ai

If the socials are anything to go by, so many of us language learners are already using AI platforms for conversation practice – whether text-typed, or spoken with speech-enabled platforms like ChatGPT.

Conversational interaction is something that LLMs – large language models – were created for. In fact, language learning and teaching seem like an uncannily good fit for AI. It’s almost like it was made for us.

But there’s one thing that’s been missing up to now – emotional awareness. In everyday conversation with other humans, we use a range of cues to gauge our speaking partner’s attitude, intentions and general mood. AI – even when using speech recognition and text-to-speech – is flat by comparison. It can only simulate true conversational interplay.

A new LLM is set to change all that. Hume.ai has empathy built-in. It uses vocal cues to determine the probable mindset of the speaker for each utterance. For each input, it selects a set of human emotions, and weights them. For instance, it might decide that what you said was 60% curious, 40% anxious and 20% proud. Then, mirroring that, it replies with an appropriate intonation and flex.

The platform already supports over 50 languages. You can try out a demo in English here, and prepare to be impressed – its guesses can be mind-bogglingly spot-on. Although it’s chiefly for developer access right now, the potential usefulness to language learning is so clear that we should hopefully see the engine popping up in language platforms in the near future!

Foreign alphabet soup (image generated by AI)

AI Chat Support for Foreign Language Alphabets

I turn to AI first and foremost for content creation, as it’s so good at creating model foreign language texts. But it’s also a pretty good conversational tool for language learners.

That said, one of the biggest obstacles to using LLMs like ChatGPT for conversational practice can be an unfamiliar script. Ask it to speak Arabic, and you’ll get lots of Arabic script. It’s usually smart enough to work out if you’re typing back using Latin characters, but it’ll likely continue to speak in script.

Now, it’s easy enough to ask your AI platform of choice to transliterate everything into Latin characters, and expect the same from you – simply instruct it to do so in your prompts. But blanket transliteration won’t help your development of native reading and writing skills. There’s a much better best of both worlds way that does.

Best of Both Worlds AI Chat Prompt

This prompt sets up a basic conversation environment. The clincher is that is give you the option to write in script  or not. And if not, you’ll get what script should look like modelled right back at you. It’s a great way to jump into conversation practice even before you’re comfortable switching keyboard layouts.

You are a Modern Greek language teacher, and you are helping me to develop my conversational skills in the language at level A2 (CEFR). Always keep the language short and simple at the given level, and always keep the conversation going with follow-up questions.

I will often type in transliterated Latin script, as I am still learning the target language alphabet. Rewrite all of my responses correctly in the target language script with any necessary grammatical corrections.

Similarly, write all of your own responses both in the target language script and also a transliteration in Latin characters. For instance,

Καλημέρα σου!
Kaliméra sou!

Do NOT give any English translations – the only support for me will be transliterations of the target language.

Let’s start off the conversation by talking about the weather.

This prompt worked pretty reliably in ChatGPT-4, Claude, Copilot, and Gemini. The first two were very strong; the latter two occasionally forget the don’t translate! instruction, but otherwise, script support – the name of the game here – was good throughout.

Try changing the language (top) and topic (bottom) to see what it comes up with!

 

A tray of medals for the IBSA Games 2023 Tennis. Volunteering at international events is a great way to practise your languages!

Volunteering for Team Languages

I almost didn’t make my deadline (albeit self-imposed) for today’s post. I’ve spent a week volunteering with V.I. tennis at the IBSA Games in Birmingham, and I’ve only just packed up my uniform for the last time as the sun is setting on Edgbaston Priory.

It’s been six days of sweaty, hard and sometimes challenging work, but six unforgettable days of incredible experiences too. Not least of those is the great opportunity to use foreign languages – both my stronger, weaker and almost non-existent ones (my three words of Lithuanian, I’m looking at you). The IBSA Games being together athletes from over 70 countries, so it’s not hard to find someone, somewhere, who speaks something you know.

International events are such a perfect match for linguistically-minded volunteers. And that’s not just the social butterflies amongst us. Meeting, speaking and helping is golden experience for anyone fighting (as I do) with a natural shyness. It offers a good level of self-challenge, but with the safety net of structured interaction in short, manageable bursts. I call it people practice, and it’s worked wonders for my own particular flavour of social awkwardness!

It’s also an opportunity to enjoy the serendipity of polyglot opportunities. Nothing ‘in the wild’ is ever predictable, and that can throw language learners off when we throw ourselves deliberately, and often over-expectantly, into a single target language setting. On an international volunteering gig, you simply don’t know what will come your way. It might be your favourite language; it could be one you haven’t touched for years, and never thought you’d use again. It’s a case of let the opportunity come to you – and you’ll be nimbler of conversation for it. Personally, I never expected to speak as much Polish as I did this week.

If you at all curious to try it out, check out the NCVO or equivalent in your country. Also, keep an ear to the ground for big events happening locally. The best leads are often by simple word of mouth.

Volunteering is massively rewarding, in so many ways. It really is the ultimate in giving something of yourself in order to grow, as a linguist – and otherwise.

The French flag flying in front of a town hall. Parlez-vous français ou anglais?

Désolé, je suis anglais…

Désolé, je ne comprends pas, je suis anglais…

Words of shame from any self-identifying polyglot. Nonetheless, I found myself stuttering them out in a crammed Paris branch of fnac on a Saturday afternoon, befuddled and bewildered by a particularly opaque queuing system. A harassed and exhausted assistant had muttered some question that went totally over my head in the mêlée, and flustered, I admitted defeat.

Luckily, a very kind fellow shopper overheard the confusion, and stepped in with a simplified and friendly “carte bancaire?“. The kindness was especially benevolent since my saviour didn’t immediately switch to English – the ultimate polyglot shame. What a considerate way to help, I thought – to support my use of the language, rather than my failure in it.

Un coup anglais

In any case, the breach of flow did  bruise my ego a little. That’s despite an insistence that French is my low stakes language, my weak ‘extra’ that I’m happy to just get by in. I shouldn’t really care. But still, why didn’t I reach for support phrases instead, a polite “pardon?” or “répétez-vous, s’il vous plaît“? And most of all, why, blurt out my nationality, as if it were some excuse for not understanding French properly? It’s like the biggest faux pas in the book.

The fact is, when there are multiple distractions in the heat of the moment, brains do struggle. It’s completely normal. We reach for whatever is easiest, whatever bridges the gap most quickly. But, as I’ve said many times, beating yourself up about it is an equally poor language learning strategy. What is a good strategy is spotting when you do err towards self-flagellation, and employing a bit of self-kindness and consideration out ‘in the field’.

Regroup, recharge

So what did I do after this particular stumble?

I found a branch of Paul – an eatery where I know my French will work more than decently – and treated myself, en français, to a coffee and pastry. Basic stuff, but it topped my confidence levels back up, and made me appreciate how situational conditions are as much, if not more, responsible for our missteps as any lack of knowledge.

And, by the time I took my seat at Matt Pokora’s fabulous 20 years concert, I was gallicising with the best of them again. You should have seen me mouthing along to Tombé like a native (or perhaps rather like the reluctant churchgoer struggling to remember the hymns).

It’s appropriate that Matt took his last name from the Polish for humility, and practising that – at least acknowledging that we are all fallible – is no bad thing for a polyglot.