A robot making clones of its voice - now quick and easy with tools like ElevenLabs.

You, But Fluent – Voice Cloning for Language Learners

I could barely contain my excitement in last week’s post on ElevenLabs’ brilliant text-to-speech voice collection. I’ve had a week of playing around with it now, and if anything, I’m only more enthusiastic about it.

After a bit of deep-delving, it’s the voice clone features that have me hooked right now. ElevenLabs can make a digital version of your voice from just 30 seconds of training speech. And it’s fast. I expected a bit of a wait for audio processing the first time I used it. But no – after reading in a couple of passages of sample text, my digital TTS voice was ready to use within seconds.

For a quick ‘n’ easy tool, it does a brilliant job of picking up general accent. It identified mine as British English, captured most of my Midlands features (it struggled with my really low u in bus, though – maybe more training would help), and it got my tone bang on. Scarily so… I can understand why cybersecurity pundits are slightly nervous about tech like this.

Your Voice, Another Language

The most marvellous thing, though, was using my voice to read foreign language texts. Although not 100% native-sounding – the voice was trained on me reading English, of course – it’s uncannily accurate. Listening to digital me reading German text, I’d say it sounds like a native-ish speaker. Perhaps someone who’s lived in Germany for a decade, and retains a bit of non-native in their speech.

But as far as models go, that’s a pretty high standard for any language learner.

ElevenLabs' TTS interface with the custom voice 'Richard' selected.

ElevenLabs’ TTS interface with the custom voice ‘Richard’ selected, ready to read some German.

The crux of it is that you can have your voice reading practice passages for memory training (think: island technique). There’s an amazing sense of personal connect that comes from that – that’s what you will sound like, when you’ve mastered this.

It also opens up the idea for tailoring digital resources with sound files read by ‘you’. Imagine a set of interactive language games for students, where the voice is their teacher’s. Incredible stuff.

In short, it’s well worth the fiver-a-month starter subscription to play around with it.

A robot reading a script. The text-to-speech voices at ElevenLabs certainly sound intelligent as well as natural!

ElevenLabs Voices for Free, Custom Language-Learning Material

There’s been a lot on the grapevine of late about AI-powered leaps forward in text-to-speech voices. From providing accent models to in-depth speaking games, next-gen TTS is poised to have a huge impact on language learning.

The catch? Much of the brand new tech isn’t available to the average user-on-the-street yet.

That’s why I was thrilled to happen across TTS service ElevenLabs recently. ElevenLabs’ stunning selection of voices powers a number of eLearning and audiobook sites already, and it’s no hype to say that they sound as close to human as you can get right now.

Even better, you can sign up for a free account that gives you 10,000 characters of text-to-speech conversion each month. For $5 a month you can up that to 30,000 characters too, as well as access voice-cloning features. Just imagine the hours of fun if you want to hear ‘yourself’ speak any number of languages!

Using ElevenLabs in Your Own Learning

There’s plenty to do for free, though. For instance, if you enjoy the island technique in your learning, you can get ElevenLabs to record your passages for audio practice / rote memorising. I make this an AI double-whammy, using ChatGPT to help prepare my topical ‘islands’ before pasting them into ElevenLabs.

The ChatGPT > ElevenLabs workflow is also brilliant for dialogue modelling. On my recent Sweden trip, I knew that a big conversational contact point would be ordering at coffee shops. This is the prompt I used to get a cover-all-bases model coffee-shop convo:

Create a comprehensive model dialogue in Swedish to help me learn and practise for the situation “ordering coffee in a Malmö coffee shop”.

Try to include the language for every eventuality / question I might be asked by the coffee shop employee. Ensure that the language is colloquial and informal, and not stilted.

The output will be pasted into a text-to-speech generator, so don’t add speaker names to the dialogue lines – just a dash will suffice to indicate a change of speaker.

I then ran off the audio file with ElevenLabs, and hey presto! Custom real-world social prep. You can’t specify different voices in the same file, of course. But you could run off the MP3 twice, in different voices, then splice it up manually in an audio editor like Audacity for the full dialogue effect. Needless to say, it’s also a great way for teachers to make custom listening activities.

The ElevenLabs voices are truly impressive – it’s worth setting up a free account just to play with the options and come up with your own creative use cases. TTS is set to only get better in the coming months – we’re excited to see where it leads!

A robot interviewing another robot - a great speaking game on ChatGPT!

So Interview Me! Structured Speaking with ChatGPT

The addition of voice chat mode to ChatGPT – soon available even to free users in an impressive, all-new format – opens up tons of possibilities for AI speaking practice. When faced with it for the first time, however, learners can find that it’s all a bit undirected and woolly. To make the most of it for targeted speaking practice, it needs some nudging with prompts.

Since AI crashed into the language learning world, the prompt bank has filled with ways to prime your chatbot for more effective speaking practice and prep. But there’s one activity I’ve been using lately that offers both structure, tailored to your level and topic, and a lot of fun. I call it So Interview Me!, and it involves you playing an esteemed expert on a topic of your choice, with ChatGPT as the prime-time TV interviewer.

So Interview Me!

Here’s an example you can paste into ChatGPT Plus straight away (as text first, then switching to voice mode after the initial response):

Let’s role-play so I can practise my Swedish with you. You play the role of a TV interviewer on a news programme. I play an esteemed expert on the topic of ‘the history of Eurovision’. Conversational turn by turn, interview me in the target language all about the topic. Don’t add any translations or other directions – you play the interviewer and no other role. Wind up the interview after about 15 turns. Keep the language quite simple, around level B1 on the CEFR scale. Are you ready? Start off by introducing me and asking the first question!

The fun of it is that you are the star of the show. You can completely throw yourself into it, interacting with your interviewer with all the gusto and gumption of a true expert. Or you can have some fun with it, throwing it off with silly answers and bending the scenario to your will (maybe you turn out not to be the expert!).

Either way, it’s a brilliant one to wind up and set going before you start the washing up!

A musical, emotive robot. OpenAI's new model GPT-4o will make digital conversations even more natural.

GPT-4o – OpenAI Creates A Perfect Fit For Language Learners

Just a couple of weeks after the excitement around Hume.ai, OpenAI has joined the emotive conversational bandwagon with a stunning new release of its GPT-4o model.

GPT-4o is a big deal for language learners because it is multimodal in much more powerful ways than previous models. It interacts with the world more naturally across text, audio and vision in ways that mimic our own interactions with language speakers. Demos have included the model reacting to the speaker’s appearance and expression, opening a path to more realistic digital conversation practice than ever.

As with Hume, its voice capabilities have been updated with natural-sounding emotion and intonation, along with a deeper understanding of the speaker’s tone. It even does a better job at sarcasm and irony, long the exclusive domain of human speakers. Heck, it can even sing now. Vocal, emotional nuance – at least simulated – does seem to be the latest big leap forward in AI, transforming the often rather staid conversations into something uncannily humanlike. And as with many of these developments, it almost feels like it was made with us linguists in mind.

Perhaps surprisingly, there’s no wait to try the new model this time, at least in text mode. OpenAI have rolled it out almost immediately, including to free users. That suggests a quite confidence in how impressed users will be with it.

As for the multimodal capabilities, we’ll have to wait a little longer, unfortunately – chat updates are being propagated more gradually, although you may already the next time you open chat mode, you may already get the message that big changes are coming. Definitely a case of watch this space – and I don’t know about you, but I’m already impatiently refreshing my ChatGPT app with increasing frequency!

A picture of a robot heart - conversation with emotion with Hume.ai

Conversation practice with emotion : Meet Hume.ai

If the socials are anything to go by, so many of us language learners are already using AI platforms for conversation practice – whether text-typed, or spoken with speech-enabled platforms like ChatGPT.

Conversational interaction is something that LLMs – large language models – were created for. In fact, language learning and teaching seem like an uncannily good fit for AI. It’s almost like it was made for us.

But there’s one thing that’s been missing up to now – emotional awareness. In everyday conversation with other humans, we use a range of cues to gauge our speaking partner’s attitude, intentions and general mood. AI – even when using speech recognition and text-to-speech – is flat by comparison. It can only simulate true conversational interplay.

A new LLM is set to change all that. Hume.ai has empathy built-in. It uses vocal cues to determine the probable mindset of the speaker for each utterance. For each input, it selects a set of human emotions, and weights them. For instance, it might decide that what you said was 60% curious, 40% anxious and 20% proud. Then, mirroring that, it replies with an appropriate intonation and flex.

The platform already supports over 50 languages. You can try out a demo in English here, and prepare to be impressed – its guesses can be mind-bogglingly spot-on. Although it’s chiefly for developer access right now, the potential usefulness to language learning is so clear that we should hopefully see the engine popping up in language platforms in the near future!

Foreign alphabet soup (image generated by AI)

AI Chat Support for Foreign Language Alphabets

I turn to AI first and foremost for content creation, as it’s so good at creating model foreign language texts. But it’s also a pretty good conversational tool for language learners.

That said, one of the biggest obstacles to using LLMs like ChatGPT for conversational practice can be an unfamiliar script. Ask it to speak Arabic, and you’ll get lots of Arabic script. It’s usually smart enough to work out if you’re typing back using Latin characters, but it’ll likely continue to speak in script.

Now, it’s easy enough to ask your AI platform of choice to transliterate everything into Latin characters, and expect the same from you – simply instruct it to do so in your prompts. But blanket transliteration won’t help your development of native reading and writing skills. There’s a much better best of both worlds way that does.

Best of Both Worlds AI Chat Prompt

This prompt sets up a basic conversation environment. The clincher is that is give you the option to write in script  or not. And if not, you’ll get what script should look like modelled right back at you. It’s a great way to jump into conversation practice even before you’re comfortable switching keyboard layouts.

You are a Modern Greek language teacher, and you are helping me to develop my conversational skills in the language at level A2 (CEFR). Always keep the language short and simple at the given level, and always keep the conversation going with follow-up questions.

I will often type in transliterated Latin script, as I am still learning the target language alphabet. Rewrite all of my responses correctly in the target language script with any necessary grammatical corrections.

Similarly, write all of your own responses both in the target language script and also a transliteration in Latin characters. For instance,

Καλημέρα σου!
Kaliméra sou!

Do NOT give any English translations – the only support for me will be transliterations of the target language.

Let’s start off the conversation by talking about the weather.

This prompt worked pretty reliably in ChatGPT-4, Claude, Copilot, and Gemini. The first two were very strong; the latter two occasionally forget the don’t translate! instruction, but otherwise, script support – the name of the game here – was good throughout.

Try changing the language (top) and topic (bottom) to see what it comes up with!

 

A tray of medals for the IBSA Games 2023 Tennis. Volunteering at international events is a great way to practise your languages!

Volunteering for Team Languages

I almost didn’t make my deadline (albeit self-imposed) for today’s post. I’ve spent a week volunteering with V.I. tennis at the IBSA Games in Birmingham, and I’ve only just packed up my uniform for the last time as the sun is setting on Edgbaston Priory.

It’s been six days of sweaty, hard and sometimes challenging work, but six unforgettable days of incredible experiences too. Not least of those is the great opportunity to use foreign languages – both my stronger, weaker and almost non-existent ones (my three words of Lithuanian, I’m looking at you). The IBSA Games being together athletes from over 70 countries, so it’s not hard to find someone, somewhere, who speaks something you know.

International events are such a perfect match for linguistically-minded volunteers. And that’s not just the social butterflies amongst us. Meeting, speaking and helping is golden experience for anyone fighting (as I do) with a natural shyness. It offers a good level of self-challenge, but with the safety net of structured interaction in short, manageable bursts. I call it people practice, and it’s worked wonders for my own particular flavour of social awkwardness!

It’s also an opportunity to enjoy the serendipity of polyglot opportunities. Nothing ‘in the wild’ is ever predictable, and that can throw language learners off when we throw ourselves deliberately, and often over-expectantly, into a single target language setting. On an international volunteering gig, you simply don’t know what will come your way. It might be your favourite language; it could be one you haven’t touched for years, and never thought you’d use again. It’s a case of let the opportunity come to you – and you’ll be nimbler of conversation for it. Personally, I never expected to speak as much Polish as I did this week.

If you at all curious to try it out, check out the NCVO or equivalent in your country. Also, keep an ear to the ground for big events happening locally. The best leads are often by simple word of mouth.

Volunteering is massively rewarding, in so many ways. It really is the ultimate in giving something of yourself in order to grow, as a linguist – and otherwise.

The French flag flying in front of a town hall. Parlez-vous français ou anglais?

Désolé, je suis anglais…

Désolé, je ne comprends pas, je suis anglais…

Words of shame from any self-identifying polyglot. Nonetheless, I found myself stuttering them out in a crammed Paris branch of fnac on a Saturday afternoon, befuddled and bewildered by a particularly opaque queuing system. A harassed and exhausted assistant had muttered some question that went totally over my head in the mêlée, and flustered, I admitted defeat.

Luckily, a very kind fellow shopper overheard the confusion, and stepped in with a simplified and friendly “carte bancaire?“. The kindness was especially benevolent since my saviour didn’t immediately switch to English – the ultimate polyglot shame. What a considerate way to help, I thought – to support my use of the language, rather than my failure in it.

Un coup anglais

In any case, the breach of flow did  bruise my ego a little. That’s despite an insistence that French is my low stakes language, my weak ‘extra’ that I’m happy to just get by in. I shouldn’t really care. But still, why didn’t I reach for support phrases instead, a polite “pardon?” or “répétez-vous, s’il vous plaît“? And most of all, why, blurt out my nationality, as if it were some excuse for not understanding French properly? It’s like the biggest faux pas in the book.

The fact is, when there are multiple distractions in the heat of the moment, brains do struggle. It’s completely normal. We reach for whatever is easiest, whatever bridges the gap most quickly. But, as I’ve said many times, beating yourself up about it is an equally poor language learning strategy. What is a good strategy is spotting when you do err towards self-flagellation, and employing a bit of self-kindness and consideration out ‘in the field’.

Regroup, recharge

So what did I do after this particular stumble?

I found a branch of Paul – an eatery where I know my French will work more than decently – and treated myself, en français, to a coffee and pastry. Basic stuff, but it topped my confidence levels back up, and made me appreciate how situational conditions are as much, if not more, responsible for our missteps as any lack of knowledge.

And, by the time I took my seat at Matt Pokora’s fabulous 20 years concert, I was gallicising with the best of them again. You should have seen me mouthing along to Tombé like a native (or perhaps rather like the reluctant churchgoer struggling to remember the hymns).

It’s appropriate that Matt took his last name from the Polish for humility, and practising that – at least acknowledging that we are all fallible – is no bad thing for a polyglot.

You can teach an old dog new tricks! Image from freeimages.com

Old Dog, New Tricks

Have you ever learnt a new trick in your target language, and promptly gone to town with it, trying to crowbar it into every conversation?

It’s the excitable puppy incarnation of the old use it or lose it adage. You might call it use it… and use it… and use it. The trait isn’t uncommon amongst students of languages – or otherwise –  when there’s a particularly passionate connection to the subject.

For instance, I know one wee chap who will excitedly regurgitate new dinosaur facts ad infinitum to his very patient parents. My own not-so-little brother will hold me hostage to myriad beekeeping facts (his latest fad) when I visit of late. And I, myself, will bore my own friends rigid with newfound oddities of grammar and etymology. (No, the behaviour doesn’t wane with age!)

It really is one of the joys of learning to (over)share your new skills.

How’s Tricks?

It’s in my mind recently thanks to a bit of Gaelic new tricks magic I’ve learnt. Some months ago, I came across a really interesting quirk of Gaelic word order that bears a striking resemblance to German syntax. Namely, verb phrases place their head to the right of the noun phrase in certain conditions:

Gaelic: ‘S urrainn dhomh am biadh a chòcaireachd. (is ability to-me the food to cook)
German: Ich kann das Essen kochen. (I can the food cook)
English: I can cook the food.

We’d not covered it in class at that point, so I filed it away mentally as an interesting fact to revisit later.

I didn’t have to wait long. One of this term’s big ideas for our group was that very phenomenon. We’ve spent lesson after lesson having fun with it (in fact, some of the most fun lessons we’ve had, making up humorous sentences based on whacky scenarios!).

The thing is, I’m now using inversion everywhere – not just in class, but in casual chat too. I’m also spotting it everywhere in my reading too, as if a spotlight has been shone on it. It’s as if inversion has taken possession of the new tricks cortex in my brain, neurons glowing at the slightest excitation.

It reminds me of that explosion of expressivity when you first learn to form the past tense in a language. Suddenly you want to use it everywhere to talk about what you did, what you’ve been doing, what you used to do… And it’s one of the greatest signs that you really love the subject, or language, you’ve chosen to dedicate your time to.

Do you recognise new tricks syndrome in your own language learning? What new linguistic toys are you currently playing with? Let us know in the comments!

As scaffold builds a building, sentence frames help build your foreign language competency. Image from freeimages.com

Sentence Frames – A Home to Hang Your Words

Idly keying out some Duolingo practice phrases this weekend, an interesting sentence popped up in Polish. Kiedy śpię, to nie mówię. When I sleep, I do not speak. Hmm, I thought. That looks like a good addition to my Polish sentence frames.

Sentence frames are short, recyclable chunks of language with repurposable slots you can swap items in and out of. The idea comes from primary literacy teaching, namely the writing frame. Early schoolers support their writing skills by memorising reusable chunks with customisable blanks.

To get started on your own, all you need is a beady eye to spot sentences you can strip down for potential reusable frames. Take my Polish sentence, for example. Removing the content stuff, we’re left with:

Kiedy X, to Y. When X, (then) Y.

At this point, it helps me to read the stripped-down sentence aloud, substituting X and Y for a meaningful mmmm…. Kiedy mmmm, to mmmm. It sounds daft, but it prepares the brain for step two.

Doing Your Lines

The next thing to do is go to town with it. Like Bart Simpson (semi-)dutifully doing his lines on the board, scribble out a whole bunch of sentences using the same pattern. Slot in whatever comes to mind to start cementing it into memory. When I go to town, I visit my friend. When I get home, I turn on the TV. And so on, and so on. Soon that pattern will be tripping off the tongue as easily as a native phrase.

The reason these sentence frames are so valuable is that they supply that native phrase structure, rather than unordered, abstract dictionary knowledge. Instead of fumbling to piece sentences together from scratch, you have something to hang words onto before you start speaking.

They’re also easy to mine in your day-to-day language contact. You can spot potential speaking frame fodder anywhere and everywhere. Duolingo throws plenty of short, snappy examples at you, for instance. But billboards, TV ads and social media posts are excellent sources too.

Short ‘n’ Simple(ish)

Just like writing frames, sentence frames work best when they are simple. Some might only have a single slot, but represent a really frequent but language-particular pattern, like the Gaelic:

‘S e X a th’ ann. It is an X.

Others can be equally short but a little more complex, fitting in a third slot, like the German:

Wenn ich X hätte, würde ich Y Z. If I had X, I would Z Y.

Note the word order there. By memorising that frame, you’re drilling that very particular verb-final order of German subordinate clauses, too. That’s a lot of useful material packed into a nice cosy space.

Wherever you find them, however you drill them, sentence frames are a great tool to have in your language learning toolbox. For sure, it’s a case when doing your lines can be very good for you.