A robot making clones of its voice - now quick and easy with tools like ElevenLabs.

You, But Fluent – Voice Cloning for Language Learners

I could barely contain my excitement in last week’s post on ElevenLabs’ brilliant text-to-speech voice collection. I’ve had a week of playing around with it now, and if anything, I’m only more enthusiastic about it.

After a bit of deep-delving, it’s the voice clone features that have me hooked right now. ElevenLabs can make a digital version of your voice from just 30 seconds of training speech. And it’s fast. I expected a bit of a wait for audio processing the first time I used it. But no – after reading in a couple of passages of sample text, my digital TTS voice was ready to use within seconds.

For a quick ‘n’ easy tool, it does a brilliant job of picking up general accent. It identified mine as British English, captured most of my Midlands features (it struggled with my really low u in bus, though – maybe more training would help), and it got my tone bang on. Scarily so… I can understand why cybersecurity pundits are slightly nervous about tech like this.

Your Voice, Another Language

The most marvellous thing, though, was using my voice to read foreign language texts. Although not 100% native-sounding – the voice was trained on me reading English, of course – it’s uncannily accurate. Listening to digital me reading German text, I’d say it sounds like a native-ish speaker. Perhaps someone who’s lived in Germany for a decade, and retains a bit of non-native in their speech.

But as far as models go, that’s a pretty high standard for any language learner.

ElevenLabs' TTS interface with the custom voice 'Richard' selected.

ElevenLabs’ TTS interface with the custom voice ‘Richard’ selected, ready to read some German.

The crux of it is that you can have your voice reading practice passages for memory training (think: island technique). There’s an amazing sense of personal connect that comes from that – that’s what you will sound like, when you’ve mastered this.

It also opens up the idea for tailoring digital resources with sound files read by ‘you’. Imagine a set of interactive language games for students, where the voice is their teacher’s. Incredible stuff.

In short, it’s well worth the fiver-a-month starter subscription to play around with it.

A robot reading a script. The text-to-speech voices at ElevenLabs certainly sound intelligent as well as natural!

ElevenLabs Voices for Free, Custom Language-Learning Material

There’s been a lot on the grapevine of late about AI-powered leaps forward in text-to-speech voices. From providing accent models to in-depth speaking games, next-gen TTS is poised to have a huge impact on language learning.

The catch? Much of the brand new tech isn’t available to the average user-on-the-street yet.

That’s why I was thrilled to happen across TTS service ElevenLabs recently. ElevenLabs’ stunning selection of voices powers a number of eLearning and audiobook sites already, and it’s no hype to say that they sound as close to human as you can get right now.

Even better, you can sign up for a free account that gives you 10,000 characters of text-to-speech conversion each month. For $5 a month you can up that to 30,000 characters too, as well as access voice-cloning features. Just imagine the hours of fun if you want to hear ‘yourself’ speak any number of languages!

Using ElevenLabs in Your Own Learning

There’s plenty to do for free, though. For instance, if you enjoy the island technique in your learning, you can get ElevenLabs to record your passages for audio practice / rote memorising. I make this an AI double-whammy, using ChatGPT to help prepare my topical ‘islands’ before pasting them into ElevenLabs.

The ChatGPT > ElevenLabs workflow is also brilliant for dialogue modelling. On my recent Sweden trip, I knew that a big conversational contact point would be ordering at coffee shops. This is the prompt I used to get a cover-all-bases model coffee-shop convo:

Create a comprehensive model dialogue in Swedish to help me learn and practise for the situation “ordering coffee in a Malmö coffee shop”.

Try to include the language for every eventuality / question I might be asked by the coffee shop employee. Ensure that the language is colloquial and informal, and not stilted.

The output will be pasted into a text-to-speech generator, so don’t add speaker names to the dialogue lines – just a dash will suffice to indicate a change of speaker.

I then ran off the audio file with ElevenLabs, and hey presto! Custom real-world social prep. You can’t specify different voices in the same file, of course. But you could run off the MP3 twice, in different voices, then splice it up manually in an audio editor like Audacity for the full dialogue effect. Needless to say, it’s also a great way for teachers to make custom listening activities.

The ElevenLabs voices are truly impressive – it’s worth setting up a free account just to play with the options and come up with your own creative use cases. TTS is set to only get better in the coming months – we’re excited to see where it leads!

TTS can lend your learning some robotic voice magic. Image by Oliver Brandt on FreeImages.com

Disembodied Voices : Using TTS as a Native Speaker Boost

Native speaker modelling is a prerequisite for learning to speak Modern Foreign Languages. But when listening materials are scarce, or you struggle to find exactly the material you want to learn, then text-to-speech (TTS) can lend a helping hand.

Text-to-speech : native speakers out of thin air

TTS, or speech synthesis, has come on in leaps and bounds since the early days of tinny Speak ‘n’ Spell voices. At first mimicking chiefly (American) English, many projects have since diversified to conjure native speaker voices in many languages out of thin air. Polyglot TTS technologies such as Google Cloud’s offering are at the edge of machine learning developments, and sounding more and more human all the time.

Using these disembodied tongues for language learning is nothing new, of course. Switching the language of your digital voice assistant has become a pretty well-known polyglot trick for some robotic speaking practice. Siri has been speaking fluent Bokmål on my phone for a while now. As a result, I’ve become a dab hand at asking what the weather will be like in Norwegian!

TTS Toolkit

But as handy as voice assistants are, you can leverage the power of TTS much more directly than tapping into Siri or Alexa. At its most basic level, plain old TTS is brilliantly useful for hearing a spoken representation of a word or phrase you are unsure of.

For example, if you are tired of guessing how to reel off “das wäre sehr schön” in German (that would be very nice), never fear. Simply paste it into Google Translate and hit the speaker icon. The platform already offers and impressive number of languages with speech support.

Google Translate offers TTS features.

Google Translate offers TTS features – simply type / paste in your target language and click the speaker icon.

But it gets even better. Several other resources allow you to do more than just listen; they offer a download function too. This way, you can keep your most useful speech synthesis files and incorporate them into your own materials. Combined with vocabulary mining tools such as mass-sentence site Tatoeba, you can begin to curate large, offline collections of target language material with text and audio.

One notable and powerful multilingual voice synthesis site is TTSMP3.com. For a start, it offers plenty of language options. On top of that, several languages include a choice of voices, too.  More than enough to sate your curiosity when you wonder “How do you say that?”.

With a little Google digging, you may also find specialist TTS projects devoted to your particular language of study. For instance, Irish learners are spoilt by the Abair website (Abair means ‘say’ in Irish). Not only does it provide downloadable Irish narration, but it does this in a choice of three different regional accents. If you learn Irish too, you will well know what a godsend this is!

Irish TTS in a range of regional dialects on the Abair website.

Irish TTS in a range of regional dialects on the Abair website.

Incorporating TTS MP3s

With your MP3s downloaded, you can now incorporate them into your own resources. Easiest of all is simply to insert them as media in PowerPoint presentations or Word documents. I personally like to add them manually to Anki cards to add audio support when I revise vocabulary.

The quick and easy way to reach Anki’s media folder on the desktop program is to open up the Preferences panel, then the Backups tab, and finally to tap the “Open backup folder” link that appears. In the same location as that backup directory, you should see another folder with the name collection.media. Anything you put in here will be synced along with the rest of your Anki data.

Note: always back up your Anki decks before tinkering in the program folders!

Anki's media folder

Anki’s media folder

Drop your saved TTS files into this folder. Note that subfolders still don’t seem to work reliably, so keep everything in that single folder. Logical file naming will definitely help!

Finally, when you create / edit a vocabulary note, use the following format to add your sound as a playable item, replacing filename as appropriate:

[sound:filename.mp3]

When viewed on your device, you should see a play button on flashcards with embedded sounds. Magic!

Anki card with embedded sound

An Anki card with embedded sound

If you prefer an even easier route, then there is an Anki plugin specially created for automatically including TTS into notes. I prefer the manual method, as it satisfies the the tech control freak in me.

A human touch

Of course, TTS is not a perfect substitute for a native speaker recording. For those times when only a human voice will do, Forvo is a goldmine. The site is replete with native sounds across a dizzying array of languages, all recorded by native speaker volunteers. Just as with the TTS examples above, the sounds can be downloaded for use in your own resources, too.

To round off our trek through native speaker sites – real and synthesised – just a final note on copyright. If you intend to share the resources you create, always check the usage notes of the website of origin. Sites commonly have fair use policies for non-commercial projects, so making resources for personal use is usually not a problem at all. If you plan to sell your resources, though, you may well need to opt for a commercial plan with the respective platform.

Robotic resources can plug a real gap in native speaker support, especially for niche languages, or niche subjects in more mainstream ones. Do you use TTS in innovative ways in your own learning? Have you come across other specialist or language-specific TTS projects? If so, please share in the comments!