I’ve raved before about the fantastic text-to-speech facility at ElevenLabs. Voices so realistic they pretty much pass for human, that work across languages, and can even clone your own.
Well, there’s a little-feted app that the ElevenLabs team are behind that is a similar show-stopper in its category. It’s simply called Reader, and is a free text-to-speech narration app for Android and iOS.
Like the company’s web-based TTS service, it has the same range of emotive, expressive, ultra-realistic voices. It can also cope well narrating many languages, which are autodetected. Best of all for me is that it links with your device’s documents, so I can quickly import the papers I’m reading. Listening AND reading has made a massive difference to my comprehension and recall – multimodal is definitely the way forward for me!
You do get the odd strange artefact in the readout, but the product is still in Beta (likely the reason it’s currently completely free), and glitches are rare. What I’m also missing in it is the ability to tweak imported texts in-app, as you can do with Speechify. This would allow some cleaning of the file pre-narration (I lost count of the number of times I had to skip the page footer DOI link, which it hilariously mis-narrated many times). You can, of course, simply export and clean the files as text before importing, which gets round that.
In any case, they’re worth putting up with while the app is still free and in Beta. Even as it is, you have features here that you’d pay a small fortune for in other apps (ahem, Speechify). Definitely worth a punt if you’re looking for TTS support in your reading! Find download links for Android and iOS here.
Tag: text-to-speech
You, But Fluent – Voice Cloning for Language Learners
I could barely contain my excitement in last week’s post on ElevenLabs’ brilliant text-to-speech voice collection. I’ve had a week of playing around with it now, and if anything, I’m only more enthusiastic about it.
After a bit of deep-delving, it’s the voice clone features that have me hooked right now. ElevenLabs can make a digital version of your voice from just 30 seconds of training speech. And it’s fast. I expected a bit of a wait for audio processing the first time I used it. But no – after reading in a couple of passages of sample text, my digital TTS voice was ready to use within seconds.
For a quick ‘n’ easy tool, it does a brilliant job of picking up general accent. It identified mine as British English, captured most of my Midlands features (it struggled with my really low u in bus, though – maybe more training would help), and it got my tone bang on. Scarily so… I can understand why cybersecurity pundits are slightly nervous about tech like this.
Your Voice, Another Language
The most marvellous thing, though, was using my voice to read foreign language texts. Although not 100% native-sounding – the voice was trained on me reading English, of course – it’s uncannily accurate. Listening to digital me reading German text, I’d say it sounds like a native-ish speaker. Perhaps someone who’s lived in Germany for a decade, and retains a bit of non-native in their speech.
But as far as models go, that’s a pretty high standard for any language learner.
The crux of it is that you can have your voice reading practice passages for memory training (think: island technique). There’s an amazing sense of personal connect that comes from that – that’s what you will sound like, when you’ve mastered this.
It also opens up the idea for tailoring digital resources with sound files read by ‘you’. Imagine a set of interactive language games for students, where the voice is their teacher’s. Incredible stuff.
In short, it’s well worth the fiver-a-month starter subscription to play around with it.
ElevenLabs Voices for Free, Custom Language-Learning Material
There’s been a lot on the grapevine of late about AI-powered leaps forward in text-to-speech voices. From providing accent models to in-depth speaking games, next-gen TTS is poised to have a huge impact on language learning.
The catch? Much of the brand new tech isn’t available to the average user-on-the-street yet.
That’s why I was thrilled to happen across TTS service ElevenLabs recently. ElevenLabs’ stunning selection of voices powers a number of eLearning and audiobook sites already, and it’s no hype to say that they sound as close to human as you can get right now.
Even better, you can sign up for a free account that gives you 10,000 characters of text-to-speech conversion each month. For $5 a month you can up that to 30,000 characters too, as well as access voice-cloning features. Just imagine the hours of fun if you want to hear ‘yourself’ speak any number of languages!
Using ElevenLabs in Your Own Learning
There’s plenty to do for free, though. For instance, if you enjoy the island technique in your learning, you can get ElevenLabs to record your passages for audio practice / rote memorising. I make this an AI double-whammy, using ChatGPT to help prepare my topical ‘islands’ before pasting them into ElevenLabs.
The ChatGPT > ElevenLabs workflow is also brilliant for dialogue modelling. On my recent Sweden trip, I knew that a big conversational contact point would be ordering at coffee shops. This is the prompt I used to get a cover-all-bases model coffee-shop convo:
Create a comprehensive model dialogue in Swedish to help me learn and practise for the situation “ordering coffee in a Malmö coffee shop”.
Try to include the language for every eventuality / question I might be asked by the coffee shop employee. Ensure that the language is colloquial and informal, and not stilted.
The output will be pasted into a text-to-speech generator, so don’t add speaker names to the dialogue lines – just a dash will suffice to indicate a change of speaker.
I then ran off the audio file with ElevenLabs, and hey presto! Custom real-world social prep. You can’t specify different voices in the same file, of course. But you could run off the MP3 twice, in different voices, then splice it up manually in an audio editor like Audacity for the full dialogue effect. Needless to say, it’s also a great way for teachers to make custom listening activities.
The ElevenLabs voices are truly impressive – it’s worth setting up a free account just to play with the options and come up with your own creative use cases. TTS is set to only get better in the coming months – we’re excited to see where it leads!
Disembodied Voices : Using TTS as a Native Speaker Boost
Native speaker modelling is a prerequisite for learning to speak Modern Foreign Languages. But when listening materials are scarce, or you struggle to find exactly the material you want to learn, then text-to-speech (TTS) can lend a helping hand.
Text-to-speech : native speakers out of thin air
TTS, or speech synthesis, has come on in leaps and bounds since the early days of tinny Speak ‘n’ Spell voices. At first mimicking chiefly (American) English, many projects have since diversified to conjure native speaker voices in many languages out of thin air. Polyglot TTS technologies such as Google Cloud’s offering are at the edge of machine learning developments, and sounding more and more human all the time.
Using these disembodied tongues for language learning is nothing new, of course. Switching the language of your digital voice assistant has become a pretty well-known polyglot trick for some robotic speaking practice. Siri has been speaking fluent Bokmål on my phone for a while now. As a result, I’ve become a dab hand at asking what the weather will be like in Norwegian!
TTS Toolkit
But as handy as voice assistants are, you can leverage the power of TTS much more directly than tapping into Siri or Alexa. At its most basic level, plain old TTS is brilliantly useful for hearing a spoken representation of a word or phrase you are unsure of.
For example, if you are tired of guessing how to reel off “das wäre sehr schön” in German (that would be very nice), never fear. Simply paste it into Google Translate and hit the speaker icon. The platform already offers and impressive number of languages with speech support.
But it gets even better. Several other resources allow you to do more than just listen; they offer a download function too. This way, you can keep your most useful speech synthesis files and incorporate them into your own materials. Combined with vocabulary mining tools such as mass-sentence site Tatoeba, you can begin to curate large, offline collections of target language material with text and audio.
One notable and powerful multilingual voice synthesis site is TTSMP3.com. For a start, it offers plenty of language options. On top of that, several languages include a choice of voices, too. More than enough to sate your curiosity when you wonder “How do you say that?”.
With a little Google digging, you may also find specialist TTS projects devoted to your particular language of study. For instance, Irish learners are spoilt by the Abair website (Abair means ‘say’ in Irish). Not only does it provide downloadable Irish narration, but it does this in a choice of three different regional accents. If you learn Irish too, you will well know what a godsend this is!
Incorporating TTS MP3s
With your MP3s downloaded, you can now incorporate them into your own resources. Easiest of all is simply to insert them as media in PowerPoint presentations or Word documents. I personally like to add them manually to Anki cards to add audio support when I revise vocabulary.
The quick and easy way to reach Anki’s media folder on the desktop program is to open up the Preferences panel, then the Backups tab, and finally to tap the “Open backup folder” link that appears. In the same location as that backup directory, you should see another folder with the name collection.media. Anything you put in here will be synced along with the rest of your Anki data.
Note: always back up your Anki decks before tinkering in the program folders!
Drop your saved TTS files into this folder. Note that subfolders still don’t seem to work reliably, so keep everything in that single folder. Logical file naming will definitely help!
Finally, when you create / edit a vocabulary note, use the following format to add your sound as a playable item, replacing filename as appropriate:
[sound:filename.mp3]
When viewed on your device, you should see a play button on flashcards with embedded sounds. Magic!
If you prefer an even easier route, then there is an Anki plugin specially created for automatically including TTS into notes. I prefer the manual method, as it satisfies the the tech control freak in me.
A human touch
Of course, TTS is not a perfect substitute for a native speaker recording. For those times when only a human voice will do, Forvo is a goldmine. The site is replete with native sounds across a dizzying array of languages, all recorded by native speaker volunteers. Just as with the TTS examples above, the sounds can be downloaded for use in your own resources, too.
To round off our trek through native speaker sites – real and synthesised – just a final note on copyright. If you intend to share the resources you create, always check the usage notes of the website of origin. Sites commonly have fair use policies for non-commercial projects, so making resources for personal use is usually not a problem at all. If you plan to sell your resources, though, you may well need to opt for a commercial plan with the respective platform.
Robotic resources can plug a real gap in native speaker support, especially for niche languages, or niche subjects in more mainstream ones. Do you use TTS in innovative ways in your own learning? Have you come across other specialist or language-specific TTS projects? If so, please share in the comments!