I could barely contain my excitement in last week’s post on ElevenLabs’ brilliant text-to-speech voice collection. I’ve had a week of playing around with it now, and if anything, I’m only more enthusiastic about it.
After a bit of deep-delving, it’s the voice clone features that have me hooked right now. ElevenLabs can make a digital version of your voice from just 30 seconds of training speech. And it’s fast. I expected a bit of a wait for audio processing the first time I used it. But no – after reading in a couple of passages of sample text, my digital TTS voice was ready to use within seconds.
For a quick ‘n’ easy tool, it does a brilliant job of picking up general accent. It identified mine as British English, captured most of my Midlands features (it struggled with my really low u in bus, though – maybe more training would help), and it got my tone bang on. Scarily so… I can understand why cybersecurity pundits are slightly nervous about tech like this.
Your Voice, Another Language
The most marvellous thing, though, was using my voice to read foreign language texts. Although not 100% native-sounding – the voice was trained on me reading English, of course – it’s uncannily accurate. Listening to digital me reading German text, I’d say it sounds like a native-ish speaker. Perhaps someone who’s lived in Germany for a decade, and retains a bit of non-native in their speech.
But as far as models go, that’s a pretty high standard for any language learner.
The crux of it is that you can have your voice reading practice passages for memory training (think: island technique). There’s an amazing sense of personal connect that comes from that – that’s what you will sound like, when you’ve mastered this.
It also opens up the idea for tailoring digital resources with sound files read by ‘you’. Imagine a set of interactive language games for students, where the voice is their teacher’s. Incredible stuff.
In short, it’s well worth the fiver-a-month starter subscription to play around with it.