SingaKids: A Glimpse of Where Multimodal AI Tutoring May Be Headed

November 23, 2025Leave a comment

A recent pre-print on SingaKids, a multilingual multimodal tutoring system for young learners, offers an interesting look at how AI-supported language learning is evolving. You can read the paper here: SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning.

Designed for early primary classrooms, SingaKids is an AI-based system that uses picture-description tasks as the basis for spoken interaction. It combines dense image captioning, multilingual speech recognition, a dialogue model tuned with pedagogical scaffolding, and child-friendly text-to-speech. The system works in English, Mandarin, Malay, and Tamil, with extra attention paid to the lower-resource languages to improve recognition and generation quality.

Flexible Scaffolding

Something that stood out to me in particular was the system’s focus on scaffolding rather than straightforward correction. That approach is flexible; depending on a child’s response, the system shifts between prompts, hints, explanations, and more structured guidance. Higher-performing learners are pushed towards fuller reasoning; less confident learners get clearer cues and more supportive turns. It’s a step away from the rigid “question–answer–score” pattern and closer to the texture of real classroom dialogue.

Although the work is aimed at children, several ideas have wider implications for the rest of us. Picture-guided dialogue isn’t new in ‘grown-up’ resources – think Rosetta Stone, for instance. But it could easily support adult learners practising free production in AI tools, too. Improved multilingual ASR – especially for hesitant, accented, or code-switched speech – would benefit almost every speaking-practice tool. And the flexible scaffolding approach hints at future e-tutors that adapt to the learner’s behaviour dynamically, rather than funnelling everyone down the same path.

The project sits firmly in the research space, but it points towards what the next generation of tools may look like: multimodal, context-aware systems that don’t just respond to learners but actively guide, prompt, and adjust. For anyone keeping an eye on developments in educational AI, it’s a nice indication of the direction of travel (and I’m probably a wee bit envious of those kids getting a chance to try it first!).

Perplexity Tasks for Language Learners

October 26, 2025October 26, 2025Leave a comment

AI techniques to support language learning are pretty well-known now. From structured conversation partners to resource creators, LLM platforms have been embraced by the polyglot community.

Like many of us, I dip in and out of them almost unthinkingly now. Often, I’ll snap in a page from a chapter I’m working on with my Greek teacher, and it’ll help me prepare ahead of a lesson. Sometimes, I’ll get it to reel off a list of useful phrases on a topic I’m studying. LLMs can make great worksheet creators, too. In many ways, it’s simply a very interactive reference tool, giving (mostly) reliable answers but with a big nod to context.

I’d been pretty dogged in my choice of platform, sticking for the most part with ChatGPT Plus. Claude and Gemini were also in the mix, alongside some fun running local models. But for the most part, I thought my tool choices were pretty settled.

But then I gave Perplexity a whirl.

Perplexity – Task Master

Perplexity isn’t an LLM in the sense that ChatGPT, Gemini and Claude are. It uses LLM technology. But it’s actually more of an intelligent, context-sensitive search tool, that uses natural language APIs to turbo-boost its web-hunting activities.

I’d clearly not found that prospect very exciting, as I’d not gone near it until now. But thanks to a bundled free upgrade, I got to try the premium tier of late. And one particular feature stands out as potentially transformative for my learning habits: Perplexity Tasks.

Tasks are scheduled searches you set up with natural language instructions. And those instructions can be as rich as your usual LLM prompts in terms of requested formatting and such like, so in essence, you can build regular bulletins with up-to-date information in any language you like. Take one of mine, that runs daily:

Search the global news for the biggest world news story of the day. Summarise it in French, German, Modern Greek, Polish, Scottish Gaelic and Swahili at a level appropriate for an intermediate learner, ensuring that the translation is of the highest, native speaker standard quality, idiomatic and natural-sounding. Summaries should be 3-4 sentences long. Highlight key words in bold.

Accompany each summary text with a glossary / vocabulary list detailing all the key / difficult words from it in dictionary format (listing word class, irregular parts if applicable etc.). Hyperlink glossary items to Wiktionary entries where available with further information on them (use the English version en.wiktionary.com).

Lay it all out neatly to make it easy on the eye. Use plenty of emojis for impact too. Make this a fabulous resource for polyglot language learning! 🌍

Now, every morning, I get a wee news digest emailed straight to my inbox in multiple languages. It’s learner-friendly, includes vocab support, and gives me something to talk about in my language meets and lessons. I’ve done the same for academic paper searches in linguistics, and stories on dialect appearing in news outlets.

It feels like a proper game changer!

Tasking on Other Platforms

Now, you don’t need Perplexity to do this – it’s just one of the most user-friendly ways I’ve found to do it. If you have ChatGPT, check out Scheduled Tasks. In Gemini, Scheduled Actions will do the trick for Pro members. Copilot is in on the game too. Others will no doubt follow suit shortly – clearly, task scheduling is becoming one of those features AI platforms are expected to have.

What I like about Perplexity, though, is that its whole raison d’être is the search – it feels particularly suited to web-based tasks like news digests. It’s also quite nice to keep the separation between my everyday LLM ramblings, and my more structured, scheduled items (use it for a few weeks and you’ll have clogged your timeline up with dozens of chats!).

If you’ve been looking for a way to make AI genuinely work for your learning rather than distract from it, try setting up a task or two – you might just find it becomes part of your morning ritual as well.

ElevenLabs Hits the Right Note: A.I. Songwriting for Language Learners

September 21, 2025September 21, 2025Leave a comment

In case you missed it, A.I. text-to-speech leader ElevenLabs is the latest platform to join the generative music scene – so language learners and teachers have another choice for creating original learning songs.

ElevenLabs’ Creative Platform ElevenMusic takes a much more structured approach to music creation that other platforms I’ve tried. Enter your prompt (or full lyrics), and it will build a song from block components – verse, chorus, bridge – just as you might construct one as a human writer. It makes for a much more natural-sounding track.

ElevenLabs music creation

As you’d expect from voice experts ElevenLabs, the service copes with a wide range of languages and the diction is very convincing. A tad more so, I think, than the current iteration of the first big name on the block, Suno AI. No doubt the latter will have some tricks up its sleeve to keep up the pace – but for now, ElevenLabs is the place to go for quick and catchy learning song.

Anyway, here’s one I made earlier – a rather natty French rock and roll song about the Moon landings. Get those blue suede Moon boots on!

It’s definitely worth having a play on the site to see what you can come up with for you or your classes. ElevenLabs has a free tier, of course, so you can try it out straight away. [Note: that’s my wee affiliate link, so if you do sign up and hop on a higher tier later, you’re helping keep Polyglossic going!]

A swirl of IPA symbols in the ether. Do LLMs 'understand' phonology? And are they any good at translation?

Tencent’s Hunyuan-MT-7B, the Translation Whizz You Can Run Locally

September 7, 2025September 8, 2025Leave a comment

There’s been a lot of talk this week about a brand new translation model, Tencent’s Hunyuan-MT-7B. It’s a Large Language Model (LLM) trained to perform machine translation. And it’s caused a big stir by beating heftier (and heavier) models by Google and OpenAI in a recent event.

This is all the more remarkable given that it’s really quite a small model by LLM standards. Hunyuan actually manages its translation-beating feat packed into just 7 billion parameters (the information nodes that models learn from). Now that might sound a lot. But fewer usually means weaker, and the behemoths are nearing post-trillion param levels already.

So Hunyuan is small. But in spite of that, it can translate accurately and reliably – market-leader beatingly so – between over 30 languages, including some low-resource ones like Tibetan and Kazakh. And its footprint is truly tiny in LLM terms – it’s lightweight enough to run locally on a computer or even tablet, using inference software like LMStudio or PocketPal.

The model is available in various GGUF formats at Hugging Face. The 4-bit quantised version comes in at just over 4 GB, making it iPad-runnable. If you want greater fidelity, then 8-bit quantised is still only around 8 GB, easily handleable in LMStudio with a decent laptop spec.

So is it any good?

Well, I ran a few deliberately tricky English to German tasks through it, trying to find a weak spot. And honestly, it’s excellent – it produces idiomatic, native-quality translations that don’t sound clunky. What I found particularly impressive was its ability to paraphrase where a literal translation wouldn’t work.

There are plenty of use cases, even if you’re not looking for a translation engine for a full-blown app. Pocketising it means you have a top-notch multi-language translator to use offline, anywhere. For language learners – particularly those struggling with the lower-resource languages the model can handle with ease – it’s another source of native-quality text to learn from.

Find out more about the model at Hugging Face, and check out last week’s post for details on loading it onto your device!

Ultra-Mobile LLMs : Getting the Most from PocketPal

August 31, 2025August 31, 2025Leave a comment

If you were following along last week, I was deep into the territory of running open, small-scale Large Language Models (LLMs) locally on a laptop in the free LMStudio environment. There are lots of reasons you’d want to run these mini chatbots, including the educational, environmental, and security aspects.

I finished off with a very cursory mention of an even more mobile vehicle for these, PocketPal. This free, open source app (available on Google and iOS) allows for easy (no computer science degree required) searching, downloading and running LLMs on smartphones and tablets. And, despite the resource limitations of mobile devices compared with full computer hardware, they run surprisingly well.

PocketPal is such a powerful and unique tool, and definitely worth a spotlight of its own. So, this week, I thought I’d share some tips and tricks I’ve found for smooth running of these language models in your pocket.

Full-Fat LLMs?

First off, even small, compact models can be (as you’d expect) unwieldy and resource-heavy files. Compressed, self-contained LLM models are available as .gguf files from sources like Hugging Face, and they can be colossal. There’s a process you’ll hear mentioned a lot in the AI world called quantisation, which compresses models to varying degrees. Generally speaking, the more compression, the more poorly the model performs. But even the most highly compressed small models can weigh in at 2gb and above. After downloading them, these mammoth blobs then load into memory, ready to be prompted. That’s a lot of data for your system to be hanging onto!

That said, with disk space, a good internet connection, and decent RAM, it’s quite doable. On a newish MacBook, I was comfortably downloading and running .gguf files 8gb large and above in LMStudio. And you don’t need to downgrade your expectations too much to run models in PocketPal, either.

For reference, I’m using a 2023 iPad Pro with the M2 chip – quite a modest spec now – and a 2024 iPhone 16. On both of them, the sweet spot seems to be a .gguf size of around 4gb – you can go larger, but there’s a noticeable slowdown and sluggishness beyond that. A couple of the models I’ve been getting good, sensible and usable results from on mobile recently are:

Qwen3-4b-Instruct (8-bit quantised version) – 4.28gb
Llama-3.2-3B-Instruct (6-bit quantised version) – 3.26gb

The ‘instruct’ in those model names refers to the fact that they’ve been trained to follow instructions particularly keenly – one of the reasons they give such decent practical prompt responses with a small footprint.

Optimising PocketPal

Once you have them downloaded, there are a couple of things you can tweak in PocketPal to eke out even more performance.

The first is to head to the settings and switch on Metal, Apple’s hardware-accelerated API. Then, increase the “Layers on GPU” setting to around 80 or so – you can experiment with this to see what your system is happy with. But the performance improvement should be instantaneous, the LLM spitting out tokens at multiple times the default speed.

What’s happening with this change is that iOS is shifting some of the processing from the device’s CPU to the GPU, or graphical processing unit. That may seem odd, but modern graphics chips are capable of intense mathematical operations, and this small switch recruits them into doing some of the heavy work.

Additionally, on some recent devices, switching on “Flash Attention” can bring extra performance enhancements. This interacts with the way LLMs track how much weight to give certain tokens, and how that matrix is stored in memory during generation. It’s pot luck whether it will make a difference, depending on device spec, but I see a little boost.

Tweaking PocketPal’s settings to run LLMs more efficiently

Making Pals – Your Own Custom Bots

When you’re all up and running with your PocketPal LLMs, there’s another great feature you can play with to get very domain-specific results – “Pal” creation. Pals are just system prompts – instructions that set the boundaries and parameters for the conversation – in a nice wrapper. And you can be as specific as you want with them, instructing the LLM to behave as a language learning assistant, a nutrition expert, a habits coach, and such like – with as many rules and output notes as you see fit. It’s an easy way to turn a very generalised tool into something focused and with real-world application.

So that’s my PocketPal in-a-nutshell power guide. I hope you can see why it’s worth much more than just a cursory mention at the end of last week’s post! Tools like PocketPal and LMStudio put you right at the centre of LLM development, and I must admit it’s turned me into a models geek – I’m already looking forward to what new open LLMs will be unleashed next.

So what have you set your mobile models doing? Please share your tips and experiences in the comments!

LLMs on Your Laptop

August 24, 2025Leave a comment

I mentioned last week that I’m spending a lot of time with LLMs recently. I’m poking and prodding them to test their ‘understanding’ (inverted commas necessary there!) of phonology, in particular with non-standard speech and dialects.

And you’d be forgiven for thinking I’m just tapping my prompts into ChatGPT, Claude, Gemini or the other big commercial concerns. Mention AI, and those are the names people come up with. They’re the all-bells-and-whistles web-facing services that get all the public fanfare and newspaper column inches.

The thing is, that’s not all there is to Large Language Models. There’s a whole world of open source (or the slightly less open ‘open weights’) models out there. Some of them offshoots of those big names, while others less well-known. But you can download all of them to run offline on any reasonably-specced laptop.

LMStudio – LLMs on your laptop

Meet LMStudio – the multi-platform desktop app that allows you to install and interrogate LLMs locally. It all sounds terribly technical, but at its most basic use – a custom chatbot – you don’t need any special tech skills. Browsing, installing and chatting with models is all done via the tab-based interface. You can do much more with it – the option to run it as a local server is super useful for development and testing – but you don’t have to touch any of that.

Many of the models downloadable within LMStudio are small models – just a few gigabytes, rather than the behemoths behind GPT-5 and other headline-grabbing releases. They feature the same architecture as those big-hitters, though. And in many cases, they are trained to approach, even match, their performance on specific tasks like problem-solving or programming. You’ll even find reasoning models, that produce a ‘stepwise-thinking’ output, similar to platforms like Gemini.

A few recent models for download include:

Qwen3 4B Thinking – a really compact model (just over 2gb) which supports reasoning by default
OpenAI’s gpt-oss-20b – the AI giant’s open weights offering, released this August
Gemma 3 – Google’s multimodal model optimised for use on everyday devices
Mistral Small 3.2 – the French AI company’s open model, with vision capabilities

So why would you bother, when you can just fire up ChatGPT / Google / Claude in a few browser clicks?

LLMs locally – but why?

Well, from an academic standpoint, you have complete control over these models if you’re exploring their use cases in a particular field, like linguistics or language learning. You can set parameters like temperature, for instance – the degree of ‘creativity wobble’ the LLM has (0 being a very rigid none, and 1 being, well, basically insane). And if you can set parameters, you can report these in your findings, which allows others to replicate your experiments and build on your knowledge.

Small models also run on smaller hardware – so you can develop solutions that people don’t need a huge data centre for. If you do hit upon a use case or process that supports researchers, then it’s super easy for colleagues to access the technology, whatever their recourse to funding support.

Secondly, there’s the environmental impact. If the resource greed of colossal data centres is something that worries you (and there’s every indication that it should be a conversation we’re all having ), then running LLMs locally allows you to take advantage of them without heating up a server farm somewhere deep inside the US. The only thing running hot will be your laptop fan (it does growl a bit with the larger models – I take that as a sign to give it a rest for a bit!).

And talk of those US server farms leads on to the next point: data privacy. OpenAI recently caused waves with their suggestion that user conversations are not the confidential chats many assume them to be. If you’re not happy with your prompts and queries passing out of your control and into the data banks of a foreign state, then local LLMs offer not a little peace of mind too.

Give it a go!

The best thing? LMStudio is completely free. So download it, give it a spin, and see whether these much smaller-footprint models can give you what you need without entering the ecosystem of the online giants.

Lastly, don’t have a laptop? Well, you can also run LLMs locally on phones and tablets too. Free app PocketPal (on iOS and Android) runs like a cut-down version of LMStudio. Great for tinkering on the go!

A robot reading a script. The text-to-speech voices at ElevenLabs certainly sound intelligent as well as natural!

ElevenLabs : 5-Star Tool for Language Work and Study

February 2, 2025

If you’re a regular reader, you’ll know how impressed I’ve been at ElevenLabs, the text-to-speech creator that stunned the industry when its super-realistic voices were unleashed on the world. Since then, it’s made itself irreplaceable in both my work and study, and it bears spreading the word again: ElevenLabs is a blow-your-socks-off kind of tool for creating spoken audio content.

Professional Projects

In my work developing language learning materials for schools, arranging quality narration used to involve coordinating with agencies and studios — a process that was both time-consuming and costly. We’ve had issues with errors, too, which cost a project time with re-recordings. And that’s not to mention the hassle keeping sections up-to-date. Removing ‘stereo’ from an old vocab section (who has those now?) would usually trigger a complete re-record.

With ElevenLabs, I can now produce new sections promptly, utilising its impressive array of voices across multiple languages. The authenticity and clarity of these voices are fantastic – I really can’t understate it – and it’s made maintaining the biggest language learning site for schools so much easier.

Supporting Individual Learning

As a language learner, ElevenLabs is more than worth its salt, too. It’s particularly good for assembling short listening passages – about a minute long – to practise ‘conversation islands’—a well-regarded polyglot technique for achieving conversational fluency.

Beyond language learning, the tool can be a great support to other academic projects. I’ve created concise narrations of complex topics, converting excerpts from scholarly papers into audio format. Listening to these clips in spare moments (or even in the background while washing up) has helped cement some key concepts, and prime my mind for conventional close study.

Flexible and Affordable Plans

ElevenLabs offers a range of pricing options to suit different needs:

• Free Plan: Ideal for those starting out, this plan provides 10,000 characters per month, roughly equating to 10 minutes of audio.

• Starter Plan: At £5 per month, you receive 30,000 characters (about 30 minutes of audio), along with features like voice cloning and commercial use rights.

• Creator Plan: For £22 per month, this plan offers 100,000 characters (around 100 minutes of audio), plus professional voice cloning and higher-quality outputs.

For messing around, that free plan is not too stingy at all – you can really get a feel for the tool from it. Personally, I’ve not needed to move beyond the starter plan yet, which is pretty much a bargain at around a fiver a month.

Introducing ElevenReader

And there’s more! Complementing the TTS service, ElevenLabs has introduced ElevenReader, a free tool that narrates PDFs, ePubs, articles, and newsletters in realistic AI voices. Available on both iOS and Android platforms, the app doesn’t even consume credits from your ElevenLabs subscription plan.

Seriously, I can’t even believe this is still free – go and try it!

Final Thoughts

ElevenLabs has truly transformed the way I create and consume spoken content. It truly is my star tool from the current crop of AI-powered utilities.

The ElevenLabs free tier is enough for most casual users to have a dabble – go and try it today!

A robot playwright - now even more up-to-date with SearchGPT.

Topical Dialogues with SearchGPT

November 3, 2024

As if recent voice improvements weren’t enough of a treat, OpenAI has just introduced another killer feature to ChatGPT, one that can likewise beef up your custom language learning resources. SearchGPT enhances the LLM’s ability to access and incorporate bang up-to-date information from the web.

It’s a development that is particularly beneficial for language learners seeking to create study materials that reflect current events and colloquial language use. With few exceptions until now, LLMs like ChatGPT have had a ‘data cutoff’, thanks to mass text training having an end-point (albeit a relatively recent one). Some LLMs, like Microsoft’s Copilot, have introduced search capabilities, but their ability to retrieve truly current data could be hit and miss.

With SearchGPT, OpenAI appear to have cracked search accuracy a level to rival AI search tool Perplexity – right in the ChatGPT app. And it’s as simple as highlighting the little world icon that you might already have noticed under the prompt field.

The new SearchGPT icon in the ChatGPT prompt bar.

Infusing Prompts with SearchGPT

Switching this on alongside tried-and-tested language learning prompt techniques yields some fun – and pedagogically useful – results. For instance, you can prompt ChatGPT to generate dialogues or reading passages based on the latest news from your target language country/ies. Take this example:

A language learning dialogue on current affairs in German, beefed up by OpenAI’s SearchGPT

SearchGPT enables content that mirrors real-life discussion with contemporary vocabulary and expressions (already something it was great at). But it also incorporates accurate, up-to-the-minute, and even cross-referenced information. That’s a big up for transparency.

Unsure where that info came from? Just click the in-text links!

Enhancing Speaking Practice with Authentic Contexts

Beyond reading, these AI-generated dialogues serve as excellent scripts for speaking practice. Learners can role-play conversations, solo or group-wise, to improve pronunciation, intonation, and conversational flow. This method bridges the gap between passive understanding and active usage, a crucial step in achieving fluency.

Incorporating SearchGPT into your language learning content creation toolbox reconnects your fluency journey with the real, evolving world. Have you used it yet?

Apples and oranges, generated by Google's new image algorithm Imagén 3

Google’s Imagén 3 : More Reliable Text for Visual Resources

October 13, 2024October 13, 2024Leave a comment

If you use AI imaging for visual teaching resources, but decry its poor text handling, then Google might have cracked it. Their new algorithm for image generation, Imagén 3, is much more reliable at including short texts without errors.

What’s more, the algorithm is included in the free tier of Google’s LLM, Gemini. Ideal for flashcards and classroom posters, you now get quite reliable results when prompting for Latin-alphabet texts on the platform. Image quality seems to have improved too, with a near-photographic finish possible:

A flashcard produced with Google Gemini and Imagén 3.

The new setup seems marginally better at consistency of style, too. Here’s a second flashcard, prompting for the same style. Not quite the same font, but close (although in a different colour).

A flashcard created with Google Gemini and Imagén 3.

It’s also better at real-world details like flags. Prompting in another engine for ‘Greek flag’, for example, usually results in some terrible approximation. Not in Imagén 3 – here are our apples and oranges on a convincing Greek flag background:

Apples and oranges on a square Greek flag, generated by Google’s Imagén 3

It’s not perfect, yet. For one thing, it performed terribly with non-Latin alphabets, producing nonsense each time I tested it. And while it’s great with shorter texts, it does tend to break down and produce the tell-tall typos with anything longer than a single, short sentence. Also, if you’re on the free tier, it won’t allow you to create images of human beings just yet.

That said, it’s a big improvement on the free competition like Bing’s Image Creator. Well worth checking out if you have a bunch of flashcards to prepare for a lesson or learning resource!

Language Lessons from Packaging (And A Little Help from ChatGPT)

October 6, 2024Leave a comment

If you love scouring the multilingual packaging of household products from discounter stores (a niche hobby, I must admit, even for us linguists), then there’s a fun way to automate it with LLMs like ChatGPT.

Take the back of this packet of crisps. To many, a useless piece of rubbish. To me (and some of you, I hope!), a treasure of language in use.

Greek text on a packet of crisps

Normally, I’d idly read through these, looking up any unfamiliar words in a dictionary. But, using an LLM app with an image facility like ChatGPT, you can automate that process. What’s more, you can request all sorts of additional info like dictionary forms, related words, and so on.

From Packaging to Vocab List

Take a snap of your packaging, and try this prompt for starters:

Create a vocabulary list from the key content words on the packaging label. For each word, list:

– its dictionary form

– a new, original sentence illustrating the word in use

– common related words

The results should be an instantly useful vocab list with added content for learning:

Vocabulary list from food packaging by ChatGPT

Vocabulary list compiled by ChatGPT from a food packaging label

I added a note-taking stage to round it off. It always helps me to write down what I’m learning, adding a kinaesthetic element to the visual (and aural, if you’ve had ChatGPT speak its notes out loud). Excuse the scrawl… (As long as your notes are readable by you, they’re just fine!)

Handwritten vocabulary notes derives from crisp packet packaging

Notes on a crisp packet…

It’s a fun workflow that really underscores the fact that there are free language lessons all around us.

Especially in the humblest, and often least glamorous, of places.

Polyglossic

Love Learning Languages

Category: Artificial Intelligence

SingaKids: A Glimpse of Where Multimodal AI Tutoring May Be Headed

Flexible Scaffolding