You Don’t Need to Be a Developer to Start Playing with AI Models in Python

I’ve been singing the praises of local models of late, for so many reasons. From intelligent OCR to data crunching with enhanced privacy, there are gains to be had and they’re easy to access with free inferencing software like LM Studio and Ollama.

That said, there’s a moment that happens to a lot of people who work adjacent to tech – linguists, teachers, researchers – where they think: I’d love to tinker with these AI models properly – and maybe even build them directly into my own tech projects.

This post addresses that tinkering itch. The good news: it’s genuinely easier than you think, and you can get something running in an afternoon.

Why Python?

I ask this a lot, myself, coming from a totally different development background (full-stack and native web app coding). Going back into academia, Python seems to be everywhere.

Python has become the de facto language of AI and data science for a reason. Its syntax is readable almost like pseudocode, its libraries are extraordinarily well-developed and vast, and – linked to that last point – calling an API takes a handful of lines, not pages of custom routines. If you’re coming from a research or humanities background, Python also has the advantage of being widely taught in academic contexts, which means the community, tutorials, and Stack Overflow threads are abundant.

Compare calling an LLM in Python to doing the same in JavaScript or Swift, and you’ll understand immediately why the ‘AI for academia’ world standardised on Python.

And a big plus – it’s probably already installed on your machine. Open your terminal / command prompt interface, and type python --version or python3 --version. If you see a version number come back, you’re good to go. If not, head to python.org/downloads and grab the latest stable release – it’s a straightforward installer on every platform.

Two Ways In: Cloud or Local

Option 1: Hugging Face’s Free Inference API (great for experimenting, zero cost)

Hugging Face is essentially the GitHub of AI models – tens of thousands of open-source models, all in one place. The Serverless Inference API lets you call many of them without setting up any infrastructure, and the free tier is perfectly generous for tinkering and learning. You’ll hit rate limits if you go overboard, but for exploration it’s hard to beat.

Here’s what you need to get started:

  1. Create a free account at huggingface.co
  2. Go to Settings → Access Tokens and generate a token with Read permissions
  3. Install the library: pip install huggingface_hub

Then you can call a model like this:

from huggingface_hub import InferenceClient

client = InferenceClient(
    model="meta-llama/Llama-3.2-11B-Vision-Instruct",
    token="hf_your_token_here"
)

response = client.text_generation("Explain enregisterment in simple terms.")
print(response)

That’s genuinely it for a first experiment. A few lines. No GPU. No cloud bill.

One gotcha: some popular models require you to accept their licence terms on the Hugging Face website before you can access them via the API. If you get a 403 error, that’s almost certainly why — head to the model page, accept the terms, and try again.

Option 2: LM Studio (run models locally, completely private)

If you’d rather not send your data to any external service – which matters for research involving sensitive text – LM Studio is still a brilliant solution. It gives you a clean interface to download and run open-source models on your own machine, with no internet connection required once the model is downloaded.

The local model landscape has improved dramatically. Models like Qwen3 (the 4B and 14B variants especially) are genuinely impressive on a modern laptop or desktop with a decent amount of RAM. You wouldn’t have believed this was possible two years ago.

LM Studio exposes a local API that mimics the OpenAI format, so you can call it from Python the same way:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:1234/v1",
    api_key="not-needed"  # LM Studio doesn't require auth locally
)

response = client.chat.completions.create(
    model="qwen3-14b",  # whatever model you've loaded in LM Studio
    messages=[{"role": "user", "content": "Hello, what can you do?"}]
)

print(response.choices[0].message.content)

The openai library here is just a convenient HTTP client — you’re not actually talking to OpenAI. You’re talking to a model running on your own machine.

Common stumbling block: LM Studio’s server needs to be running and a model needs to be loaded before your script will work. The error message when it’s not running is a bit cryptic (ConnectionRefusedError or similar) — if you see that, it just means you didn’t start the server yet.

Making the Output Actually Readable

Once you’re getting responses back, the next temptation is to do something with them in your terminal – loop through results, display analysis, format comparisons. The default print() approach quickly gets messy.

My namesake, the rich library is a revelation here (how nice to have a Python library named after me). It adds colour, formatting, tables, and syntax highlighting to terminal output with almost no effort:

pip install rich
from rich.console import Console
from rich.markdown import Markdown

console = Console()

response_text = client.text_generation("Write a haiku about Python.")
console.print(Markdown(response_text))

If the model returns markdown (which most do), rich will render it beautifully right in your terminal. Headers, code blocks, bold text — all of it. This is genuinely transformative for readability when you’re doing exploratory work.

Don’t Stop at Chat: Sentence Transformers Are Worth Knowing About

Here’s where it gets interesting for researchers and linguists in particular. Large language models are great for generation — producing text, summarising, answering questions. But there’s a whole other class of model designed for understanding text semantically: sentence transformers.

The Sentence Transformers library (also called sbert) lets you turn text into numerical vectors that capture meaning. Two sentences that mean the same thing will have vectors that are close together; two unrelated sentences won’t. This is called a semantic embedding.

Why does this matter? A few examples:

  • Corpus linguistics for semantics: Automatically cluster dialect examples by semantic similarity rather than just keyword matching
  • Research assistants: Find the most relevant papers or passages from a large collection based on meaning, not just exact words
  • Teaching tools: Build a quiz that detects when a learner’s answer is semantically equivalent to the model answer, even if the wording is different
pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
    "The dialect features of the Black Country are highly distinctive.",
    "Black Country speech has unique phonological characteristics.",
    "The weather in Edinburgh is famously miserable."
]

embeddings = model.encode(sentences)
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.3f}")  # will be high

This runs entirely locally (the model downloads once and caches), is fast even on a modest laptop, and opens up a whole world of computational approaches to language that go well beyond chatting with an LLM.

Getting Set Up: The Boring-but-Important Bit

Beyond that, there are just a few things I’ve learnt from my initial tinkerings that will save you headaches.

Use a virtual environment. Every time. Before you install anything for a new project, do:

python -m venv venv
source venv/bin/activate  # on Mac/Linux
venv\Scripts\activate     # on Windows

This keeps your project’s dependencies isolated and prevents the infuriating “but it worked yesterday” problem where one project’s libraries silently break another’s.

Keep API secrets out of your code. Don’t paste your Hugging Face token directly into a script you might share or commit to GitHub. Use a .env file and the python-dotenv library:

pip install python-dotenv
# .env file (this file stays off GitHub — add it to .gitignore)
HF_TOKEN=hf_your_token_here
# your script
from dotenv import load_dotenv
import os

load_dotenv()
token = os.getenv("HF_TOKEN")

Read error messages. This sounds obvious, but: most Python errors from LLM libraries tell you exactly what went wrong. A 401 means authentication failed (wrong or missing token). A 503 means the model is loading on the server side – wait a moment and retry. A ConnectionRefusedError from a local API almost always means LM Studio’s server isn’t running.

What Next?

Once you’ve got a basic script running, the natural next steps are:

  • Build a simple chat loop that keeps track of conversation history and lets you have a back-and-forth with a model
  • Experiment with system prompts to give the model a persona or set of instructions
  • Try different models on the same prompts and compare the results – it’s illuminating
  • Start combining LLMs with sentence transformers for retrieval-augmented approaches where you search a corpus semantically before feeding results to a generative model

The Python AI ecosystem is genuinely exciting right now, and the barrier to entry has never been lower. You don’t need a GPU, you don’t need a cloud account, and you don’t need to be a professional developer. You just need an afternoon and a bit of curiosity.

Have questions or want to share what you built? Drop a comment below.

OCR for Historical Newsprint: Four Models Worth Running Locally in LM Studio

If you work with scanned, typeset documents from archives like the British Newspaper Archive, you will know the frustration of running standard OCR tools on material they were never really designed for: degraded print, Victorian column layouts, eccentric typography, and occasionally deliberate non-standard spelling.

You can leverage the power of local AI models, however, to automate this process, and with free inferencing software like LM Studio, the learning curve isn’t at all steep. Below, I take a look at four specialist OCR models you can run entirely locally using the package – and why you might prefer doing so over handing your documents to a web service.

Why Run OCR Locally?

There are some truly excellent web-based OCR services. There’s Transkribus, for instance, which is widely used in the academic community. Tool like this are powerful and convenient, but they come with some real trade-offs:

  • Privacy: Your document images leave your machine and are processed on someone else’s server. For sensitive archival material or unpublished research corpora, that matters.
  • Cost at scale: Processing hundreds or thousands of newspaper pages through a paid API adds up quickly.
  • No customisation: Cloud OCR engines don’t always offer many pipeline options. You cannot instruct them to preserve dialect spellings, flag ambiguous characters, or respect the orthographic conventions of a specific historical variety of English.
  • Reproducibility: Web services update their models silently. A corpus processed in 2024 may produce different output if you re-run it in 2026. A local model stays consistent – important for methodological reproducibility.

Running OCR-trained models in an inferencing software like LM Studio removes most of this friction. The program handles multiple model download and management through a clean interface, and also allows you to customise model settings, up to the inclusion of system prompts that persist across sessions. For historical document work, that means you can instruct the model once about the linguistic conventions of your material and have it apply those rules to every page you send it.

The Four Models

1. OLMOCR 2 (7B) — Best Overall for Documents

Developed by the Allen Institute for AI (Ai2), olmOCR 2 is built on Qwen2.5-VL-7B-Instruct and fine-tuned using reinforcement learning with unit-test rewards specifically targeting document OCR tasks. It is one of the few models designed from the ground up for this use case rather than adapted from a general vision assistant.

Size: 7 billion parameters. Available as a ~4.7 GB GGUF (Q4 quantisation) or ~8.85 GB at Q8. Needs around 5–10 GB RAM depending on quantisation.

Why it works for newspaper archives: Handles multi-column layouts, mixed content (tables, headings, body text), and degraded print reliably. Scores 82.4 on olmOCR-Bench. It responds well to system prompt instructions, making it a strong candidate for dialect-preservation workflows.

LM Studio: There’s a GGUF in the native catalogue – search and download directly in the app.
🔗 lmstudio.ai/models/allenai/olmocr-2-7b-1025

✅ Pros: Best-in-class document OCR accuracy; strong layout understanding; instruction-following is reliable; native LM Studio support.
❌ Cons: 7B means slower inference on modest hardware; not ideal for rapid bulk processing.

2. NANONETS-OCR-S — Clean Catalogue Option

Developed by Nanonets, a document AI company, this model is also based on the Qwen2.5-VL architecture but fine-tuned specifically on structured document extraction tasks including forms, invoices, and archival print.

Size: Approximately 7B parameters, similar footprint to olmOCR 2. Available directly via the LM Studio model catalogue as a GGUF.

Why it works for newspaper archives: Strong on structured layout extraction and clean Markdown output. Useful when you want transcription that preserves document structure (headings, columns, captions) as well as raw text.

LM Studio: Native catalogue – findable by searching “Nanonets” in the model browser.
🔗 lmstudio.ai/models (search: Nanonets-OCR-s)

✅ Pros: Easy one-click setup; good structural output; reliable on clean and moderately degraded scans.
❌ Cons: Less tested on heavily damaged historical material than olmOCR 2; similar hardware demands.

3. DOTS.OCR (1.7B) — Best for Complex Column Layouts

Released by Rednote (小红书) in late 2025, dots.ocr is a compact 1.7B vision-language model that combines layout detection and text recognition in a single pass. Unusually for its size, it explicitly predicts reading order — the sequence in which text blocks should be read — which is critical for Victorian newspaper pages where columns can be irregular and text wraps around illustrations.

Size: 1.7 billion parameters; approximately 2 GB as a GGUF. Runs comfortably on 3 GB VRAM.

Why it works for newspaper archives: Reading order prediction alone makes it worth considering for multi-column broadsheet layouts. Supports over 100 languages, outputs JSON, Markdown, or HTML, and benchmarks show Table TEDS accuracy of 88.6% — ahead of Gemini 2.5 Pro on that metric.

LM Studio: Load via HuggingFace GGUF import (paste the HuggingFace URL into LM Studio’s search bar).
🔗 huggingface.co/dotsdocx/dots.ocr-1.7B-GGUF

✅ Pros: Tiny footprint; reading order detection; fast; strong on multi-column layouts; multilingual.
❌ Cons: Smaller context window means system prompts may drift on very long sessions; can hallucinate on heavily degraded scans; not in the native LM Studio catalogue.

4. GLM-OCR (0.9B) — Best for Bulk Processing on Modest Hardware

Released by Z.ai (Zhipu AI) in early 2026, GLM-OCR is built on the GLM-V encoder–decoder architecture and fine-tuned exclusively for OCR. At under 1 billion parameters it is the smallest model here, yet it scores 94.0 on OCRBench and 93.96% Table TEDS accuracy – results that comfortably outperform much larger general-purpose models.

Size: 0.9 billion parameters; approximately 1 GB quantised (Q8). Needs under 1.5 GB VRAM – it will run on almost any laptop made in the last five years.

Why it works for newspaper archives: Speed and low resource use make it ideal for processing large batches of pages. It is not a chat model — it takes an image and outputs text, triggered by the phrase Text Recognition: — so it is best suited to pure transcription pipelines rather than interactive use.

LM Studio: Load via HuggingFace GGUF import using the ggml-org GGUF repository.
🔗 huggingface.co/ggml-org/GLM-OCR-GGUF

✅ Pros: Tiny; fast; runs on minimal hardware; excellent accuracy for its size; good for bulk workflows.
❌ Cons: Not a chat/instruction model — no system prompt support for dialect customisation; requires a separate layout detection step for complex multi-column pages; not in the native LM Studio catalogue.

Quick Comparison

Model Size (GGUF) VRAM LM Studio Route Best For
olmOCR 2 (7B) ~4.7 GB 5 GB+ Native catalogue Best accuracy, complex layouts, dialect workflows
Nanonets-OCR-s ~4.7 GB 5 GB+ Native catalogue Structured document extraction, clean output
dots.ocr (1.7B) ~2 GB 3 GB HuggingFace GGUF import Multi-column layouts, reading order, low VRAM
GLM-OCR (0.9B) ~1 GB <1.5 GB HuggingFace GGUF import Bulk processing, minimal hardware

A Practical Workflow for Newspaper Archives

For a large corpus like material from the British Newspaper Archive, a two-tier approach works well. Use GLM-OCR for the bulk of clean, well-preserved pages – it is fast and accurate enough for standard 20th-century newsprint. Then escalate difficult pages (damaged, illegible columns, unusual typefaces, pre-1880 material) to olmOCR 2 for a more careful second pass. If column order is scrambling your output, switch to dots.ocr for those pages specifically.

For dialect writing research – where you need the transcription to preserve non-standard spellings rather than silently normalise them – load olmOCR 2 or Nanonets-OCR-s and write a system prompt that explicitly instructs the model to treat all orthographic choices as intentional. That single step does something no traditional OCR engine is capable of: it makes the tool linguistically aware of your material.

All four models run fully offline once downloaded. No subscription, no API key, no usage limits — just your hardware and your documents.

The GLM-OCR model running in LM Studio, transcribing a 19th-century newspaper article

The GLM-OCR model running in LM Studio, transcribing a 19th-century newspaper article

Escaping the Ecosystem : AI Edition

We live in such unexpected, shifting, fracturing geopolitical times just now. A stability taken for granted for decades no longer seems a given. So much so, that many have begun to question the global tech ecosystem we are embedded in, considering the safety of our data and workflows, and seeking less exposed, closer-to-home alternatives.

It’s something we can explore without straying into conspiracy territory, and it goes beyond data security. Tech writer Cory Doctorow has written at length on the downsides to walled garden platforms that make leaving costs high while degrading (or enshittifying – Macquarie Dictionary’s 2024 word of the year) their services. Linguaphiles should know – our own beloved Duo is one of them. It’s a compelling argument, and one that national consumer protection agencies are starting to incorporate into policy. The notion that we can take meaningful steps to decouple from tech monopolies is beginning to take hold.

Ecosystem creep : AI

This leads us to AI firms – arguably the fastest growing of tech behemoths, whose services nonetheless are working their way into many of our workflows. It’s not all doom and gloom here, though; Anthropic in particular has emerged as one US company willing to stand up for an ethical stance in the field.

That said, most European LLM traffic still goes down that American route, collecting on servers users’ states have no jurisdiction over. Users come to rely more and more on these services for key elements of their day-to-day, although have little control over their place in that ecosystem.

So what to do? LLMs are incredibly useful tools for a number of creative applications. For language teachers, they are particularly good at creating authentic-sounding materials for worksheets. In fact, I’ve often argued that LLMs are a tech almost tailor-made for language learning and teaching – in few other fields is the language structure more important than the actual content! They’re genuinely brilliant at creating copy, often highly nuanced, for learning.

AI Swaps

Well, one quick and easy swap is Le Chat by French AI company Mistral. It’s a ‘full fat’ LLM on a par with the big US names, running your prompts remotely on a multi-billion parameter model. Not so remote, though – their server activity remains within EU jurisdiction.

Then, of course, there is the ‘peak privacy’ option – running your own LLM. That’s a lot easier than it sounds, thanks to easy-setup software like LM Studio or Ollama (both US-based projects, but run locally on your own machine). Install, download a model, and prompt away. While few (to no) people will have the hardware to run full-sized LLMs, small models are getting better and better, rivalling the biggies for everyday use.

Google’s Gemma 4 is a case in point, a new small model (you can get a sub-20gb version) achieving some really impressive benchmark scores. Multi-language support is one of its strengths, and believe me, it does more than a good enough job of worksheet authoring and lesson planning. And it comes with an extra ‘externalities’ bonus, too – the only energy it’s using it your laptop battery, rather than spinning up some red-hot servers on a remote farm somewhere.

That has to be a win-win – using open source releases from the industry leads, without getting trapped inside the matrix.

We may have little control over geopolitics. But there are always choices when it comes to our exposure to it in the tech we use. I’m working on a list of these swaps as part of my own digital hygiene plan, and hope to share much more of this in coming weeks!

Speaking dialect in the mountains

AI and Dialect: Can You Nudge a Language Model Off the Standard Path?

One of my quiet worries about the AI revolution in language is standardisation. When a model trained on billions of documents produces text, it tends to sand everything down to some kind of weighted average – grammatically impeccable, register-appropriate, but utterly devoid of local colour. And where’s the fun in that? You know my thoughts on dialect and language learning.

But here’s the thing: despite the odds, the standard grey slop isn’t a mathematical inevitability. Large Language Models are trained on billions of words. And that material will include plenty of non-standard language, too, simply by dint of the extraordinarily broad sweep of the training brush. Dialect literature, forum posts, subtitles, transcripts – there’s room for non-standard, more colloquial language in every context of the model’s background.

Which raises an interesting question: can we nudge AI output away from the standard, and towards something more authentically regional?

I’ve been playing around with exactly this, and the short answer is: yes – and it’s surprisingly decent at it.

Stepping off the well-trodden path

I’ve said from the start that AI is built for language learning content creation. There aren’t many fields where content accuracy doesn’t matter too much – as long as the language is correct and authentic-sounding. One of my go-to moves is pushing the LLM for a more colloquial register. Standard written language is fine for reading comprehension, but if you’re teaching people how to actually speak, you want contractions, ellipsis, informal vocabulary – the stuff that sounds like a real person. AI handles this pretty well once you prompt for it explicitly.

Create a model German dialogue on the topic “Reality TV”. The language should be contemporary, colloquial and natural-sounding German.

Claude did a pretty good job of this:

  • Lena: Ey, hast du letzte Woche „Love Island” geschaut? Ich konnte gar nicht aufhören – ich war bis halb zwei wach.
  • Kai: Natürlich hab ich das geschaut – wär ich ja bescheuert, das zu verpassen. Diese Tiara und Marco… ich meine, komm schon. Die spielen da nur rum.
  • Lena: Genau! Marco tut total so, als wäre er an ihr interessiert, aber das ist doch alles für die Kamera. Ich find das so zum Fremdschämen.
  • Kai: Ja, aber genau deswegen schaust du’s doch! Gib’s zu. Du liebst den Drama.
  • Lena: Okay, okay, ich geb’s zu. Aber ich find’s trotzdem fragwürdig, wie die Leute da so bloßgestellt werden. Manche wirken echt überfordert.
  • Kai: Absolut. Ich glaub, vielen ist nicht bewusst, wie sehr die Produktion das alles lenkt. Die schneiden das so zusammen, dass du denkst, die Person ist voll die Zicke – aber vielleicht war das gar nicht so gemeint.
  • Lena: Stimmt schon. Und dann wird die Person im Netz fertiggemacht. Ich hab mal gelesen, dass einige danach richtig psychische Probleme hatten.
  • Kai: Das ist so traurig. Und trotzdem melden sich jedes Jahr tausende Leute an. Ich versteh’s irgendwie – du willst berühmt werden, schnelles Geld, Follower… Der Lockruf ist halt stark.
  • Lena: Stimmt. Und seien wir ehrlich – wir beide gucken’s ja auch. Also sind wir Teil des Problems, oder?
  • Kai: Pff… ich würd sagen: wir sind Teil des Unterhaltsystems. Klingt besser. Jetzt mach die nächste Folge an.

Some great non-standard, everyday features in there. Colloquial contractions like -e dropping on verbs, decent use of particles like halt – not a bad job at all. Claude even threw in a bunch of usage notes without asking, too.

But colloquial is still, broadly, floating just beneath a standard umbrella. What if we go further?

Adding Local Flavour

When I’m probing the edges of what AI can do, I like to start with varieties I can actually judge. Austria is my German-speaking region of choice in this regard, being my year abroad destination of old – I still have a huge soft spot for Kärnterisch. So here’s the tweak:

Create a model German dialogue on the topic “Reality TV” between two young Austrian speakers from Kärnten. The language should be contemporary, colloquial and natural-sounding Austrian German.

Here’s what came back:

  • Magdi: Oida, hast du gestern „Love Island” g’schaut? I kann nimma! Die Lea hat den Stefan komplett ausg’spielt, echt a Wahnsinn.
  • Tobi: Na freilich, i schau des jo seit Wochn! Aber geh, die Lea, die tuat doch nur so. Die weiß ganz genau, wos sie macht — des is alles g’spield.
  • Magdi: Ja eh, ober trotzdem — der Stefan is so a Trottel! Wie kann man so blöd sein, gell? Der glaubt ihr jo alles.
  • Tobi: Najo, i mein, de san jo auch so a bissi animiert zum Streiten, oder? Die Produzenten woin das jo so. Ohne Zoff läuft des ned.
  • Magdi: Des stimmt scho, ober i find des trotzdem scheißn. Die tean so, ois ob’s echt wär, und dann geht’s nur ums Follower kriegen. Danoch san alle auf Instagram mit Werbung zuagekleistert.
  • Tobi: Haha, stimmt! Drei Wochen nach der Sendung verkaufen’s alle irgendwelche Protein-Shakes. Gott sei Dank ned mia. [lacht]
  • Magdi: Ober weißt wos? I schau’s trotzdem weiter. I bin schon voll drin, i muss wissen, ob der Kevin die Kathi kriegt oder ned.
  • Tobi: Na sag i dir: die kriegen sich ned. Die Kathi is viel zu schlau für ihn. Die geht da raus, macht a paar Storys, und dann war’s des.
  • Magdi: Oida, du bist so a Zyniker! Vielleicht is es jo echt?
  • Tobi: Magdi. Des is Reality TV. Des is so echt wie mei Oma beim Wrestling. [grinst]
  • Magdi: Haha, okay okay. Ober schauen tust’s trotzdem, gell?
  • Tobi: … Ja. Natürli. Um neun bin i dabei. Sag’s ned weiter.

If you’ve spent any time in Austria, this is probably just as joyful to you as it is to me. It’s added in so much southern colour, from the tag particle ‘gell’ to the reduced i from ich and past participles in g’. There’s even some fun wordplay in there (as real as my gran at wrestling!), so it’s not only dialect, but character. It’s laid it on thick!

There’s an element of pastiche to this, for sure. Nobody will claim this is indistinguishable from an authentic speaker of Klagenfurt. But it’s genuinely not bad, especially if you’re after a solid introduction to local speech features.

And then there’s Trøndersk…

That’s all very well for ‘mainstream’ foreign languages like German. To push the envelope a little more, it’s worth playing around with some lesser-learnt languages – Norwegian is always a good choice.

For my Norwegian test, I went with Trøndersk, the dialect cluster associated with the Trondheim region and much of central Norway. It’s one of the most immediately recognisable Norwegian dialects, with distinctive pitch accent patterns, characteristic -a verb endings, and a few phonological quirks that mark it out clearly from the Oslo Bokmål you’ll meet in textbooks.

How good a job would Claude do of this challenge?

  • Marte: Oi, har du sett at det e ny sesong av Farmen nå? Æ begynte å se på den i går kveld og klarte itj å slutt.
  • Jonas: Åja? Æ har itte sett mye reality-greier på ei stund, men Farmen e jo noe anna. Det e liksom ordentlig underholdning, da.
  • Marte: Ja, det e noko med den serien! Og den nye dama fra Steinkjer – ho e jo helt vill. Ho klaga på absolutt alt og alle og gjor ingenting sjæl.
  • Jonas: Haha, klassisk. Det e alltid nåkka en sånn type på realitygreier. Produsenten e nødt til å ha med dem, skjønner du? Ellers e det jo kjedelig.
  • Marte: Jaja, men det e jo litt ekkelt å innrømme at man liker det, liksom. Æ satt der og ropte på skjermen som et idiot.
  • Jonas: Haha! Ka sa du da?
  • Marte: Noe sånt som «koffor gjør’u det, din tulling!» Men mora mi satt rett ved sia og hun så bare på mæ som om æ var gal.
  • Jonas: Uff da, haha. Ka syns ho om programmet da?
  • Marte: Ho e heilt avhengig sjæl! Det e liksom heime hos oss at hele familien samles rundt skjermen på tirsdager. Det har vorre sånn siden æ va lita.
  • Jonas: Okei men ka med all den andre reality-søpla da? Eks på stranda og sånt? Det e jo bare tull, syns æ.
  • Marte: Ja nei, det gidder itj æ. Det e jo bare folk som vil bli kjendis og som itte har nåkka å by på. Farmen e annleis fordi det faktisk krev noe.
  • Jonas: Sant nok. Kameraten min i Ålesund e helt hekta på Paradise Hotel derimot. Han ser på det heile tida.
  • Marte: Seriøst? Koss klare han det? Æ holdt ut to episoder og ga opp. Alle folkan der oppe e jo helt… æ veit itte. Tomme, liksom.
  • Jonas: Haha, han sier det e «sosialt eksperiment». Æ tror bare han har lyst på drama heime fra sofaen sjæl.
  • Marte: Hahaha! Ja det e nok det. Uansett — skal du se Farmen med mæ på fredag? Vi kan bestille pizza og le av alle.
  • Jonas: Ja, det høres bra ut! Men du bestemme pizza-topping. Sist gang valde du masse squash og det va itte greit.
  • Marte: Oi, squash e jo godt da! Men okei, du får velge. Bare itte ananas.
  • Jonas: Deal. Vi ses fredag!

As far as I can tell – and I’m not so well placed here, I must admit – it’s done a solid job. The k- question words (rather than hv-), the reduced æ for jeg… Even the cultural references, selecting two actual Norwegian shows for the exchange. I’d trust this as a bit of pre-Trondheim trip prep!

That said, if you speak a Trøndersk dialect, what’s the verdict? Does this read as a reasonable attempt, or is it giving you second-hand embarrassment? I’d genuinely love to know. (Drop a comment below, or find me wherever I am on the internet this week.)

AI and Dialect for Language Learning

AI platforms clearly have some representation of dialect. Awareness is the wrong word for it – anthropomorphising LLMs and all that – but you know what I mean. The patterns are in the training data; the question is whether a prompt can reliably surface them, or whether the model’s default pull towards the standard is too strong.

The two languages I explored here seem to work well. The pull to parody is always something to watch, particularly when written material in a dialect tends to be humorous, rather than in-community. And rarer or more exclusively spoken varieties will no doubt be much more troublesome. But the more linguistic scaffolding you give the model – the more you treat prompting as a kind of sociolinguistic briefing – the better the output is likely to get.

That standard grey slop, it seems, has more texture than it first appears – you just have to know where to poke.

Diffuse squares

SingaKids: A Glimpse of Where Multimodal AI Tutoring May Be Headed

A recent pre-print on SingaKids, a multilingual multimodal tutoring system for young learners, offers an interesting look at how AI-supported language learning is evolving. You can read the paper here: SingaKids: A Multilingual Multimodal Dialogic Tutor for Language Learning.

Designed for early primary classrooms, SingaKids is an AI-based system that uses picture-description tasks as the basis for spoken interaction. It combines dense image captioning, multilingual speech recognition, a dialogue model tuned with pedagogical scaffolding, and child-friendly text-to-speech. The system works in English, Mandarin, Malay, and Tamil, with extra attention paid to the lower-resource languages to improve recognition and generation quality.

Flexible Scaffolding

Something that stood out to me in particular was the system’s focus on scaffolding rather than straightforward correction. That approach is flexible; depending on a child’s response, the system shifts between prompts, hints, explanations, and more structured guidance. Higher-performing learners are pushed towards fuller reasoning; less confident learners get clearer cues and more supportive turns. It’s a step away from the rigid “question–answer–score” pattern and closer to the texture of real classroom dialogue.

Although the work is aimed at children, several ideas have wider implications for the rest of us. Picture-guided dialogue isn’t new in ‘grown-up’ resources – think Rosetta Stone, for instance. But it could easily support adult learners practising free production in AI tools, too. Improved multilingual ASR – especially for hesitant, accented, or code-switched speech – would benefit almost every speaking-practice tool. And the flexible scaffolding approach hints at future e-tutors that adapt to the learner’s behaviour dynamically, rather than funnelling everyone down the same path.

The project sits firmly in the research space, but it points towards what the next generation of tools may look like: multimodal, context-aware systems that don’t just respond to learners but actively guide, prompt, and adjust. For anyone keeping an eye on developments in educational AI, it’s a nice indication of the direction of travel (and I’m probably a wee bit envious of those kids getting a chance to try it first!).

Perplexity Tasks for Language Learners

AI techniques to support language learning are pretty well-known now. From structured conversation partners to resource creators, LLM platforms have been embraced by the polyglot community.

Like many of us, I dip in and out of them almost unthinkingly now. Often, I’ll snap in a page from a chapter I’m working on with my Greek teacher, and it’ll help me prepare ahead of a lesson. Sometimes, I’ll get it to reel off a list of useful phrases on a topic I’m studying. LLMs can make great worksheet creators, too. In many ways, it’s simply a very interactive reference tool, giving (mostly) reliable answers but with a big nod to context.

I’d been pretty dogged in my choice of platform, sticking for the most part with ChatGPT Plus. Claude and Gemini were also in the mix, alongside some fun running local models. But for the most part, I thought my tool choices were pretty settled.

But then I gave Perplexity a whirl.

Perplexity – Task Master

Perplexity isn’t an LLM in the sense that ChatGPT, Gemini and Claude are. It uses LLM technology. But it’s actually more of an intelligent, context-sensitive search tool, that uses natural language APIs to turbo-boost its web-hunting activities.

I’d clearly not found that prospect very exciting, as I’d not gone near it until now. But thanks to a bundled free upgrade, I got to try the premium tier of late. And one particular feature stands out as potentially transformative for my learning habits: Perplexity Tasks.

Tasks are scheduled searches you set up with natural language instructions. And those instructions can be as rich as your usual LLM prompts in terms of requested formatting and such like, so in essence, you can build regular bulletins with up-to-date information in any language you like. Take one of mine, that runs daily:

Search the global news for the biggest world news story of the day. Summarise it in French, German, Modern Greek, Polish, Scottish Gaelic and Swahili at a level appropriate for an intermediate learner, ensuring that the translation is of the highest, native speaker standard quality, idiomatic and natural-sounding. Summaries should be 3-4 sentences long. Highlight key words in bold.

Accompany each summary text with a glossary / vocabulary list detailing all the key / difficult words from it in dictionary format (listing word class, irregular parts if applicable etc.). Hyperlink glossary items to Wiktionary entries where available with further information on them (use the English version en.wiktionary.com).

Lay it all out neatly to make it easy on the eye. Use plenty of emojis for impact too. Make this a fabulous resource for polyglot language learning! 🌍

Now, every morning, I get a wee news digest emailed straight to my inbox in multiple languages. It’s learner-friendly, includes vocab support, and gives me something to talk about in my language meets and lessons. I’ve done the same for academic paper searches in linguistics, and stories on dialect appearing in news outlets.

It feels like a proper game changer!

Tasking on Other Platforms

Now, you don’t need Perplexity to do this – it’s just one of the most user-friendly ways I’ve found to do it. If you have ChatGPT,  check out Scheduled Tasks. In Gemini, Scheduled Actions will do the trick for Pro members. Copilot is in on the game too. Others will no doubt follow suit shortly – clearly, task scheduling is becoming one of those features AI platforms are expected to have.

What I like about Perplexity, though, is that its whole raison d’être is the search – it feels particularly suited to web-based tasks like news digests. It’s also quite nice to keep the separation between my everyday LLM ramblings, and my more structured, scheduled items (use it for a few weeks and you’ll have clogged your timeline up with dozens of chats!).

If you’ve been looking for a way to make AI genuinely work for your learning rather than distract from it, try setting up a task or two – you might just find it becomes part of your morning ritual as well.

ElevenLabs Hits the Right Note: A.I. Songwriting for Language Learners

In case you missed it, A.I. text-to-speech leader ElevenLabs is the latest platform to join the generative music scene – so language learners and teachers have another choice for creating original learning songs.

ElevenLabs’ Creative Platform ElevenMusic takes a much more structured approach to music creation that other platforms I’ve tried. Enter your prompt (or full lyrics), and it will build a song from block components – verse, chorus, bridge – just as you might construct one as a human writer. It makes for a much more natural-sounding track.

ElevenLabs music creation

ElevenLabs music creation

As you’d expect from voice experts ElevenLabs, the service copes with a wide range of languages and the diction is very convincing. A tad more so, I think, than the current iteration of the first big name on the block, Suno AI. No doubt the latter will have some tricks up its sleeve to keep up the pace – but for now, ElevenLabs is the place to go for quick and catchy learning song.

Anyway, here’s one I made earlier – a rather natty French rock and roll song about the Moon landings. Get those blue suede Moon boots on!

It’s definitely worth having a play on the site to see what you can come up with for you or your classes. ElevenLabs has a free tier, of course, so you can try it out straight away. [Note: that’s my wee affiliate link, so if you do sign up and hop on a higher tier later, you’re helping keep Polyglossic going!]

A swirl of IPA symbols in the ether. Do LLMs 'understand' phonology? And are they any good at translation?

Tencent’s Hunyuan-MT-7B, the Translation Whizz You Can Run Locally

There’s been a lot of talk this week about a brand new translation model, Tencent’s Hunyuan-MT-7B. It’s a Large Language Model (LLM) trained to perform machine translation. And it’s caused a big stir by beating heftier (and heavier) models by Google and OpenAI in a recent event.

This is all the more remarkable given that it’s really quite a small model by LLM standards. Hunyuan actually manages its translation-beating feat packed into just 7 billion parameters (the information nodes that models learn from). Now that might sound a lot. But fewer usually means weaker, and the behemoths are nearing post-trillion param levels already.

So Hunyuan is small. But in spite of that, it can translate accurately and reliably – market-leader beatingly so – between over 30 languages, including some low-resource ones like Tibetan and Kazakh. And its footprint is truly tiny in LLM terms – it’s lightweight enough to run locally on a computer or even tablet, using inference software like LMStudio or PocketPal.

The model is available in various GGUF formats at Hugging Face. The 4-bit quantised version comes in at just over 4 GB, making it iPad-runnable. If you want greater fidelity, then 8-bit quantised is still only around 8 GB, easily handleable in LMStudio with a decent laptop spec.

So is it any good?

Well, I ran a few deliberately tricky English to German tasks through it, trying to find a weak spot. And honestly, it’s excellent – it produces idiomatic, native-quality translations that don’t sound clunky. What I found particularly impressive was its ability to paraphrase where a literal translation wouldn’t work.

There are plenty of use cases, even if you’re not looking for a translation engine for a full-blown app. Pocketising it means you have a top-notch multi-language translator to use offline, anywhere. For language learners – particularly those struggling with the lower-resource languages the model can handle with ease – it’s another source of native-quality text to learn from.

Find out more about the model at Hugging Face, and check out last week’s post for details on loading it onto your device!

Ultra-Mobile LLMs : Getting the Most from PocketPal

If you were following along last week, I was deep into the territory of running open, small-scale Large Language Models (LLMs) locally on a laptop in the free LMStudio environment. There are lots of reasons you’d want to run these mini chatbots, including the educational, environmental, and security aspects.

I finished off with a very cursory mention of an even more mobile vehicle for these, PocketPal. This free, open source app (available on Google and iOS) allows for easy (no computer science degree required) searching, downloading and running LLMs on smartphones and tablets. And, despite the resource limitations of mobile devices compared with full computer hardware, they run surprisingly well.

PocketPal is such a powerful and unique tool, and definitely worth a spotlight of its own. So, this week, I thought I’d share some tips and tricks I’ve found for smooth running of these language models in your pocket.

Full-Fat LLMs?

First off, even small, compact models can be (as you’d expect) unwieldy and resource-heavy files. Compressed, self-contained LLM models are available as .gguf files from sources like Hugging Face, and they can be colossal. There’s a process you’ll hear mentioned a lot in the AI world called quantisation, which compresses models to varying degrees. Generally speaking, the more compression, the more poorly the model performs. But even the most highly compressed small models can weigh in at 2gb and above. After downloading them, these mammoth blobs then load into memory, ready to be prompted. That’s a lot of data for your system to be hanging onto!

That said, with disk space, a good internet connection, and decent RAM, it’s quite doable. On a newish MacBook, I was comfortably downloading and running .gguf files 8gb large and above in LMStudio. And you don’t need to downgrade your expectations too much to run models in PocketPal, either.

For reference, I’m using a 2023 iPad Pro with the M2 chip – quite a modest spec now – and a 2024 iPhone 16. On both of them, the sweet spot seems to be a .gguf size of around 4gb – you can go larger, but there’s a noticeable slowdown and sluggishness beyond that. A couple of the models I’ve been getting good, sensible and usable results from on mobile recently are:

  • Qwen3-4b-Instruct (8-bit quantised version) – 4.28gb
  • Llama-3.2-3B-Instruct (6-bit quantised version) – 3.26gb

The ‘instruct’ in those model names refers to the fact that they’ve been trained to follow instructions particularly keenly – one of the reasons they give such decent practical prompt responses with a small footprint.

Optimising PocketPal

Once you have them downloaded, there are a couple of things you can tweak in PocketPal to eke out even more performance.

The first is to head to the settings and switch on Metal, Apple’s hardware-accelerated API. Then, increase the “Layers on GPU” setting to around 80 or so – you can experiment with this to see what your system is happy with. But the performance improvement should be instantaneous, the LLM spitting out tokens at multiple times the default speed.

What’s happening with this change is that iOS is shifting some of the processing from the device’s CPU to the GPU, or graphical processing unit. That may seem odd, but modern graphics chips are capable of intense mathematical operations, and this small switch recruits them into doing some of the heavy work.

Additionally, on some recent devices, switching on “Flash Attention” can bring extra performance enhancements. This interacts with the way LLMs track how much weight to give certain tokens, and how that matrix is stored in memory during generation. It’s pot luck whether it will make a difference, depending on device spec, but I see a little boost.

Tweaking PocketPal’s settings to run LLMs more efficiently

Tweaking PocketPal’s settings to run LLMs more efficiently

Making Pals – Your Own Custom Bots

When you’re all up and running with your PocketPal LLMs, there’s another great feature you can play with to get very domain-specific results – “Pal” creation. Pals are just system prompts – instructions that set the boundaries and parameters for the conversation – in a nice wrapper. And you can be as specific as you want with them, instructing the LLM to behave as a language learning assistant, a nutrition expert, a habits coach, and such like – with as many rules and output notes as you see fit. It’s an easy way to turn a very generalised tool into something focused and with real-world application.

So that’s my PocketPal in-a-nutshell power guide. I hope you can see why it’s worth much more than just a cursory mention at the end of last week’s post! Tools like PocketPal and LMStudio put you right at the centre of LLM development, and I must admit it’s turned me into a models geek – I’m already looking forward to what new open LLMs will be unleashed next.

So what have you set your mobile models doing? Please share your tips and experiences in the comments!

Small LLMs

LLMs on Your Laptop

I mentioned last week that I’m spending a lot of time with LLMs recently. I’m poking and prodding them to test their ‘understanding’ (inverted commas necessary there!) of phonology, in particular with non-standard speech and dialects.

And you’d be forgiven for thinking I’m just tapping my prompts into ChatGPT, Claude, Gemini or the other big commercial concerns. Mention AI, and those are the names people come up with. They’re the all-bells-and-whistles web-facing services that get all the public fanfare and newspaper column inches.

The thing is, that’s not all there is to Large Language Models. There’s a whole world of open source (or the slightly less open ‘open weights’) models out there. Some of them offshoots of those big names, while others less well-known. But you can download all of them to run offline on any reasonably-specced laptop.

LMStudio – LLMs on your laptop

Meet LMStudio – the multi-platform desktop app that allows you to install and interrogate LLMs locally. It all sounds terribly technical, but at its most basic use – a custom chatbot – you don’t need any special tech skills. Browsing, installing and chatting with models is all done via the tab-based interface. You can do much more with it – the option to run it as a local server is super useful for development and testing – but you don’t have to touch any of that.

Many of the models downloadable within LMStudio are small models – just a few gigabytes, rather than the behemoths behind GPT-5 and other headline-grabbing releases. They feature the same architecture as those big-hitters, though. And in many cases, they are trained to approach, even match, their performance on specific tasks like problem-solving or programming. You’ll even find reasoning models, that produce a ‘stepwise-thinking’ output, similar to platforms like Gemini.

A few recent models for download include:

  • Qwen3 4B Thinking – a really compact model (just over 2gb) which supports reasoning by default
  • OpenAI’s gpt-oss-20b – the AI giant’s open weights offering, released this August
  • Gemma 3 – Google’s multimodal model optimised for use on everyday devices
  • Mistral Small 3.2 – the French AI company’s open model, with vision capabilities

So why would you bother, when you can just fire up ChatGPT / Google / Claude in a few browser clicks?

LLMs locally – but why?

Well, from an academic standpoint, you have complete control over these models if you’re exploring their use cases in a particular field, like linguistics or language learning. You can set parameters like temperature, for instance – the degree of ‘creativity wobble’ the LLM has (0 being a very rigid none, and 1 being, well, basically insane). And if you can set parameters, you can report these in your findings, which allows others to replicate your experiments and build on your knowledge.

Small models also run on smaller hardware – so you can develop solutions that people don’t need a huge data centre for. If you do hit upon a use case or process that supports researchers, then it’s super easy for colleagues to access the technology, whatever their recourse to funding support.

Secondly, there’s the environmental impact. If the resource greed of colossal data centres is something that worries you (and there’s every indication that it should be a conversation we’re all having ), then running LLMs locally allows you to take advantage of them without heating up a server farm somewhere deep inside the US. The only thing running hot will be your laptop fan (it does growl a bit with the larger models – I take that as a sign to give it a rest for a bit!).

And talk of those US server farms leads on to the next point: data privacy. OpenAI recently caused waves with their suggestion that user conversations are not the confidential chats many assume them to be. If you’re not happy with your prompts and queries passing out of your control and into the data banks of a foreign state, then local LLMs offer not a little peace of mind too.

Give it a go!

The best thing? LMStudio is completely free. So download it, give it a spin, and see whether these much smaller-footprint models can give you what you need without entering the ecosystem of the online giants.

Lastly, don’t have a laptop? Well, you can also run LLMs locally on phones and tablets too. Free app PocketPal (on iOS and Android) runs like a cut-down version of LMStudio. Great for tinkering on the go!