Speaking dialect in the mountains

AI and Dialect: Can You Nudge a Language Model Off the Standard Path?

One of my quiet worries about the AI revolution in language is standardisation. When a model trained on billions of documents produces text, it tends to sand everything down to some kind of weighted average – grammatically impeccable, register-appropriate, but utterly devoid of local colour. And where’s the fun in that? You know my thoughts on dialect and language learning.

But here’s the thing: despite the odds, the standard grey slop isn’t a mathematical inevitability. Large Language Models are trained on billions of words. And that material will include plenty of non-standard language, too, simply by dint of the extraordinarily broad sweep of the training brush. Dialect literature, forum posts, subtitles, transcripts – there’s room for non-standard, more colloquial language in every context of the model’s background.

Which raises an interesting question: can we nudge AI output away from the standard, and towards something more authentically regional?

I’ve been playing around with exactly this, and the short answer is: yes – and it’s surprisingly decent at it.

Stepping off the well-trodden path

I’ve said from the start that AI is built for language learning content creation. There aren’t many fields where content accuracy doesn’t matter too much – as long as the language is correct and authentic-sounding. One of my go-to moves is pushing the LLM for a more colloquial register. Standard written language is fine for reading comprehension, but if you’re teaching people how to actually speak, you want contractions, ellipsis, informal vocabulary – the stuff that sounds like a real person. AI handles this pretty well once you prompt for it explicitly.

Create a model German dialogue on the topic “Reality TV”. The language should be contemporary, colloquial and natural-sounding German.

Claude did a pretty good job of this:

  • Lena: Ey, hast du letzte Woche „Love Island” geschaut? Ich konnte gar nicht aufhören – ich war bis halb zwei wach.
  • Kai: Natürlich hab ich das geschaut – wär ich ja bescheuert, das zu verpassen. Diese Tiara und Marco… ich meine, komm schon. Die spielen da nur rum.
  • Lena: Genau! Marco tut total so, als wäre er an ihr interessiert, aber das ist doch alles für die Kamera. Ich find das so zum Fremdschämen.
  • Kai: Ja, aber genau deswegen schaust du’s doch! Gib’s zu. Du liebst den Drama.
  • Lena: Okay, okay, ich geb’s zu. Aber ich find’s trotzdem fragwürdig, wie die Leute da so bloßgestellt werden. Manche wirken echt überfordert.
  • Kai: Absolut. Ich glaub, vielen ist nicht bewusst, wie sehr die Produktion das alles lenkt. Die schneiden das so zusammen, dass du denkst, die Person ist voll die Zicke – aber vielleicht war das gar nicht so gemeint.
  • Lena: Stimmt schon. Und dann wird die Person im Netz fertiggemacht. Ich hab mal gelesen, dass einige danach richtig psychische Probleme hatten.
  • Kai: Das ist so traurig. Und trotzdem melden sich jedes Jahr tausende Leute an. Ich versteh’s irgendwie – du willst berühmt werden, schnelles Geld, Follower… Der Lockruf ist halt stark.
  • Lena: Stimmt. Und seien wir ehrlich – wir beide gucken’s ja auch. Also sind wir Teil des Problems, oder?
  • Kai: Pff… ich würd sagen: wir sind Teil des Unterhaltsystems. Klingt besser. Jetzt mach die nächste Folge an.

Some great non-standard, everyday features in there. Colloquial contractions like -e dropping on verbs, decent use of particles like halt – not a bad job at all. Claude even threw in a bunch of usage notes without asking, too.

But colloquial is still, broadly, floating just beneath a standard umbrella. What if we go further?

Adding Local Flavour

When I’m probing the edges of what AI can do, I like to start with varieties I can actually judge. Austria is my German-speaking region of choice in this regard, being my year abroad destination of old – I still have a huge soft spot for Kärnterisch. So here’s the tweak:

Create a model German dialogue on the topic “Reality TV” between two young Austrian speakers from Kärnten. The language should be contemporary, colloquial and natural-sounding Austrian German.

Here’s what came back:

  • Magdi: Oida, hast du gestern „Love Island” g’schaut? I kann nimma! Die Lea hat den Stefan komplett ausg’spielt, echt a Wahnsinn.
  • Tobi: Na freilich, i schau des jo seit Wochn! Aber geh, die Lea, die tuat doch nur so. Die weiß ganz genau, wos sie macht — des is alles g’spield.
  • Magdi: Ja eh, ober trotzdem — der Stefan is so a Trottel! Wie kann man so blöd sein, gell? Der glaubt ihr jo alles.
  • Tobi: Najo, i mein, de san jo auch so a bissi animiert zum Streiten, oder? Die Produzenten woin das jo so. Ohne Zoff läuft des ned.
  • Magdi: Des stimmt scho, ober i find des trotzdem scheißn. Die tean so, ois ob’s echt wär, und dann geht’s nur ums Follower kriegen. Danoch san alle auf Instagram mit Werbung zuagekleistert.
  • Tobi: Haha, stimmt! Drei Wochen nach der Sendung verkaufen’s alle irgendwelche Protein-Shakes. Gott sei Dank ned mia. [lacht]
  • Magdi: Ober weißt wos? I schau’s trotzdem weiter. I bin schon voll drin, i muss wissen, ob der Kevin die Kathi kriegt oder ned.
  • Tobi: Na sag i dir: die kriegen sich ned. Die Kathi is viel zu schlau für ihn. Die geht da raus, macht a paar Storys, und dann war’s des.
  • Magdi: Oida, du bist so a Zyniker! Vielleicht is es jo echt?
  • Tobi: Magdi. Des is Reality TV. Des is so echt wie mei Oma beim Wrestling. [grinst]
  • Magdi: Haha, okay okay. Ober schauen tust’s trotzdem, gell?
  • Tobi: … Ja. Natürli. Um neun bin i dabei. Sag’s ned weiter.

If you’ve spent any time in Austria, this is probably just as joyful to you as it is to me. It’s added in so much southern colour, from the tag particle ‘gell’ to the reduced i from ich and past participles in g’. There’s even some fun wordplay in there (as real as my gran at wrestling!), so it’s not only dialect, but character. It’s laid it on thick!

There’s an element of pastiche to this, for sure. Nobody will claim this is indistinguishable from an authentic speaker of Klagenfurt. But it’s genuinely not bad, especially if you’re after a solid introduction to local speech features.

And then there’s Trøndersk…

That’s all very well for ‘mainstream’ foreign languages like German. To push the envelope a little more, it’s worth playing around with some lesser-learnt languages – Norwegian is always a good choice.

For my Norwegian test, I went with Trøndersk, the dialect cluster associated with the Trondheim region and much of central Norway. It’s one of the most immediately recognisable Norwegian dialects, with distinctive pitch accent patterns, characteristic -a verb endings, and a few phonological quirks that mark it out clearly from the Oslo Bokmål you’ll meet in textbooks.

How good a job would Claude do of this challenge?

  • Marte: Oi, har du sett at det e ny sesong av Farmen nå? Æ begynte å se på den i går kveld og klarte itj å slutt.
  • Jonas: Åja? Æ har itte sett mye reality-greier på ei stund, men Farmen e jo noe anna. Det e liksom ordentlig underholdning, da.
  • Marte: Ja, det e noko med den serien! Og den nye dama fra Steinkjer – ho e jo helt vill. Ho klaga på absolutt alt og alle og gjor ingenting sjæl.
  • Jonas: Haha, klassisk. Det e alltid nåkka en sånn type på realitygreier. Produsenten e nødt til å ha med dem, skjønner du? Ellers e det jo kjedelig.
  • Marte: Jaja, men det e jo litt ekkelt å innrømme at man liker det, liksom. Æ satt der og ropte på skjermen som et idiot.
  • Jonas: Haha! Ka sa du da?
  • Marte: Noe sånt som «koffor gjør’u det, din tulling!» Men mora mi satt rett ved sia og hun så bare på mæ som om æ var gal.
  • Jonas: Uff da, haha. Ka syns ho om programmet da?
  • Marte: Ho e heilt avhengig sjæl! Det e liksom heime hos oss at hele familien samles rundt skjermen på tirsdager. Det har vorre sånn siden æ va lita.
  • Jonas: Okei men ka med all den andre reality-søpla da? Eks på stranda og sånt? Det e jo bare tull, syns æ.
  • Marte: Ja nei, det gidder itj æ. Det e jo bare folk som vil bli kjendis og som itte har nåkka å by på. Farmen e annleis fordi det faktisk krev noe.
  • Jonas: Sant nok. Kameraten min i Ålesund e helt hekta på Paradise Hotel derimot. Han ser på det heile tida.
  • Marte: Seriøst? Koss klare han det? Æ holdt ut to episoder og ga opp. Alle folkan der oppe e jo helt… æ veit itte. Tomme, liksom.
  • Jonas: Haha, han sier det e «sosialt eksperiment». Æ tror bare han har lyst på drama heime fra sofaen sjæl.
  • Marte: Hahaha! Ja det e nok det. Uansett — skal du se Farmen med mæ på fredag? Vi kan bestille pizza og le av alle.
  • Jonas: Ja, det høres bra ut! Men du bestemme pizza-topping. Sist gang valde du masse squash og det va itte greit.
  • Marte: Oi, squash e jo godt da! Men okei, du får velge. Bare itte ananas.
  • Jonas: Deal. Vi ses fredag!

As far as I can tell – and I’m not so well placed here, I must admit – it’s done a solid job. The k- question words (rather than hv-), the reduced æ for jeg… Even the cultural references, selecting two actual Norwegian shows for the exchange. I’d trust this as a bit of pre-Trondheim trip prep!

That said, if you speak a Trøndersk dialect, what’s the verdict? Does this read as a reasonable attempt, or is it giving you second-hand embarrassment? I’d genuinely love to know. (Drop a comment below, or find me wherever I am on the internet this week.)

AI and Dialect for Language Learning

AI platforms clearly have some representation of dialect. Awareness is the wrong word for it – anthropomorphising LLMs and all that – but you know what I mean. The patterns are in the training data; the question is whether a prompt can reliably surface them, or whether the model’s default pull towards the standard is too strong.

The two languages I explored here seem to work well. The pull to parody is always something to watch, particularly when written material in a dialect tends to be humorous, rather than in-community. And rarer or more exclusively spoken varieties will no doubt be much more troublesome. But the more linguistic scaffolding you give the model – the more you treat prompting as a kind of sociolinguistic briefing – the better the output is likely to get.

That standard grey slop, it seems, has more texture than it first appears – you just have to know where to poke.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.