Speaking dialect in the mountains

AI and Dialect: Can You Nudge a Language Model Off the Standard Path?

One of my quiet worries about the AI revolution in language is standardisation. When a model trained on billions of documents produces text, it tends to sand everything down to some kind of weighted average – grammatically impeccable, register-appropriate, but utterly devoid of local colour. And where’s the fun in that? You know my thoughts on dialect and language learning.

But here’s the thing: despite the odds, the standard grey slop isn’t a mathematical inevitability. Large Language Models are trained on billions of words. And that material will include plenty of non-standard language, too, simply by dint of the extraordinarily broad sweep of the training brush. Dialect literature, forum posts, subtitles, transcripts – there’s room for non-standard, more colloquial language in every context of the model’s background.

Which raises an interesting question: can we nudge AI output away from the standard, and towards something more authentically regional?

I’ve been playing around with exactly this, and the short answer is: yes – and it’s surprisingly decent at it.

Stepping off the well-trodden path

I’ve said from the start that AI is built for language learning content creation. There aren’t many fields where content accuracy doesn’t matter too much – as long as the language is correct and authentic-sounding. One of my go-to moves is pushing the LLM for a more colloquial register. Standard written language is fine for reading comprehension, but if you’re teaching people how to actually speak, you want contractions, ellipsis, informal vocabulary – the stuff that sounds like a real person. AI handles this pretty well once you prompt for it explicitly.

Create a model German dialogue on the topic “Reality TV”. The language should be contemporary, colloquial and natural-sounding German.

Claude did a pretty good job of this:

  • Lena: Ey, hast du letzte Woche „Love Island” geschaut? Ich konnte gar nicht aufhören – ich war bis halb zwei wach.
  • Kai: Natürlich hab ich das geschaut – wär ich ja bescheuert, das zu verpassen. Diese Tiara und Marco… ich meine, komm schon. Die spielen da nur rum.
  • Lena: Genau! Marco tut total so, als wäre er an ihr interessiert, aber das ist doch alles für die Kamera. Ich find das so zum Fremdschämen.
  • Kai: Ja, aber genau deswegen schaust du’s doch! Gib’s zu. Du liebst den Drama.
  • Lena: Okay, okay, ich geb’s zu. Aber ich find’s trotzdem fragwürdig, wie die Leute da so bloßgestellt werden. Manche wirken echt überfordert.
  • Kai: Absolut. Ich glaub, vielen ist nicht bewusst, wie sehr die Produktion das alles lenkt. Die schneiden das so zusammen, dass du denkst, die Person ist voll die Zicke – aber vielleicht war das gar nicht so gemeint.
  • Lena: Stimmt schon. Und dann wird die Person im Netz fertiggemacht. Ich hab mal gelesen, dass einige danach richtig psychische Probleme hatten.
  • Kai: Das ist so traurig. Und trotzdem melden sich jedes Jahr tausende Leute an. Ich versteh’s irgendwie – du willst berühmt werden, schnelles Geld, Follower… Der Lockruf ist halt stark.
  • Lena: Stimmt. Und seien wir ehrlich – wir beide gucken’s ja auch. Also sind wir Teil des Problems, oder?
  • Kai: Pff… ich würd sagen: wir sind Teil des Unterhaltsystems. Klingt besser. Jetzt mach die nächste Folge an.

Some great non-standard, everyday features in there. Colloquial contractions like -e dropping on verbs, decent use of particles like halt – not a bad job at all. Claude even threw in a bunch of usage notes without asking, too.

But colloquial is still, broadly, floating just beneath a standard umbrella. What if we go further?

Adding Local Flavour

When I’m probing the edges of what AI can do, I like to start with varieties I can actually judge. Austria is my German-speaking region of choice in this regard, being my year abroad destination of old – I still have a huge soft spot for Kärnterisch. So here’s the tweak:

Create a model German dialogue on the topic “Reality TV” between two young Austrian speakers from Kärnten. The language should be contemporary, colloquial and natural-sounding Austrian German.

Here’s what came back:

  • Magdi: Oida, hast du gestern „Love Island” g’schaut? I kann nimma! Die Lea hat den Stefan komplett ausg’spielt, echt a Wahnsinn.
  • Tobi: Na freilich, i schau des jo seit Wochn! Aber geh, die Lea, die tuat doch nur so. Die weiß ganz genau, wos sie macht — des is alles g’spield.
  • Magdi: Ja eh, ober trotzdem — der Stefan is so a Trottel! Wie kann man so blöd sein, gell? Der glaubt ihr jo alles.
  • Tobi: Najo, i mein, de san jo auch so a bissi animiert zum Streiten, oder? Die Produzenten woin das jo so. Ohne Zoff läuft des ned.
  • Magdi: Des stimmt scho, ober i find des trotzdem scheißn. Die tean so, ois ob’s echt wär, und dann geht’s nur ums Follower kriegen. Danoch san alle auf Instagram mit Werbung zuagekleistert.
  • Tobi: Haha, stimmt! Drei Wochen nach der Sendung verkaufen’s alle irgendwelche Protein-Shakes. Gott sei Dank ned mia. [lacht]
  • Magdi: Ober weißt wos? I schau’s trotzdem weiter. I bin schon voll drin, i muss wissen, ob der Kevin die Kathi kriegt oder ned.
  • Tobi: Na sag i dir: die kriegen sich ned. Die Kathi is viel zu schlau für ihn. Die geht da raus, macht a paar Storys, und dann war’s des.
  • Magdi: Oida, du bist so a Zyniker! Vielleicht is es jo echt?
  • Tobi: Magdi. Des is Reality TV. Des is so echt wie mei Oma beim Wrestling. [grinst]
  • Magdi: Haha, okay okay. Ober schauen tust’s trotzdem, gell?
  • Tobi: … Ja. Natürli. Um neun bin i dabei. Sag’s ned weiter.

If you’ve spent any time in Austria, this is probably just as joyful to you as it is to me. It’s added in so much southern colour, from the tag particle ‘gell’ to the reduced i from ich and past participles in g’. There’s even some fun wordplay in there (as real as my gran at wrestling!), so it’s not only dialect, but character. It’s laid it on thick!

There’s an element of pastiche to this, for sure. Nobody will claim this is indistinguishable from an authentic speaker of Klagenfurt. But it’s genuinely not bad, especially if you’re after a solid introduction to local speech features.

And then there’s Trøndersk…

That’s all very well for ‘mainstream’ foreign languages like German. To push the envelope a little more, it’s worth playing around with some lesser-learnt languages – Norwegian is always a good choice.

For my Norwegian test, I went with Trøndersk, the dialect cluster associated with the Trondheim region and much of central Norway. It’s one of the most immediately recognisable Norwegian dialects, with distinctive pitch accent patterns, characteristic -a verb endings, and a few phonological quirks that mark it out clearly from the Oslo Bokmål you’ll meet in textbooks.

How good a job would Claude do of this challenge?

  • Marte: Oi, har du sett at det e ny sesong av Farmen nå? Æ begynte å se på den i går kveld og klarte itj å slutt.
  • Jonas: Åja? Æ har itte sett mye reality-greier på ei stund, men Farmen e jo noe anna. Det e liksom ordentlig underholdning, da.
  • Marte: Ja, det e noko med den serien! Og den nye dama fra Steinkjer – ho e jo helt vill. Ho klaga på absolutt alt og alle og gjor ingenting sjæl.
  • Jonas: Haha, klassisk. Det e alltid nåkka en sånn type på realitygreier. Produsenten e nødt til å ha med dem, skjønner du? Ellers e det jo kjedelig.
  • Marte: Jaja, men det e jo litt ekkelt å innrømme at man liker det, liksom. Æ satt der og ropte på skjermen som et idiot.
  • Jonas: Haha! Ka sa du da?
  • Marte: Noe sånt som «koffor gjør’u det, din tulling!» Men mora mi satt rett ved sia og hun så bare på mæ som om æ var gal.
  • Jonas: Uff da, haha. Ka syns ho om programmet da?
  • Marte: Ho e heilt avhengig sjæl! Det e liksom heime hos oss at hele familien samles rundt skjermen på tirsdager. Det har vorre sånn siden æ va lita.
  • Jonas: Okei men ka med all den andre reality-søpla da? Eks på stranda og sånt? Det e jo bare tull, syns æ.
  • Marte: Ja nei, det gidder itj æ. Det e jo bare folk som vil bli kjendis og som itte har nåkka å by på. Farmen e annleis fordi det faktisk krev noe.
  • Jonas: Sant nok. Kameraten min i Ålesund e helt hekta på Paradise Hotel derimot. Han ser på det heile tida.
  • Marte: Seriøst? Koss klare han det? Æ holdt ut to episoder og ga opp. Alle folkan der oppe e jo helt… æ veit itte. Tomme, liksom.
  • Jonas: Haha, han sier det e «sosialt eksperiment». Æ tror bare han har lyst på drama heime fra sofaen sjæl.
  • Marte: Hahaha! Ja det e nok det. Uansett — skal du se Farmen med mæ på fredag? Vi kan bestille pizza og le av alle.
  • Jonas: Ja, det høres bra ut! Men du bestemme pizza-topping. Sist gang valde du masse squash og det va itte greit.
  • Marte: Oi, squash e jo godt da! Men okei, du får velge. Bare itte ananas.
  • Jonas: Deal. Vi ses fredag!

As far as I can tell – and I’m not so well placed here, I must admit – it’s done a solid job. The k- question words (rather than hv-), the reduced æ for jeg… Even the cultural references, selecting two actual Norwegian shows for the exchange. I’d trust this as a bit of pre-Trondheim trip prep!

That said, if you speak a Trøndersk dialect, what’s the verdict? Does this read as a reasonable attempt, or is it giving you second-hand embarrassment? I’d genuinely love to know. (Drop a comment below, or find me wherever I am on the internet this week.)

AI and Dialect for Language Learning

AI platforms clearly have some representation of dialect. Awareness is the wrong word for it – anthropomorphising LLMs and all that – but you know what I mean. The patterns are in the training data; the question is whether a prompt can reliably surface them, or whether the model’s default pull towards the standard is too strong.

The two languages I explored here seem to work well. The pull to parody is always something to watch, particularly when written material in a dialect tends to be humorous, rather than in-community. And rarer or more exclusively spoken varieties will no doubt be much more troublesome. But the more linguistic scaffolding you give the model – the more you treat prompting as a kind of sociolinguistic briefing – the better the output is likely to get.

That standard grey slop, it seems, has more texture than it first appears – you just have to know where to poke.

Intonation adds a thousand different colours to speech. Coloured glass. Image by Simon Jackson on FreeImages.com

Intonation Training: From Yam-Yam to Yia Sou

When you meet me, one of the first things you notice is probably my accent. Despite being embedded in Scottish life for over a decade, there’s still an unmistakeable Midlands lilt that persists. The vowels have flattened out to something a little more neutral over the years, it’s true. But it’s in my intonation that you can still hear the imprint of my roots.

Midlands accents get a bad rap. Full-on Brummie, for instance, still battles to be taken seriously after years of parodies and comedy sketches. And the baggage that people attach to your variety of speech can weigh you down. That pressure is one reason many of us subconsciously begin to change our distinctive sounds when we move away from our home regions.

One thing has proven extremely resistant, though – that characteristic rise and fall, up-and-down, sing-song intonation of my West Midlands English. In the Black Country, where I grew up, that particularly strong swinging tone has given us some national fame as yam-yams (most probably from the local form “ya’m” for “you are“). The almost musical nature of it is something it has in common with certain varieties of Welsh English.

But as endearing as it can be to us locals, it can play havoc with your foreign language learning.

Intonation and Learning Foreign Languages

The reason is the same phonological interplay that anchors our foreign language speech to our native phonology. Just as much as our vowel shapes and consonant articulation, intonation is highly ingrained in our oral muscle memory.

The unwelcome interference stuck out like a sore thumb in my recent learning on the mass sentence training platform Glossika, which I’ve been using to improve my fluency in a couple of language projects. The great thing about this platform is the chance to compare your own pronunciation with native speakers’ renditions. But be prepared: it can be very revealing. I realised that my intonation in Greek – especially in questions – was completely off.

What was going on?

Well, it all comes down to my deeply rooted Midlands twang. The tendency I carry over from my own native accent is to go up at the end of a sentence. That’s not just in questions, either. If you listen to Midlands English, you might well notice that our intonation rises at the end of nearly every sentence!

Not so with Greek. Often, the intonation will fall after rising towards the end of a yes-no question. It’s a bit more complex than that, of course, and there is much more detail in studies like this one if you need the nitty gritty. But generally, it is quite a bit different from English (especially mine).

Training It Out

The solution, of course, is more of the tool that shed light on the problem. Plenty of reps later on Glossika, and my question intonation is starting to improve considerably.

Repetition is the key, here. And if you don’t have access to Glossika, it’s not difficult to make your own DIY solution using the mass sentence technique. First of all, you need to source neatly chunked, model sentences in audio format. This can be surprisingly easy to come across. Many phrase books, for example, come with an accompanying CD or MP3 download links. Often, this material is available for download without even buying the book. Audio support for German publisher PONS’ mini courses, like this Croatian introductory text, is one such freely available resource. Multilingual sentence repository Tatoeba also includes many native recordings for its entries.

Once located, you can organise the material as a playlist in the player app of your choice. Having them loop round on a reel isn’t far off doing audio-only reps with a rep tool like Glossika. While it won’t quite follow the very effective, high-frequency high-representation corpus method of that site, it isn’t a bad substitute to give the technique a try in working on your intonation. There’s a plus side to phrase books, too; they tend to include lots of questions, which is ideal if you also struggle with that particular aspect.

Bit by bit, my up-and-downy Midlands intonation is disappearing from my Greek. It’s a lot less yam-yam, and a lot more yia sou. As for my English? I’m older and wiser enough now to stand up for my accent. I’ll carry that intonation with pride – as long as it leaves my other languages alone!

Shipwrecks in Scotland (from freeimages.com). Perhaps Doric was spoken aboard these vessels?

Doric Scots: Treasure Trove of Nordic Gems

As language learners, we often focus on cultures that are far-flung. With our eyes and ears fixed on the far away, any richness around us can end up playing second fiddle. But occasionally, when you take a moment to pause, you realise the beautiful relevance of the local to your learning. So it is with Doric Scots and my journey with learning Icelandic and Norwegian.

Doric Scots

Doric is the dialect of Scots that is typical of Northeast Scotland, particularly Aberdeen and the surrounding fishing towns and villages. It boasts a very particular vocabulary of its own, which differs a fair bit from the Scots heard elsewhere in the country.

Although based in Edinburgh when I’m here, I’m lucky to be surrounded by friends and family who speak this colourful, unique and linguistically intriguing variety as their home tongue.

Scandi-Scots

The most curious thing is its substantial overlap in vocabulary with North Germanic languages. As a student of Norwegian and Icelandic, it is constantly throwing up nice surprises. Now and again friends will use a word that is unfamiliar in English. However, there is often more than a slight chance that it has a cognate somewhere in Scandinavia.

It’s certainly true that some of this North Germanic vocabulary is well attested throughout Scotland. Bairn (child) and kirk (church) are two that even south-of-the-border anglophones will recognise.

That said, Doric adds a whole raft of other northern terms like thole (bear, stand) and muckle (much, lots) that give the dialect a special Nordic twist.

Routes and roots

How they ended up in Doric, but lost to the rest of English (and even Scots), is unclear. Perhaps they were brought here by Viking invaders who assimilated into the local culture man hundreds of years ago. Maybe they travelled here by more peaceful routes via visiting sailors, fisherman and traders. There again, maybe they were more widespread, longer ago – perhaps standard English used to have these terms, and has since lost them.

Not knowing for certain lends these special words a delicious mystery. Words are stories, histories, and trying to fathom their beginnings is a unique delight of etymology.

It’s also worth pointing out, along the way, that there once lived a full-blooded, bona fide North Germanic language on Scottish soil: Norn, a language close to Faroese and Icelandic, which flourished until relatively recent times on the northern isles. Little surprise, then, that the language group still has such a presence in some modern-day varieties of Scots.

Memory tricks

But beyond the delightful surprises, could these similarities have a more practical purpose?

Spotting links between the local and the far away object of study can be a huge support when it comes to memorising vocabulary. It assists in creating memory hooks – multiple points of reference that pin a new word into the neural net of your brain. Rather than a single pair of points – English and Icelandic – you can now create a memory that is fixed by a third point, the Doric translation. Noting that gráta (to weep) corresponds to Doric / Scots greet holds that entry much faster in memory.

Examples

Now, I am a backseat etymologist. The list below is not based on extensive research of mine, but of frequent questioning of ever-patient friends and extensive excursions on Wiktionary. As such, here is a list of some touchpoints I’ve spotted between Doric, general Scots and North Germanic languages. It is far from complete or exhaustive, but shows some nice crossovers between Doric, Icelandic and Norwegian.

I have checked these entries with handy Doric-speaking friends, as well as the brief but brilliant Doric word list here. My conclusions proceed from superficial observations (and lots of fun trying to spot patterns), so please let me know in the comments if you know a different etymology, or reason for the overlap.

Doric / Scots terms with Nordic analogues

  • bairn : child
    🇮🇸🇳🇴barn
  • bide : wait / stay
    🇮🇸bíða (‘stay’ in Doric Scots – archaic English sense of ‘wait’ matches Icelandic bída)
  • breeks : trousers
    🇮🇸buxur 🇳🇴bukse – a word the rest of English has all but lost (although you can still hear britches / breeches in old cowboy films!)
  • claes : clothes
    🇮🇸klæði (cloth – the more usual Icelandic term for clothes is föt) 🇳🇴klær
  • ee / een : eye / eyes
    The plural in -n is remarkably similar to the Norwegian øyne (eyes)
  • fit / far : what / where
    The interesting thing here is not that the words have cognates in Doric – after all, the Standard English what / where come from the same route. What is interesting is that the Doric retains an initial fricative sound, just like the Nordic counterparts 🇮🇸hvað / hvar 🇳🇴hva / hvor
  • ging : go
    🇮🇸ganga (walk) – the Doric retains the Germanic -ng- that the shortened Standard English root has lost
  • greet : cry, weep
    🇮🇸gráta 🇳🇴gråte
  • het : hot
    Still close phonetically to the Standard English hot, although the different vowel echoes the Icelandic heitt
  • hoast : cough
    🇮🇸hósta 🇳🇴 husta (and also, husten in German!)
  • mate : food
    🇮🇸matur 🇳🇴mat
  • muckle : much
    🇮🇸mikill
  • oxter : armpit
    🇮🇸öxl (although this means ‘shoulder’ in Icelandic!)
  • quine : woman
    🇮🇸kona  🇳🇴kvinne – also note that Standard English has a cognate in the word queen
  • smit : infect
    🇮🇸smita 🇳🇴smitte (and of course, the Standard English word smitten in a more figurative sense)
  • thole : bear, stand
    🇮🇸þola
  • tint : lost
    🇮🇸týnt (it is not clear whether Doric only retains the past participle, or also an equivalent to the infinitive týna – to lose – too)
  • tow : rope
    🇳🇴tau
  • teem : empty
    🇮🇸tómur 🇳🇴tom

Much as we can do this with Doric Scots and Nordic languages, you can scout English for other traces of history that can help your learning adventure. Greek, Latin and more have made their mark in similar ways. As well as memory aids, the payoff is a deeper, richer understanding of the language you call your own mother tongue.

Often, learning a foreign language can teach you much about the lesser-spotted intricacies of your own – particularly the twists and turns of its pathways through social geography and history.

Dialektboka (The Dialect Book) from Norway

Dialect deviants? Celebrating linguistic diversity

Spoiler alert: the language you’re learning probably isn’t the language people are speaking. Thanks to dialect, you might be surprised when you chat with your first native speaker.

If you’re not prepared for it, the surprise can be disconcerting at best, and demoralising at worst. I remember the first time I tried out my fresh, pristine, textbook Norwegian in Bergen. I marched up to the tourist information desk, and enunciated my request for a map with all the precision I could muster. And the answer? Gobbledegook. Nothing like my Norwegian learning CDs back home. Was that really Norwegian? Or was I really that bad at learning languages?

OK, I was naïve back then! But dialect can still pose an issue for anyone hoping to get a functional, everyday knowledge of a foreign language.

Golden standard

When you learn a foreign language from a textbook, you’ll be learning a standardised form. This will be some general, accepted form of the language, often prescribed by an official language body in the country of origin. Some of these organisations have remarkable pedigrees; the Académie Française has been looking after the French language since 1635, for example. Spain’s Real Academia Española has been around since 1713. Sometimes, publishers or private companies will become semi-official language keepers, like Germany’s Duden, or the UK’s Oxford English Dictionary.

These lofty institutes (a full list can be found here) are custodians of the ‘dictionary’ forms of language. Consequently, it’s these forms that we’ll find in textbooks as foreign learners, and for good reason; native speakers use language in such varied ways, it would be impractical to learn every manner of speaking from every region. But out in the field, it’s everyday, spoken, dialectal forms that can add a lot of colour to your language experience.

Norwegian dialects: Extreme sport

If you know Norway, you might well consider people like me slightly masochistic. Norway is an pretty extreme example of dialect diversity. In fact, there is so much linguistic diversity in Norway, that there are two official standard forms: bokmål and nynorsk. The interplay between the two gives rise to the great language controversy that continues to play out across the country today.

However, accessing this diversity is gaining an insight into something very close to Norwegian hearts. I recently happened upon a book in Oslo that I just had to buy. In fact, it’s not just a book. It has a big, whopping MP3 player attached to it. Dialektboka (The Dialect Book) is a compendium of Norwegian dialects to read about and listen to! It’s pretty amazing:

Dialektboka (The Dialect Book) from Norway

Dialektboka (The Dialect Book) from Norway

Dialektboka (The Dialect Book) from Norway

Dialektboka (The Dialect Book) from Norway

What grabbed me particularly was this line from the introduction:

Vi nordmenn er stolte av dialekten vår.
We Norwegians are proud of our dialect.

Look at that: proud. Dialect isn’t just something that makes learning Norwegian a bit tricky. It’s actually something that makes Norway Norway. A source of national pride. So you might not understand everything straight away. But you can enjoy something that is as much a part of Norway as reindeer and hurtigruten: marvelling at how rich the country’s linguistic landscape is.

Celebrate diversity

One of the greatest thing about this book is its celebration of all dialects. This is something Norway does very well, where other countries can sometimes stigmatise dialect as ‘substandard’. When I compare this to the situation of my native language, British English, I’m a little ashamed; recent studies suggest a continued prejudice towards certain dialect and regional accents. Even qualifying accents with the seemingly innocuous term ‘non-standard’ hides a snootiness that places them outside some prestige ‘norm’. Can’t we all be more like Norway, please?

Dialect for the learner

So, dialect is a key to richness and diversity in your chosen language’s culture. You needn’t view it as an obstacle, but rather an amazing opportunity. The first engagement as a learner should be to acknowledge that dialects exist, and to expect diversity from your very first interactions. There are a couple of things you can do to maximise your enjoyment, though.

Prepare yourself

Research the linguistic topography through Internet searches. Simply starting with ‘German dialects’ in Google, for example, leads to a wealth of material.

Interrogate your textbooks

Check the intro – does it say which variety of the language you are learning? Does it give information about alternative forms that aren’t included? Welsh, for example, comes in two standards, like Norwegian. Which one are you learning? Be aware.

Expose yourself!

Aim to soak up as much contemporary language as possible. You don’t need to be in the target language country for this. Mine online TV channels and podcasts for examples of real speech. National broadcasters are good places to start; the Norwegian state broadcaster NRK has a wealth of podcasts available, for example.

Reap the rewards

If you can cope with a relatively obscure rural dialect that differs a great deal from the standard you are learning, then you have something to celebrate! Dialect comprehension shows that you’re starting to gain a very deep, active understanding of the language. Like native speakers, you’re able to hear unfamiliar words and make educated guesses at meaning.

Being able to pick out dialects can give you so much more cultural access to your target language country, too. There’s a delicious satisfaction when you hear a dialect and can place where the person is (probably) from.

Look beyond your standardised textbooks, and be prepared for colour, richness and diversity in your language learning experience. Most of all: enjoy it.