Up the etymology garden path with ChatGPT

This week’s story starts with an instinct. I’ve been learning Swedish, which, as a Norwegian speaker, has advantages and disadvantages. One downside is the need to fight the assumption that the vocabulary of each matches up exactly with an identical etymology, when this is so often patently untrue.

In fact, Norwegian and Swedish have walked separate paths long enough for all sorts of things to happen to their individual vocabularies. For instance, take trist and ledsen, both meaning sad in Norwegian and Swedish respectively. Adding ledsen to my list of Swedish differences (I’m using my Swedish Anki deck just for the differing words), I started wondering about the etymology of both. Norwegian trist, clearly, I thought, is a French borrowing, probably via Danish. On the other hand, ledsen looks like it was inherited from the North Germanic parent language.

ChatGPT Etymology

Since I’m exploring the use of AI for language learning both personally and professionally at the moment, it seemed like a good test case for a chat. I went straight in with it: is the Norwegian word trist a borrowing from French?

But shockingly, ChatGPT was resolute in its rejection of that hypothesis. The AI assistant insisted that it’s from a Nordic root þrjóstr, the same that gives us þrjóstur (stubborn) in Modern Icelandic, with the variant þristr which seems to have evolved into Modern Norwegian trist.

Now, the thing with ChatGPT is that it can be so convincing. That’s entirely thanks to the very adept use of natural language in a conversational format. The bot simply speaks with an authoritative voice like it knows what it’s talking about.

So it must be true, right?

Manual Etymology

At this point, it all felt a bit off. I just had to do some manual digging to check. In Bokmål cases like these, my first port of call is the Norsk Akademi Ordbok. If there is an authority on Norwegian words, there’s little that comes close.

So I key in trist, and – lo and behold – it is a French borrowing.

The entry for 'trist' in the Norwegian Academy's Dictionary, showing its etymology.

The entry for ‘trist’ in the Norwegian Academy’s Dictionary, showing its etymology.

There’s no mention of Danish, just the French and the Latin that comes from. I suspect, with a bit of digging, it might turn out to have been borrowed into Danish first, but NAOB is definitive. Not a hint of Norse etymology.

Now there’s a chance ChatGPT knows something that NAOB doesn’t, although I doubt it. More likely, it’s just the innate talent the emergent AI has for winging it, and making best guesses. That’s what makes it so powerful, but, like human guesses, it’s also what makes it fallible just now. It’s a timely reminder to double-check AI-generated facts for the time being.

And maybe, to just trust your own instinct.

Celtic designs on a stone sphere, evoking Old Irish culture. Image from FreeImages.com

Sengoidelc : Old Irish (and More Besides)

I stumbled across a rather special book this week. It’s David Stifter’s very thorough introduction to Old Irish, Sengoidelc, pleasingly still in print, and approaching its 20th birthday.

I sought it out first and foremost as a language-learning gap-filling exercise. I’ve spent some time with Scottish Gaelic, and a bit (well, a lot) less with Irish. Exploring Old Irish seemed like a good way to get to know their common history, especially given how helpful etymological pathfinding can be with multiple language projects. I’ve also come across satisfying snippets of Old Irish writing, like the brilliantly feline Pangur Bán, and hoped it might open the door to similar treats.

Old Irish – and the Rest

What I didn’t expect from an Old Irish primer was the wealth of detail about Proto-Indo-European. It makes sense, of course; for linguists studying PIE, Old Irish is an important source of evidence from a relatively less well-known ancient descendant – at least compared to, say, Greek and Latin. But it’s positively packed with background info on PIE parts of speech, and their development into the Celtic branch. All in all, it’s a fantastically erudite book written in a disarmingly friendly tone, helped along by some very cute cartoons of sheep.

The author even provides plenty of comparative examples in German. That’s perhaps unsurprising, given his connection to the University of Vienna. But the additional language gives a further handle on potentially difficult concepts for those who know a little. It’s the ultimate in triangulation (and you know I love that).

If your language interests intersect in the same way, Sengoidelc is heartily recommended. I’m just annoyed I didn’t find it sooner!

Exploring language family tree connections can be one of the most useful polyglot learning tools

Wiktionary Trails : Tracing Cognates

One of the greatest things about Wiktionary, the crowd-sourced, multilingual lexicon, is the wealth of etymological information included in its entries. If you’ve ever wondered where does that word come from? then Wiktionary is a good place to start.

I’m a fiend for digging into my vocab’s provenance. It’s a natural curiosity and desire to join the dots up. Once I start pondering on a word, I have to follow it right down the rabbit hole.

Let’s play Wiktionary

This week, it was the Greek παίζω (paízo – I play) that I randomly chanced to cogitate upon. If you have a bit of Greek yourself, you might well recognise the connection with παιδί (paidhí – child). That’s a self-explanatory etymology, since playing is something children are especially fond of. And from παιδί, you can see a host of other connections thanks to Greek’s generous donations of words to science and medicine: paediatrics, pedagogy and so on. 

What I didn’t know was how much deeper the interlanguage connections of παιδί go. At first glance, paidí (paidí) doesn’t look much like other Indo-European words for child, save perhaps the Irish páiste, which may itself be a borrowing from Greek via Latin. I’d assumed it might be a loanword from a neighbouring, non-Indo-European language. But the truth lies closer to home; the Wiktionary entry throws light on some hidden family resemblances.

Setting off on a Wiktionary track and trace, it turns out that παιδί goes back to a diminutive form of Ancient Greek παῖς (pais – child). That, in turn, has been traced back to a reconstructed Indo-European form *peh2w-, denoting smallness or few in number. The Greek, then, seems ultimately to have arisen from the notion of a small person.

The relevance of that might not ring any bells. That is, until you check out the Wiktionary page and peruse the raft of guises this root has been cast to across other languages. These are just a few:

  • Latin: puer (boy), puella (girl); paucus (few)
  • Spanish: poco (few)
  • Italian: pocco (few)
  • Norwegian: fá (few)
  • English: few
  • Russian: пти́ца (ptíca – bird)
  • Polish: ptak (bird)

It’s the idea of smallness that links all these. Suddenly, παιδί (paidí) doesn’t seem such an outlier after all.

Wherever the trail may lead…

You might wonder what all the point of this meandering is, of course. Well, I find it helps to create a bird’s eye view of related languages you study, especially if you’re a regular dabbler. If you know the wider terrain, and make connections between linguistic territories, there are more connections for your brain to secure those words and phrases in memory.

And that can only be a good thing!

Search those etymology dictionaries for evidence of semantic change! Picture from freeimages.com

Semantic Change : The Double Lives of Cognates

When I was young, I was a very silly girl. Not in the familiar, modern sense, of course, but in the long-lost meaning of those two words in English: a happy child.

Semantic change – words taking on new meanings over time – is a fascinating garden path of surprising twists and turns. And it’s not only a fitting focus for linguistics nerds, either. Travelling back down the path of language change can lead language learners to the crossover points that tie foreign languages to our own. It can be thoroughly eye-opening to learn of the double lives that words in related languages have come to lead. Ultimately, comparing cognates also bolsters our armoury in the quest to gain a deeper understanding of vocabulary.

A Worsening Situation…

Take silly, for example. Its rather dramatic journey from happy to daft can be traced back to the Old English form sælig. Back then, it covered a range of happy nuances from happy, to blessed, to fortunate. Germanists might recognise its family heritage here: it is cognate with the German word selig, which today means blissfully happy. So what happened to poor old silly in English?

Historical linguistics studying semantic change identify several broad flavours of meaning drift. What silly shows is pejoration, the application of a more and more negative meaning over time. It happens frequently in the history of words: knave is a rather old-fashioned word for a rogue, but originally just meant a boy, or servant. Of course, the opposite – amelioration – can happen too; just think of how bad, wicked and sick have been used in recent decades. And knight went the opposite way of knave, starting out as a mere boy but coming to take on quite haughty responsibilties. German cognate Knecht, of course, knew its station – it remains quite a lowly word.

On a slightly dispiriting side note, from a social standpoint, pejoration does seem to affect words for women with some degree of disproportion in English. For example, hussy, mistress, tart and wench all started out as quite neutral, uninsulting words. Language is the mirror of human culture, whether that be its pleasant or ugly side.

Straight and Narrow

Then, of course, we have girl. Back in Middle English times, the word referred to a child of any gender. Now, it is quite exclusively a female designation. And that is a classic case of narrowing of meaning from a general category to a more specific one.

Although cognates of girl don’t pop up in any high frequency vocab in related languages, there is an amusing, more obscure German analogue in Göre – a cheeky young childGöre has retained the gender-vagueness English lost, but has gained the slightly negative connotation of naughty (pejoration again!).

Learning More About Semantic Change

Taking a deep dive into semantic change is a fascinating way to work backwards in a language, revealing the maze of cross-language touch points. When the changes are as dramatic as the handful of words above, it can be fun tracing out this secret life of cognates.

Of course, pejoration, amelioration and narrowing are just a couple of a range of recognised processes of semantic change. To fall down this very addictive rabbit hole, check out Lyle Campbell’s chapter on the subject in his Historical Linguistics primer. The Wikipedia article on semantic change also gives a really helpful overview.

And if you find yourself hunting etymologies but lack access to behemoth resources like the OED, then Wiktionary is, as ever, always on hand. That site is fast becoming a second home…

Exploring language family tree connections can be one of the most useful polyglot learning tools

Polyglot perfect recall: connecting your languages with Wiktionary

One of the nicest things about the polyglot journey is the interconnectedness you see along the way. And finding connections is a brilliant way to make words stick. Sometimes, those connections are staring you right in the face, like the German Flasche (bottle), a relative of the English word flask. But more often than not, it’s the less obvious connections that can be the most rewarding (and memorable).

Polyglot pants

Sometimes, you see connections in the most unlikely of places. Take Lowland Scots and Romanian, for example. Both Indo-European, but pretty far removed from each other. I happen to hear a lot of Scottish English, being based in Edinburgh. So when I came across the Romanian verb îmbrăca (to dress), I thought I spotted something familiar.

That -brăc- part of the word is, in fact, from an old Latin word for ‘trousers’, braca. So in Romanian, you literally ‘trouser yourself’ when you get dressed. Now, with these clues, some will instantly spot the connection. The Scots for ‘trousers’ is breeks, also related to the slightly more archaic breeches in Standard English or britches in Yosemite Sam country. That’s a handy hook between two unlikely language pairs to help remember a word!

Mining for connections

Unless you are a walking etymology dictionary, it can be hard to spot these connections. To this end, it’s much handier to look up new words on the open source dictionary site, Wiktionary. For a community-driven site, it’s absolutely packed with detail, including word origin.

Take the German word Zaun (fence), for example. At first glance, it looks pretty removed from anything familiar in English. However, check out the Wiktionary listing; it turns out that the word is a relative of the English town. With a bit of historical imagination, you can think up reasons why the meanings have slightly diverged. The town, or settlement, is an enclosed living space; the fence is a means for enclosing a space.

Word hangover

Languages derived from the same proto-family, like Indo-European, are bound to display these similarities. But often, you can find them in neighbouring languages from totally different trees, too.

If you’re learning Finnish and Russian, for example, you’ll find a few crossover words to help you. One of my favourites is the word kohmelo, meaning ‘hangover’. Check Wiktionary, and you’ll see that it’s a borrowing from the Russian похме́лье, meaning the very same. However, as a bonus, Wiktionary informs you that the Finnish word was further changed by ‘contamination’ with the word kohme, meaning numbness. So that’s three words you’ve learnt for the price of one, thanks to some canny connection-spotting!

Cultivate a bird’s eye view of language

If you travel back far enough, you’ll find all sorts of links between your languages. It’s one more reason why studying several languages at once can be a help, and not a burden. The polyglot approach is a fantastic way to get a bird’s eye view of language relationships and development; in my experience that has provided a great scaffold for making those words stick.

Which are your favourite word connections between languages?
Share them in the comments!