Whale song shows language-like statistical structure
Open access link to paper in Science.
Co-lead authors: Inbal Arnon, Ellen Garland, and Simon Kirby
Other co-authors: Jenny Allen, Claire Garrigue, and Emma Carroll.
We found key statistical properties that characterise all human languages in another species for the first time. We have more in common with whales than we previously thought! Every language is made up of statistically coherent parts - words. These show a characteristic pattern: The most frequent is twice as common as the second most frequent, three times as common as the third etc. Despite this being true of all languages, it has not been found in any other species.
So, why does this pattern show up in language? We think having words, and having them pattern in this way makes language more learnable. And because language is passed down over generations of learners, it evolves culturally to have features that aid learning. If we're right, then we should find this pattern in other species who also use culture for their communication! Turns out that's what humpback whales do... their long complex songs are learned, and evolve culturally too. But how do we figure out if there are word-like units in whale song? To answer this, we ask: how do babies figure out what the words are in human language? Luckily we know a lot about this! Babies are exposed to continuous speech & discover where the word boundaries are by looking for sounds that are surprising in context. (Sounds within words are more predictable.)
We reasoned we could apply the same technique to whale song. To start with we did something we assumed wouldn't work and just poured 8 years of whale recordings into a programme that modelled this simple segmentation process inspired by human infants. To our amazement, this approach worked first time, with no tweaks to the code being needed to make it work with the whales! We discovered these word-like units, and they followed the same statistical distribution that is found in all human languages.
We worried that this might be an artefact of our method. Would always find this pattern wherever we used the infant-inspired segmentation process? So we created thousands of "fake" whale song datasets and ran them through the procedure. None showed the pattern we saw in the real whale song data.
Across lots of species (including humans) we tend to find communication systems appear to be designed to be efficient. This means more frequent elements should be shorter. We find this for our segmented units, which makes them plausible candidates for "real" units for the whales.
Although we have discovered "word-like" sequences in whale song, we don't think these have meanings in the way words in language do. Just like music, whale song may be complex and expressive without carrying semantic content. But nevertheless we have uncovered a deep commonality between the two. Humans and whales are evolutionarily distant, but united in having culture. When we use what we know about language learning to look at whale song, its hidden structure is revealed. These two systems have evolved culturally to maximise their own learnability - statistical structure is the result. This work makes a bold prediction: we should find these language-like properties wherever complex communication is transmitted culturally. Species that do vocal production learning (like whales, humans) are obvious candidates for us to look next... watch this space!
By working across disciplinary boundaries we've learned more about whales, and more about humans too. Thank you to my amazing co-leads Ellen Garland, Inbal Arnon, and to Jenny Allen, Claire Garrigue, and Emma Carroll. You are all brilliant humans!