I am surprised we don't see more "Dr. Doolittle" projects like this. I assumed rats or corvids would be good candidates for animal language translation projects since you can keep them in a confined space and record video of them. I am sure that body language plays a huge role in animal communication.
I recently read a paper[0] that claimed to have decoded the basic building blocks of Sperm whale language. I went and took a look at the github for the project CETI and found that most of the code was for the whale trackers and hydrophones. It seems like there are a lot of pre-requisite problems that you have to solve to even get good whale recordings.
On the other hand though, whales probably don't rely on body language since they are communicating way out of line of sight. So it may be easier in that regard.
Anyways, I am convinced that we will figure out how to teach some basic human concepts (like self) to animals and the intelligence of even "stupid" animals like chickens will make people more reluctant to eat meat.
A major reason was that there was a dogma that language is uniquely human phenomenon. It was already part of the hugely Decartes' philosophy of mind and got a strong revival with Chomskian linguistics and the "cognitive revolution" in the late 1950's.
There was sporadaric research into animal linguistics, e.g. the Koko study, but those were dismissed on the grounds that animals can't have language by definition.
The ethics of enslaving and torturing animals is definitely part of the motivation for the dogma.
To be clear - as of today, many researchers would agree that language is still a uniquely human phenomenon. They discuss this pretty explicitly in the article linked, how it is important to draw a distinction between language and communication. There are no non-human species that have been found to use language for the Chomskian definition of language (using a finite set of symbols to represent an infinite number of communicable meanings).
However, this "dogma" as you call it is beginning to be weakened as researchers document more nuance and complexity in non-human communication than ever before, and so some researchers begin to say, "maybe we shouldn't have this all-or-nothing view of language". But it is simply not true that researchers are suppressing evidence of language in animals out of a desire to enslave and torture them.
> There are no non-human species that have been found to use language for the Chomskian definition of language (using a finite set of symbols to represent an infinite number of communicable meanings)
It's far from clear whether humans are capable of the Chomskian criteria of language. And Chomskian linguistics have more or less collapsed with the huge success of statistical methods.
Chomsky's poverty of stimulus argument is, if anything, strengthened by LLMs. You need to read the entire internet to make statistical methods work at producing grammatical texts. Children don't read the entire internet but do produce grammatical texts. Therefore &c. QED.
I think this is greatly complicated by the fact that the human brain has been "pre-trained" (in the deep learning sense) by hundreds of millions of years of evolution.
A pre-trained LLM also can also learn new concepts from extremely few examples. Humans may still be much smarter but I think there's a lot of reason to believe that the mechanics are similar.
The poverty of the stimulus (POS) argument is that "evolutionary pre-training" in the form (recursive) grammar is fundamentally required and can not be inferred from the stimulus.
The argument is based on multiple questionable assumptions of Chomskian linguistics:
- Humans actually learn grammar in the Chomskian way
- Syntax is separate from semantics, so only language (utterances) can be learned from uttrances, and not e.g. what is seen in the environment
- At least in the Gold's formalization of the argument language is learned only from "positive examples", so e.g. the learner can't observe that some does not understand some utterance
One could argue for a (very) weak form of POS that there has to be some kind of "inductive bias" in the learning system, but this applies to all learning as shown by Kant. The inductive bias can be very generic.
It seems to be a persistent myth (possibly revived more recently due to Norvig?) that Chomsky's POS argument has some interesting connection to Gold's theorem. The two things have only a very loose logical connection (Gold's theorem is in no sense a formalization of any claim of Chomsky's), and Chomsky himself never based any of his arguments for innateness on Gold's theorem. Here is a secondary source making the same point (search for 'Gold'): https://stevenpinker.com/files/pinker/files/jcl_macwhinney_c...
The assumption that syntax is 'separate from semantics' also does not figure in any of Chomsky's POS arguments. Chomsky argued that syntax was separate from semantics only in the fairly uncontroversial sense that there are properly syntactic primitives (e.g. 'noun', 'chain', 'c-command') that do not reduce entirely to semantic or phonological notions. But even if that were untrue, it would not undermine POS arguments, which for the most part can be run without any specific assumptions about the syntax/semantics boundary. Indeed, semantic and conceptual knowledge provides an equally fertile source of POS problems.
Yeah, I don't necessarily buy the whole Chomskian program. I'm willing to be persuaded that the reason kids learn to speak despite their individual poverty of stimulus is that there was sufficient empirically experience stimulus over evolutionary time. The Chomskian grammar stuff seems way too Platonic to be a description of human neuroanatomy. But be that as it may, it's clear the stimulus it takes to train an LLM is orders of magnitude greater than the stimulus necessary to train an individual child, so children must have a different process for language acquisition.
Children do get ~6000 hours a year of stimulus. Spoken, unspoken, written, and body language. Even then they aren't able to form language proficiently until 5 or 6 years old. Does the internet contain 30,000 hours of stimulus?
That's astonishing. If you watched all of them, how much new information would you learn? I suspect a large portion of them are the same information presented differently; for example a news story duplicated by hundreds of different channels.
Yeah, I imagine every moment of communication a child receives is new information not just baby talk about getting the spoon in their mouth and asking them if they have pooped.
I'm sure someone else could calculate the informational density of all of the text on the internet vs. 30,000 hours of sight, smell, touch, sound, etc density. My intuition tells me it's not even close.
Does the information contained in smell and touch contribute to the acquisition of language? Keep in mind you'd be arguing that people born without a sense of smell take longer to develop language, or are otherwise deficient in it in some way. I'm doubtful. It's certainly tricky to measure full sight / sound vs. text, but luckily we don't have to, because we also have video online, which, surprise surprise, utterly dwarfs 30,000 hours of sight and sound in terms of total information.
One qualitative difference is that the child's 30,000 hours is realtime, interactive, and often bespoke to the individual and context. All the videos on youtube are static and impersonal.
I think what he's saying is that "real world" interaction is so high bandwidth it dwarfs internet (screen based) stimulation. Not saying I agree just that he's not comparing hours being alive to hours of youtube
> And Chomskian linguistics have more or less collapsed with the huge success of statistical methods.
People have been saying this for decades. But the hype around large language models is finally starting to wane and I wouldn't be surprised if in another 10 years we hear again that we "finally disproved generative linguistics" (again?)
Counterpoint: What progress has generative linguistics made in the same amount of time that deep learning has been around? It sure doesn't seem to be working well.
Also, the racecar example is because of tokenization in LLMs - they don't actually see the raw letters of the text they read. It would be like me asking you to read this sentence in your head and then tell me which syllable would have the lowest pitch when spoken aloud. Maybe you could do it, but it would take effort because it doesn't align with the way you're interpreting the input.
>What progress has generative linguistics made in the same amount of time that deep learning has been around? It sure doesn't seem to be working well.
Working well for what? Generative linguistics has certainly made progress in the past couple of decades, but it's not trying to solve engineering problems. If you think that generative linguistics and deep learning models are somehow competitors, you've probably misunderstood the former.
We can. Scientific notation with 1 significant figure can be meaningful because we can use it to figure out order relations. It’s an infinite language.
Recite 99 bottles of beer on the wall, but start from 1 and change so the number increases? Stop when there are no remaining numbers or when you reach infinity, whichever comes first.
They are talking as if language was some platonic construct like a Turing machine with an infinite tape and you are talking about the concrete reality where there are no such things as an infinite tape.
Both viewpoints are useful, they can prove general properties that hold for arbitrary long sequence of words and you put a practical bound on that length.
Whatever "finite set of symbols" humans use to communicate is not the finite set of symbols that form letters or words. Communication isn't discrete in practical sense, it's continuous - any symbol can take not just different meanings, but different shades and superposition of meanings, based on the differences in way it's articulated (tone, style of writing - including colors), context in which it shows, and context of the whole situation.
The only way you can represent this symbolically is in the trivial sense like you can represent everything, because you can use few symbols to build up natural numbers, and then you can use those numbers to approximate everything else. But I doubt it's what Chomsky had in mind.
This is not really the point though. Why is "uses Chomskian language" the criteria for whether it not it's okay to to change and slaughter a living being?
There is and remains a desire to explain exactly how it is that humans are different than other animals. Language or the language faculty has been touted by some as this thing.
Your usage of "language" here is akin to laymen usage of "hypothesis" and "theory" and then trying to apply it in an academic context. Same sequence of letters but different meaning. In linguistics, "language" has a specific definition that only humans have been shown to have. Some trained individuals like Koko do seem to demostrate an very limited ability to use "language" in the linguistics sense.
You might argue that the definition itself is arbitrary and coming from the same place that geocentrism, creationism and flat-Earth views come from. I can't argue for or against that.
I suspect things as more nuanced than the current definition that we have though, especially after the recent study from the Scientific American that heated up Hacker News in a way that only "Is CS a science" articles can.
There's no consensus on the definition of what language is.
Chomskian linguistics does posit that human language is based on (innate) recursive grammars (narrow language faculty hypothesis), but this has always been a contentious question. And per that definition humans too have demonstrated only very limited ability in e.g. infinite embedding.
"Language" in the sense of "the thing only humans have been shown to do" requires a bit more than just one to one correlations between signifiers and objects (or a "sentence" of signifiers with the same meaning as all of the words added together independently). For a system of symbols to be "language" there must be a difference between "what the cat ate" and "what ate the cat". No animal communication has been shown to have a grammar to it, and thus the ability to express exponentially many unique ideas with each additional word.
I feel like there are human languages where the symbolic distinction between "what the cat ate" and "what ate the cat" are nil and the understanding is achieved contextually.
2. You don't need to understand the words to push the buttons. You could replace the English words with gibberish and it would still work as long as you always give the same thing when the same button is pressed. Many animals can do this. It is called positive reinforcement. Nothing to do with language
I think most dog owners would tell you that their adult dogs can communicate things like this, but that the language is unfortunately siloed into a very personal relationship that is difficult for even the human part of the pair to demonstrate, making it difficult to do science about
Sometimes at bedtime my cat will go to the door and scream nonstop. I don't know why he does it. Maybe it is for food or attention. But the only way I have found to get him to stop is to pick him up, put him on his special pillow, squish him, and have my partner join me in telling him "we are going to bed, it's bedtime".
I'd say about 80% of the time he listens. So he is capable of understanding what we want him to do, and capable of supressing his own personal desires in order to maintain harmony in our group. Funny enough, he won't go to bed unless both me and my partner tell him it is bedtime, so maybe he is only obeying because there is some majority consensus?
Because of this, I find it easy to believe that a cat or dog could be taught something as abstract as "self" if they can understand commands and intent and group dynamics. It's just difficult to tell what is "understood" and what is just conditioned behavior. Hell, I can't even answer that question for myself as a human.
- use arbitrary links between signifier and signified
- generate new linguistic tokens (new signifier and signifieds as well as links between them)
- refer to events and times beyond the current ones
- talk about the system of communication itself using the system itself
Then, you would have grounds to think that your cat uses a language in the linguistic sense. But until, then it is just a communication system no matter how sophisticated.
Your cat being a extremely well oiled operant conditioning system does not mean that it is able to think the way you can even if they are likely more intelligent than what we give them credit for because, as much as we would like to believe we do, we don't know what they think and any patterns that we see are just good old human pareidolia. Like hearing voices in the wind, faces on rocks, etc. Your feeling is real but the belief that feeling induces does not have grounds in reality as far as we know so far.
It is possible to over-select on skepticism about other species. Imho, the simplest explanation is that it is far more likely that there is nothing special about us, and a mere quirk of say, the combination of tool-use, foresight, and social cohesion that makes humanity special.
Or is it special? We are just a well oiled operant conditioning system, it does not mean that we are able to think the way cats can.
Laymen think of language the same way they do about theory and then try to apply that to an academic context. Different meaning. A system of communication is not necessarily a language even if all languages are systems of communication. If your dog could use arbitrary linguistic tokens, generate new ones, describe things that it has not seen before, talk about the past, the future or places other than the current one, then I would be more willing to entertain the idea that your dog has a language
There still is, as far as I can tell. Whenever my curiosity drives me to take a psychology or philosophy class I end up with the feeling that they think part of their job is to reassure the rest of the humans that we are in fact special. It feels like some kind of leftover from when that kind of work was done by monks.
We are objectively special in creating technological civilization with all sorts of cultural artifacts like philosophy that we have no evidence for in other species that have existed on this planet, other than possibly a few of our close hominid relatives. Hominids are a very special evolutionary branch in that sense.
When we think about ETs, we're wondering about technological civilizations on other planets with space craft and radio telescopes, not the equivalent of birds or dolphins.
Really? I distinctly remember a lot of pissed off kids in my college philosophy and psychology classes trying to defend their religious beliefs and that we are more than just monkeys with fancier tools. Most of the religious folks (at least vocally) dropped out of Philosophy 101 after 2 weeks. It was incredibly entertaining. I guess this was 20 years ago, but assuming we are a more secular society I guess I thought that would still be the case.
It's a bit different at the intro level. I'm talking about the professors and the grad students. It's not that they're directly religious, but I get a status-quo-preserving kind of feeling from them. Like maybe they're influenced by a tradition of not calling your patron an ape--or somesuch.
However, another major reason is that people have repeatedly gone seeking for language-like or human-language-level behaviors in animals, and repeatedly and consistently failed.
It is also worth pointing out that detecting language is a great deal easier than understanding language. Something like https://www.youtube.com/watch?v=vvr9AMWEU-c is reasonably recognizable as clearly some sort of language even if we have no (unassisted) human idea what it is saying. We can tell with quite high confidence that most animal sounds are not hiding some deeper layer of information content.
Such exceptions as there are, like whalesong, take you back to my first paragraph, though.
The idea that language is a uniquely human phenomenon may be "dogma", but it is also fairly well-founded in fact. It should also not be that surprising; had another species developed language first, they'd be the ones looking around at their surroundings being surprised they are the only ones with proper language, because they'd probably be the dominant species on the planet. It isn't a "humanist" bias, in some sense that humans are super special because they're humans, it's a "first species to high language" bias, which happens on this planet to be humans.
1. Train crows to push a touchscreen for reward of food.
2. Next set up two touchscreens back to back. Make it so touching one screen only dispenses food on the other side.
3. Next make it so food is dispensed on the other side only one crow is perched at each terminal.
4. Next make it so food is only dispensed after a crow says something to the other crow on the other side.
5. Next display a picture on one terminal and give the other crow the choice of four quadrants. The food is dispensed if the picture on the far side matches the displayed picture.
Hey, that makes a lot of sense. There's a lot of crows who come to the bird feeder in my backyard. I would just have to figure out how to easily make a food dispenser, and what sort of touch screen a crow can activate....
I would do it myself but where I am there are almost no corvids, they (and city pigeons) are basically displaced by the highly aggressive local bird. Which seems to swarm around from time to time but doesn't stay anywhere permanent locally except for grocery store parking lots. That's probably enough to leak location info on this anon account so I'll shut up now
I think the thing about animal language is there is no objective truth to it. Even with well defined words and educated and experienced communicators, miscommunication and ambiguity is everywhere in human language.
For animals I think there is less language as we know it. Communication is an instantaneous and unfiltered expression of mental state, and there isn’t much guarantee that it’s received as intended. My dog is the friendliest little dude. He will play bow other dogs immediately, and then jump around with excitement. But it’s extremely common that other dogs misread this and respond with aggression.
I suppose I think animal language is less something we could ever learn academically and more something you just feel.
I think one problem with the idea of learning the language of captive animals is that it runs on the assumption that animals have an inherent language. There was some research centuries ago that involved raising kids without exposure to the outside world of languages in hopes of learning what our "natural" language was. Turns out there isn't one.
And I think the same applies to animals. Bird songs have already been noted to have accents by region. We'll probably end up finding out that if animals do have complex language, it's something they develop through community exposure just like humans do, and it'll vary by area. Crows in Texas will talk very different from crows in England, I'm sure.
With pets, most of them are kept isolated from the world at large. It wouldn't surprise me if dogs raised by people have very stilted or nonexistent language compared to feral dogs. Maybe 200 years from now, we'll see the modern concept of raising dogs completely alone as inhumane and sending them off to doggy daycare a couple times a week to learn the local dog language and "culture" will be normalized.
Will there be a place I can upload bird recordings? I have half a dozen wild grouse that think they are my chickens and they have dozens of different sounds they babble at me and I have no idea what they are trying to convey. I try to mimic the sounds they make. Sometimes they chat back and forth with me until I get bored, sometimes they follow me whereas one particular sound makes them wander off.
Thankyou for that. I will check it out! Maybe one day the decoding project can ingest all the sound content.
Reading through the rules I like these people already. They prefer high quality .wav files as do I. Not sure if I have the skills to edit to their standard but I will try.
Have a look at xeno-canto as well, a large repo of animal sounds. It's more of a general archive than specifically for “understanding”, for example it’s often used to train audio recognition models.
I feed all the wild birds birdseed and since the grouse started following me I started giving them chicken crumble but I do not currently have chickens at the moment.
No need to apologize. Many of the animals here are fun to interact with. Maybe this upcoming winter I will try to record the deer when it's feeding time. There's usually 2 or 3 fawn that are right on my heels testing each food pile to see which one is the right one not realizing they should just start with the first one to get more time to eat.
Great app for playing bird songs and annoying them once you’ve identified them, too. Sometimes you can get a few chirping really loudly at you and confused why their new friend looks like an iPhone.
+1, this app is an eye opener to the nature around oneself. So much so, I have actually linked it to my iPhones action button to make it easier to open on a whim.
I installed it a year or two ago but was disappointed by its identification abilities. Then it changed to require providing an email address so I deleted it.
You might have used the wrong model. They tend to be location specific, so if you live in eg Australia make sure you get the appropriate pack. It does skew to more common species - there is a very long tail in species recognition.
I didn't immediately see a way around the "Please enter your email" prompt, but long pressing the icon (on android) gives context menu with options like "Choose photo" and "Start new recording" that open into the main app without any login.
> “Social birds . . . are constantly chatting to each other,” Mike Webster, an animal-communication expert at Cornell, says. “What in the hell are they saying?”
Whenever I hear this question I always remember the Eddie Izzard skit about birdsong being territorial, so the nightingale in "A Nightingale Sang in Berkeley Square" was essentially shouting "Get out of Berkeley Square! It's my Square!"
Does anyone have a clue how far we are from having "LLMs for animals"? Even if we don't understand what the LLM is saying to a dolphin or a monkey, does it change much from feeding millions of texts to a model without ever explaining language to it as a prerequisite?
A predictive/generative model of animal "vocalizations" would be almost trivial to do with current speech or music generation models. And those could be conditioned with contextual information easily.
Wouldn't we need several hundred gigabytes of ingestible/structured contextual info for animal vocalizations in order to train a model with any accuracy? Even if we had it, seems to me the model would be able to tell us what sounds probably “should” follow those of a given recording, but not what they mean.
We could train a transformer that could predict the next token, whether it's the next sound from one animal or a sound from another animal replying to it. However, we wouldn't understand the majority of what it means, except for the most obvious sounds that we could derive from context and observation of behavior. This wouldn't result in a ChatGPT-like interface, as it is impossible for us to translate most of these sounds into a meaningful conversation with animals.
Why not label a fine-tuning dataset with human descriptions based on video recordings. We explain in human language what they do, and then tune the model. It doesn't need to be a very large dataset, but it would allow for models to directly translate to human language from bird calls.
What if they just sit and talk? What is the description of this? What if only part of the communication is relevant? What if it's not relevant at all because they reacted to atmospheric changes? Or electromagnetic signals, that can't be observed on video? Or smell? Or sound outside of human
hearing frequency? What if the decision based on communication is deferred? etc etc
As I mentioned before, only the most obvious examples of behaviors and context can be translated into anything meaningful.
Generative models yes, since there are terabytes of audio available. High quality contextual info is much harder to obtain. It’s like saying that we could easily build a model for X if we had training data available.
With LLMs we can leverage human insight to e.g. caption or describe images (which was what made CLIP and successors possible). With animals we often have no idea beyond a location. There is work to include kinematic data with audio to try and associate movement with vocalisation but it’s early days.
To clarify: I didn't mean a model that would "translate" animal sounds to some representation of language or meaning. I meant a model that would capture statistical regularities in animal sounds and perhaps be able to link these to contextual information (e.g. time of day, other animals around, season etc).
By almost trivial I mean it wouldn't require much new technology. Something like WaveNet or VQ-VAE could be applied almost out of the box.
Someone already mentioned Aza Raskin, but the organisation you should look up is Earth Species Project. It’s a fairly open question and fairly philosophical - do the semantics of language transcend species? Certainly there is evidence that “concepts” are somewhat language agnostic in LLM embedding spaces.
I had the pleasure of hanging out with him at Stochastic Labs in 2018 while he was working on this, and I was working on 3D fractal stuff there. Pretty fun place, and was my first time living in the US.
At the time it seemed a bit wild / long shot, but now he just looks like a pioneer.
Presumably anyone with a multimodal transformer already pretrained on Human data could be further pretrained on animal vocalizations. I don't know whether any of the large model owners are doing this.
Quite a few geese are flying over me each day now. I've convinced myself they are saying to each other "left..left..OK straight...right a bit...OK". I'm a amazed at how precise they can be (an sometimes not) like they all stop flapping at once and glide then flap again. There were at least 24 to 40 geese all acting in perfect harmony.
I remember seeing a video from the 80s about how the behavior is emergent - they made a computer program that replicated how birds fly by stating just a few axioms like don't fall behind and don't be in front.
The idea being that the V takes shape because they want to have a bird in front of them the entire time while one poor bird gets stuck out in front.
They take turns! Easily explainable in mechanical terms —lead bird is tired, slows down, another bird overtakes, and the v formation algorithm kicks in again.
But it could be more than that. Maybe the lead bird vocalizes or signals that they want to swap out, and some type of social status determines the order, with disputes resolved by wing flap aggression. Or maybe not.
If, as Jane’s Addiction and many sf authors have imagined, we’re being kept as pets by aliens, much of our behavior would look rules-based, unthinking and reactive.
These are happening this time of the year where I live. I like to go out at sunset to watch them dance. It's amazing how they coordinate so well at such close quarters, looks like a single organism from afar.
You think it needs a lot of coordination to fly in sync? I only have a slow clap for you, actually everybody else join in for the clapping and without any coordination whatsoever you'll notice that we clap in sync after a just 50 seconds https://www.youtube.com/watch?v=Au5tGPPcPus
This slow clap thing is a tradition to ask for an encore/bis/repeat at concerts. So I wouldn't be so quick at stating that this is an emergent phenomenon.
But maybe this has become the tradition because when you clap for a long time it would slowly synchronize.
In the video it is quite clear a few people are seeding the synchronisation.
Just don’t clap on the 1 and 3, please. Lots of videos of musicians attempting to get the crowd to clap on the 2 and 4…but the audience has a mind of its own.
Didn’t Douglas Adams have a bit about this? Once you figure it out you’d do anything to return to blissful ignorance. It’s all inane chatter about what’s for dinner, who’s looking hot today, and more than anyone would ever want to know about wind speed and weather conditions.
I always fancied that they might be debating philosophical points or maybe even offering up "tweets" of wisdom. Owl: "In order to understand the very nature of the mind itself, one must earnestly seek to find the answer to this riddle: WHOoooooooo?!"
"He learned to communicate with birds and discovered their conversation was fantastically boring. It was all to do with windspeed, wingspans, power-to-weight ratios and a fair bit about berries."
So this is purely anecdotal, but it seems to me that bird songs work kind of like drum circles. A bird can sing a pattern, and see if anyone else can replay the pattern. If you can, then the initiating bird will slightly modify the pattern, and see if you are able to pick up on the nuance. With drum circles, people typically play off of patterns set by others. And both the leader and follower can tell that they are in sync with each other. I suspect that this dynamic is at the core of a lot of bird song interactions. And to try to translate that into a human language would not work well.
It would be pretty useful if we could somehow convince birds to relay our messages using birdsongs -- just use a speaker to transmit a message, encoded in birdsong with some special preamble header, and it will get broadcast or unicast to the desired destination bird that happens to be located near a microphone that receives this message. Could this scheme beat IPoAC? Maybe if we manage to reverse engineer birdsongs well enough, BGP could be ported to birds!
If anyone has a spare Raspberry Pi and is looking for a fun project, consider BirdNET-PI[0]. It turns your Raspberry Pi into a 24/7 bird monitoring device. You need a microphone and then it will automatically detect birds by their songs and report them to the BirdWeather service that helps monitor bird populations.
That's something I was thinking about, but don't have the brain juice to do.
Every bird species has its own language.
I'd say tempo, frequency, pause duration, it all plays a role.
Do dogs have a language? Woof woof woof sounds a lot like the crows craaaw craaaw, and while they don't speak they do communicate.
Maybe there are nuances just us can't hear or recognize.
It's certainly fascinating.
How do you map sound to action, you can't analyze it independently, but also need to take the environment into account
What if we don't spend the effort, time, and money to 'decode' birdsong? What if we don't feel the need to uncover such, to reinforce the human exceptionalism? What if it really wasn't meant for us to know? What if we simply just relaxed and immersed ourselves in the beauty of birdsong? Would we be somehow deprived?
So there was this German scientist that went about decoding how bees communicate where pollen sources are. I believe he won a noble prize for it. He had to individually hand paint bees. I can't remember the details and I'm too lazy to look it up.
The point here is that if you want to know what birds are saying you'd probably have to record the flapping of wings (especially with the more colorful birds) and then the bird song - their eyesight is particularly acute due to needing to eat fresh berries so body posture is most likely important in communication. A high def camera and microphone and an LLM should be able to do the job if the data is good enough on a particular species. From there you should be able to extrapolate to multiple species.
The language would probably be along the lines of a few set phrases
I recently read a paper[0] that claimed to have decoded the basic building blocks of Sperm whale language. I went and took a look at the github for the project CETI and found that most of the code was for the whale trackers and hydrophones. It seems like there are a lot of pre-requisite problems that you have to solve to even get good whale recordings.
On the other hand though, whales probably don't rely on body language since they are communicating way out of line of sight. So it may be easier in that regard.
Anyways, I am convinced that we will figure out how to teach some basic human concepts (like self) to animals and the intelligence of even "stupid" animals like chickens will make people more reluctant to eat meat.
[0] https://www.nature.com/articles/s41467-024-47221-8#:~:text=S....