Google DeepMind has demonstrated a remarkable breakthrough in AI-powered bioacoustics with Perch 2.0, a foundation model originally trained on birdsong that can now identify whale calls through transfer learning.
The discovery challenges conventional assumptions about cross-species audio recognition. Birds' chirps, trills, and warbles travel through air, while whales' boings, "biotwangs," and whistles vibrate underwater—both in sound characteristics and transmission medium. Yet Perch 2.0 bridges this gap with surprising effectiveness.
How Transfer Learning Works
Perch 2.0 was trained on millions of recordings from birds and land-based animals, including amphibians, insects, and mammals. Researchers at Google DeepMind and Google Research then applied transfer learning to test whether this bird-focused model could recognize whale vocalizations.
"If [Perch 2.0] performs well for our whale use cases, then that means we don't need to build an entirely separate new whale model—we can just build on top of that," explains Lauren Harrell, a data scientist at Google Research.
The technique works by "recycling all of the training that's been done and just do a small model at the end for your use cases," Harrell adds. This approach dramatically reduces computation time and experimentation effort.
Evaluation and Results
The team tested Perch 2.0 on three marine audio datasets containing whale sounds and other aquatic noises. They converted each five-second audio window into a spectrogram—a visual representation of sound intensity across frequencies over time. These images were fed to the model to produce embeddings that preserve the most salient attributes.
Researchers then trained a logistic regression classifier using randomly selected embeddings (minimum four, maximum 32 per dataset). Results showed strong performance even with just a handful of embeddings, improving as the number increased.
Implications for Marine Research
Google has been working on whale bioacoustics for nearly a decade, including algorithms that detect humpback whale calls and a more recent multispecies whale model capable of identifying eight distinct species and multiple call types for two of those species.
The Perch 2.0 approach offers new flexibility. "We're always making new discoveries about call types. We're always learning new things about underwater sounds. There's so many mysterious ocean noises that you can't just have one fixed model," Harrell notes.
The findings were detailed in a paper presented at the NeurIPS conference workshop on AI for Non-Human Animal Communication last December.