General Synthesized from 1 source

DeepMind Perch 2.0 Learns to Identify Whales From Birdsong

Key Points

• Perch 2.0 trained on millions of bird and land animal recordings
• Transfer learning enables whale call identification from bird-trained model
• Evaluated on three marine datasets with 4-32 embeddings per dataset
• Paper presented at NeurIPS workshop on AI for Non-Human Animal Communication
• Google has worked on whale bioacoustics for nearly a decade
• Model converts 5-second audio to spectrograms for classification

References (1)

[1] Google DeepMind Model Transforms Bird Calls to Whales — IEEE Spectrum AI ↗

Google DeepMind has demonstrated a remarkable breakthrough in AI-powered bioacoustics with Perch 2.0, a foundation model originally trained on birdsong that can now identify whale calls through transfer learning.

The discovery challenges conventional assumptions about cross-species audio recognition. Birds' chirps, trills, and warbles travel through air, while whales' boings, "biotwangs," and whistles vibrate underwater—both in sound characteristics and transmission medium. Yet Perch 2.0 bridges this gap with surprising effectiveness.

How Transfer Learning Works

Perch 2.0 was trained on millions of recordings from birds and land-based animals, including amphibians, insects, and mammals. Researchers at Google DeepMind and Google Research then applied transfer learning to test whether this bird-focused model could recognize whale vocalizations.

"If [Perch 2.0] performs well for our whale use cases, then that means we don't need to build an entirely separate new whale model—we can just build on top of that," explains Lauren Harrell, a data scientist at Google Research.

The technique works by "recycling all of the training that's been done and just do a small model at the end for your use cases," Harrell adds. This approach dramatically reduces computation time and experimentation effort.

Evaluation and Results

The team tested Perch 2.0 on three marine audio datasets containing whale sounds and other aquatic noises. They converted each five-second audio window into a spectrogram—a visual representation of sound intensity across frequencies over time. These images were fed to the model to produce embeddings that preserve the most salient attributes.

Researchers then trained a logistic regression classifier using randomly selected embeddings (minimum four, maximum 32 per dataset). Results showed strong performance even with just a handful of embeddings, improving as the number increased.

Implications for Marine Research

Google has been working on whale bioacoustics for nearly a decade, including algorithms that detect humpback whale calls and a more recent multispecies whale model capable of identifying eight distinct species and multiple call types for two of those species.

The Perch 2.0 approach offers new flexibility. "We're always making new discoveries about call types. We're always learning new things about underwater sounds. There's so many mysterious ocean noises that you can't just have one fixed model," Harrell notes.

The findings were detailed in a paper presented at the NeurIPS conference workshop on AI for Non-Human Animal Communication last December.