Separation of Overlapping Zebra Finch Vocalizations Using Syllable-Based Optimization
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Overlapping vocalizations are common in social animal recordings and make it dif
ficult to study the behavior of individual animals. In zebra finches, this problem
is especially challenging in single-microphone recordings, where several birds may
sing at the same time but only one mixed signal is observed. This thesis presents
an interpretable syllable-based framework for separating overlapping zebra finch vo
calizations by exploiting a key property of the signal: each bird produces songs
composed of a limited set of repeated and acoustically stereotyped syllables.
The proposed framework first segments recordings into syllable-level events and con
structs per-bird syllable dictionaries from isolated recordings. For each mixture syl
lable, candidate exemplars are retrieved using acoustic similarity measures, aligned
in time, and combined through waveform-domain reconstruction. In the refined ver
sion of the method, the final decision between one-bird and two-bird explanations is
made by a lightweight learned decision layer trained on synthetic labeled segments.
The framework was evaluated on both synthetic and real mixtures of zebra finch
song. Synthetic experiments provided the main validation setting because bird
wise ground truth was known. In a dictionary-matched setting, the refined method
achieved strong active-set performance, while in a stricter disjoint file-level setting
it remained effective but showed the expected reduction in generalization to unseen
syllable renditions. Comparisons with NMF and with a compact deep learning
baseline further clarified the position of the proposed approach: the method does
not aim to maximize reconstruction alone, but to provide a selective, interpretable,
and data-efficient bird-wise decomposition.
Real-data evaluation was carried out on six mixture recordings. Because source
ground truth is not available for these recordings, assessment focused on reconstruc
2
tion consistency and segment-level decision behavior rather than direct attribution
accuracy. In the final real-recording implementation, a hybrid reconstruction rule
was used: one-bird segments were assigned directly from the observed mixture sylla
ble, whereas two-bird segments were reconstructed from aligned bird-specific exem
plars. Under this configuration, the framework achieved a mean explained variance
of 0.4881 across the six real mixtures.
Overall, the results show that syllable-level structure can serve as a practical basis
for separating overlapping zebra finch vocalizations in a way that remains transpar
ent and biologically meaningful. More broadly, the thesis demonstrates that inter
pretable and data-efficient source separation can be achieved by combining explicit
vocal priors with local reconstruction and decision mechanisms
Description
Release date : 2027-05-14.