Separation of Overlapping Zebra Finch Vocalizations Using Syllable-Based Optimization

Abstract

Overlapping vocalizations are common in social animal recordings and make it dif ficult to study the behavior of individual animals. In zebra finches, this problem is especially challenging in single-microphone recordings, where several birds may sing at the same time but only one mixed signal is observed. This thesis presents an interpretable syllable-based framework for separating overlapping zebra finch vo calizations by exploiting a key property of the signal: each bird produces songs composed of a limited set of repeated and acoustically stereotyped syllables. The proposed framework first segments recordings into syllable-level events and con structs per-bird syllable dictionaries from isolated recordings. For each mixture syl lable, candidate exemplars are retrieved using acoustic similarity measures, aligned in time, and combined through waveform-domain reconstruction. In the refined ver sion of the method, the final decision between one-bird and two-bird explanations is made by a lightweight learned decision layer trained on synthetic labeled segments. The framework was evaluated on both synthetic and real mixtures of zebra finch song. Synthetic experiments provided the main validation setting because bird wise ground truth was known. In a dictionary-matched setting, the refined method achieved strong active-set performance, while in a stricter disjoint file-level setting it remained effective but showed the expected reduction in generalization to unseen syllable renditions. Comparisons with NMF and with a compact deep learning baseline further clarified the position of the proposed approach: the method does not aim to maximize reconstruction alone, but to provide a selective, interpretable, and data-efficient bird-wise decomposition. Real-data evaluation was carried out on six mixture recordings. Because source ground truth is not available for these recordings, assessment focused on reconstruc 2 tion consistency and segment-level decision behavior rather than direct attribution accuracy. In the final real-recording implementation, a hybrid reconstruction rule was used: one-bird segments were assigned directly from the observed mixture sylla ble, whereas two-bird segments were reconstructed from aligned bird-specific exem plars. Under this configuration, the framework achieved a mean explained variance of 0.4881 across the six real mixtures. Overall, the results show that syllable-level structure can serve as a practical basis for separating overlapping zebra finch vocalizations in a way that remains transpar ent and biologically meaningful. More broadly, the thesis demonstrates that inter pretable and data-efficient source separation can be achieved by combining explicit vocal priors with local reconstruction and decision mechanisms

Description

Release date : 2027-05-14.

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By