An error occured Please, provide your email address This address is invalid Your email is already subscribed. Do you wish to unsuscribe ? newsletter nolist newsletter refreshed You've been unsuscribed Your email has been registered

Subscribe

Your email adress

By signing up, you agree to receive emails every time the Deezer Newsroom is updated. You can unsubscribe at any time by clicking the link at the bottom of each email.

Zero-Note Samba: Self-Supervised Beat Tracking

Deezer Aug 16, 2023 2 min read

Supervised machine learning for music information retrieval requires a large annotated training set, and is thus an expensive and time-consuming process. To circumvent this problem, we propose to train deep neural networks to perceive beats in musical recordings despite having little or no access to human annotations. The key idea is to train two fully convolutional networks in parallel, which we name “Zero-Note Samba’’ (ZeroNS): the first analyzes the percussive part of a musical piece whilst the second analyzes its non-percussive part. These networks learn a self-supervised pretext task of synchrony prediction (sync-pred), which simulates the ability of musicians to groove together when playing in the same band. Sync-pred encourages the two networks to return similar outputs if the underlying musical parts are synchronized, yet dissimilar outputs if the parts are out of sync. In practice, we obtain the instrumental parts from commercial recordings via an off-the-shelf source separation system: Spleeter.

After self-supervised learning with sync-pred, ZeroNS produces a sparse output that resembles a beat detection function. When used in conjunction with a dynamic Bayesian network, ZeroNS surpasses the state of the art in unsupervised beat tracking. Furthermore, fine-tuning ZeroNS to a small set of labeled data (of the order of one to ten songs) matches the performance of a fully supervised network on 96 songs. Lastly, we show that pre-training a supervised model with sync-pred mitigates dataset bias and thus improves cross-dataset generalization, at no extra annotation cost.

This paper has been published in the IEEE/ACM Transactions on Audio, Speech, and Language Processing journal in 2023.

Written by Deezer

Up Next

Deezer Deezer launches music solution for businesses with high profile brands on-board

Jun 26, 2025 3 min read

Deezer PeakNetFP: Peak-based Neural Audio Fingerprinting Robust to Extreme Time Stretching

Jun 25, 2025 2 min read

Deezer Deezer launches world’s first AI tagging system for music streaming

Jun 20, 2025 4 min read