Deezer Introduces Search by Lyric
Two years ago, Deezer introduced the ability to display lyrics while listening to a song. Today, almost 6 million tracks are available with lyrics. In the Deezer tech team, we thought it would be really convenient to allow tracks to be searched by lyric query. Who has never been stuck with lyrics in their head and unable to find the song they belong to? So we introduced the lyric search feature…
How does it work?
When we talk about search, we’re talking about both the search query itself and the index of search terms. First, we build an index based on the open source search engine Elasticsearch (indeed the entire Deezer search engine relies on it). We index JSON documents representing track lyrics like this:
So we have four fields:
- id: ID of the track
- text: lyric text of the track
- global_rank: popularity of the track
- product_track_id: a field that allows us to regroup the same audio track behind a unique ID
- Our index contains around 5,700,000 tracks for an index size inferior to 3GB.
Let’s talk about the search query now. Commonly, searching for music involves the querying of short text metadata. The frequency of the search term in the metadata text, typically measured by a tf-idf score, is not particularly important; we only care if the search query is there or not. Querying lyrics text however is more akin to a traditional search on a text document where the frequency of the query must be taken into account. In order to compute a score that reflects the matching between the user query and the track lyrics, we use the ElasticSearch Function Score Query.
Another thing when searching in textual document is to deal with stop words. The basic approach is to remove stop words during the indexing phase. The problem with this approach is that, while stop words normally have a small impact on relevance, they are important in the specific case of music lyrics. If we remove stop words, we are unable to distinguish between “happy” and “not happy”. Moreover, stop words are language dependent. An alternative to stop words is provided by the Elasticsearch Common Terms Query. To briefly describe the process, the common terms query divide terms into two groups; more important (i.e. low frequency terms) and less important (i.e. high frequency terms).
Finally, our complete query pattern is the following:
This query pattern works well in most cases. However sometimes this query pattern doesn’t retrieve the expected track in the top position. This is because we consider the query as a group of words instead of a phrase. Searching for “Look at your children” is equivalent to search for “Children look at your”.
The first option is to search for tracks by lyrics directly on Deezer. In order to do that, you need to use the hidden advanced search feature. So go to Deezer and search for a query with the following syntax: lyrics:“oh yeah yeah oh yeah yeah yeah yeah”.
The second, more user-friendly option is to go to the dedicated Deezer search by lyrics website: www.deezbylyrics.fr. It provides you with a light search interface and a Deezer player that allows you to listen to the tracks you were looking for.
Oh, by the way…
Interested in joining our teams?
Discover all our open positions on jobs.deezer.com.