Deezer, as a streaming service, needs to pay royalties to music providers (record companies that make their music available on Deezer) as well as publishers and copyright collective management organisations so that the music you listen to gets payed at the right price according to the number of times it has been streamed. We also need to generate and send them financial and statistics reports to allow them to pay their own right owners (performers, song writers) accordingly. Reports are also sent to various other partners (charts companies for example).
This is a more complex problem than it seems, in particular because we need to compute the data from billions of listens coming from millions of users in almost 200 countries. Therefore a whole engineering team is dedicated to this work to make sure all computations are correct and completed in time.
Collecting Data
Logging Listens from our Customers
In order to pay everybody fairly, we need to collect data from all devices that can stream music from Deezer. It includes the website, Android and iOS apps but also all the other devices like connected TVs, connected Hi-Fi systems, car stereos and many more. One of our jobs is to make sure that all these devices developed by different teams at Deezer send the right information to our system. Once we have all the listens from the users, we need to extend the log with additional data: we need information on the users’ offers to know how much we should pay, the name of the provider to know who we should pay, etc.
Gather Information from Many Teams in the Company
In order to extend the log with more data, we need to collect details from various teams at Deezer:
- Payment team: they provide information about offers and currencies for each country, which is needed because each offer is paid differently and we need to convert all invoices in local currency;
- Catalog team: they provide us with metadata information to know to whom we need to pay (labels, publishers or collective management organisations);
- Financial team: they have -among other things- the turnover information, which is a key information in the payment of the royalties since we pay the right owners according to how much money we have earned in the month for a specific country and service;
- Acquisition team: this is the most important team for us because they provide information about the contracts we have with rights holders (the applicable terms and conditions) and all the details that will help us send the invoices and reports to the right person (report format, ftp credentials, email for notifications, etc.) We consequently work with this team on a daily basis and provide them with a user interface to ingest all this data efficiently.
Once we have gathered all the data we need, a lot of computation is required to analyze it and extract something meaningful. That’s why we need to use BigData technologies.
Computing Royalties
The Royalties and Reporting team has been the first team at Deezer to use BigData technologies in 2012. A Hadoop Cluster was created to support the ever-growing quantity of data collected to compute royalties and generate reports. The cluster is now used by many teams within the company and is the center of all the business.
At the beginning, the cluster was only doing the pre-computation for another system (that had been working since 2009) that would compute the royalties. But we understood in 2014 that this system had reached its limits (in terms of performance and features) and that we needed at least a new version or better a fresh new start. That’s finally what we started to do in September 2014 when the Zephir project was born. In March 2015, after a few months of parallel run with the previous system, the software was ready to be put into production and February 2015 was the first royalties computation performed by Zephir using only BigData technologies to do the computations. We decreased the computation time from 24h to 40 minutes!
Daily Statistics Workflow
On the following workflow you can see the operations that we perform daily. We collect all the listens (logs streams) then we add information about users (like their offer and country) and the songs they listened to. We do a bit of cleaning of these logs (removing duplicates, malformed logs, etc.), then we gather information about partners so that we know what to send and to whom and create all the daily reports requested by our partners.
Monthly Financial Workflow
At the end of each month, a new workflow is launched (see following diagram). We take all the daily data we generated for the month and add data on offers, subscribers and turnovers we collected for the month on a specific country and offer. The royalties computation is then executed and various reports are created from this.
What’s Next ?
Since the launch of Zephir in 2015, the team has kept growing from 2 data engineers to a team of 3 data engineers, 2 web engineers and a data scientist. The project has grown very fast and allows Deezer to be more and more precise in its royalties computations, especially in what we report to our partners. The BigData technologies even allow us to imagine new ways of making payments in the music industry: the user centric payment system, a much better and fairer way of distributing revenues to artists…but it’s another story!
In the meantime you can discover our latest career opportunities on Deezer Jobs. In particular, the Royalties and Reporting team is currently looking for a Data Architect.