‘Hey Siri, play my Flow on Deezer’ (part 2): integrating SiriKit with Deezer

co-authored by Martin Lukacs

Following WWDC19 announcements, Deezer quickly decided to support Media Intent with SiriKit. To know more about all the benefits of integrating a voice assistant in your app, you can read this post written by Anna Louis, Product Owner of Voice at Deezer.

Adding support of SiriKit in your app is as simple as… Wait, nothing is that simple! In this post, you won’t actually find a “How to do it” tutorial. It would be too long and would only be copying what Apple already provides with this video and this sample code. Instead, you will discover what it has meant to make this feature work in Deezer’s iOS app.

Throughout this article, we will take a closer look at several interesting aspects of Siri implementation in an iOS application. You will discover how to work with the Apple extension dedicated to supporting Siri intents and discuss the relative complexity of application-extension communication. At this point, you’ll be ready to dive into the specifics of working with and supporting Media Intents. Finally, we’ll finish off with a few tips on Siri’s vocabulary comprehension.

How to work with an extension?

A good introduction to how an app extension works can be found on Apple’s developer documentation.

To interact with Siri, iOS will use an app extension to launch a lighter process of our app that will only aim to support those kinds of interactions. Once again, Apple documentation to create an Intents App Extension is well written on. So in this section, we will focus on how, at Deezer, we have created this extension while trying to reuse and share as much code as possible.

What are the tools our app can provide to make the extension work flawlessly?

  • A tool to make network requests to our API
  • The current user session — to update their library and check their rights to access some content (to fail early and return the appropriate .subcriptionRequired response code)
  • The user’s playlist metadata — to improve the local search of content and avoid a network request

By chance, Deezer recently had created an Auth.framework module to manage authentication across our different apps. This framework supports SSO (sign in on an app and share these credentials with another app). This facilitates the work to share user session credentials to the extension and authenticates the user from the Siri extension.

To support network calls to our API in Siri extension without duplicating code, we shared some files between our main app and the extension. To do so, it just required to check a box:

File Inspector Menu to set multiple targets

This allowed us to share all the logic of a request to our API: setting default HTTP headers, default query parameters and configuring default processing of our API response.

Lastly, to share some data between our app and our extension, we created an Interface SharedStorage that uses an instance of UserDefaults(suiteName:) to store simple data. For example, some metadata composed of only the name and the ID of each user’s playlist.

One downside of working with Apple Extension is that the code we write is difficult to unit test:

To test an app extension using the Xcode testing framework (that is, the XCTest APIs), write tests that exercise the extension code use your containing app as the host environment (source).

From INIntent to Deezer content

There are 4 types of Media Intents available since iOS 13. Deezer chose to support all of them from day one:

  • 🎶 An intent to support playing media: INPlayMediaIntent
  • ➕ An intent to add a media to a playlist or the user library: INAddMediaIntent
  • 👍 An intent to like or dislike a media: INUpdateMediaAffinityIntent
  • 🔍 An intent to search a media and open it in the app: INSearchForMediaIntent

Each INIntent has the same 3 following phases:

1. Resolve the parameters of the intent

In this first phase, we will check if the received parameters are valid and if we have enough information to handle the intent. For media intents, it usually means finding content in the Deezer catalog using information from the INMediaSearch object.

Our extension won’t resolve the parameters of INPlayMediaIntent and INAddMediaIntent in the same way. Let’s see why.

To understand the difference, we should look at what behavior the user expects from a request and what we want to support.

What are you more likely to ask Siri?

“Hey Siri, tell Deezer to add Makeba to my Chill playlist”
“Hey Siri, add this song to my Chill playlist”

In the case of a playing intent, if we don’t play exactly what the user asked for, he can just try again and expect better results. In the case of adding the media to the user’s library or a playlist, this can be more of a problem. The operation will be done but if we matched the wrong content, then the user will have unwanted media in their collection. By supporting only .currentlyPlaying for INAddMediaIntent we know that we will have more information at hand.

INMediaSearch object is indeed prefilled with information from MPNowPlayingInfoCenter. So all properties: mediaName, artistName, albumName… will exactly match what our main app player has set in the nowPlayingInfo dictionary. As INMediaSearch doesn’t contain our internal track identifier, we will have to retrieve it with the help from our search engine.

According to the type of intent and the information on INMediaReference property, we can alter our search behavior:

  • For Play and Search intent — we will use our default search as you would use our search inside the application, and we will query our search using the terms available in INMediaSearch object
  • For AddMedia and UpdateAffinity intent — we will only resolve for .currentlyPlaying content and then use this information to search for an exact match of the song title, artist name and album name

2. Confirm the details of the intent

This phase is usually used to confirm with the user what the app understood before proceeding (like making sure the order is correct before making the payment). In the context of a media intent, Apple doesn’t recommend using this phase to smooth out the process during Siri interaction.

3. Handle the intent

To handle the intent, you can either do the action from the extension or delegate it to the main app.

For INPlayMediaIntent we have to first launch the app to play the audio. This is because our extension has a short living period and will be killed after handling the intent. We will keep the app launched in the background so that we can start the audio without interrupting what the user is doing. To request a background launch, just add this line in your handle phase: completion(.init(code: .handleInApp, userActivity: nil)).To handle the intent, you can either do the action from the extension or delegate it to the main app.

For INPlayMediaIntent we have to first launch the app to play the audio. This is because our extension has a short living period and will be killed after handling the intent. We will keep the app launched in the background so that we can start the audio without interrupting what the user is doing. To request a background launch, just add this line in your handle phase: completion(.init(code: .handleInApp, userActivity:nil)).

INSearchForMediaIntent expects you to open the app and show the search results to the user. This can be done using the .continueInApp response code.

But when looking at INUpdateMediaAffinityIntent and INAddMediaIntent, things become more challenging.

Both intents could be handled directly by the extension. This would prevent the app from being launched in the background, taking more resources and time to execute the action.

But is this task easily doable inside the extension? To do so we would need to access some classes that manage the user’s library state which, in turn, will also access some persisted information (the extension will not be granted access to the app storage so we need to store our data in some shared container).

From here, it seems we don’t have much of a choice. To avoid a huge refactoring of an important piece of our code, we will have to hand over the control to the app to execute the action.

Before making this decision, we have to look at one last thing, the documentation. There, you will discover that Apple doesn’t allow .handleInApp as a response to handle the INUpdateMediaAffinityIntent.

Hopefully, handling the media affinity updates can be quite simple:

  1. Make the network call to update the affinity state on the server’s side
  2. Ask the app to refresh the cached affinity state (usually done at the app launch to get the changes you may have done on another device such as a desktop, but here we must take into account that the app will already be running)

To handle the latter, after making the network call, we will set a flag in a shared storage (using UserDefaults(suiteName:)). In the app, we listen to UIApplication.didBecomeActiveNotification notification to check this flag and trigger the necessary refresh.

Improve Siri for Deezer usage

Siri already does a lot of work to automatically recognize and match artists, albums and even song titles out of the box. This process seems to rest on Siri’s knowledge of Apple’s Music catalog. But, there are still some cases where Siri falls short and will require some help to process the intent.

User vocabulary

“Hey Siri, play my lockdown playlist”

For user playlists, there are cases where some entities are fully customizable and unpredictable.

Siri provides an API through INVocabulary to feed it new content. It enables you to associate certain words with a specific content type. In our case, we mostly used .mediaPlaylistTitle vocabulary type attached to the user’s playlist names. This enables users to request their playlists by name through Siri.

In the Deezer app, we configure a PlaylistVocabularyUpdater object that listens to any playlist update in the user’s library and keeps the user’s Siri vocabulary always up to date.

App’s custom terminology

Another kind of vocabulary interesting to focus on is specific company brand names.

“Hey Siri, play my Flow in Deezer”

In this particular case, Siri will not recognize what “Flow” is, and putting this term in the Deezer search engine will not provide any result. Flow is our app custom terminology to identify a custom playlist that contains all your favorite music, mixed with fresh recommendations and songs you forgot you loved.

One solution to tackle this problem would have been using Global Vocabulary Reference, unfortunately, it doesn’t apply to Media Intent.

Instead, during the resolve phase, we will look for the word “flow” to resolve against our Flow custom INMediaItem.

This has some caveats. If a song / album / playlist contains the word flow, it will never be found by Siri because of us bypassing the search process, as we will always prefer matching flow to our custom experience. (hopefully, this ‘flow’ term is rare enough to not conflict in most cases and still offer a great Deezer experience).

A third kind of vocabulary, undocumented…

“Hey Siri, play my favorites on Deezer”
“Hey Siri, play new releases on Deezer”

As you can imagine, there are many ways to ask Siri to play such things. Just to give you an example it could be any phrase containing words like “play favorite music” or “fetch new releases.” Consequently, it would be very difficult to use the flow technique discussed above to match this kind of intent for all languages.

The task doesn’t seem easy at first. But, while conducting tests, we discovered that Siri has built-in helpers for these cases. When interpreting these generic media intents, Siri will always return some exact string in INMediaSearch.mediaName be it "new releases" or "my favorite mix".

This makes it much easier to parse the user’s intention and respond to it. There may be more of these prebuilt catchphrases that we didn’t discover and we hope that Apple will document them in the near future.

Overall our journey in the implementation of Siri was very interesting and rewarding. Seeing first hand the progress in language processing and what can be done with it is impressive.

We can just hope and see what the next big step can be built on top of that.