An error occured Please, provide your email address This address is invalid Your email is already subscribed. Do you wish to unsuscribe ? newsletter nolist newsletter refreshed You've been unsuscribed Your email has been registered

Subscribe

Your email adress

By signing up, you agree to receive emails every time the Deezer Newsroom is updated. You can unsubscribe at any time by clicking the link at the bottom of each email.

Deezer R&D goes to NIPS 2016

Deezer Dec 21, 2016 5 min read

This month, Jimena and Romain from the Deezer R&D team were in Barcelona to attend NIPS — Neural Information Processing Systems, one of the main conferences about artificial intelligence.

Numerous scientists and companies working on machine learning were there: Google, Facebook, DeepMind, Microsoft Research, Amazon, Criteo, as well as numerous universities from all over the world.

This year was a new record of participants with more than 6000 attendees, confirming the importance of this area in the industry and academy.

Many aspects of Artificial Intelligence were addressed, but this year, the hottest topics were the following ones.

Reinforcement Learning

Reinforcement Learning (RL) systems can learn to solve a complex problem without needing to be explicitly taught how to do it.

Recently, notable problems has been tackled using RL architectures: automatically played ATARI games (from raw game pixels), defeating humans in the Go game, simulated animals that learn to walk and run, robots arms that learn to manipulate object

In an RL framework, there is an agent (for example a player) living in an environment (a game) that must take a decision (ex.: a direction to take) in order to maximize a reward (winning points).

In order to solve the task, the RL system learns a correspondence (a mapping) from a couple (states of the agent, environment observations) to an action to take.

We saw the biggest machine learning companies releasing artificial intelligence platforms where is possible to train RL systems: Universe for Open AI, DeepMind Lab for Google’s DeepMind and Malmon (Minecraft) for Microsoft.

Visualisation of an agent trained with deep reinforcement learning methods at DeepMind (source).

Generative networks

Generative networks are systems that are able to generate data, such as images that look like real images.

Recently, a new way of generating images, based on two neural networks, was proposed: one network is used to generate images, the other one is used to discriminate actual image from images generated by the first one.

The two parts thus act one against the other, the discriminator trying to detect fake images and the generator trying to fool the discriminators: that’s why they are called adversarial.

This kind of architecture was proposed two years ago, but was lacking stability and ability to generate actual images: as can be seen below, images looks quite real when looked from far away, but looked very weird when looked closely.

Lot of papers were thus addressing these issues.

*Images generated by a Generative adversarial network (source*).

Besides these main topics, others subjects drew our attention:

The workshop on extreme classification (that is trying to classify items with an extremely large number of labels) was quite interesting, showing in particular how multi-modal approaches (for example using both texts and images) could result in more accurate classifications than mono-modal ones.
A fun and interesting poster was the one presented by people from Boston University and Microsoft Research, where the researchers found that a word embedding space (a popular framework to represent text data as vectors), learned from Google News, contained a direction that encoded gender bias. For example, they found that the embedding revealed implicit sexism in the text, founding a geometric representation of the correspondence man::computer programmer and woman::homemaker. The authors also found a way to remove this biases from the embedding space.
An interesting demo was performed by Youtube people. They showed how to learn a video content-based based similarity on the youtube8M dataset. The system was trained to predict ground-truth video relationships (identified by a co-watch-based system) only from visual content.
In another interesting work using videos (from Flickr website), researchers from MIT trained a system to solve an acoustic scene/object classification task, using a large data set of unlabeled video. They managed to transfer discriminative visual knowledge from image classification networks into sound space, learning the acoustic representation of natural scenes sound. They used the raw audio as input to the deep network that processed the audio and were able to predict object in videos from audio only.