Announcing the 3rd annual Machine Learning in the Real World Workshop

By: Criteo Research / 02 Oct 2017

Criteo Research will be organising its 3rd annual Machine Learning in the Real World workshop. This one day workshop aims at bringing together people from both industry and academia to better understand which machine learning algorithms are used in practice and what can be done to improve them. It is at the same time an opportunity to learn about fundamental and advanced aspects of machine learning, from leaders in the field. Hence, the idea of assembling a group of brilliant and engaging international speakers and attendees from some of the leading faculty and institutions in the world.

Date & venue

The event will take place in our Paris headquarters, 32 rue Blanche, 75009 Paris, on November 8th, 2017.

Want to contribute?

For the first time, this year, we are inviting attendees to submit papers for short presentations or posters. Presentations will last between 5 to 10 minutes during the Spotlight session (from 14:45 pm -15:45 pm) , and the poster sessions will take place during coffee breaks and networking time. To submit a paper, follow this link

Tentative schedule

9:15-9:45     Welcome Coffee

9:45-10:30   Thorsten Joachims – Cornell University

10:30-11:15    Alexandros Karatzoglou – Telefonica Research

11:15-11:45    Coffee Break

11:45-12:30    Sathiya Keerthi – Microsoft

12:30-14:00    Lunch break

14:00-14:45    Chih-Jen Lin – National Taiwan University

14:45-15:45    Spotlight session on contributions

15:45-16:15    Coffee break

16:15-17:00    Csaba Szepesvari – Google Deepmind

17:00-19:00     Cocktail (with posters sessions)


This year’s event will be open to 80 selected attendees who apply machine learning on real world data in their everyday activities. Selection of applicants will be based on proficiency in machine learning but we will also ensure a good balance between academia and industry, as well as a diversity of backgrounds and interests.

To secure your place, please register here before Nov 3rd.

Videos from our previous edition can be found here

Speaker bio & abstracts

Joachims Thorsten

Cornell University

Title: Deep Learning from Logged Interventions


Every time a system places an ad, presents a search ranking, or makes a recommendation, we can think about this as an intervention for which we can observe the user’s response (e.g. click, dwell time, purchase). Such logged intervention data is actually one of the most plentiful types of data available, as it can be recorded from a variety of systems (e.g., search engines, recommender systems, ad placement) at little cost. However, this data provides only partial-information feedback — aka “bandit feedback” — limited to the particular intervention chosen by the system. We don’t get to see how the user would have responded, if we had chosen a different intervention. This makes learning from logged bandit feedback substantially different from conventional supervised learning, where “correct” predictions together with a loss function provide full-information feedback. It is also different from online learning in the bandit setting, since the algorithm does not assume interactive control of the interventions.

In this talk, I will explore deep learning methods for batch learning from logged bandit feedback (BLBF). Following the inductive principle of Counterfactual Risk Minimization for BLBF, this talk presents an approach to training deep networks from propensity-scored bandit feedback, demonstrating its effectiveness for applications ranging from visual object detection to ad placement.


Joachims Thorsten is a Professor in the Department of Computer Science and Chair of the Department of Information Science at Cornell University. His research interests center on a synthesis of theory and system building in machine learning, with applications in information access, language technology, and recommendation. His past research focused on counterfactual and causal inference, support vector machines, text classification, structured output prediction, convex optimization, learning to rank, learning with preferences, and learning from implicit feedback. In 2001, he finished his dissertation advised by Prof. Katharina Morik at the University of Dortmund. He is an ACM Fellow, AAAI Fellow, and Humboldt Fellow.

Alexandros Karatzoglou

 Telefonica Research

Title: Recurrent Neural Networks for Session-based Recommendations


Recurrent Neural Networks have made a dynamic entry into the world of recommender systems particularly in modelling sequences of user interactions with items. Our research in the area started with our work on RNN’s for session-based based recommendations that was published in 2015 at ICLR. Since then we have extended the session-based RNN model with features, improved loss functions, personalization with hierarchical RNN’s, and compressed representations with bloom embeddings.

I will give an overview of our work in the area over the past two years.


Alexandros is the Scientific Director of Telefonica Research in Barcelona – Spain,  working on Machine Learning, Deep Learning, Recommender Systems, Information Retrieval. He earned his PhD from the Vienna University of Technology has worked as a visiting fellow at the Statistical Machine Learning group at NICTA/ANU in Canberra, Australia. His team includes researchers in the areas of HCI, Networks and Systems. They create Machine Learning algorithms for customer data and data generated by Network operations. Alexandros, frequently teaches Machine Learning courses at the UPF/BGSE in Python and R with an emphasis on Deep Learning and Recommender Systems.

S. Sathiya Keerthi


Title: Interplay between Optimization and Generalization in Deep Neural Networks


Deep Neural Networks (DNNs) have seen great successes in many large scale machine learning applications. Given the complex nature of the solution, the field has predominantly taken the approach of experimentally arriving at good solutions first and then explaining these via possible theories later. Recently there has been a lot of discussion around why DNNs generalize well in spite of heavy over-parametrization. One track of this discussion has been on the effect of the optimization method used for training, on generalization. For example, the stochastic gradient method has demonstrated very good generalization properties. This talk will review various studies on this interplay between optimization and generalization.


Keerthi is a principal scientist in Microsoft Mountain View, California. He has been with Microsoft since 2012, and held roles in the Cloud and Information Services Lab (CISL, pronounced as sizzle), an applied science group in Microsoft among others. Currently, he leads a machine learning group in Microsoft with a research focus on the design of distributed training algorithms for developing various types of linear and nonlinear models on Big Data, and the application of machine learning to textual problems.

Prior to Microsoft, Keerthi worked at Yahoo as a Senior Research Scientist as well as working the Indian Institute of Science, Bangalore, and for 5 years at the National University of Singapore.

Chih-Jen Lin

National Taiwan University

Title: Training large-scale linear classifiers: status and challenges

Many classification techniques such as neural networks or decision trees are nonlinear approaches. However, linear methods of using a simple weight vector as the model remain to be very useful for some applications. By careful feature engineering and having data in a rich dimensional space, the performance may be competitive with that of using a highly nonlinear classifier. In the past decade we have developed efficient optimization methods for large-scale linear classification and make them available in a package LIBLINEAR for
public use. In this talk we discuss optimization methods in three environments: single-thread, shared-memory, and distributed. Because of system issues, suitable algorithms for these environments may be significantly different. We broadly review issues that have been solved and discuss challenges that we are still trying to address.


Chih-Jen Lin is Distinguished Professor of Computer Science at National Taiwan University and a leading researcher in machine learning, optimization, and data mining. He is best known for the open source library LIBSVM, an implementation of support vector machines.

Chih-Jen Lin received his B.Sc. in Mathematics at National Taiwan University, and M.SE and Ph.D. in Operations at University of Michigan.
Csaba Szepesvari

University of Alberta

Title: Messy Bandit Problems


The exploration-exploitation dilemma is a key aspect of online interactive learning.
Classical finite-armed bandit problems provide an elegant formalism to capture the essence of this dilemma. Many great results have been obtained in this elegant framework, but how much of these can survive the challenges posed by the real world? In this talk, I will look sketch a few of these challenges and some solution attempts. The challenges include the effective use of structure and prior information for scaling up bandits to large scale problems, dealing with delayed and missing feedback, and “taming” exploration.


Csaba Szepesvari (PhD’99) is currently a Professor at the the Department of Computing Science of the University of Alberta and a Principal Investigator of the Alberta Innovates Center for Machine Learning, before which he was a senior researcher of the Computer and Automation Research Institute of the Hungarian Academy of sciences and held various industrial positions.

The co-author of a book on nonlinear approximate adaptive controllers and the author of a short book on Reinforcement Learning, he published about 150 journal and conference papers. He is best known for the UCT algorithm, which led to a leap in the performance of planning and search algorithms in many domains, in particular in computer go.

He is an Action Editor of the Journal of Machine Learning Research and the Machine Learning Journal. His research interests include reinforcement learning, statistical learning theory and online learning.