EAISI Data Science lecture by Matthijs van Leeuwen

Date
Thursday March 5, 2020 from 12:30 PM to 1:30 PM
Location
TU/e
Address
Het Kranenveld 12
Organizer
Data Science
Building
Gaslab, building 12

matthijs van leeuwen

Title
Machine learning meets data mining: interpretable prediction and subgroup discovery

Abstract
Interpretable machine learning has recently witnessed a strong increase in attention, both within and outside the scientific community. This is especially true for application domains where decision making is crucial and requires transparency, such as in health care. While it is of interest to investigate how existing `black-box' machine learning models can be made transparent, the trend towards interpretability also offers opportunities for data mining, as this field traditionally has a strong emphasis on intelligibility.

In the machine learning literature, examples of interpretable predictive models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In the data mining literature, on the other hand, the discovery of compact, descriptive rule lists has long been studied under the name of subgroup discovery. Although the problems are similar, they are typically defined and solved in rather different ways.

In this talk, I will present recent results in which we use probabilistic rule lists and the minimum description length (MDL) principle to unify (multi-class) classification, regression, and subgroup discovery. Our formal framework allows for virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, effectively avoiding the need for hyperparameter tuning. We empirically demonstrate that our heuristics select small probabilistic rule lists that serve as either accurate predictors or descriptive subgroups.

Biography
Dr. Matthijs van Leeuwen is assistant professor and group leader of the Explanatory Data Analysis group at LIACS, Leiden University, the Netherlands. His primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and -ultimately- novel knowledge? Van Leeuwen was awarded several grants, including NWO Rubicon, TOP2 and TTW Perspectief grants, best paper awards, co-organised international conferences and workshops, and is on the editorial board of DAMI and the guest editorial board of the ECML PKDD Journal Track. He was guest editor of a TKDD special issue on ‘Interactive Data Exploration and Analytics’.

Organizer

Data Science

Data Science is an interdisciplinary field that uses a variety of techniques to create value based on extracting knowledge and insights from available data. The successful and responsible application of these methods highly depends on a good understanding of the application domain, taking into account ethics, business models, and human behavior.