Date
Thursday March 5, 2020 from 12:30 PM to 1:30 PMLocation
TU/eAddress
Het Kranenveld 12Organizer
Data ScienceBuilding
Gaslab, building 12matthijs van leeuwen
Title
Machine learning meets data mining: interpretable prediction and subgroup discovery
Abstract
Interpretable machine learning has recently witnessed a strong increase in attention, both within and outside the scientific community. This is especially true for application domains where decision making is crucial and requires transparency, such as in health care. While it is of interest to investigate how existing `black-box' machine learning models can be made transparent, the trend towards interpretability also offers opportunities for data mining, as this field traditionally has a strong emphasis on intelligibility.
In the machine learning literature, examples of interpretable predictive models include decision trees, rule sets, and rule lists. Learning such models often involves optimizing hyperparameters, which typically requires substantial amounts of data and may result in relatively large models. In the data mining literature, on the other hand, the discovery of compact, descriptive rule lists has long been studied under the name of subgroup discovery. Although the problems are similar, they are typically defined and solved in rather different ways.
In this talk, I will present recent results in which we use probabilistic rule lists and the minimum description length (MDL) principle to unify (multi-class) classification, regression, and subgroup discovery. Our formal framework allows for virtually parameter-free model selection that naturally allows to trade-off model complexity with goodness of fit, effectively avoiding the need for hyperparameter tuning. We empirically demonstrate that our heuristics select small probabilistic rule lists that serve as either accurate predictors or descriptive subgroups.
Biography
Dr. Matthijs van Leeuwen is assistant professor and group leader of the Explanatory Data Analysis group at LIACS, Leiden University, the Netherlands. His primary research interest is exploratory data mining: how can we enable domain experts to explore and analyse their data, to discover structure and -ultimately- novel knowledge? Van Leeuwen was awarded several grants, including NWO Rubicon, TOP2 and TTW Perspectief grants, best paper awards, co-organised international conferences and workshops, and is on the editorial board of DAMI and the guest editorial board of the ECML PKDD Journal Track. He was guest editor of a TKDD special issue on ‘Interactive Data Exploration and Analytics’.