DSC/e Lecture Kai Puolamäki, Human-guided data exploration
- 23 May
- 12:30 - 13:30
- Doors open at 12:00h
- TU/e Zwarte Doos - Filmzaal
- From 17 April
- Research program CJ & DSC/e
We have recently proposed the paradigm of iterative data mining, where the user's current knowledge is modelled by a "background distribution" over the data sets. The user can control the data exploration process by inputting knowledge in the form of patterns observed during the data exploration process using intuitive interactions. These user-defined patterns can be formally expressed as constraints to the background distribution, and new data sets can be sampled from the updated background distribution with computationally efficient constrained randomization scheme. The system then shows the user views of the data that are maximally informative given the user's current knowledge, i.e., where the actual data and the background distribution differ most. Although this scheme is good at showing surprising views of the data to the user, there is a clear shortcoming: the user cannot easily steer the process. In many real cases we want to focus on investigating specific questions concerning the data. Human Guided Data Exploration framework generalizes previous research by allowing the user to interactively compare different complex user-defined hypotheses - which can also be expressed as constraints to the background distribution - about the relations in the data. To showcase the framework, we developed a free open-source tool, using which the empirical evaluation on real-world datasets was carried out. Our evaluation shows that the ability to focus on particular subsets and being able to compare hypotheses are important additions to the interactive iterative data mining process.
Reference: https://arxiv.org/abs/1804.03194 (and references therein)
Dr. Kai Puolamäki is a Professor of Practice at the Department of Computer Science in Aalto University and Vice Director of the Helsinki Institute for Information Technology HIIT. He completed his PhD in 2001 in theoretical physics at the University of Helsinki. His primary interests lie in the areas of data mining, machine learning, artificial intelligence, and related algorithms.