Event abstraction for process mining using supervised learning techniques
ConferentiebijdrageTax, N., Sidorova, N., Haakma, R. & van der Aalst, W.M.P. (2017). Event abstraction for process mining using supervised learning techniques. In Y. Bi, S. Kapoor & R. Bhatia (Eds.), Proceedings of the SAI Intelligent Systems Conference (IntelliSys 2016), 21-22 September 2016, London, United Kingdom (pp. 251-269). (LNNS, No. 15). Dordrecht: Springer Netherlands. Lees meer: Medialink/Full text
Process mining techniques focus on extracting insight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or process models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process model from a low-level event log, structure in the process can in some cases still be discovered by first abstracting the event log to a higher level of granularity. This gives rise to the challenge to bridge the gap between an original low-level event log and a desired high-level perspective on this log, such that a more structured or more comprehensible process model can be discovered. We show that supervised learning can be leveraged for the event abstraction task when annotations with high-level interpretations of the low-level events are available for a subset of the sequences (i.e., traces). We present a method to generate feature vector representations of events based on XES extensions, and describe an approach to abstract events in an event log with Condition Random Fields using these event features. Furthermore, we propose a sequence-focused metric to evaluate supervised event abstraction results that fits closely to the tasks of process discovery and conformance checking. We conclude this paper by demonstrating the usefulness of supervised event abstraction for obtaining more structured and/or more comprehensible process models using both real life event data and synthetic event data.