Assessing process discovery scalability in data intensive environments


Hernández, S., Ezpeleta, J., Van Zelst, S.J. & Van Der Aalst, W.M.P. (2016). Assessing process discovery scalability in data intensive environments. Proceedings - 2015 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015, 7-10 December 2015, Limassol, Cyprus (pp. 99-104). Piscataway: Institute of Electrical and Electronics Engineers (IEEE). In Scopus Cited 5 times.

Lees meer: DOI      Medialink/Full text



Tremendous developments in Information Technology (IT) have enabled us to store and process huge amounts of data at unprecedented rates. This phenomenon largely impacts business processes. The field of process discovery, originating from the area of process mining, is concerned with automatically discovering process models from event data related to the execution of business processes. In this paper, we assess the scalability of applying process discovery techniques in data intensive environments. We propose ways to compute the internal data abstractions used by the discovery techniques within the MapReduce framework. The combination of MapReduce and process discovery enables us to tackle much bigger event logs in less time. Our generic approach scales linearly in terms of the data size and the number of computational resources used, and thus, shows great potential for the adoption of process discovery in a Big Data context.