Responsible data science : using event data in a “people friendly” manner

Conference Contribution

van der Aalst, W.M.P. (2017). Responsible data science : using event data in a “people friendly” manner. In O. Camp, J. Cordeiro, M.M. Missikoff, L.A. Maciaszek & S. Hammoudi (Eds.), Enterprise Information Systems (pp. 3-28). (Lecture Notes in Business Information Processing, No. 291). Dordrecht: Springer. In Scopus Cited 0 times.

Read more: DOI      Medialink/Full text

Abstract

 

The omnipresence of event data and powerful process mining techniques make it possible to quickly learn process models describing what people and organizations really do. Recent breakthroughs in process mining resulted in powerful techniques to discover the real processes, to detect deviations from normative process models, and to analyze bottlenecks and waste. Process mining and other data science techniques can be used to improve processes within any organization. However, there are also great concerns about the use of data for such purposes. Increasingly, customers, patients, and other stakeholders worry about “irresponsible” forms of data science. Automated data decisions may be unfair or non-transparent. Confidential data may be shared unintentionally or abused by third parties. Each step in the “data science pipeline” (from raw data to decisions) may create inaccuracies, e.g., if the data used to learn a model reflects existing social biases, the algorithm is likely to incorporate these biases. These concerns could lead to resistance against the large-scale use of data and make it impossible to reap the benefits of process mining and other data science approaches. This paper discusses Responsible Process Mining (RPM) as a new challenge in the broader field of Responsible Data Science (RDS). Rather than avoiding the use of (event) data altogether, we strongly believe that techniques, infrastructures and approaches can be made responsible by design. Not addressing the challenges related to RPM/RDS may lead to a society where (event) data are misused or analysis results are deeply mistrusted.