Stefan Kramer


The best privacy defense is a good privacy offense: obfuscating a search user's profile


User privacy on the internet is an important and unsolved problem. So far, no sufficient and comprehensive solution has been proposed that helps a user to protect his or her privacy while using the internet. Data are collected and assembled by numerous service providers. Solutions so far focused on the side of the service providers to store encrypted or transformed data that can be still used for analysis. This has a major flaw, as it relies on the service providers to do this. The user has no chance of actively protecting his or her privacy. In this work, we suggest a new approach, empowering the user to take advantage of the same tool the other side has, namely data mining to produce data which obfuscates the user’s identity. We apply this approach to search engine queries and use feedback of the search engines in terms of personalized advertisements in a reinforcement learning algorithm to generate new queries potentially confusing the search engine. We evaluated the approach using a real-world data set. While evaluation is hard, we achieve results that indicate that it is possible to influence the user’s profile that the search engine generates. This shows that it is feasible to defend a user’s privacy from a new and more practical perspective.


Stefan Kramer is full professor of data mining, head of department of the Institute of Computer Science of Johannes Gutenberg University Mainz and honorary professor of the University of Waikato in New Zealand, the home of the Weka workbench and other important machine learning libraries. From 2003 to 2011 he was associate professor at the Computer Science Department of Technische Universität München. Stefan is the author of award winning papers (ACM SIGKDD, IEEE ICDM, ILP, IEEE ICBK), was IEEE ICDM 2013 vice chair, ACM SAC Data Mining Track Chair 2014-17 and is regularly area chair of all major data mining conferences. His research interests are centered around representations of data and knowledge for machine learning and data mining, speeding up algorithms for the data stream setting, temporal data, and applications.