|
New England Database Society sponsored by Sun Microsystems |
|
|
|
NEDS |
Constraints
and Privacy in Data Mining
Johannes Gehrke
Cornell University
Friday,
Nov. 15, 2002, 4:00
Volen 101, Brandeis University
(preceded by a wine and cheese reception at 3:00 pm)
Abstract:
In this talk, I
will talk about some recent results on one of the oldest data mining problem:
Finding large itemsets. First, I will describe an algorithm that efficiently
mines with both monotone and anti-monotone constraints, and I will describe how
this work opens the door for research on more complex constraints that were
thought of as infeasible previously. Second, I will talk about a framework for
mining large itemsets where the input data has been randomized to preserve
privacy of individual transactions. While it is feasible to recover association
rules and preserve privacy using a straightforward "uniform"
randomization, the discovered rules can unfortunately be exploited to find
privacy breaches. We analyze privacy breaches, propose solutions, and show
experimental results on real data.
Speaker Bio:
Johannes Gehrke is
an assistant professor in the Department of Computer Science at Cornell
University. He obtained his Ph.D.
in computer science from the University of Wisconsin-Madison in 1999; his
graduate studies were supported by a Fulbright fellowship and an IBM
fellowship.
Johannes' research interests are in the areas of data mining, data stream processing, and novel distributed database technology such as data management for sensor networks and peer-to-peer networks. Johannes has received a National Science Foundation Career Award, an IBM Faculty Award, and the Cornell College of Engineering James and Mary Tien Excellence in Teaching Award. He is the author of numerous publications on data mining and database systems, and he co-authored the textbook "Database Management Systems" (McGraw-Hill, currently in its third edition).
Maintained by Dina Goldin dqg AT cse.uconn.edu
Last updated on 11/04/02