neds.gif (1190 bytes)

New England Database Society

Friday, April 28, 2006

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)

NEDS

Issues in Data Cleaning

Surajit Chaudhuri
      Data Management and Exploration Group
Microsoft Research

Friday, April 28, 2006, 4:00 PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm)

Abstract:

I will present an overview of the directions we are pursuing in our data cleaning research project. I will focus on two key issues: (i) Efficient data profiling and (ii) Primitive data cleaning operators. Data profiling serves as the starting point for investigating data quality issues in a database and I will describe the work we have done to do data profiling efficiently. Most of my talk will explore the idea of isolating primitive data cleaning operators that can serve as building blocks for developing a data cleaning solution. This approach is in contrast to identifying a few pre-defined heavy-weight operators. Specifically, we will describe one such useful primitive (SSJoin) in details. I will also take this opportunity to briefly mention the other two research projects (AutoAdmin and DBXplorer) in our group.

Speaker Bio:

Surajit Chaudhuri (http://research.microsoft.com/users/surajitc) oversees research in the database systems area at Microsoft Research in Redmond, consisting of Database and the Data Management and Exploration (DMX) groups.  In 1996, Surajit started the AutoAdmin project on self-tuning database systems at Microsoft Research and developed novel automated physical design tuning technology for SQL Server 7.0, SQL Server 2000 and SQL Server 2005. More recently, Surajit has begun work in the area of data cleaning and integration techniques. Part of this research has been incorporated in Microsoft SQL Server 2005. Surajit got his Ph.D. from Stanford University in 1991 and worked at Hewlett-Packard Laboratories in Palo Alto from 1991 to 1995. He was awarded the 2004 ACM SIGMOD Contributions award and was named an ACM Fellow (2005).


Maintained by Dina Goldin dqg AT cse.uconn.edu