|
New England Database
Society sponsored by Sun Microsystems |
| |
|
NEDS |
Issues in Data Cleaning
Surajit Chaudhuri
Data Management and Exploration Group
Microsoft Research
Friday, April 28,
2006, 4:00 PM
Volen 101,
Brandeis University
(preceded by a wine and cheese reception at 3:00 pm)
Abstract:
I will present an overview of the directions we are pursuing
in our data cleaning research project. I will focus on two key issues: (i)
Efficient
data profiling and (ii) Primitive data cleaning operators. Data profiling serves
as the starting point for investigating data quality issues in a
database and I will describe the work we have done to do data profiling
efficiently. Most of my talk will explore the idea of isolating primitive
data cleaning operators that can serve as building blocks for developing a data
cleaning solution. This approach is in contrast to identifying a few pre-defined
heavy-weight operators. Specifically, we will describe one such useful primitive
(SSJoin) in details. I will also take this opportunity to briefly mention the
other two research projects (AutoAdmin and DBXplorer) in our group.
Speaker Bio:
Surajit Chaudhuri (http://research.microsoft.com/users/surajitc)
oversees research in the database systems area at Microsoft Research in Redmond,
consisting of Database and the Data Management and Exploration (DMX) groups.
In 1996, Surajit started the AutoAdmin project on self-tuning database systems
at Microsoft Research and developed novel automated physical design tuning
technology for SQL Server 7.0, SQL Server 2000 and SQL Server 2005. More
recently, Surajit has begun work in the area of data cleaning and
integration techniques. Part of this research has been incorporated in Microsoft
SQL Server 2005. Surajit got his Ph.D. from Stanford University in 1991 and
worked at Hewlett-Packard Laboratories in Palo Alto from 1991 to 1995. He was
awarded the 2004 ACM SIGMOD Contributions award and was named an ACM Fellow
(2005).
Maintained by Dina Goldin dqg AT cse.uconn.edu