neds.gif (1190 bytes)

New England Database Society

Friday, April 23

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)

NEDS

Learning to Reconcile Semantic Heterogeneity

Alon Y. Halevy  
University of Washington

Friday, April 23, 2004, 4:00 PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm)

Abstract:

The advent of modern networking technology has enabled numerous opportunities for sharing data among multiple parties, such as in the areas of scientific research, government agency collaboration and enterprise informtation integration. A key ingredient in all these data sharing scenarios is the ability to map the semantics of one data source to another. However, providing such semantic mappings is a labor-intensive error-prone task, and therefore it is crucial to develop tools that assist a human in creating them. 

I will describe several results in a line of work whose key idea is to build schema matching systems that improve over time by learning from experience. In the first, our goal was to improve at matching data sources to a common mediated schema. In the second, we leverage a corpus of schemas and matches to match a pair of previously unseen schemas. Finally, the third project attempts to match between descriptions of web services to help a user find similar sets of web service operations. In all of these works we build on techniques from Machine Learning to learn models of concepts in the domain of the schemas and use these concepts to propose possible schema matches. I will show several experimental results validating our approach, and outline the current challenges. 

Joint work with Anhai Doan, Jayant Madhavan, Luna Dong, Jun Zhang, Phil Bernstein and Pedro Domingos.

Speaker Bio:

Dr. Alon Halevy received his Bachelors degree in Computer Science and Mathematics from the Hebrew University in Jerusalem in 1988, and his Ph.D in Computer Science from Stanford University in 1993. From 1993 to 1997, Dr. Halevy was a principal member of technical staff at AT&T Bell Laboratories, and then at AT&T Laboratories. He joined the Department of Computer Science and Engineering at the University of Washington in 1998. Dr. Halevy's research interests are in data integration, management of XML data, web-site management, peer-data management systems, query optimization, database theory, knowledge representation, and more generally, the intersection between Database and AI technologies. His research developed several systems, such as the Information Manifold data integration system, the Strudel web-site management system, the Tukwila XML data integration system, and the Piazza Peer-data Management System. He was also a co-developer of XML-QL, which later contributed to the development of XQuery standard for querying XML data. In 1999, Dr. Halevy co-founded Nimble Technology (www.nimble.com), whose product is a data integration system based on XML. Dr. Halevy was a Sloan Fellow (1999-2000), and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. He serves on the editorial boards of the VLDB Journal, the Journal of Artificial Intelligence Research and ACM Transactions on Internet Technology, and served as the program chair for the ACM SIGMOD 2003 Conference.


Maintained by Dina Goldin dqg AT cse.uconn.edu
Last updated on 04/13/04