neds.gif (1190 bytes)

New England Database Society

Friday, April 11, 2003

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)

NEDS

Selectivity Estimation using Probabilistic Models

Lise Getoor  
University of Maryland

Friday, April 11, 2003, 4:00 PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm)

Abstract:

Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this work, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Statistical Relational Models (SRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing an SRM from a database, and show how an SRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time. 

Joint work with Daphne Koller and Benjamin Taskar.

Speaker Bio:

Lise Getoor recently joined the University of Maryland, College Park as an assistant professor in the computer science department. She received her PhD from Stanford University in December, 2001. Her research interests include probabilistic models, machine learning and data management. She has published papers on a variety of topics including learning probabilistic models, utility elicitation, on-line scheduling, constraint-based planning and machine learning. Her recent data management work includes publications on probabilistic models of semistructured data and selectivity estimation. Before pursuing her PhD at Stanford, she worked at NASA-Ames Research Center as a research associate. She received her M.S. in Computer Science from UC Berkeley and her B.S. in Computer Science from UC Santa Barbara. She is co-organizing an upcoming IJCAI 2003 workshop on 'Learning Statistical Models from Relational Data'.


Maintained by Dina Goldin dqg AT cse.uconn.edu
Last updated on 04/03/03