neds.gif (1190 bytes)

New England Database Society

Friday, October 15, 2004

sponsored by Sun Microsystems

../sunlogo.gif (4979 bytes)

NEDS

Towards Declarative and Efficient Querying on Biological Data Sets

Jignesh Patel  
University of Michigan

Friday, September 24, 2004, 4:00 PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:00 pm)

Abstract:

The current ongoing revolution in life sciences is producing new and exciting discoveries at a remarkable pace. The driving factor behind these advances is the emergence of new high-throughput methods and the use of computational tools to analyze the data that is produced by these methods. Unfortunately, existing querying methods used in such research employ awkward procedural querying methods (such as using Perl, Python, or Java scripts), and often use query evaluation algorithms that don't scale as the data set size increases. Many biological data sets are growing rapidly in size and the complexity of queries is also rapidly increasing. Consequently, these existing querying methods will be even more limiting in the future. Efficient and declarative methods for querying these data sets are urgently needed. In this talk, I will describe our research efforts in building a database management system, called Periscope, to meet these challenges. Our current focus in this project is largely on supporting declarative and efficient querying on protein structures. 

In this talk I will touch upon various aspects of Periscope including an algebra that we have developed for querying on protein structures. I will spend most of the talk describing a new sequence matching algorithm that is often more accurate and faster than the popular sequence search tool BLAST. I will conclude the talk by pointing to some actual life sciences problems that are being investigated using Periscope, and highlight the benefits that declarative and efficient querying can bring to the life sciences community.

Speaker Bio:

Jignesh M. Patel is an Assistant Professor at the University of Michigan. He graduated with a PhD from the University of Wisconsin in 1998. As a graduate student, he led the efforts to develop the Paradise database system, a parallel object-relational database system, which is currently being commercialized at NCR Corp. Since 1999, he has been a faculty member in the EECS department at the University of Michigan, where his research has focused on bioinformatics, spatial query processing, XML query processing, and interactions between DBMSs and processor architectures. He is the recipient of a 2001 NSF Career Award, and IBM Faculty Awards in the years 2001 and 2003. He has served on a number of Program Committees including ACM SIGMOD and VLDB. He is currently the Associate Editor for the Systems and Prototype section of ACM SIGMOD Record, and a Vice-Chair for IEEE International Conference on Data Engineering, 2005.


Maintained by Dina Goldin dqg AT cse.uconn.edu