neds.gif (1190 bytes)

New England Database Society

Friday, December 13, 2002

sponsored by Sun Microsystems

sunlogo.gif (4979 bytes)

NEDS

Distance Based Indexing and Similarity Search for Sequences

Z. Meral Ozsoyoglu
Case Western Reserve University

Friday, January 17, 2003, 4:00PM
Volen 101, Brandeis University

(preceded by a wine and cheese reception at 3:30 pm)

Abstract:

Finding sequences similar to a given query sequence in a large collection of sequences is a fundamental problem in many database applications including, computational genomics, computational finance, image and text processing. The similarity between sequences is defined in terms of a distance function determined by the application domain. In this work, we consider sequence proximity search in computational genomics, where sequence similarity is usually an indication of an evolutionary relationship between DNA and protein sequences, and usually indicates functional similarity. The most popular distance measures are based on (a weighted) count of character edit or block edit operations to transform one string to another. The main goal is to develop efficient near neighbor search tools that work for both character edit and block edit distances. Our premise is that the Distance Based Indexing techniques, which are originally developed for metric distances can be modified for sequence distance measures provided that they are almost metrics. We first show that sequence distance functions of interest (compression distance and weighted character edit distance) are almost metric. We then show how to modify distance based index structures vantage point trees to accommodate almost metric distances. We test our theoretical results on synthetic data sets and protein sequences.

Speaker Bio:

Z. Meral Ozsoyoglu is a professor of Computer Science at EECS department, Case Western Reserve University, Cleveland, Ohio. Prof. Ozsoyoglu's primary work and research interests are in the areas of query languages and query processing, data models, and index structures in object oriented databases, multimedia databases, and temporal databases. More recently, she has been working in Bioinformatics, more specifically on managing, visualizing and querying genomic pathways, and efficient access methods for genomic sequences. She is a founding member of CWRU Center for Computational Genomics. Dr. Ozsoyoglu has published over 60 papers in computer science journals including ACM TODS, IEEE TKDE, IEEE TSE, JCSS, and conferences including ACM SIGMOD, ACM PODS, VLDB, and IEEE ICDE. She is currently an associate editor of ACM TODS, and WWW journal.


Maintained by Dina Goldin dqg AT cse.uconn.edu
Last updated on 1/16/03