icde04_logo.gif (3327 bytes) 

Old State House in downtown, Boston20th International Conference on
Data Engineering
March 30 - April 2, 2004


SEMINARS


 

All seminars are held in the Kennedy room. Morning seminars are 1 session, 11-12:30 AM; afternoon seminars are 2 sessions, 2-3:30 PM and 4-5:30 PM.


Seminar 1: Data Management in Location-Dependent Information Services

(Tuesday March 30, morning)

Location-dependent information services (LDISs) answer queries in accordance with the locations the queries are associated with (e.g., the locations from which the queries are issued. The emergence of LDISs is resulted from the convergence of high-speed wireless networks, personal portable devices, and locatining techniques. LDISs have a variety of promising applications, such as local information access (e.g., traffic reports, news, and navigation maps) and nearest neighbor queries (e.g., finding the nearest restaurant), and are expected to become an integral part of our daily life.

This seminar will provide background and an overview of research on location-dependent information access in mobile and pervasive environments. In particular, it will discuss the following topic areas:

1. Positioning technologies;
2. Moving objects tracking;
3. Location-dependent query processing;
4. Location-dependent cache management;
5. System integration;
6.
Privacy and security.

Biographies:

Wang-Chien Lee is an associate professor of computer science and engineering at Penn State University. His primary research interests lie in the areas of mobile and pervasive computing, data management, sensor networks, and peer-to-peer computing.

Baihua Zheng is an assistant professor in the School of Information System, Singapore Management University. Her research interests include data management for mobile/pervasive computing environments, and spatial database.

Jianliang Xu is an assistant professor in the Department of Computer Science at Hong Kong Baptist University. His research interests include mobile and pervasive computing, location-aware computing, and Web content delivery.


Seminar 2:
'My Personal Web': A Tutorial on Personalization and Privacy for Web and Converged Services
(Tuesday March 30, afternoon)

The web services paradigm holds the promise of tremendous flexibility in how services are combined to meet the needs of individual end-users.  The ``convergence'' of networks (wireline telephony, wireless, data) further enhances the web services paradigm, by enabling the incorporation of real time contextual information (e.g., presence and location) along with opportunities for web services to impact the physical world more immediately (e.g., a vending machine delivering a soda based on a purchase via a cell phone).  But it will not be possible for most end-users to enjoy the rich and intricate possibilities, unless a broad variety of personalization technologies are available and respect the end user's legitimate need for privacy.

This tutorial begins with examples illustrating why personalization will be so important for the emerging web and converged services.  The main body of the tutorial focuses on three inter-related technologies:

1.      profile data management, the ability for services to share and access end-user profile data (including address, credit card, ``simple'' preferences, current location, current presence, ...) as appropriate for the services to be provided.

2.      preference and policy management, the ability to store and execute on intricate, interrelated preferences that end-users may have (e.g., ``during working hours, calls from strangers should be routed to voice-mail''; ``I usually work from 9 to 6, but on Thursdays it is from 8 to 4''; ...).

3.      personalized and privacy-conscious data sharing of profile data and preferences, the notion that an end-user should have complete control over what profile and preference data is shared with whom and under what circumstances and how it is interpreted.

In addition to describing emerging approaches for providing these capabilities, the tutorial will describe how to add value to applications by using personalization, from both the end-user and the application provider perspectives.

Biographies:

Arnaud Sahuguet is a Member of the Technical Staff in the Network Data and Services Department at Bell Laboratories, Lucent Technologies. He received his Ph.D. in computer science at the University of Pennsylvania in December 2001. His research interests include cryptography and electronic commerce, information retrieval/extraction and database HTML-to-XML screen-scraper and the main architect of Kweelt, an open-source Java implementation of the Quilt query language (now XQuery).  He is a leading figure in the 3GPP Generic User Profile (GUP) standards work.

Irini Fundulaki holds a Post Doc position at the Network Data and Services Department at Bell Laboratories, Lucent Technologies. She received her Ph.D. in computer science from the Conservatoire National des Arts et M\'etiers in Paris on January 2003. During her Ph.D. she was a member of the Verso group at INRIA-Rocquencourt.  Her research interests include semantic data integration, database technology, XML and related technologies, and more recently user profile data management.


Seminar 3: Similarity Search in Multimedia Databases

(Wednesday March 31, morning)

There are many practical applications that benefit from multimedia databases, e.g., molecular biology, medicine, CAD/CAM, and geography. An important research issue in the field of multimedia databases is the content-based retrieval of similar objects. Given a multimedia query object, the search for an exact match in a database is not meaningful in most applications, because the probability that two multimedia objects are identical is negligible (unless they are digital copies from the same source). For this reason, the development of efficient and effective similarity search techniques has become an important topic in the multimedia database research community.

The goal of this advanced technology seminar is to provide an overview of the similarity search problem and to present the state-of-art techniques for performing efficient and effective similarity queries in multimedia databases. The seminar begins with an introduction and a motivation of multimedia databases. The two main approaches for describing multimedia objects (as elements in a metric space or in a vector space) are introduced, as well as a description of the "Multimedia Content Description Interface" (MPEG)-7 standard. The efficiency issue is addressed for both metric and vector space approaches, describing the data structures and algorithms used to answer similarity queries. For the effectiveness issue, the seminar introduces some widely used retrieval performance measures. Several examples of techniques for particular multimedia applications (text, image, CAD, 3D objects, audio and video) are presented.

Biographies:

Daniel A. Keim is working in the area of multimedia databases, image similarity search and high-dimensional indexing structures. He has published extensively on multimedia databases and data mining, and he has given tutorials on related issues at several large conferences including SIGMOD, VLDB, ICDE and KDD; he has been program co-chair of the KDD conference in 2002 and of the IEEE Information Visualization Symposia in 1999 and 2000; and he is editor of IEEE Trans. on Knowledge and Data Engineering, IEEE Trans. on Visualization and Computer Graphics, and Palgrave's Information Visualization Journal. He received his PhD in Computer Science from the University of Munich in 1994. He has been assistant professor in the CS department of the University of Munich, associate professor in the CS department of the Martin-Luther-University Halle, and he is currently full professor and head of the database and visualization group in the CS department of the University of Konstanz, Germany.

Benjamin Bustos is working in the area of multimedia databases and similarity search in high-dimensional spaces. He received an MSc degree in Computer Science in 2002 from the University of Chile. Currently, he is PhD student at the Department of Computer and Information Science of the University of Konstanz, and his advisor is Prof. Daniel A. Keim.


Seminar 4: XML Query Processing

(Wednesday March 31, afternoon)

XQuery is starting to gain significant traction as a language for querying and transforming XML data.  It is used in a variety of different products.  Examples to date include XML database systems, XML document repositories, XML data integation, workflow systems, and publish and subscribe systems.  In addition, XPath of which XQuery is a superset is used in various products such as Web browsers. Although the W3C XQuery specification has not yet attained recommendation status, and the definition of the language has not entirely stabilized, a number of alternative proposals to implement and optimize XQuery have appeared both in industry and in the research community.  Given the wide range of applications for which XQuery is applicable, a wide spectrum of alternative techniques have been proposed for XQuery processing.  Some of these techniques are only useful for certain applications, other techniques are general-purpose.

The goal of this tutorial is to give an overview of the existing approaches to process XQuery expressions and to give details of the most important techniques.  The presenters have experience from designing and building an industrial-strength XQuery engine [1].  The tutorial will give details of that XQuery engine, but the tutorial will also give extensive coverage of other XQuery engines and of the state of the art in the research community.

Agenda

 

1.  Introduction to XQuery

     - Motivation

     - XQuery data model

     - XQuery type system

     - Basic  query language concepts

 

2.  Internal Representation of XML Data

     - DOM

     - SAX Events

     - TokenStream

     - Skeleton

     - Vertical Partitioning

 

3.  XQuery Algebras

     - XQuery Core vs. Relational Algebra

     - XQuery Algebras from Research Projects

 

4.  XPath Query Processing

     - Transducers, Automata, etc.

 

5.  XQuery Optimization

     - XML query equivalence

     - Rewrite Rules

     - Cost Models

 

6.  XQuery Runtime Systems

     - Iterator Models

     - Algorithms for XQuery Operators

 

7.  XML Indexes

    - Value and path indexes, other

 

8.  XQuery Products and Prototypes

     - XQRL/BEA, Galax, Saxon, etc. (as available)

 

9.  Advanced Query Processing Techniques, Related Topics

     - Querying compressed XML data

     - Multi-Query Optimization

     - Publish&Subscribe and XML Information Filter

     - XML Data Integration

     - XML Updates

     - XML integrity constraints


10.  Summary

Biographies:

Daniela Florescu is a Senior Software Engineer in BEA Systems. She received her MS in Mathematics in 1990 from the University of Bucharest, and her PhD in computer Science in 1996 from the University of Paris VI. After being a researcher in ATT Research Center (New Jersey) for two years and another two years in INRIA, France, in 2000 she decided to try the industry path. She was a lead architect in Crossgain (Washington) and lead scientist in Propel (California), before starting her own company, XQRL, recently acquired by BEA Systems. Daniela has extensive experience in query languages and query processing. She is one of the editors of the standard XML query language, XQuery. Together with Donald Kossmann (also a founder of XQRL, Inc), she designed and implemented the streaming Xquery engine that is currently being shipped by BEA Systems as part of the Web Logic Integration 8.1 [1].

Donald Kossmann is a Full Professor for Computer Science at the University of Heidelberg in Germany.  He received his MS in 1991 from the University of Karlsruhe and completed his PhD in 1995 at the Technical University of Aachen (RWTH).  After that, he spent 18 months at the University of Maryland (College Park) and at the IBM Almaden Research Center, California, USA.  From mid 1996 until 2000, he worked as a research associate at the University of Passau where he received his habilitation in 1999.  From 2000 until 2003, he was an Associate Professor at the Technical University of Munich.  He is a co-founder of two start-ups: i-TV-T AG and XQRL, Inc. His current research is focused on the performance of database and information systems and on platforms for Web services.


Seminar 5:
Meta Data Management
(Thursday March 1, morning)

By meta data management, we mean techniques for manipulating schemas and schema-like objects (such as interface definitions and web site maps) and mappings between them. Many popular research problems in the past five years are primarily meta data problems, such as data warehouse tools (e.g., ETL -- to extract, transform and load), data integration, the semantic web, generation of XML or object-oriented wrappers for SQL databases, and generation of wrappers for web sites. Other classical meta data problems are information resource management, design tool support and integration, and schema evolution and data migration.

Despite its longevity and continued importance, there is no widely accepted conceptual framework for the meta data field, as there is for many other database topics, such as access methods, query processing, and transaction management.  In this tutorial, we propose such a conceptual framework. It consists of three layers: applications, design patterns, and basic operators. Applications are the end-user problems to be solved, like those listed in the previous paragraph. Design patterns are generic problems that need to be solved in support of many different applications, such as meta modeling (for all meta data problems), answering queries using views (for data integration and the semantic web), and change propagation (for data translation, schema evolution, and round-trip engineering). Basic operators are procedures that are needed to support multiple design patterns and applications, such as matching schemas to produce a mapping, merging schemas based on a mapping, and composing mappings.

We will describe several meta data management problems. For each problem, we will explain which design patterns and operators are needed to solve it. We will summarize the main approaches to each design pattern and operator -- the main choices of language, data structures, and algorithms -- and will highlight the relevant papers that address it.

This tutorial is targeted at both practicing engineers and researchers. The former will learn about the latest solutions to important meta data problems and the many difficult unsolved problems that are best to avoid. Database researchers, especially professors, will benefit from considering the conceptual framework that we propose, since no database textbooks treat meta data management as a separate topic as far as we know.

Biographies:

Philip A. Bernstein is a researcher at Microsoft Corporation. Over the past 25 years, he has been a product architect and industrial researcher at Microsoft and at Digital Equipment Corp., a professor at Harvard University and Wang Institute of Graduate Studies, and a VP Software at Sequoia Systems. He was architect of Meta Data Services (formerly Microsoft Repository), which is the meta data manager in Microsoft SQL Server. Over the past three years, he has been developing a new approach to meta data management, called Model Management. He is an ACM Fellow, a recipient of the ACM SIGMOD Innovations Award, and a member of the National Academy of Engineering. He has published over 100 articles on the theory and implementation of database systems, and coauthored three books.

Sergey Melnik is a Ph.D. candidate in Computer Science at Leipzig University. He serves as an invited expert in the RDF Core Working Group at the World-Wide Web Consortium. He spent three years as a visiting researcher in the Stanford Database Group where he worked on a variety of topics including meta data management, database optimization, information retrieval, and Semantic Web. He has published papers in EDBT, SIGMOD, TODS, TOIS, and WWW, and received the best student paper award at ICDE 2002 for his paper on schema matching. His Ph.D. thesis is on Generic Model Management, which includes an implementation of many of the scenarios and operators to be presented in this tutorial. He will be joining Microsoft Research in late 2003.


Seminar 6: Implementation and Research Issues in Query Processing for Wireless Sensor Networks
(Thursday April 1, afternoon)

This is a three-hour tutorial discussing the design and implementation of software systems as well as open research problems related to data processing and collection in wireless sensor networks. During the first hour-and-a-half, we focus on the design of the TinyDB data collection system for networks of Berkeley motes running the TinyOS operating system.  Then, during the remainder of the tutorial, we survey relevant literature from the database, networking, and OS communities and identify a number of unsolved and inadequately addressed research problems.  This tutorial is intended for anyone interested in wireless sensor networks with a general background in computer science, be they users of sensor networks looking for an easy way to collect data, developers interested in the design of TinyOS and TinyDB, or researchers in search of challenging new problems.

Biographies:

Wei Hong is a senior researcher at Intel Research, Berkeley.  His current research focuses on data management in sensor networks.  He leads the Tiny Application Sensor Kit (TASK) project at Intel Research and co-designed/developed TinyDB, an open-source, in-network sensor database system with Samuel Madden. Prior to joining Intel Research, Wei co-founded and architected the products of two startup companies: Illustra Information Technology Inc. and Cohera Corp. Illustra developed the first successful commercial Object-Relational database system. It was acquired by Informix, now part of IBM. Cohera provided electronic catalog management solutions based on a novel federated database system that it developed. Its technology was acquired by PeopleSoft.  Wei earned a Ph.D. in computer science from UC Berkeley and holds a master and two bachelor degrees from Tsinghua University in Beijing, China.

Samuel Madden is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and a member of the Computer Sciences and Artificial Intelligence Laboratory at MIT. He received his Ph.D. in Computer Science from the University of California at Berkeley in 2003. His research interests are in the area of distributed and adaptive data management and related networking and systems issues, particularly as they pertain to sensor networks and streaming data.


Seminar 7:
Data Mining for Intrusion Detection: Techniques, Applications and Systems
(Friday April 2, morning)

An intrusion is defined as any set of actions that compromise the integrity, confidentiality or availability of a resource. Intrusion detection is an important task for information infrastructure security. One major challenge in intrusion detection is that we have to identify the camouflaged intrusions from a huge amount of normal communication activities. Data mining is to identify valid, novel, potentially useful, and ultimately understandable patterns in massive data. It is demanding to apply data mining techniques to detect various intrusions.

In the last several years, some exciting and important advances have been made in intrusion detection using data mining techniques. Research results have been published and some prototype systems have been established. Inspired by the huge demands from applications, the interactions and collaborations between the communities of security and data mining have been boosted substantially.

This seminar will present an interdisciplinary survey of data mining techniques for intrusion detection so that the researchers from computer security and data mining communities can share the experiences and learn from each other. Some data mining based intrusion detection systems will also be reviewed briefly. Moreover, research challenges and problems will be discussed so that future collaborations may be stimulated. For data mining/database researchers and practitioners, the seminar will provide background knowledge and opportunities for applying data mining techniques to intrusion detection and computer security.  For computer security researchers and practitioners, it provides knowledge on how data mining can benefit and enhance computer security. We will try to understand and appreciate the following technical issues.

1.      What is intrusion detection? Why is it challenging and why data mining techniques can really help?

2.      What are the major data mining techniques available for intrusion detection?

3.      Successful applications of data mining techniques in intrusion detection and the experiences.

 

Biographies:

Jian Pei is an Assistant Professor of Computer Science and Engineering at State University of New York at Buffalo. He received his Ph.D degree in Computing Science from Simon Fraser University, Canada. His research interests include data mining, data warehousing, OLAP, database systems, bioinformatics and their applications. His research is supported in part by the National Science Foundation.

Shambhu J. Upadhyaya is an Associate Professor of Computer Science and Engineering at the State University of New York at Buffalo. His research interests are information assurance, computer security, fault diagnosis, fault tolerant computing, and VLSI Testing. He is the director of the Center of Academic Excellence in Information Assurance Education at Buffalo, accredited by the National Security Agency. His research on computer security has been funded by AFOSR, AFRL, DARPA, NSA and Telcordia Technologies. He is an Associate Editor of IEEE Transactions on Computers and is a senior member of IEEE.

Faisal Farooq received the B.Eng. in Computer Science from National Institute of Technology, Bhopal, India in 2001. He is currently working toward his M.S. degree in Computer Science at State University of New York at Buffalo. His research interests include information retrieval, databases, data mining and computer security.

Venugopal Govindaraju is a professor of Computer Science and Engineering at State University of New York at Buffalo, and Associate Director of CEDAR, the Center of Excellence for Document Analysis and Recognition at his university. He received his Ph.D degree in Computer Science at the University at Buffalo in 1992. His research is focused on Human Computer Interaction, Pattern Recognition, and Biometrics.


ICDE'04 HOME PAGE

Photo by Jim Steinhart, courtesy of PlanetWare™ Inc., all rights reserved.
Maintained by Dina Goldin <dqg AT cse.uconn.edu>; last updated on
10/03/03