Title: Effective Algorithms for the Closest Pair and Related Problems
PhD Candidate: Xingyu Cai
Major Advisor: Sanguthevar Rajasekaran
Associate Advisors: Song Han and Yufeng Wu
Date/Time: Thursday, March 14, 2019 4:00PM
Location: HBL 1947 Conference Room
Abstract: The Closest Pair Problem (CPP) aims to identify the closest pair (under some similarity measure, e.g., Euclidean distance, Minkowski distance, Mahalanobis distance, Dynamic Time Warping, etc.) of points in a metric space. CPP is one of the fundamental problems that has a wide range of applications in data mining, since most of the data can be represented in vector format residing in a high dimensional space, and we would like to identify the relationship between those data points.
Typical applications include but not limited to, social data analysis, user pattern identification, motif mining in biological data, etc. This is a very classical problem and has been studied well. In this proposal, we study the CPP problem and its variants, and also bring the machine learning perspective to solve some closely related problems.
In the first part, we propose an algorithm called JUMP, to solve the Closest Pair of Subsequences problem (CPS). CPS is a variant of CPP in sequence mining area, where the input data are numerical sequences rather than vectors. The goal of CPS is to locate the closest pair of subsequences of a certain length in the original input sequences. Euclidean distance is used in this problem. By exploiting the dependence in the data sequences, we show that the proposed JUMP algorithm could achieve the state-of-the-art run time performance.
In the second part, we offer two approximate algorithms that address the CPP problem.
The first approximate algorithm uses the concept of divide-and-conquer, while the second applies the random projection idea. We give theoretical analyses on both of the approximate algorithms. Empirical study shows that they outperform the state-of-the-art algorithms.
In the last part, we adopt the Dynamic Time Warping (DTW) into a learning problem that tries to learn a pattern, under DTW measure. The DTW is a non-linear transform and often obtained via dynamic programming. We propose an efficient solution, and analyze the DTW loss theoretically. The experiments also prove that the proposed approach could learn the proper pattern or feature under DTW measure, and deliver a very good performance in a number of applications.