This event has passed.

Ph.D. Defense: Abdullah Baihan

Name: Ph.D. Defense: Abdullah Baihan
Start: 2019-08-28T12:00:00-05:00
End: 2019-08-28T13:00:00-05:00
Location: HBL Class of 1947 Conference Room

August 28, 2019 @ 12:00 pm - 1:00 pm UTC-5

Title: Efficient Sequential and Parallel Algorithms for Incremental Record Linkage.

Ph.D. Candidate: Abdullah Baihan

Major Advisors: Prof. Sanguthevar Rajasekaran and Prof. Reda Ammar.

Associate Advisors: Prof. Song Han and Prof. Sheida Nabavi.

Date/Time: Wednesday, August 28nd, 2019 at 12:00 PM.

Location: HBL 1947 Conference Room

Abstract:

In the biomedical domain, the record linkage is considered as a crucial problem. When the number of records is very large, existing algorithms for record linkage take too much time. Often, we have to link a small set of new records with a large set of old records. This can be done by putting together the old and new records and performing a linkage on all the records. Clearly, this will call for an enormous amount of time. An alternative is to develop algorithms that perform linkage in an incremental manner. We refer to any such algorithm as an Incremental Record Linkage (IRL) algorithm.
In this thesis, we present an efficient IRL algorithm. In addition to taking large amounts of time, existing algorithms might also suffer from a chaining problem and hence introduce some errors in linking. As has been observed in the literature, this chaining problem can be solved by performing clustering under complete linkage.
This thesis makes two main contributions. Firstly, we have offer novel sequential and parallel algorithms for the critical incremental record linkage problem using single linkage. Secondly, we have come up with novel sequential and parallel algorithms for incremental record linkage using complete linkage to overcome the chaining problems.
Our algorithms can handle any number of datasets. In contrast, many of the existing algorithms can only link two datasets at a time. Our algorithms outperform previous algorithms and offer state-of-the-art solutions to the IRL problem. We have tested our algorithms on millions of records on synthetic and real datasets and shown that our algorithms outperform the best-known RLA algorithms when the number of new records is up to around 20% of the total number of old records. Our algorithms achieve a very nearly linear speedup in parallel.

Details

Date:: August 28, 2019
Time:: 12:00 pm - 1:00 pm UTC-5
Event Category:: Colloquia

Venue

: HBL Class of 1947 Conference Room
: UConn Library, 369 Fairfield Way, Unit 1005
Storrs, CT 06269 United States + Google Map
Phone: (860) 486-2518
: View Venue Website

College of Engineering

School of Computing

Ph.D. Defense: Abdullah Baihan

August 28, 2019 @ 12:00 pm - 1:00 pm UTC-5

Details

Venue

Connect With Us

UConn Navigation

CSE Navigation

Office Contact