Title: Efficient Techniques to Process Big Data
Ph.D. Candidate: Zigeng Wang
Major Advisor: Dr. Sanguthevar Rajasekaran
Associate Advisors: Dr. Caiwen Ding, Dr. Qian Yang
Committee Members: Dr. Nalini Ravishanker, Dr. Wei Zhang
Date/Time: Monday, October 3rd, 2022, 2:00 pm
Meeting link: https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=mfd0de41191eb316e588b6f1871d74431
Meeting number: 2623 649 5684
In the current era, we see an explosion in data volume, data diversity, and data dimensions. Processing big data demands huge computational resources and storage space. Big data bring a lot of difficulties and opportunities in computing. In this dissertation, we offer several techniques for efficiently processing and learning from big data.
The first problem we investigate is feature selection. Feature selection is crucial in efficient machine learning, and it also greatly contributes to the explainability of machine-driven decisions. Methods, like decision trees and LASSO, can select features during training. However, these embedded approaches can only be applied to a small subset of machine learning models. Wrapper-based methods can select features independent of the machine learning models but they often suffer from a high computational cost. To enhance their efficiency, we have designed a wrapper-based randomized feature selection algorithm.
The second problem we explore is efficient time series spectrum analysis. Higher order spectra (HOS) are a powerful tool in nonlinear time series analysis and they have been extensively used as feature representations in data mining, communications and cosmology domains. However, HOS estimation suffers from a high computational cost and memory consumption, restricting its use in resource-limited and time-sensitive applications. We present a set of generic sequential and parallel algorithms for computation and memory efficient HOS estimations that can be employed on any parallel machine or platform.
The third problem we consider is the efficient learning of neural network substructures. Given the growing number of resource-limited devices, the problem of efficient Deep Neural Network (DNN) compression has become vital. Compression can be achieved using methods such as pruning, quantization and distillation. In neuron pruning, locating and dropping redundant neurons from DNNs is the focal point of DNN subnetwork search. This heavily relies on expert knowledge. In our work, we propose an automatic non-expert-friendly differentiable subnetwork search algorithm which dynamically adjusts the layer-wise neuron-pruning penalty based on the sensitivity of Lagrangian multipliers.
In addition to the above fundamental results, we also consider several machine learning applications spanning material science, medical science and metagenomics.