- This event has passed.
Ph.D. Proposal: Guannan Liang
December 17, 2020 @ 10:00 am - 11:30 am EST
Doctoral Dissertation Proposal
Title: Efficient and Privacy-preserving Algorithms for Nonconvex Sparse Learning
Ph.D. Candidate: Guannan Liang
Major Advisor: Dr. Jinbo Bi
Associate Advisors: Dr. Sanguthevar Rajasekaran, Dr. Alexander Russell
Review Committee Members: Dr. Yufeng Wu and Dr. Laurent Michel
Date/Time: Thursday, December 17th, 2020, 10:00 am
Join by phone: +1-415-655-0002
Access code: 242 915 21
Sparse learning plays important roles in the fields of statistical learning, machine learning and signal processing. Many learning problems using high dimensional data rely on the sparsity-driven regularization. Solving a sparsity-regularized empirical risk minimization (ERM) problem can derive a machine learning (ML) model with sparse parameters, meaning that the model parameter vector has many zero entries. Thus it helps select relevant features for use in a ML model. For example, via the sparsity regularization, genomic analysis can identify a small number of genes contributing to the risk of a disease, and smartphone-based healthcare systems can detect the most important mobile health indicators. In this dissertation, sparse learning is formulated as nonconvex or nonsmooth optimization problems depending on the specific regularizer. We first develop efficient stochastic gradient descent (SGD) based algorithms to solve these problems. Then to use sparse learning in privacy-preserving data mining, we develop another set of algorithms that satisfy a mathematical condition namely differential privacy (DP).
Firstly, we propose a hard thresholding (HT) method based on the stochastically controlled stochastic gradient descent (SCSG) algorithm, which we call SCSG-HT, to solve constrained ERM problems with a constraint on the $l_0$-norm of the model parameter vector. SCSG belongs to a family of stochastic variance reduction gradient (SVRG) methods that attempt to reduce the variance of the stochastic gradients used in the optimization process. We prove that SCSG-HT has a stronger guarantee than SGD-based HT methods to recover the optimal sparse estimator. The computational complexity of SCSG-HT is independent of sample size n when n is large, which enhances the scalability to massive-scale problems.
Secondly, to solve the unconstrained EMR problem where the sparsity constraint becomes a non-smooth penalty term in the objective, we design a family of stochastic proximal gradient methods. Such a proximal method exists that uniformly draws mini-batches to calculate stochastic gradients. Rather, our methods draw samples according to an arbitrary probability distribution. A family of methods can be developed according to different sample schemes based on SGD and its variance-reduction variants. A unified approach is developed to examine the convergence and computational complexity of these methods across different sampling schemes, and allows us to compare the different sampling schemes. We show that the independent sampling scheme tends to improve performance over the commonly-used uniform sampling scheme. Our new analysis also arrives at a tighter bound on convergence speed for the uniform sampling than the best one known so far.
Thirdly, we further extend these methods to solve sparse learning problems with privacy preserving guarantee. We will develop algorithms that satisfy DP, which we name as DP-SGD-HT and DP-SCSG-HT. A DP algorithm based on the non-stochastic gradient descent method has previously developed for the sparsity-regularized ERM problem, but it is computationally prohibitive due to the cost of calculating full gradients. Our DP-SGD-HT method uses stochastic gradients and is thus much more efficient and scalable. To further reduce the estimation error, DP-SCSG-HT algorithm is developed to reduce the stochasticity caused by the sampling of data points in SGD steps and obtain more accurate solutions than DP-SGD-HT, under the same DP guarantee. For both DP-SGD-HT and DP-SCSG-HT, we provide their theoretical analysis in terms of convergence, utility bound, and computational complexity.