Colloquia, Seminars and Conference News
Title : Active Learning with Hidden Factor Models in Collaborative Prediction and Systems Management
Date : March 15, 2007. (2:00 pm) Tea starts half an hour before each seminar
Location: ITEB 336
Speaker : Dr. Irina Rish
Abstract:
Various tasks arising in management of complex distributed compter systems and networks, such as problem diagnosis and resource allocation, require fast real-time inferences based on available systems measurements, and a smart choice of such measurements can greatly improve both the quality and speed of inference and decision-making. For example, accurately estimating end-to-end transaction performance is essential both for monitoring compliance with service-level agreements (SLAs) and for performance optimization (e.g., choosing the highest-bandwidth server for a download request in a content-distribution system). However, exhaustive pairwise measurements of end-to-end performance is infeasible in large systems, and cannot be kept up-to-date in highly dynamic environments. Thus, a natural alternative is to predict unobserved end-to-end performances from available historic data, with a minimal amount of additional "active" measurements. In this talk, I will present our recent work on active sampling in collaborative predictions, with applications to end-to-end performance prediction and best server selection in content-distribution systems. Collaborative prediction is a problem of predicting unobserved entries in sparsely observed matrices, e.g.
product ratings by different users in online recommender systems, or historic data on the connection quality (e.g., bandwidth) between nodes in a network. However, the quality of prediction may be quite sensitive to the choice of available samples, which motivates active sampling approaches. In this work, we suggest an active sampling method based on the recently proposed Maximum-Margin Matrix Factorization (MMMF), a linear factor model that was shown to
outperform state-of-art collaborative prediction techniques. MMMF
is formulated as a semi-definite program (SDP) that finds a low-norm (rather than traditional low-rank) matrix factorization, and is also closely related to learning max-margin linear discriminants (SVMs).
This relation to SVMs inspires several margin-based active sampling heuristics that allow for an exploration-exploitation. trade-off on top of MMMF factor models and demonstrate excellent empirical results, saving hundreds of samples in order to achieve desired predictive accuracy, in a variety of practical domains, including both traditional recommender systems and systems-management applications.
If time permits, I will also discuss our prior work on network fault diagnosis, i.e. recovering most-likely states of unobserved system components given the outcomes of end-to-end test transactions, called probes. Our focus here is on achieving good trade-offs between the diagnostic accuracy versus the cost of testing and computational complexity of diagnosis. results characterizing these trade-offs, such as lower bound on the number of probes necessary to active online approach to selecting most-informative tests, as well as (3) approximation techniques using ''loopy'' belief propagation for handling intractable inference problems involved in both diagnosis and results on real applications demonstrate the advantages of active that greatly reduces the number of probes (up to 75%) and the time needed to diagnose problems.
Bio:Research Gubkin of probabilistic inference, machine learning,
and information Particularly, she has been working on
approximate inference in graphical models,
information-theoretic experiment and their applications to the
area of management of complex distributed prediction and
online include dimensionality reduction and feature selection
in bioinformatics and neuroscience (analysis of MRI data).
She is also an adjunct professor at Columbia University where
she taught machine-learning courses at EE and CS Departments.
[Back]