Title: Enhancing Attention-Based Models by Learning Dynamics with Static Content
Ph.D. Candidate: Qinqing Liu
Major Advisor: Dr. Jinbo Bi
Associate Advisors: Dr. Dongjin Song, Dr. Derek Aguiar
Date/Time: Wednesday, July 24, 2024, 10:15 AM
Location: virtual
Meeting link:
https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=mda5fd90c9bf3d1afbea07485004c30e5
Meeting number (access code): 2637 836 2320
Meeting password: gCyq4T
Abstract:
In the rapidly evolving field of data-driven machine learning, learning the underlying patterns within temporal and sequential data is essential for a plethora of applications, ranging from time series analysis to natural language processing. Yet, the challenge lies in effectively distinguishing and leveraging the dynamics intrinsic to these static features not being confounded by the dynamics. The static features could serve as a foundational bedrock, powerfully modulating the dynamic features’ behavior. The dependence of dynamic properties on their static counterparts is far from trivial, suggesting a profound interdependence that warrants deeper examination. This dissertation explores the novel paradigm of “Learning Dynamics with Static Context,” which seeks to enhance the learning capabilities of attention-based models by anchoring the dynamic features with static context.
Building upon this paradigm, our research delves into the fusion of static features with modern machine learning architectures for sequence learning to improve performance and generalization. The introduction of attention and shortcut connections to the Long Short-Term Memory (LSTM) model constitutes our first study, where the LSTM backbone captures dynamic time series patterns, while the static variables provide attention weights to aggregate the embeddings of each time step and also build a shortcut connections between single embedding and final representation. The different stages of dynamics could be distinguished according to the static features and the shortcut connection ensures the information in early stages could be preserved in final representation, thus refining predictions. The second study scales this concept to the Transformer architecture, revolutionizing the treatment of position encodings. Instead of the conventional method of sharing positional encodings among all sequences, we proposed to apply the static variables to inform Customized Positional Encoding that interacts with the self-attention through a Partially Linearized Attention Module. This innovation reconstructs the paradigm of positional awareness in attention-based models, catering to the static variables that underlie the temporal features and proves to be robust. Our forthcoming third study continues this thread, targeting the consistent elements within input and output sequences in text-related tasks. Focusing on seq2seq denoising task, we introduce the Follow-On Tokenizer to output string, which includes a mapping mechanism to identify and preserve consistent tokens between input and output thus minimizing the ‘noise’ or dynamic aspects of the data, thereby enabling the model to focus on rectifying errors such as typographical and phonetic anomalies. This facilitates more precise, context-aware spell correction and suggests broader implications for error remediation in textual data.