3rd International Workshop on

Machine Learning for Irregular Time Series (ML4ITS2024): Advances in Generative Models, Global Models and Self-Supervised Learning

Co-located with ECML PKDD 2024

TBD TBD

About The Event

Theoretical Foundations of Deep Selective State-Space Models

Antonio Orvieto

Antonio Orvieto

Structured state-space models (SSMs) have emerged as powerful foundational architectures for sequential data, offering excellent performance across diverse domains and scaling efficiently to billions of parameters. Recent advancements demonstrate that incorporating selectivity mechanisms, such as multiplicative interactions between inputs and hidden states, significantly enhances the performance of SSMs, even surpassing attention-powered foundation models in text-based tasks. Antonio's work provides theoretical grounding for these selectivity mechanisms, using Rough Path Theory to fully characterize the expressive power of selective SSMs. His analysis identifies gating mechanisms as crucial to the architecture and offers a closed-form description of their superior performance over previous models like S4. This theoretical framework opens avenues for future improvements in SSM variants, particularly in the role of cross-channel interactions.

About

Antonio Orvieto is an independent group leader at the Max Planck Institute for Intelligent Systems and a principal investigator at the ELLIS Institute Tübingen, Germany. He holds a Ph.D. from ETH Zürich and has gained valuable experience at leading institutions including Google DeepMind, Meta, MILA, INRIA Paris, and HILTI.

Antonio's primary expertise lies in optimization for deep learning. His work focuses on improving the efficiency of deep learning technologies through the development of new architectures and training techniques grounded in theoretical knowledge. His research spans a wide range of applications, including natural language processing, biology, neuroscience, and music generation. Notably, his LRU architecture serves as the foundation for several variants of Google's Gemma language model.

He has published in prestigious conferences such as NeurIPS, ICML, ICLR, AISTATS, and CVPR. Antonio has also organized sessions and workshops, including the "Optimization for Data Science and Machine Learning" session at the International Conference on Continuous Optimization (ICCOPT) 2022 and the ICML 2024 Workshop on Next Generation of Sequence Modeling Architectures.

Partners