Description
Keynote: Navigating the Architectural Timeline of LLMs - Sebastian Raschka, Staff Research Engineer, Lightning AI
The evolution of large language models (LLMs) from the original Generative Pre-trained Transformer (GPT) series to the recent advancements seen in models like Llama 3 has been accompanied by several architectural and methodological innovations. This talk aims to catch attendees up on the latest AI and LLM development trends, highlighting the key changes and motivations that led to the development of recent state-of-the-art LLMs, such as Llama 3.1.
Specifically, this presentation explores key developments in attention mechanisms, such as sliding window attention, group query, multi-query attention, and FlashAttention, and explains their key motivations and advantages. In addition to exploring the structural changes, this presentation also reviews the recent "tricks of the trade" that have improved the training processes and performance of the latest LLMs. This includes the recent two-step pretraining approach in Llama 3.1 and applying knowledge distillation techniques using real datasets like Gemma 2 and synthetic data, as seen in Llama 3.1.
Moreover, we will also examine the integration of system-level optimizations, such as the Mixture of the Expert method and the hybrid model Samba, which combines Mamba techniques with attention mechanisms and illustrates a broader trend toward more specialized and efficient architectures.
This talk will provide attendees with an understanding of the most notable transformations that have defined the architectural timeline of LLMs.