Stanford CS25: V4 I Hyung Won Chung of OpenAI
April 11, 2024
Speaker: Hyung Won Chung, OpenAI
Shaping the Future of AI from the History of Transformer
AI is developing at such an overwhelming pace that it is hard to keep up. Instead of spending all our energy catching up with the latest development, I argue that we should study the change itself. First step is to identify and understand the driving force behind the change. For AI, it is the exponentially cheaper compute and associated scaling. I will provide a highly-opinionated view on the early history of Transformer architectures, focusing on what motivated each development and how each became less relevant with more compute. This analysis will help us connect the past and present in a unified perspective, which in turn makes it more manageable to project where the field is heading. Slides here: https://docs.google.com/presentation/d/1u05yQQaw4QXLVYGLI6o3YoFHv6eC3YN8GvWD8JMumpE/edit#slide=id.g2885e521b53_0_0
0:00 Introduction
2:05 Identifying and understanding the dominant driving force behind AI.
15:18 Overview of Transformer architectures: encoder-decoder, encoder-only and decoder-only
23:29 Differences between encoder-decoder and decoder-only, and rationale for encoder-decoder’s additional structures from the perspective of scaling.
About the speaker:
Hyung Won Chung is a research scientist at OpenAI ChatGPT team. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT.
More about the course can be found here: https://web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM
April 11, 2024
Speaker: Hyung Won Chung, OpenAI
Shaping the Future of AI from the History of Transformer
AI is developing at such an overwhelming pace that it is hard to keep up. Instead of spending all our energy catching up with the latest development, I argue that we should study the change itself. First step is to identify and understand the driving force behind the change. For AI, it is the exponentially cheaper compute and associated scaling. I will provide a highly-opinionated view on the early history of Transformer architectures, focusing on what motivated each development and how each became less relevant with more compute. This analysis will help us connect the past and present in a unified perspective, which in turn makes it more manageable to project where the field is heading. Slides here: https://docs.google.com/presentation/d/1u05yQQaw4QXLVYGLI6o3YoFHv6eC3YN8GvWD8JMumpE/edit#slide=id.g2885e521b53_0_0
0:00 Introduction
2:05 Identifying and understanding the dominant driving force behind AI.
15:18 Overview of Transformer architectures: encoder-decoder, encoder-only and decoder-only
23:29 Differences between encoder-decoder and decoder-only, and rationale for encoder-decoder’s additional structures from the perspective of scaling.
About the speaker:
Hyung Won Chung is a research scientist at OpenAI ChatGPT team. He has worked on various aspects of Large Language Models: pre-training, instruction fine-tuning, reinforcement learning with human feedback, reasoning, multilinguality, parallelism strategies, etc. Some of the notable work includes scaling Flan paper (Flan-T5, Flan-PaLM) and T5X, the training framework used to train the PaLM language model. Before OpenAI, he was at Google Brain and before that he received a PhD from MIT.
More about the course can be found here: https://web.stanford.edu/class/cs25/
View the entire CS25 Transformers United playlist: https://www.youtube.com/playlist?list=PLoROMvodv4rNiJRchCzutFw5ItR_Z27CM