Transformers: A Guided Tour
Transformers: A Guided Tour
The transformer is the mechanical brain that powers language models, chatbots, and agents. These artificial intelligence systems are currently in the forefront of the public consciousness.
One can find support for almost any point of view about them: this technology will doom us to extinction, or usher in a new utopia, or it might be nothing but hype. They don’t understand anything and just parrot back their training data, or they’re beginning to make genuine scientific and mathematical breakthroughs that will change how we understand the world. They’ll make you 10x better at your job, or they’re a crutch and you will lose the ability to think for yourself if you overuse them. I don’t claim to know the right position on any of these issues. These differences of opinion are rooted in very deep questions that don’t have obvious answers today. Trying to resolve them will be a massive, collective human endeavor over the coming years.
I hope to demystify the transformer for a general audience, so that we are all better equipped to distinguish the hype from the reality and use these things well. I will explain what the transformer is, what it is doing, how to think about it, and how to use it. We’ll begin with a high-level, math-free overview of the transformer.
This is a very early draft. I keep changing exactly what I’m trying to say and to whom.
My goal is to present a mental model of the transformer that is simple enough to be useful to a lay audience, while still accurate enough to be useful to practitioners. We’re not there yet, but I’d love to hear from people at both ends of the spectrum so we can get there.
Feedback is very welcome! Please send to [email protected]