And so, as an old RNN fanboy, I grudgingly gave in. The loop, the unrolled representation, the gates, the heroic struggle of the hidden state, the beautiful madness of Backpropagation Through Time — all lost out to what felt like simple, long chains of a forward pass. I have taught the Transformer architecture to thousands of students all over the world, and every time I said, “QKV and GPU go brrr…that’s all there is to it,” I died a little inside. Eventually, I learned to stop worrying and love the bomb.

    All notes