Posts

Showing posts from May, 2025

The Rise of the Transformers – a simplified version Part 2

Image
This post continues from my previous article—click here to read it first. In this article, I’ll begin unpacking the some of the key steps that need to be completed before we get to the attention mechanism. Let’s first understand how a large language model (LLM) interacts with a human user. Humans communicate in natural language—sentences, paragraphs, and conversation. But LLMs, like ChatGPT, operate purely with numbers. So, the first step in any interaction is to convert the text we type into a format the model can understand. This transformation happens in a few key stages: 1. Text Input (from user) 2. Tokenization – breaking text into smaller units the model can process 3. Numerical Conversion – each token is mapped to a unique id corresponding to that token. Each unique id is then mapped to a high-dimensional vector (a list of numbers)  that captures the meaning of each token in a format the model can understand  4. Positional Encoding – adding information about th...

The Rise of the Transformers – a simplified version.

 The Rise of the Transformers – a simplified version. Long before the wonderful world of ChatGPT, large language models struggled to understand meaning across sentences, paragraphs, and context. Sequence mattered, but older architectures—RNNs, LSTMs—could only peer through narrow windows. Comprehension was fleeting, memory limited, parallelism elusive. Then came the breakthrough: the Transformer. In this series, I’ll be simplifying how we got here, what Transformers are, why they matter, and where we’re headed in this rapidly evolving landscape of machine understanding. How we got here.  What was the Problem that the Transformer was trying to solve.  Before Transformers, neural networks processed language one step at a time. Earlier models like RNNs(Recurrent Neural Networks) and LSTMs read sentences sequentially, word by word, with limited memory of what came before, like someone with a short memory trying to follow a long story. Consider this paragraph: “Sarah packed he...