A complete sentence forms a matrix of number of tokens × feature dimension. The currently highlighted row is the token you selected.
Vectors, Matrices & Linear Transformations
Many "mysterious steps" in LLMs are essentially matrix multiplications. Embeddings are vectors, a whole sentence becomes a matrix, and linear layers recombine old features into new ones. Once you build this intuition, Q, K, V, FFN, and output layers will make much more sense.
A Token is First a Row Vector
In the sentence below, each token corresponds to a row of numbers. You can switch between tokens to build the basic intuition that "a token is a row vector."
Matrix Multiplication Rearranges Old Features into New Ones
Now multiply the input matrix X by a weight matrix W. The page will highlight "the row, column, and result cell being computed" so you can see exactly how each output value is derived.
Highlight the row of the current token.
Highlight the column corresponding to the current output dimension.
Highlight the result cell being computed.
Why This Step Directly Relates to Q, K, V
X: Original Representation
Embeddings or outputs from the previous layer provide "what the current token looks like now." This isn't yet the final shape for matching.
W: Learned Transformation Rules
Weight matrices aren't manually written—they're learned during training. Each column in the matrix says "which new feature I want to extract."
XW: New Feature Space
After linear transformation, the same token gets projected into a new coordinate system. Q, K, V essentially project the same input into three different coordinate systems.
Matching Can Happen Next
Once Q and K are in a comparable space, dot products can be computed; V brings the actual information that needs to be aggregated to the output.