Page 03 · SimLabs LLM Visual

Vectors, Matrices & Linear Transformations

Many "mysterious steps" in LLMs are essentially matrix multiplications. Embeddings are vectors, a whole sentence becomes a matrix, and linear layers recombine old features into new ones. Once you build this intuition, Q, K, V, FFN, and output layers will make much more sense.

Start with a row vector Then see sentence matrix Connect to QKV

A Token is First a Row Vector

In the sentence below, each token corresponds to a row of numbers. You can switch between tokens to build the basic intuition that "a token is a row vector."

Input Matrix X

A complete sentence forms a matrix of number of tokens × feature dimension. The currently highlighted row is the token you selected.

Remember: Vectors are not "natural language sentences that explain words"—they are numerical coordinates used for computation inside the model. Later, Attention will compare these numerical directions and magnitudes.

Matrix Multiplication Rearranges Old Features into New Ones

Now multiply the input matrix X by a weight matrix W. The page will highlight "the row, column, and result cell being computed" so you can see exactly how each output value is derived.

Current Cell: X[0] × W[:,0]
Input X

Highlight the row of the current token.

Weight Matrix W

Highlight the column corresponding to the current output dimension.

Output Y = XW

Highlight the result cell being computed.

Current Token Row Current Output Column Current Result Cell

Why This Step Directly Relates to Q, K, V

X: Original Representation

Embeddings or outputs from the previous layer provide "what the current token looks like now." This isn't yet the final shape for matching.

W: Learned Transformation Rules

Weight matrices aren't manually written—they're learned during training. Each column in the matrix says "which new feature I want to extract."

XW: New Feature Space

After linear transformation, the same token gets projected into a new coordinate system. Q, K, V essentially project the same input into three different coordinate systems.

Matching Can Happen Next

Once Q and K are in a comparable space, dot products can be computed; V brings the actual information that needs to be aggregated to the output.

In short: Matrix multiplication isn't abstract mathematical decoration—it's the core operation LLMs use to "re-express features." This step you're seeing now will repeatedly appear in linear layers, QKV projections, and FFN.