Page 14 · SimLabs LLM Visual

Positional Encoding: Let Transformer See Order

Self-Attention itself doesn't know the difference between "the first word" and "the third word". If we only treat tokens as an unordered set of vectors, cat bites dog and dog bites cat would look too similar in many ways. The task of positional encoding is to explicitly inject the concept of "order" into the model.

See how order info is added Compare word and position vectors Understand sine positional encoding patterns

Same Words, Different Order = Different Input Representation

First switch the sentence order, then click on a position. The left side shows word vectors, position vectors, and final input vectors. The right side explains what changes occur at that position.

Word Embedding

Word Vector Matrix

Word vectors for the same token depend only on the word itself, not on its position in the sequence.

Position Encoding

Position Vector Matrix

Different positions receive different position signals. Even for the same token, the position vector will be different.

Input = Word + Position

Final Input Matrix

The model actually reads the sum of both vectors, not just the word vector alone.

How Sine Positional Encoding Changes with Position

This experiment uses a simplified 4-dimensional positional encoding to demonstrate the numerical patterns at different positions. Drag the position slider to observe how some dimensions change rapidly while others change slowly, allowing the model to perceive both local and long-range positional relationships.

Select Position

Current Position 0

Compare Current and Next Position

Positional encoding is not random numbering, but a vector with continuous variation patterns. Adjacent positions differ from each other but maintain continuity.

Key Point: Positional encoding doesn't have to use sine and cosine functions. Regardless of the form, it serves the same purpose: allowing the model to explicitly understand order in "unordered vector operations".

Three Key Takeaways

Attention Doesn't Understand Order

Without positional information, the model sees "a bag of tokens" rather than an ordered sequence.

Position Vectors Are Added to Word Vectors

The model input is not "either word vectors or position vectors", but the sum of both passed to subsequent attention calculations.

Positional Encoding Is Fundamental

It's not just a nice-to-have trick, but one of the fundamental conditions that allows Transformers to handle sequential information.