Page 05 · SimLabs LLM Visual

Embedding: How Semantic Space Forms

After tokens enter the model, they don't remain as discrete indices but are mapped into vectors. The power of embedding lies in this: concepts with similar semantics, functions, or contexts tend to cluster together in high-dimensional space. Think of this page as a "word vector map."

Observe word clustering Compare nearest neighbors Experience vector analogy

Placing Words in Semantic Space

Real embeddings typically have hundreds to thousands of dimensions. Here we select just two dimensions for projection. What you see is not the "complete space" but a slice of the high-dimensional semantic space onto a plane. Switching between different projections reveals how relative positions between words change.

当前投影:身份层级 × 性别

越靠右表示越偏高层级身份,越靠上表示越偏女性特征。

Key Point: Each dimension of an embedding is not typically labeled by hand. For educational purposes, we've compressed several common relationships into interpretable "projection axes" to help build spatial intuition.

Try Vector Analogy

Vector addition and subtraction aren't mysterious. Their meaning is typically: preserve certain relationships while replacing certain attributes. If a direction happens to encode "gender change" or "status hierarchy change," then moving along that direction may find a new corresponding word.

Analogy in one sentence: "King - Man + Woman = Queen" isn't magic. It means: if the vector space truly learns the "gender change" direction, then applying that direction from one word to another may yield a structurally corresponding new word.

Why Embeddings Take This Form

Similar Co-occurrence = Closer Position

If two words frequently appear in similar contexts, the training process gradually moves them closer together.

Space Shaped by Task

Embeddings aren't learned to "look nice" but to better accomplish prediction tasks. Task requirements shape the spatial structure.

Projection Shows Only Part

2D visualizations work well for teaching, but they're just a corner of the high-dimensional space. Real semantic relationships typically span many more dimensions.