Page 06 · SimLabs LLM Visual

Language Model Objective: Predicting the Next Token

The core task of training a large language model is not to "memorize entire answers," but to maximize the probability of the correct next token at each position. Think of it this way: given a prefix, the model repeatedly practices "what should come next" and uses loss to measure how far it is from the correct answer.

Observe next-token distribution Watch loss decrease Understand how sentences become training samples

Training Objective: Increase Probability of Correct Answers

First select a prefix scenario, then adjust the "Training Progress" slider. Higher training progress means higher probability assigned to the correct token, and corresponding lower loss. This is the core optimization direction of language model training.

Current Prefix

The model can only see these tokens right now, and its task is to predict "what is most likely to come next."

Training Progress0 / 6
Loss

What Loss Measures

If the model assigns most probability to the correct token, loss decreases; if it wastes probability on wrong candidates, loss remains high.

The Most Important Formula: Loss = -log P(correct token | current prefix). It focuses on one thing: whether the model has increased the probability of the correct answer.

One Complete Sentence Becomes Multiple Training Samples

During training, scoring doesn't happen only at the end of a sentence. Instead, a "prefix → next token" supervision signal is generated at every position. Click any position below to see what input and target the model actually faced.

Current Training Position

After seeing the prefix, the model needs to assign more probability to the actual next token.

Average Loss for This Sentence

A training sample typically contributes loss at multiple positions, which are then aggregated into the overall objective. Think of it this way: the model tries to make fewer mistakes at every step of the sentence.

Summary of this section: Language model training is not about "memorizing entire sentences," but breaking them into many next-token prediction problems and making the correct token more dominant at each step.

Key Takeaways After This Page

Training Objective is Simple

Essentially, it's about continuously increasing the probability of the correct next token, not understanding the "entire answer" at once.

Loss Reflects Probability

High loss usually means the correct token's probability is still too low; low loss means the model is closer to the correct distribution at that position.

Sentence Training Involves Multiple Positions

Each position in a sequence generates supervision signals, so the model learns at every step of the sequence, not just at the end.