Page 18 ยท SimLabs LLM Visual

Temperature, Top-k, Top-p & Decoding Strategies

The same model can produce noticeably different response styles with different inference parameters. The reason is not that "the model suddenly changes its mind," but that the strategy for selecting the next token differs at the final step. Temperature changes the sharpness of the distribution, Top-k and Top-p trim the candidate set, and Greedy directly picks the highest probability item.

See how probability distributions change Sample once in real time Then understand "stability" and "divergence"

Next Token Sampling Lab

First pick a scenario, then switch decoding strategies, and finally drag the parameters. You will directly see how the retention range, probability distribution, and sampling results of candidate tokens change.

Temperature

The lower the temperature, the sharper the distribution; the higher the temperature, the flatter the distribution, giving low-probability items a better chance of being sampled.

Current Temperature 1.00

Top-k

Only keep the top k candidates with the highest probability; other tokens are directly cut off. Suitable for controlling the search space.

Current Top-k 4

Top-p

Keep a "sufficiently large" minimal set based on cumulative probability. When the distribution is very sharp, fewer items are kept; when it is flat, more items are kept.

Current Top-p 0.85

Current Prompt

Current Final Distribution

The bar chart below shows the distribution that actually participates in sampling under the current strategy. Cut-off items become 0.

What Happened in This Step

Four Strategies at a Glance

This set of cards places the current scenario under Greedy, Temperature, Top-k, and Top-p respectively, helping you quickly establish a sense of contrast between "stability" and "diversity."

Common misconception: High temperature does not mean "smarter," only that sampling is more exploratory; Top-k / Top-p are not "the larger the better," but rather a balance between stability and diversity.
In a nutshell: Decoding strategies do not change the knowledge the model has learned, but they change "which candidates to choose from, how conservatively to choose, and whether to retain randomness," so the same model can exhibit different styles.