Page 20 · SimLabs LLM Visual

Pretraining, SFT & Alignment

Large language models are not naturally like chat assistants right after training. They typically first learn broad language patterns and knowledge through pretraining, then learn to "answer by instruction" via supervised fine-tuning, and finally undergo preference alignment to make outputs more aligned with human expectations of helpfulness, safety, and style.

Learn language patterns first Then learn to answer as instructed Finally learn preferences and boundaries

Observe how the model changes across training stages with the same prompt

First select a training stage, then switch between prompts. You'll see that pretrained models often behave more like "text completion language systems," while SFT and preference alignment gradually push it toward a more assistant-like interaction style.

Illustrative Output for Same Prompt

What It Learns

Primary Data Format

Optimization Objective

Key insight: These stages do not replace one another; they shape capabilities on top of the previous stage. Pretraining determines "what's inside the mind," SFT determines "whether it can answer as instructed," and preference alignment determines "whether the style and boundaries of the response are closer to human expectations."

Why "knowing a lot" is not the same as "being a good assistant"

Pretraining excels in coverage

It has seen massive amounts of text, so it learns broad knowledge and language patterns, but that does not mean it inherently knows you are giving a task, asking a question, or requesting a summary.

SFT excels in interaction format

Through numerous "instruction-response" samples, the model learns to treat user input as a task to execute rather than just continuing the text.

Preference alignment excels in selection bias

When multiple answers are "all somewhat plausible," alignment further influences the model to prefer a certain style — for example, more concise, more polite, or safer.

Boundaries are still not absolute

Alignment is a significant improvement, not a magic switch. In real-world systems, it usually works together with policy layers, safety layers, and tool layers.

A concise sequence of the training route

1

Pretraining

Teach the model "what the next token should be" on massive text corpora.

2

SFT

Use high-quality examples to teach the model how to answer in task format.

3

Preference Alignment

Adjust response style and boundaries based on human preferences or reward signals.

4

System Enhancement

Combine retrieval, tools, constraints, and evaluation to turn the model into a product capability.

In a nutshell: Pretraining addresses "whether it knows language and knowledge," SFT addresses "whether it can answer by task," and alignment addresses "whether the answering style is more like a usable assistant."