Step by Step
Start with the roadmap to build a global picture, then dive into individual page interactions. This way, when you learn about Attention, RAG, and Agent, they won't seem like isolated concepts.
This is a series of interactive courses for understanding large model technology from scratch. Its goal is not to pile up jargon, but to make Token, vectors, Tokenizer, Attention, QKV, Transformer and decoding strategies observable, operable, and progressively learnable HTML pages.
Start with the roadmap to build a global picture, then dive into individual page interactions. This way, when you learn about Attention, RAG, and Agent, they won't seem like isolated concepts.
Each page focuses on one core question, avoiding cramming multiple new concepts into a single page.
Use heatmaps, step-by-step playback, parameter sliders, matrix highlighting, and comparison cards to let users observe changes themselves, rather than just reading text.
If this is your first time exploring this series, we recommend starting with these key pages. After the page script loads, all available courses will be automatically expanded here.
Build a panoramic view first to understand where AI, machine learning, deep learning, and large models fit in the technology landscape.
Understand why language needs to be tokenized and why tokens are mapped to vectors.
Build visual mathematical foundation for Embedding, linear layers, and QK dot product.
Use rank, shape, slicing and common operations to turn tensors from abstract concepts into observable and manipulable objects.
Understand why models see token sequences instead of words, and why tokens are more stable.
Understand why a token needs to look at other tokens, not just itself.
Directly experience Q, K, V, scoring matrix and final output. Quickly turn attention mechanism into an operable process.
Compare single-head mixed views vs multi-head specialization, then see how Concat + W^O fuses results from different heads.
Break down "preserve backbone, normalize scale, process features" into actionable experiments to complete the intuition beyond attention in Transformer Block.
Understand from a system perspective why models often need to search before answering, and how retrieval enhances generation quality.
Evolve from "can answer" to "can continuously complete tasks". Understand why Agent is a cyclic system.
Clickable cards represent currently accessible pages; gray cards indicate planned content still under development. After the page script loads, this will automatically switch to the complete phase-based course list.
Build the overall map first, then understand how text is transformed into model-computable representations step by step.
Where do AI and large models fit in the technology spectrum? Why does text need to become Tokens, vectors, matrices and tensors?
This phase bridges "basic representation" to "how models learn semantic structure", avoiding gaps when jumping into Transformer later.
How do neural networks recombine raw features into hidden representations? Why can Embedding bring similar concepts closer? What exactly are language models optimizing during training?
This is the technical heart of the entire series, focusing on Tokenizer, Attention, QKV, causal Mask, positional encoding, multi-head attention, Residual/LayerNorm/FFN and Transformer Block.
How does the model view relationships between tokens? Why can't decoder peek into the future? How is sequential information explicitly injected? Why are multiple heads more complete? How does the other half beyond attention preserve backbone and continue processing representations?
Understand how models are trained and why the same model can behave very differently with different decoding parameters.
How do model capabilities gradually emerge from pre-training to alignment? How do temperature, Top-k and Top-p affect output style?
Place individual model capabilities into real product systems. Understand RAG, Tool Use, Agent, safety, evaluation and deployment.
A "usable LLM system" is more than just a conversation - it requires retrieval, tools, evaluation, safety boundaries and engineering trade-offs.
Understand the layers you need to learn before diving into specific mechanisms.
Understand how text becomes tokens, vectors, matrices and tensors.
Build intuition on neural networks, Embedding and LM objectives before entering Transformer.
Dive into Attention, QKV, causal Mask, positional encoding, multi-head attention, Residual/LayerNorm/FFN and Block structure.
Finally explore sampling, RAG, tool use, safety and deployment.