Character View
First, view text as a stream of fine-grained characters. Characters are the smallest unit in human intuition, but not necessarily the best unit for the model to process.
Large language models don't "read" text directly. Text is first split into tokens that the model can process, then further mapped into vectors. On this page, you can enter a sentence, switch examples, and click tokens to see how they transform into machine-computable representations step by step.
For introductory demonstration purposes, this uses "educational tokenization" rather than a full industrial-grade tokenizer. The goal is to help you understand: text doesn't enter the model directly; it's first split into more stable units.
First, view text as a stream of fine-grained characters. Characters are the smallest unit in human intuition, but not necessarily the best unit for the model to process.
For more stable language processing, models often split text into tokens first. Click any token below to see its corresponding "conceptual vector".
The "conceptual vector" shown below helps you build intuition that "tokens are projected into numerical vectors", rather than reproducing actual embedding values from a specific model.