Tensor Lab · SimLabs Visual

Tensor Visualization Lab

The term "tensor" sounds like advanced mathematics, but in deep learning, its most common meaning is actually quite simple: a numerical container with multiple axes. A scalar is a rank-0 tensor, a vector is rank-1, a matrix is rank-2, and at rank-3 and above, tensors can naturally represent more complex structures like batches, time steps, channels, attention heads, positions, and hidden dimensions.

Start with rank & shape Then indexing & slicing Finally operations

Think of Tensors as Data Boxes of Different Dimensions

Click the rank buttons below. You'll see: tensors aren't mysterious objects, they're just the natural extension of arrays as dimensions increase. Once you understand rank and shape, you've grasped the core concepts.

Tensor Appearance
A useful mnemonic: rank describes "how many axes there are", while shape describes "how long each axis is". For example, [batch, seq, hidden] = [2, 4, 8] is a rank-3 tensor.

Build Your Own 3D Tensor, Click to See Element Position

This experiment breaks down a 3D tensor into "multiple slice matrices". You can modify the shape, switch semantic presets, select slices, and click elements to see their coordinates, values, and flattened position.

Axis 0

The outermost axis. Often represents batch, head, or channel.

Length 2

Axis 1

The middle axis. Often represents token sequence, rows, or height.

Length 3

Axis 2

The innermost axis. Often represents hidden dimension, columns, or width.

Length 4
Slice View
Currently selected element

4 Most Common Tensor Operations, See How Data Changes

This section isn't just formulas - it breaks down the most common tensor operations in deep learning into observable processes. Each experiment preserves a "before" and "after" visual comparison.

1. Reshape: Change Shape, Not Element Count

You can rearrange the same data into different shapes, but the total number of elements and their order remain unchanged.

Original Shape [3, 4]

Target Shape

2. Transpose: Swap Axes

In 2D matrices, transposition most intuitively swaps rows with columns. Click any element on the left to see its new coordinates after transposition.

Original Matrix X

After Transpose XT

3. Reduce: Aggregate Along an Axis

In deep learning, we often perform sum or mean along an axis. Click any cell in the result matrix on the right, and the left side will highlight which original elements contributed to it.

Original 3D Tensor

After Aggregation

Original elements contributing to current aggregation Current result cell

4. Broadcast: Auto-expand Small Shapes for Operations

The most common use of broadcast is adding a bias vector to every row of a matrix. You can adjust each dimension's bias to see how the results change.

bias[0]

Value 2

bias[1]

Value 0

bias[2]

Value -2

bias[3]

Value 3

Matrix A

Bias Vector b

A + b

Common Tensor Shapes in Large Models

Input Representation

[batch, seq, hidden]

A batch of sentences, where each sentence has tokens, and each token has a hidden dimension vector. This is one of the most common backbone shapes in Transformers.

Attention Scores

[batch, heads, query_len, key_len]

Each attention head produces a score table of "query looking at keys", making it naturally a higher-dimensional tensor.

Classification or Next Token Probability

[batch, vocab] or [batch, seq, vocab]

The output layer often needs to score the entire vocabulary, making the last dimension very large.

Multimodal Input

[batch, channel, height, width]

Input tensors for images, video, and audio typically have additional spatial or temporal dimensions, but are essentially "multi-axis arrays".

Core intuition: If you can first explain "what each axis represents", reading model shapes becomes much easier. Many errors that seem complex are actually just shape mismatches.

Concepts and Related Formulas

Concept 1: Tensor Rank and Shape

A tensor can be denoted as X. If it has r axes, we call it a rank-r tensor; its shape is denoted as (d1, d2, ..., dr).

rank(X) = r shape(X) = (d1, d2, ..., dr) total elements = d1 × d2 × ... × dr

Concept 2: Indexing an Element

A single element in a tensor requires multiple subscripts to locate. For example, an element in a rank-3 tensor can be written as x[i, j, k].

X[i1, i2, ..., ir] is a specific element 0 <= ik < dk

Concept 3: Reshape

Reshape only changes "how we view these numbers", not the element values or total count. Therefore, the total number of elements must be exactly equal before and after reshape.

d1 × d2 × ... × dr = d'1 × d'2 × ... × d's reshape: X -> Y

Concept 4: Transpose / Permute

Transpose essentially swaps axis order. In 2D, it's most intuitive: rows become columns and vice versa. In higher dimensions, it's a more general axis permutation.

2D: Y[j, i] = X[i, j] General: permute rearranges axis order

Concept 5: Reduce

Summing or averaging along an axis "collapses" that axis. This is common in pooling, loss aggregation, and batch averaging.

sum: Y[...] = Σj X[..., j, ...] mean: Y[...] = (1 / n) × Σj X[..., j, ...]

Concept 6: Broadcast

Broadcast allows tensors with smaller shapes to automatically expand when rules are satisfied. One common case is adding a vector to each row of a matrix.

Broadcast is allowed if dimensions are equal or one has dimension 1 Z = X + b, b's shape can be [d]
In short: Tensors aren't "scarier than matrices" - they're just the natural extension of "axis-based data structures" to higher dimensions. Once you grasp the key concepts of rank, shape, indexing, and axis operations, much deep learning code becomes much clearer.

Formal and Intuitive Explanations of Tensors

Formal Definition: Tensors in Mathematics

More rigorously, tensors aren't "just numbers piled together" - they're mathematical objects that satisfy specific transformation rules under coordinate changes. In linear algebra and differential geometry, they're often viewed as multilinear maps, and can be written as component objects with multiple superscripts and subscripts.

Looking at the most common engineering notation in deep learning, we often write a tensor as X ∈ R^(d1 × d2 × ... × dr), meaning it's a real-valued tensor with shape (d1, d2, ..., dr).

Common engineering notation: X ∈ R^(d1 × d2 × ... × dr) One more rigorous mathematical notation: T : V1 × V2 × ... × Vr -> R In component form, commonly written as: T[i1, i2, ..., ir]

Formal Note: Why "High-Dimensional Array" is Just an Engineering Approximation

In machine learning code, treating tensors as "high-dimensional arrays" is almost always sufficient, so people often say tensors are just multidimensional arrays. But from a stricter mathematical perspective, what really matters isn't just the number of dimensions, but how components transform together when changing between different bases or coordinate systems.

In other words, high-dimensional arrays are the most common and easiest-to-manipulate "storage form" of tensors; while tensors themselves emphasize structure and transformation rules.

2D matrices are just a special case of tensors Vectors, matrices, and higher-dimensional arrays can all represent tensor components In deep learning, we often simplify: Tensor ≈ shape-aware multidimensional numeric container

Intuitive Explanation: Think of Tensors as "Labeled Number Warehouses"

Without abstract mathematics, the easiest way to understand tensors is as "number warehouses". The warehouse contains many numbers, and each label answers a question: which sample? which token? which channel? which hidden dimension?

Once these axis labels are clear, tensors aren't mysterious. You're not looking at a bunch of numbers, but "data organized across multiple dimensions".

Example: X[batch, seq, hidden] Can be read as: sample index × position index × feature index

Intuitive Explanation: From Points, Lines, Tables to Data Cubes

A single number is a scalar; a row of numbers is a vector; a table is a matrix; stacking many tables in a new direction creates a higher-rank tensor. You can think of it as Excel spreadsheets "adding another dimension": starting with rows and columns, then adding batch, time, heads, channels - and you get a tensor.

So tensors aren't fundamentally different from vectors and matrices - they're their natural generalization.

Rank 0: x Rank 1: [x1, x2, ..., xn] Rank 2: X[i, j] Rank 3: X[i, j, k] Rank n: X[i1, i2, ..., in]
Reconciling the two views: Mathematically, tensors emphasize "multilinear structure and transformation rules"; in deep learning engineering, tensors are typically "shape-aware multidimensional arrays". When coding, you'll mostly use the latter, but understanding the former provides conceptual rigor.
Intuitively: A tensor is "a collection of numbers organized along multiple axes". Formally: A tensor is a mathematical object that can be represented with multidimensional components and satisfies specific transformation rules. Deep learning connects these two: we compute with tensors as multidimensional arrays while using them to represent higher-dimensional data structures.