Page 04 · SimLabs LLM Visual

Neural Network Basics: From Linear to Nonlinear

Before diving into Embedding, Attention, and Transformer, let's first understand what neural networks really do. They're not "mysterious black boxes"—they take input features, perform weighted summation, add biases, pass through activation functions, and send the results to the next layer for further recombination. The key to their expressive power lies in hidden layers and nonlinearity.

Understand Weighted Summation Understand What Hidden Layers Do Grasp the Necessity of Nonlinearity

Run a Minimal Feedforward Neural Network

First select a task scenario, then adjust the input features. This educational network compresses 3 input features into a hidden layer and outputs 3 types of tendencies. The goal isn't to simulate real large model parameters, but to understand the concept of "feature recombination."

Information Density

How much factual clues, background information, and reasoning material are contained in this input.

Current Value0.82

Emotional Intensity

Whether this input carries strong emotions, or requires the model to be more comforting, explanatory, and tone-aware.

Current Value0.18

Action Orientation

Whether this input is more about requesting actions, tool calls, output lists, or task completion.

Current Value0.28

Input Layer

The input layer doesn't "understand"—it simply passes the feature values of the current sample to the next layer.

Hidden Layer

The hidden layer recombines raw features into more useful intermediate patterns, such as "fact detectors" or "task detectors."

Output Layer

The output layer reads hidden layer activations and converts them into probability distributions closer to the final task judgment.

First, remember this: Neural networks aren't "thinking out of thin air"—they're recombining input features layer by layer. What hidden layers learn isn't the final answer, but intermediate representations more suitable for the next step.

Why Nonlinearity is the Watershed

If every layer only performs linear transformations, no matter how many layers you stack, the whole system is still equivalent to one larger linear transformation. What truly enables networks to express complex patterns is the "bending capability" brought by activation functions like ReLU, tanh, and sigmoid.

How Linear vs. Nonlinear See the Same Input

The left card below treats the hidden layer as purely linear (no activation function), while the right card retains nonlinearity. Comparing the output distributions on both sides helps visualize why hidden layers can't be reduced to just doing more matrix multiplications.

XOR Experiment

"Activate if only one condition is met" is a classic nonlinear pattern. Linear models cannot simultaneously score (1,0) and (0,1) as high, while scoring (0,0) and (1,1) as low.

Nonlinearity in one sentence: Linear layers are responsible for "remixing information," while activation functions decide "when to trigger certain patterns." Without the second step, networks struggle to learn truly curved, piecewise, and conditional rules.

Three Key Takeaways from This Page

Input Layer Isn't the Understanding Layer

The input layer simply feeds in sample features. True "pattern formation" happens in hidden layers, where weighted combinations and activations first emerge.

Hidden Layers Learn Intermediate Representations

Hidden layers don't directly output "final answers"—they're more like distilling information layer by layer: which information is more important, which combinations should be emphasized.

Activation Functions Determine Expressiveness

Without nonlinearity, even deep networks are just larger linear mappings; with nonlinearity, networks gain the ability to fit complex patterns.