Security & Guardrails
The more capable a model is, the more boundaries it needs. Guardrails are not just a single "refusal prompt", but a systematic defense built around input, tool permissions, output content, and audit logs. Its goal is not to prevent the model from doing anything, but to establish a more stable control band between usability and risk.
Switch risk scenarios to see how Guardrails take over
Switch scenarios, then switch the current stage. You'll see whether the system is performing input detection, policy judgment, tool permission constraints, output rewriting, or entering the log and human review process.
Current Request
High-risk scenarios often require not just blocking, but also recording "why it was blocked, what was blocked, and whether human review is needed."
Without Guardrails
With Guardrails
Four Common Guardrail Positions
Pre-Input Filtering
First check if the request itself has obvious risks, such as unauthorized operations, inappropriate content, prompt injection, or suspicious data exfiltration attempts.
Tool Permission Control
Even if the model decides to call tools, not all tools should be available by default. Different users, tasks, and risk levels often correspond to different permissions.
Post-Output Processing
Model-generated results may also need to be scanned again for tasks like data masking, risk warnings, refusal rewriting, citation enrichment, or human approval.
Logging & Human Review
A real system should not only care about "whether it was blocked this time," but also leave reviewable records and introduce human approval workflows when necessary.
Why Security Isn't Just "Total Rejection"
Rejection is Just One Action
Some requests should be rejected outright, but others are better suited for downgrading, rewriting, warning, masking, or routing to humans, rather than applying a blanket policy.
Security Should Be Designed with Task Goals
Customer service, office assistants, code helpers, and medical support systems face different risks, so Guardrails strategies should differ accordingly.
False Positives Matter Too
If the system blocks many legitimate requests, usability drops rapidly. Therefore, security systems also need evaluation and parameter tuning.
Security is Layered Engineering
Prompts, model alignment, tool permissions, rule engines, log auditing, and human review often work together. A single layer can rarely cover all risks.