This is a beta feature according to Algolia’s Terms of Service (“Beta Services”).
How guardrails work
When a user sends a message to an agent with guardrails enabled:- Input check: the user’s message is classified against your input-scoped categories. If it matches a category, the agent returns the fallback response without calling the LLM.
- Agent response: if the input passes, the agent generates a response normally.
- Output check: the agent’s response is classified against your output-scoped categories. If it matches, the response is replaced with the fallback.
Guardrails use a fail-open design.
If the classification LLM is unavailable (timeout, API error, rate limit),
content is allowed through rather than blocked.
This prevents guardrail outages from disrupting your agent.
Set up guardrails
Configure a provider
Guardrails need an LLM provider for classification.
You can use the same provider as your agent or a different one.
- Go to your agent in the Agent Studio dashboard.
- Open the Safety controls tab.
- Enable Guardrails.
- Select a provider and model.
Define your agent's scope
Describe your agent’s domain to give the classifier context.
For example: “Customer support agent for an electronics store.”This helps the classifier distinguish legitimate queries from off-topic content.
Add violation categories
Each category defines a type of content to block:
- Name: identifier for the category (for example,
competitor_mentions) - Scope:
input(user messages),output(agent responses), orboth - Description: what content this category catches
- Fallback response: message returned when this category triggers
Category scope
Each category is scoped to control when it’s checked:| Scope | Checks user input | Checks agent output |
|---|---|---|
input | Yes | No |
output | No | Yes |
both | Yes | Yes |
input scope for categories that filter what users can ask (off-topic questions, prompt injection attempts).
Use output scope for categories that filter what the agent can say (confidential information, competitor mentions).
Use both when the same policy applies to both directions.
Fallback responses
When content is blocked, the user sees the category’s fallback response. If no fallback is configured, a default message is used:- Input violations: “I cannot process this request.”
- Output violations: “I cannot provide this response.”
Streaming behavior
For streaming responses, guardrails work differently for input and output: Input guardrails run concurrently with the LLM stream. If a violation is detected mid-stream, a violation event is emitted and the client discards any already-streamed content. Output guardrails classify the full response after streaming completes. If a violation is detected, a violation event is emitted as the final chunk. The client replaces the streamed content with the fallback response. In both cases, the Vercel AI SDK handles the replacement automatically. If you’re building a custom integration, handle theguardrailViolation event in your streaming parser.
API configuration
Configure guardrails in the agent’sconfig.guardrail object:
JSON
Configuration reference
| Field | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Turn guardrails on or off |
required | boolean | false | Return 503 if the guardrail provider can’t be initialized |
providerId | string | — | UUID of the provider authentication for classification |
model | string | — | Model name for the classification LLM |
scope | string | — | Description of the agent’s domain (max 1,024 characters) |
categories | array | [] | List of violation categories |
Category reference
| Field | Type | Default | Description |
|---|---|---|---|
name | string | required | Category identifier (1-64 characters) |
scope | string | both | input, output, or both |
description | string | — | What content this category catches (max 1,024 characters) |
fallbackResponse | string | — | Message returned when this category triggers |
Handling guardrail events
If you’re using the Vercel AI SDK, guardrail violations are handled automatically. The SDK replaces blocked content with the fallback response. For custom integrations, handle these streaming events:| Event (AI SDK v5) | Event (AI SDK v4) | Description |
|---|---|---|
data-guardrail-violation | guardrailViolation data chunk | Content was blocked. Contains category, guardrailType, and fallbackResponse. |
- Discard any content already streamed for this message.
- Display the
fallbackResponsefrom the event data.
Troubleshooting
Guardrail not blocking content
- Verify the category’s
scopematches the direction you’re testing (input vs output). - Check the category
descriptionis specific enough for the classifier to detect violations. - Test with clear violations first, then refine descriptions for edge cases.
”Review your guardrail model configuration” error
This error appears in the playground when the guardrail provider can’t classify content. Common causes:- Invalid or expired API key on the provider
- Wrong model name
- Provider rate limit exceeded
Slow response times
Guardrail classification adds latency to each request. To minimize this:- Use a fast, low-latency model for classification
- Keep the number of categories under eight
- Keep category descriptions concise