Guardrails - Algolia

This is a beta feature according to Algolia’s Terms of Service (“Beta Services”).

Guardrails classify user messages and agent responses against categories you define, blocking content that violates your policies. When content is blocked, a fallback response is returned instead.

How guardrails work

When a user sends a message to an agent with guardrails enabled:

Input check: the user’s message is classified against your input-scoped categories. If it matches a category, the agent returns the fallback response without calling the LLM.
Agent response: if the input passes, the agent generates a response normally.
Output check: the agent’s response is classified against your output-scoped categories. If it matches, the response is replaced with the fallback.

Classification uses a separate LLM call with a dedicated model and provider. This keeps guardrail logic independent from the agent’s main model.

Guardrails use a fail-open design. If the classification LLM is unavailable (timeout, API error, rate limit), content is allowed through rather than blocked. This prevents guardrail outages from disrupting your agent.

Set up guardrails

Configure a provider

Guardrails need an LLM provider for classification. You can use the same provider as your agent or a different one.

Go to your agent in the Agent Studio dashboard.
Open the Safety controls tab.
Enable Guardrails.
Select a provider and model.

Use a fast, low-latency model for classification. Larger models may improve accuracy but add latency to every request.

Define your agent's scope

Describe your agent’s domain to give the classifier context. For example: “Customer support agent for an electronics store.”This helps the classifier distinguish legitimate queries from off-topic content.

Add violation categories

Each category defines a type of content to block:

Name: identifier for the category (for example, competitor_mentions)
Scope: input (user messages), output (agent responses), or both
Description: what content this category catches
Fallback response: message returned when this category triggers

Add categories that match your use case. For example, an e-commerce agent might block competitor mentions, off-topic questions, and inappropriate content.

Consider consolidating categories if you have more than eight. Too many categories can reduce classification accuracy.

Test in the playground

Send messages that should be blocked and messages that should pass through. The playground shows guardrail violations and provider errors so you can verify your configuration before deploying.

Category scope

Each category is scoped to control when it’s checked:

Scope	Checks user input	Checks agent output
`input`	Yes	No
`output`	No	Yes
`both`	Yes	Yes

Use input scope for categories that filter what users can ask (off-topic questions, prompt injection attempts). Use output scope for categories that filter what the agent can say (confidential information, competitor mentions). Use both when the same policy applies to both directions.

Fallback responses

When content is blocked, the user sees the category’s fallback response. If no fallback is configured, a default message is used:

Input violations: “I cannot process this request.”
Output violations: “I cannot provide this response.”

Configure specific fallback responses to guide users toward acceptable queries. For example: “I can only help with questions about our products and services.”

Streaming behavior

For streaming responses, guardrails work differently for input and output: Input guardrails run concurrently with the LLM stream. If a violation is detected mid-stream, a violation event is emitted and the client discards any already-streamed content. Output guardrails classify the full response after streaming completes. If a violation is detected, a violation event is emitted as the final chunk. The client replaces the streamed content with the fallback response. In both cases, the Vercel AI SDK handles the replacement automatically. If you’re building a custom integration, handle the guardrailViolation event in your streaming parser.

API configuration

Configure guardrails in the agent’s config.guardrail object:

JSON

{
  "config": {
    "guardrail": {
      "enabled": true,
      "providerId": "PROVIDER_UUID",
      "model": "gpt-4.1-mini",
      "scope": "Customer support agent for an electronics store.",
      "categories": [
        {
          "name": "off_topic",
          "scope": "input",
          "description": "Questions unrelated to electronics or the store.",
          "fallbackResponse": "I can only help with electronics questions."
        },
        {
          "name": "inappropriate",
          "scope": "both",
          "description": "Offensive, hateful, or sexually explicit content."
        }
      ]
    }
  }
}

Configuration reference

Field	Type	Default	Description
`enabled`	boolean	`false`	Turn guardrails on or off
`required`	boolean	`false`	Return 503 if the guardrail provider can’t be initialized
`providerId`	string	—	UUID of the provider authentication for classification
`model`	string	—	Model name for the classification LLM
`scope`	string	—	Description of the agent’s domain (max 1,024 characters)
`categories`	array	`[]`	List of violation categories

Category reference

Field	Type	Default	Description
`name`	string	required	Category identifier (1-64 characters)
`scope`	string	`both`	`input`, `output`, or `both`
`description`	string	—	What content this category catches (max 1,024 characters)
`fallbackResponse`	string	—	Message returned when this category triggers

Handling guardrail events

If you’re using the Vercel AI SDK, guardrail violations are handled automatically. The SDK replaces blocked content with the fallback response. For custom integrations, handle these streaming events:

Event (AI SDK v5)	Event (AI SDK v4)	Description
`data-guardrail-violation`	`guardrailViolation` data chunk	Content was blocked. Contains `category`, `guardrailType`, and `fallbackResponse`.

When you receive a violation event:

Discard any content already streamed for this message.
Display the fallbackResponse from the event data.

Troubleshooting

Guardrail not blocking content

Verify the category’s scope matches the direction you’re testing (input vs output).
Check the category description is specific enough for the classifier to detect violations.
Test with clear violations first, then refine descriptions for edge cases.

”Review your guardrail model configuration” error

This error appears in the playground when the guardrail provider can’t classify content. Common causes:

Invalid or expired API key on the provider
Wrong model name
Provider rate limit exceeded

Check your provider settings and try again. This error doesn’t appear in production. In production, guardrails fail open silently.

Slow response times

Guardrail classification adds latency to each request. To minimize this:

Use a fast, low-latency model for classification
Keep the number of categories under eight
Keep category descriptions concise

​How guardrails work

​Set up guardrails

​Category scope

​Fallback responses

​Streaming behavior

​API configuration

​Configuration reference

​Category reference

​Handling guardrail events

​Troubleshooting

​Guardrail not blocking content

​”Review your guardrail model configuration” error

​Slow response times

​See also

How guardrails work

Set up guardrails

Category scope

Fallback responses

Streaming behavior

API configuration

Configuration reference

Category reference

Handling guardrail events

Troubleshooting

Guardrail not blocking content

”Review your guardrail model configuration” error

Slow response times

See also