Reduce token usage

Ask AI is part of Agent Studio and isn’t available as a standalone feature for new applications. Use these docs for existing Ask AI implementations. Migration guides will be added when available.

Ask AI’s token usage, and your costs, depend on how much content it sends to the large language model (LLM). To reduce tokens while maintaining response quality, apply these strategies.

Strategy	How it helps	How to apply
Reduce hits per LLM request	Ask AI includes 7 search results (“hits”) by default in each LLM request. While this provides useful context, it also increases the number of tokens sent, raising costs. Reducing the number of hits lowers token usage.	In your Ask AI assistant configuration, change Set a maximum number of search hits per LLM request.
Split records into smaller chunks	Smaller record chunks ensure the LLM receives only relevant context. For example, split long documentation pages into smaller records based on headings.	See Markdown indexing.
Shorten large records	Return only the most relevant excerpts from records instead of the full content.	Use the `attributesToSnippet` parameter or configure it in the Algolia dashboard.
Limit record size	Prevent overly large records from inflating token usage. Set a maximum record size to truncate long records before indexing.	Use the `maxRecordBytes` parameter when indexing content.

Last modified on May 29, 2026

Safeguards

Ask AI API reference