LLM Providers
Reactive Agents supports multiple LLM providers through a unified LLMService interface. Switch providers with a single line — your agent code stays the same.
Supported Providers
Section titled “Supported Providers”| Provider | Models | Tool Calling | Streaming | Embeddings | Prompt Caching |
|---|---|---|---|---|---|
| Anthropic | Claude 3.5 Haiku, Claude Sonnet 4, Claude Opus 4 | Yes | Yes | No (use OpenAI) | Yes (explicit) |
| OpenAI | GPT-4o, GPT-4o-mini | Yes | Yes | Yes | Yes (automatic) |
| Google Gemini | Gemini 2.0 Flash, Gemini 2.5 Flash, Gemini 2.5 Pro | Yes | Yes | No | Yes (automatic) |
| Ollama | Any locally hosted model | Yes | Yes | Yes | No |
| LiteLLM | 100+ models via LiteLLM proxy | Yes | Yes | No | Depends |
| Test | Mock provider for testing (withTestScenario) | Yes* | Yes* | No | No |
*The test provider advertises native tool calling so kernels exercise the same FC path as real providers; responses are still fully deterministic from your scenario.
Configuration
Section titled “Configuration”Set your API key in .env and specify the provider:
import { ReactiveAgents } from "reactive-agents";
// Anthropicconst agent = await ReactiveAgents.create() .withProvider("anthropic") .withModel("claude-sonnet-4-20250514") .build();
// OpenAIconst agent = await ReactiveAgents.create() .withProvider("openai") .withModel("gpt-4o") .build();
// Google Geminiconst agent = await ReactiveAgents.create() .withProvider("gemini") .withModel("gemini-2.5-flash") .build();
// Ollama (local)const agent = await ReactiveAgents.create() .withProvider("ollama") .withModel("cogito") .build();
// LiteLLM proxy (100+ models)const agent = await ReactiveAgents.create() .withProvider("litellm") .withModel("gpt-4o") .build();Environment Variables
Section titled “Environment Variables”ANTHROPIC_API_KEY=sk-ant-...OPENAI_API_KEY=sk-...GOOGLE_API_KEY=...OLLAMA_ENDPOINT=http://localhost:11434 # defaults to thisLITELLM_BASE_URL=http://localhost:4000 # LiteLLM proxy endpoint
TAVILY_API_KEY=tvly-... # web search — Tavily (primary)BRAVE_SEARCH_API_KEY=BSA... # web search — Brave (fallback)SERPER_API_KEY=... # web search — Serper/Google (fallback)
LLM_DEFAULT_MODEL=claude-sonnet-4-20250514LLM_DEFAULT_TEMPERATURE=0.7LLM_MAX_RETRIES=3LLM_TIMEOUT_MS=30000Web Search Providers
Section titled “Web Search Providers”The built-in web-search tool supports four providers that are tried in priority order. The first provider that returns usable results wins; the rest are skipped. No configuration is required to use DuckDuckGo (the no-key fallback).
| Provider | Env var | API key required | Notes |
|---|---|---|---|
| Tavily | TAVILY_API_KEY | Yes | High-quality results; primary recommended provider |
| Brave | BRAVE_SEARCH_API_KEY or BRAVE_API_KEY | Yes | Full-web coverage; good Tavily fallback |
| Serper | SERPER_API_KEY | Yes | Google-backed results; 2,500 free queries/month, low-cost plans |
| DuckDuckGo | (none) | No | Instant answers only; limited coverage but always available |
Provider chain
Section titled “Provider chain”Tavily → Brave → Serper → DuckDuckGoEach provider is skipped automatically if its API key is not set. If a provider returns an error or no usable rows, the chain continues to the next one.
Enabling Serper
Section titled “Enabling Serper”Serper proxies Google Search results and is a good option when Tavily quota is exhausted or when you want low-cost, high-volume search. Sign up at serper.dev to get an API key.
SERPER_API_KEY=your-serper-api-key// No code changes needed — set the env var and web-search uses Serper automaticallyconst agent = await ReactiveAgents.create() .withProvider("anthropic") .withTools(["web-search"]) .build();
// The agent will now use Tavily → Brave → Serper → DuckDuckGo as its search chainModel Presets
Section titled “Model Presets”| Preset | Provider | Cost/1M Input | Context Window | Quality |
|---|---|---|---|---|
claude-haiku | Anthropic | $1.00 | 200K | 0.60 |
claude-sonnet | Anthropic | $3.00 | 200K | 0.85 |
claude-opus | Anthropic | $15.00 | 1M | 1.00 |
gpt-4o-mini | OpenAI | $0.15 | 128K | 0.55 |
gpt-4o | OpenAI | $2.50 | 128K | 0.80 |
gemini-2.0-flash | Gemini | $0.10 | 1M | 0.75 |
gemini-2.5-flash | Gemini | $0.15 | 1M | 0.80 |
gemini-2.5-pro | Gemini | $1.25 | 1M | 0.95 |
Tool Calling
Section titled “Tool Calling”When tools are enabled, each provider translates tool definitions to its native format automatically:
- Anthropic —
toolsparameter with Anthropic’stool_useformat; last tool marked withcache_controlto cache the full schema block - OpenAI —
toolsarray withfunction_calling; automatic prompt caching applies to tool schemas - Gemini —
functionDeclarationsintoolsarray; function calling supported natively - Ollama — OpenAI-compatible
toolsarray via the Ollama SDK - LiteLLM — OpenAI-compatible
toolsarray forwarded to proxy
Prompt Caching
Section titled “Prompt Caching”Each provider implements caching differently. The framework handles cost discounting automatically when the provider reports cached token usage.
Anthropic — Explicit cache_control
Section titled “Anthropic — Explicit cache_control”Anthropic uses manual cache hints via cache_control: { type: "ephemeral" } blocks. The framework automatically applies these to system prompts ≥ 1,024 tokens and to the full tool schema block on every request:
- System prompt: Cached when
>= ~4,096 chars— 90% discount on cache hits, 25% surcharge on writes - Tool schemas: Last tool in the array is marked, caching the full schema block
Cache TTL is 5 minutes. The framework handles this transparently — no configuration required.
Gemini — Automatic Implicit Caching
Section titled “Gemini — Automatic Implicit Caching”Gemini 2.0 Flash and 2.5 models support automatic context caching — Google’s servers cache repeated prefixes server-side with no client code required. When a cache hit occurs, cachedContentTokenCount is returned in the usage metadata and the framework applies a 75% cost discount automatically.
There is no minimum token requirement for implicit caching — Google manages it transparently for eligible models.
// No special config needed — Gemini caches automaticallyconst agent = await ReactiveAgents.create() .withProvider("gemini") .withModel("gemini-2.5-flash") .withTools() .build();// Repeated system prompts and tool schemas are cached by Gemini automaticallyOpenAI — Automatic Caching
Section titled “OpenAI — Automatic Caching”OpenAI applies automatic prompt caching server-side for inputs longer than 1,024 tokens. Cached tokens are returned as cached_tokens in the usage object and the framework applies a 50% cost discount automatically.
Provider Adapters
Section titled “Provider Adapters”Provider adapters are lightweight hook objects the kernel calls at specific points to compensate for model-specific behavior differences — especially useful for local and mid-tier models that need more explicit guidance.
The framework ships three built-in adapters selected automatically by model tier:
| Tier | Adapter | Behavior |
|---|---|---|
local | localModelAdapter | Explicit task framing, tool guidance, error recovery, quality check |
mid | midModelAdapter | Lighter continuation hint + synthesis prompt |
large / frontier | defaultAdapter | Structured decision framework only |
Adapter Hooks (7 total)
Section titled “Adapter Hooks (7 total)”| Hook | When it fires | What it does |
|---|---|---|
systemPromptPatch | Once at system prompt build time | Append multi-step completion instructions (local tier) |
toolGuidance | Once after the tool schema block in the system prompt | Append inline required-tool reminder |
taskFraming | First iteration only (iteration 0) | Wrap task message with explicit numbered steps |
continuationHint | Each iteration when required tools are still pending | Inject guidance as user message after tool results |
errorRecovery | When a tool call returns a failed result | Append context-aware recovery hint to the observation |
synthesisPrompt | Research→produce transition (all search tools satisfied) | Replace generic progress message with “write it now” |
qualityCheck | Once before final answer (gated by qualityCheckDone flag) | Self-eval prompt; fires only once to prevent loops |
You can register a fully custom adapter:
import { selectAdapter } from "@reactive-agents/llm-provider";
// The built-in adapters are selected automatically by tier.// Access them directly for inspection or extension:import { localModelAdapter, midModelAdapter, defaultAdapter } from "@reactive-agents/llm-provider";Embeddings
Section titled “Embeddings”Embeddings are routed through the configured embedding provider regardless of which chat provider you use:
EMBEDDING_PROVIDER=openaiEMBEDDING_MODEL=text-embedding-3-smallEMBEDDING_DIMENSIONS=1536const vectors = await llm.embed(["text to embed", "another text"]);// Returns: number[][] (one vector per input text)Structured Output
Section titled “Structured Output”Parse LLM responses into typed objects with automatic retry on parse failure:
import { Schema } from "effect";
const WeatherSchema = Schema.Struct({ city: Schema.String, temperature: Schema.Number, conditions: Schema.String,});
const weather = await llm.completeStructured({ messages: [{ role: "user", content: "Weather in Tokyo" }], outputSchema: WeatherSchema, maxParseRetries: 2,});Automatic Retry and Timeout
Section titled “Automatic Retry and Timeout”All providers include built-in retry logic with exponential backoff:
- Rate limit (429) — Retried with backoff, tracked as
LLMRateLimitError - Timeout — Configurable per-request, defaults to 30 seconds
- Retries — Configurable, defaults to 3 attempts
Testing
Section titled “Testing”Use withTestScenario() for deterministic, offline testing:
const agent = await ReactiveAgents.create() .withTestScenario([ { match: "capital of France", text: "Paris is the capital of France." }, ]) .build();