Skip to content

LLM Providers

Reactive Agents supports multiple LLM providers through a unified LLMService interface. Switch providers with a single line — your agent code stays the same.

ProviderModelsTool CallingStreamingEmbeddingsPrompt Caching
AnthropicClaude 3.5 Haiku, Claude Sonnet 4, Claude Opus 4YesYesNo (use OpenAI)Yes (explicit)
OpenAIGPT-4o, GPT-4o-miniYesYesYesYes (automatic)
Google GeminiGemini 2.0 Flash, Gemini 2.5 Flash, Gemini 2.5 ProYesYesNoYes (automatic)
OllamaAny locally hosted modelYesYesYesNo
LiteLLM100+ models via LiteLLM proxyYesYesNoDepends
TestMock provider for testing (withTestScenario)Yes*Yes*NoNo

*The test provider advertises native tool calling so kernels exercise the same FC path as real providers; responses are still fully deterministic from your scenario.

Set your API key in .env and specify the provider:

import { ReactiveAgents } from "reactive-agents";
// Anthropic
const agent = await ReactiveAgents.create()
.withProvider("anthropic")
.withModel("claude-sonnet-4-20250514")
.build();
// OpenAI
const agent = await ReactiveAgents.create()
.withProvider("openai")
.withModel("gpt-4o")
.build();
// Google Gemini
const agent = await ReactiveAgents.create()
.withProvider("gemini")
.withModel("gemini-2.5-flash")
.build();
// Ollama (local)
const agent = await ReactiveAgents.create()
.withProvider("ollama")
.withModel("cogito")
.build();
// LiteLLM proxy (100+ models)
const agent = await ReactiveAgents.create()
.withProvider("litellm")
.withModel("gpt-4o")
.build();
Terminal window
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
OLLAMA_ENDPOINT=http://localhost:11434 # defaults to this
LITELLM_BASE_URL=http://localhost:4000 # LiteLLM proxy endpoint
TAVILY_API_KEY=tvly-... # web search — Tavily (primary)
BRAVE_SEARCH_API_KEY=BSA... # web search — Brave (fallback)
SERPER_API_KEY=... # web search — Serper/Google (fallback)
LLM_DEFAULT_MODEL=claude-sonnet-4-20250514
LLM_DEFAULT_TEMPERATURE=0.7
LLM_MAX_RETRIES=3
LLM_TIMEOUT_MS=30000

The built-in web-search tool supports four providers that are tried in priority order. The first provider that returns usable results wins; the rest are skipped. No configuration is required to use DuckDuckGo (the no-key fallback).

ProviderEnv varAPI key requiredNotes
TavilyTAVILY_API_KEYYesHigh-quality results; primary recommended provider
BraveBRAVE_SEARCH_API_KEY or BRAVE_API_KEYYesFull-web coverage; good Tavily fallback
SerperSERPER_API_KEYYesGoogle-backed results; 2,500 free queries/month, low-cost plans
DuckDuckGo(none)NoInstant answers only; limited coverage but always available
Tavily → Brave → Serper → DuckDuckGo

Each provider is skipped automatically if its API key is not set. If a provider returns an error or no usable rows, the chain continues to the next one.

Serper proxies Google Search results and is a good option when Tavily quota is exhausted or when you want low-cost, high-volume search. Sign up at serper.dev to get an API key.

Terminal window
SERPER_API_KEY=your-serper-api-key
// No code changes needed — set the env var and web-search uses Serper automatically
const agent = await ReactiveAgents.create()
.withProvider("anthropic")
.withTools(["web-search"])
.build();
// The agent will now use Tavily → Brave → Serper → DuckDuckGo as its search chain
PresetProviderCost/1M InputContext WindowQuality
claude-haikuAnthropic$1.00200K0.60
claude-sonnetAnthropic$3.00200K0.85
claude-opusAnthropic$15.001M1.00
gpt-4o-miniOpenAI$0.15128K0.55
gpt-4oOpenAI$2.50128K0.80
gemini-2.0-flashGemini$0.101M0.75
gemini-2.5-flashGemini$0.151M0.80
gemini-2.5-proGemini$1.251M0.95

When tools are enabled, each provider translates tool definitions to its native format automatically:

  • Anthropictools parameter with Anthropic’s tool_use format; last tool marked with cache_control to cache the full schema block
  • OpenAItools array with function_calling; automatic prompt caching applies to tool schemas
  • GeminifunctionDeclarations in tools array; function calling supported natively
  • Ollama — OpenAI-compatible tools array via the Ollama SDK
  • LiteLLM — OpenAI-compatible tools array forwarded to proxy

Each provider implements caching differently. The framework handles cost discounting automatically when the provider reports cached token usage.

Anthropic uses manual cache hints via cache_control: { type: "ephemeral" } blocks. The framework automatically applies these to system prompts ≥ 1,024 tokens and to the full tool schema block on every request:

  • System prompt: Cached when >= ~4,096 chars — 90% discount on cache hits, 25% surcharge on writes
  • Tool schemas: Last tool in the array is marked, caching the full schema block

Cache TTL is 5 minutes. The framework handles this transparently — no configuration required.

Gemini 2.0 Flash and 2.5 models support automatic context caching — Google’s servers cache repeated prefixes server-side with no client code required. When a cache hit occurs, cachedContentTokenCount is returned in the usage metadata and the framework applies a 75% cost discount automatically.

There is no minimum token requirement for implicit caching — Google manages it transparently for eligible models.

// No special config needed — Gemini caches automatically
const agent = await ReactiveAgents.create()
.withProvider("gemini")
.withModel("gemini-2.5-flash")
.withTools()
.build();
// Repeated system prompts and tool schemas are cached by Gemini automatically

OpenAI applies automatic prompt caching server-side for inputs longer than 1,024 tokens. Cached tokens are returned as cached_tokens in the usage object and the framework applies a 50% cost discount automatically.

Provider adapters are lightweight hook objects the kernel calls at specific points to compensate for model-specific behavior differences — especially useful for local and mid-tier models that need more explicit guidance.

The framework ships three built-in adapters selected automatically by model tier:

TierAdapterBehavior
locallocalModelAdapterExplicit task framing, tool guidance, error recovery, quality check
midmidModelAdapterLighter continuation hint + synthesis prompt
large / frontierdefaultAdapterStructured decision framework only
HookWhen it firesWhat it does
systemPromptPatchOnce at system prompt build timeAppend multi-step completion instructions (local tier)
toolGuidanceOnce after the tool schema block in the system promptAppend inline required-tool reminder
taskFramingFirst iteration only (iteration 0)Wrap task message with explicit numbered steps
continuationHintEach iteration when required tools are still pendingInject guidance as user message after tool results
errorRecoveryWhen a tool call returns a failed resultAppend context-aware recovery hint to the observation
synthesisPromptResearch→produce transition (all search tools satisfied)Replace generic progress message with “write it now”
qualityCheckOnce before final answer (gated by qualityCheckDone flag)Self-eval prompt; fires only once to prevent loops

You can register a fully custom adapter:

import { selectAdapter } from "@reactive-agents/llm-provider";
// The built-in adapters are selected automatically by tier.
// Access them directly for inspection or extension:
import { localModelAdapter, midModelAdapter, defaultAdapter } from "@reactive-agents/llm-provider";

Embeddings are routed through the configured embedding provider regardless of which chat provider you use:

Terminal window
EMBEDDING_PROVIDER=openai
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
const vectors = await llm.embed(["text to embed", "another text"]);
// Returns: number[][] (one vector per input text)

Parse LLM responses into typed objects with automatic retry on parse failure:

import { Schema } from "effect";
const WeatherSchema = Schema.Struct({
city: Schema.String,
temperature: Schema.Number,
conditions: Schema.String,
});
const weather = await llm.completeStructured({
messages: [{ role: "user", content: "Weather in Tokyo" }],
outputSchema: WeatherSchema,
maxParseRetries: 2,
});

All providers include built-in retry logic with exponential backoff:

  • Rate limit (429) — Retried with backoff, tracked as LLMRateLimitError
  • Timeout — Configurable per-request, defaults to 30 seconds
  • Retries — Configurable, defaults to 3 attempts

Use withTestScenario() for deterministic, offline testing:

const agent = await ReactiveAgents.create()
.withTestScenario([
{ match: "capital of France", text: "Paris is the capital of France." },
])
.build();