What's New

A quick-scan guide to what has landed in each major release. Start here when returning after time away — each bullet links to the relevant documentation.

v0.11.x — Production tooling + full observability (May 2026)

The focus: developer tooling that makes agents production-observable and repeatable, plus the first create-reactive-agent scaffolder, cross-runtime support, and three new capabilities (code-action strategy, skill persistence, interactive playground).

New packages

@reactive-agents/observe — Zero-config OpenTelemetry tracing. Set OTEL_EXPORTER_OTLP_ENDPOINT and every run emits a workflow → LLM → tool span hierarchy, OpenInference-compliant, to any OTLP backend (Jaeger, Grafana Tempo, Langfuse, Arize Phoenix). See OpenTelemetry Tracing.
@reactive-agents/replay — Deterministic trace replay. Record any run to a snapshot file and re-run it with a different model or prompt without calling the LLM again. Enables regression testing and prompt A/B comparisons. See Snapshot & Replay.
@reactive-agents/runtime-shim — Cross-runtime support. The framework now runs on Node.js 22.5+ in addition to Bun. Provides unified Database, spawn, serve, glob, writeFile, readFile, and hash primitives that delegate to the available runtime. FTS5 is optional — falls back to LIKE-based search on Node’s built-in SQLite. Unblocks Stackblitz WebContainers (Node-only) and Vercel/Netlify deployments.

New tooling

create-reactive-agent CLI — bunx create-reactive-agent my-app scaffolds a runnable agent project in seconds. Supports --template minimal|standard|tool-use|multi-agent|gateway, --provider, --model, --pm bun|npm|yarn|pnpm. See create-reactive-agent.

Interactive Playground

Three live Stackblitz scenarios, zero install. Runs fully in-browser via WebContainers — no local runtime required. Default provider is Google Gemini (free tier).

Scenario	What it shows
Hello Agent	Simple Q&A — minimal builder, one-step response
Tool Integration	Built-in `code-execute` + `scratchpad` tools working together
Strategy Demo	`reactive` vs `plan-execute-reflect` side-by-side on the same task

See Playground.

`code-action` strategy (`@experimental`)

A 7th reasoning strategy in which the LLM generates a TypeScript IIFE that runs inside a Worker-thread sandbox. Tools are exposed as normal async functions and called via postMessage round-trips — no JSON schema juggling in the prompt. Best suited for multi-tool orchestration tasks where expressing control flow in code is cleaner than iterative tool calls.

Enable with defaultStrategy: "code-action". ToolService is optional; the strategy also handles pure computation tasks. See code-action.

Skill persistence

Learned SkillRecord objects now survive process restarts. The skill system uses a dual-store: the existing in-memory session store for fast within-run access, plus a new SQLite-backed SkillStore that persists across runs. On cold start, skills are resolved from the persistent store before any LLM call. skillFragmentToSkillRecord() is exported from reactive-agents for manual skill construction.

New runtime controls

RunHandle — runStream() now returns a RunHandle with four controls and a status property:
- .pause() — suspends the loop at the next safe checkpoint
- .resume() — resumes a paused run
- .stop() — graceful shutdown: finishes the current step, then runs output synthesis
- .terminate() — immediate abort, skips synthesis
- .status — "running" | "paused" | "stopped" | "terminated" | "completed"
- .result — Promise that resolves when the run reaches a terminal state
See Compose API.

Killswitches — Six factory functions from @reactive-agents/compose that wire stopping conditions into the agent loop. Pass them to .compose() or .withHarness():

import { maxIterations, budgetLimit, timeoutAfter, watchdog, requireApprovalFor } from "@reactive-agents/compose";

Factory	Stops when…
`maxIterations(n)`	Loop count reaches `n`
`budgetLimit({ maxTokens?, maxCostUSD? })`	Token or cost ceiling hit
`timeoutAfter(duration)`	Wall-clock duration exceeded
`watchdog({ timeout })`	No progress within `timeout`
`requireApprovalFor(toolName, approver)`	Named tool needs human approval

See Compose API.

Compose API (@stable) — .compose(fn) (alias: .withHarness(fn)) attaches a harness transform that intercepts tagged chokepoints (prompt.system, nudge.loop-detected, message.tool-result, etc.) via h.on(), h.tap(), h.before(), h.after(), and h.onError(). Existing builder methods .withSystemPrompt(), .withErrorHandler(), and .withHook() now desugar through the harness. See Compose API and Harness Tags.

Strategy switching on by default

enableStrategySwitching now defaults to true. The reactive intelligence dispatcher will switch strategies automatically when entropy signals a stuck loop — no explicit opt-in required.

Decision tracing

Agents can capture the model’s stated why for every tool call. Tool-call rationale on the reactive/adaptive paths is opt-in (audit feature, not performance — pure token/latency cost):

auditRationale opt-in — .withReasoning({ auditRationale: true }) (or env RA_RATIONALE_AUDIT=1). When on, the kernel coaxes one <rationale call="N">{"why":"…","confidence":0-1}</rationale> block per tool call. Off by default.
Native function-calling capture — parseRationaleBlocks() reads side-channel blocks from thought + thinking content and attaches each rationale to the matching ToolCallSpec by position. The parser tolerates fenced/prose-wrapped JSON, over-length why, and repeated call="N" attributes, so capture is reliable on small local models.
plan-execute-reflect enforcement (always on) — LLMPlanStepSchema carries a rationale: { why, confidence? } field, MANDATORY for every tool_call step (independent of auditRationale). Failures after retry emit a plan_rationale_missing metric — no synthetic fallback invented.
AgentDebrief.rationale[] — Unified milestone-decision log: tool selections, curator decisions, strategy switches, reactive interventions, and terminations. All render in debrief.markdown under ## Decision Rationale.

See Decision Tracing for the full pipeline and Debrief & Chat for the result shape.

Context-window override (`numCtx`)

Pin the exact context window the provider receives instead of relying on the model’s assumed maximum:

.withModel({ model, numCtx }) — numCtx maps to Ollama’s num_ctx; cloud providers without a context-window knob ignore it. Now a first-class AgentConfig field, so it round-trips through toConfig() / fromJSON() and the config-driven path. See Builder API and Local Models.
Cortex Studio — exposed as a Context length (numCtx) field in the Lab Builder’s Inference section, and used as the authoritative denominator for the context-usage gauge.

Cortex rich-trace debugger

The Cortex Run View’s Trace Panel adds a Timeline view: a fine-grained, filterable, chronological event stream (LLM exchanges with prompt-cache %, tool calls, strategy switches, verifier verdicts, guards) grouped by iteration, reusing the same TraceEvent model as rax diagnose. The classic per-iteration Frames view remains a click away. See Cortex.

v0.10.x — Local models match frontier (May 2026)

The biggest release since v0.9 — 0.10.0 through 0.10.6, shipped over four weeks. The headline: local Ollama models now hit 91–94% on the same task suite as paid frontier APIs, thanks to a closed-loop healing pipeline and adaptive tool-calling. Read the full v0.10.0 changelog for engineering detail.

What you gain

Local models that actually work

Healing Pipeline — 4-stage closed-loop recovery on every tool call (tool-name fuzzy match → parameter-name aliasing → path resolution → type coercion). 86.7% recovery rate, +80pp accuracy, 90% cheaper than LLM reprompt. Ships on by default — see LLM Providers and Resilience.
Adaptive tool calling — Each model gets fingerprinted on first run; native FC capable models route through the JSON path, weaker ones through a 3-tier text-parse cascade (XML → JSON → pseudo-code). The framework learns each model’s dialect after 5 runs and stops asking it to do things it can’t.
Calibration system — Per-model observations (parallel-call capability, classifier reliability, tool-call dialect) adapt empirically. Auto-enabled when .withReasoning() is on.
Frontier benchmark: 100% on ra-full verified across claude-sonnet-4-6, claude-haiku-4-5, gpt-4o-mini, gemini-2.5-pro. Bare LLM only reaches 85% on the same suite.
Local benchmark: 91–94% on ra-full for gemma4:e4b (4 GB) and cogito:14b (9 GB) — tied with gemini-2.5-flash and gpt-4o-mini on the same 35-task suite.

Long agent runs stay cheap

Three-stage context curation — Tool results get compressed and stashed → curator renders only what’s needed → optional reactive trim. 60.7% context reduction, 38.6% token savings, 0.16 ms overhead per step. See Intelligent Context Synthesis.
Reactive Intelligence dispatcher — 6 corrective interventions fire automatically when an agent shows entropy signs (early-stop, temperature adjust, strategy switch, context compress, tool inject, skill activate). Suppression gates prevent runaway dispatch. See Reactive Intelligence.

Production safety hardened

@reactive-agents/diagnose — Standalone npm package detects system-prompt, API-key, credential, and internal-instruction leaks in any output. 100% true positive, 0% false positive, 0.02 ms latency. 25 regex patterns + 4 FP filters.
Single-owner termination — All 12 phases route stop decisions through one arbitrator. CI lint guard prevents future bypass paths. Agents always finish cleanly, never get stuck.

Better runtime + tooling

@reactive-agents/cortex — Cortex Studio is now installable from npm: bunx @reactive-agents/cortex or rax cortex launches the live agent canvas, debrief UI, and visual builder. See Cortex.
Gateway chat mode — Per-sender SQLite session history, episodic context injection, daily compaction. Set channels.mode: 'chat' for conversational webhooks; keep 'task' for one-shot triggers. See Gateway and Messaging Channels.
Composable kernel architecture — Internal kernel/ reorganized by capability (act/ · attend/ · comprehend/ · decide/ · reason/ · reflect/ · sense/ · verify/ + loop/ + state/). Doesn’t change the public API; makes contributing to the framework easier. See Composable Kernel.
5,294 tests across 741 files — verified by bun test on every PR.

Patch releases

Version	Highlights
`0.10.0`	Phase 1 release — healing pipeline, calibration, diagnose, cortex npm
`0.10.1–0.10.2`	Documentation polish, version drift fixes across 28 packages
`0.10.3`	Coordinated package alignment, npm publish drift CI guard
`0.10.4`	Coordinated changeset release (single source of truth)
`0.10.5–0.10.6`	Static-asset serving in Cortex server, README + cookbook freshness

Breaking changes

None. All existing ReactiveAgents.create().with*() builder chains keep working unchanged. New calibration fields are forward-compatible — existing ~/.reactive-agents/observations/ files decode cleanly.

v0.9.x — MCP Production Hardening + Pre-v0.10 Polish

MCP client rewritten on @modelcontextprotocol/sdk — smart auto-detection between stdio and HTTP-only containers, two-phase docker lifecycle — see Orchestration
Composable kernel architecture (initial) — react-kernel.ts reduced from ~1,700 to ~197 lines via makeKernel({ phases }) factory — see Composable Kernel
Permanently-failed required tools fix — tools that always error no longer cause loop-until-maxIterations — see Harness Control Flow
Cortex MCP CRUD + JSON import — import Cursor/Claude-style MCP configs directly into Cortex — see Cortex
StatusRenderer TUI — live terminal display with collapsible think panel (t key toggles), mode: 'stream' | 'status'
3 new terminal tools — git-cli, gh-cli, and gws-cli are now built-in
Web-search provider Serper.dev — third web-search backend alongside Tavily
crypto-price built-in tool — CoinGecko price lookup, no API key required
Observability on by default — minimal verbosity is now enabled out of the box
Sub-agent maxIterations fully honored — the silent cap of 3 has been removed

v0.9.0 — MCP Production Hardening

MCP client rewritten on @modelcontextprotocol/sdk — smart auto-detection between stdio and HTTP-only containers, two-phase docker lifecycle — see Orchestration
Composable kernel architecture — react-kernel.ts reduced from ~1,700 to ~197 lines via makeKernel({ phases }) factory; phases are now individually swappable — see Composable Kernel
Permanently-failed required tools fix — tools that always error no longer cause loop-until-maxIterations; framework detects and stops early — see Harness Control Flow
Cortex MCP CRUD + JSON import — import Cursor/Claude-style MCP configs directly into Cortex — see Cortex
effect moved to peerDependencies — add effect explicitly if you import from it directly — see Installation

v0.8.5 — Native FC Hardening + Web Framework Adapters

React, Vue, and Svelte adapters — useAgentStream() and useAgent() hooks/composables/stores for all three frameworks, consuming SSE endpoints — see Web Integration and Streaming
7-hook provider adapter system — taskFraming, toolGuidance, errorRecovery, synthesisPrompt, qualityCheck, continuationHint, systemPromptPatch fully wired — see Reactive Intelligence
Dynamic stopping (3-layer) — novelty signal (Jaccard overlap), budget exhaustion phase transition, and per-tool call cap (maxCallsPerTool) — see Harness Control Flow
Full prompt observability — logModelIO: true logs the complete FC conversation thread with no truncation — see Observability
Actionable failure messages — loop detection, required-tools, and stall detection all emit Fix: suggestions with specific builder options — see Troubleshooting

v0.8.0 — Reactive Intelligence Layer

Entropy-aware intelligence pipeline — 5-source composite entropy sensor, trajectory classifier, and reactive controller that takes corrective action automatically — see Reactive Intelligence
Thompson Sampling strategy learner — SQLite-backed bandit learns which reasoning strategy wins per task category across runs — see Reactive Intelligence
Builder hardening — withStrictValidation(), withTimeout(), withRetryPolicy(), withFallbacks(), withHealthCheck(), and withErrorHandler() — see Builder API
Automatic strategy switching — when entropy analysis detects a stuck loop, the agent switches reasoning strategy without user intervention — see Choosing Strategies
Observability dashboard upgrade — chalk/boxen terminal UI with entropy grade (A–F), sparklines, and entropy-informed alerts — see Observability

v0.5.0 — A2A Protocol + Observability Foundation

Full A2A (Agent-to-Agent) protocol — JSON-RPC 2.0 server, streaming SSE, client, discovery, and capability matching based on Google’s A2A spec — see A2A Protocol
Agent-as-tool pattern — wrap any local or remote A2A agent as a callable tool with createAgentTool() / createRemoteAgentTool() — see Sub-agents
Live observability streaming — withObservability({ live: true, verbosity }) writes structured phase logs to stdout as each step fires — see Observability
rax serve — expose any agent as an A2A-compliant HTTP server with a single CLI command — see CLI
EventBus reasoning events — all 5 strategies publish ReasoningStepCompleted; subscribe with agent.on() for custom monitoring — see Observability