Skip to content

What's New

A quick-scan guide to what has landed in each major release. Start here when returning after time away — each bullet links to the relevant documentation.


v0.11.x — Production tooling + full observability (May 2026)

Section titled “v0.11.x — Production tooling + full observability (May 2026)”

The focus: developer tooling that makes agents production-observable and repeatable, plus the first create-reactive-agent scaffolder, cross-runtime support, and three new capabilities (code-action strategy, skill persistence, interactive playground).

  • @reactive-agents/observe — Zero-config OpenTelemetry tracing. Set OTEL_EXPORTER_OTLP_ENDPOINT and every run emits a workflow → LLM → tool span hierarchy, OpenInference-compliant, to any OTLP backend (Jaeger, Grafana Tempo, Langfuse, Arize Phoenix). See OpenTelemetry Tracing.
  • @reactive-agents/replay — Deterministic trace replay. Record any run to a snapshot file and re-run it with a different model or prompt without calling the LLM again. Enables regression testing and prompt A/B comparisons. See Snapshot & Replay.
  • @reactive-agents/runtime-shim — Cross-runtime support. The framework now runs on Node.js 22.5+ in addition to Bun. Provides unified Database, spawn, serve, glob, writeFile, readFile, and hash primitives that delegate to the available runtime. FTS5 is optional — falls back to LIKE-based search on Node’s built-in SQLite. Unblocks Stackblitz WebContainers (Node-only) and Vercel/Netlify deployments.
  • create-reactive-agent CLIbunx create-reactive-agent my-app scaffolds a runnable agent project in seconds. Supports --template minimal|standard|tool-use|multi-agent|gateway, --provider, --model, --pm bun|npm|yarn|pnpm. See create-reactive-agent.

Three live Stackblitz scenarios, zero install. Runs fully in-browser via WebContainers — no local runtime required. Default provider is Google Gemini (free tier).

ScenarioWhat it shows
Hello AgentSimple Q&A — minimal builder, one-step response
Tool IntegrationBuilt-in code-execute + scratchpad tools working together
Strategy Demoreactive vs plan-execute-reflect side-by-side on the same task

See Playground.

A 7th reasoning strategy in which the LLM generates a TypeScript IIFE that runs inside a Worker-thread sandbox. Tools are exposed as normal async functions and called via postMessage round-trips — no JSON schema juggling in the prompt. Best suited for multi-tool orchestration tasks where expressing control flow in code is cleaner than iterative tool calls.

Enable with defaultStrategy: "code-action". ToolService is optional; the strategy also handles pure computation tasks. See code-action.

Learned SkillRecord objects now survive process restarts. The skill system uses a dual-store: the existing in-memory session store for fast within-run access, plus a new SQLite-backed SkillStore that persists across runs. On cold start, skills are resolved from the persistent store before any LLM call. skillFragmentToSkillRecord() is exported from reactive-agents for manual skill construction.

  • RunHandlerunStream() now returns a RunHandle with four controls and a status property:

    • .pause() — suspends the loop at the next safe checkpoint
    • .resume() — resumes a paused run
    • .stop() — graceful shutdown: finishes the current step, then runs output synthesis
    • .terminate() — immediate abort, skips synthesis
    • .status"running" | "paused" | "stopped" | "terminated" | "completed"
    • .resultPromise that resolves when the run reaches a terminal state

    See Compose API.

  • Killswitches — Six factory functions from @reactive-agents/compose that wire stopping conditions into the agent loop. Pass them to .compose() or .withHarness():

    import { maxIterations, budgetLimit, timeoutAfter, watchdog, requireApprovalFor } from "@reactive-agents/compose";
    FactoryStops when…
    maxIterations(n)Loop count reaches n
    budgetLimit({ maxTokens?, maxCostUSD? })Token or cost ceiling hit
    timeoutAfter(duration)Wall-clock duration exceeded
    watchdog({ timeout })No progress within timeout
    requireApprovalFor(toolName, approver)Named tool needs human approval

    See Compose API.

  • Compose API (@stable) — .compose(fn) (alias: .withHarness(fn)) attaches a harness transform that intercepts tagged chokepoints (prompt.system, nudge.loop-detected, message.tool-result, etc.) via h.on(), h.tap(), h.before(), h.after(), and h.onError(). Existing builder methods .withSystemPrompt(), .withErrorHandler(), and .withHook() now desugar through the harness. See Compose API and Harness Tags.

enableStrategySwitching now defaults to true. The reactive intelligence dispatcher will switch strategies automatically when entropy signals a stuck loop — no explicit opt-in required.

Agents can capture the model’s stated why for every tool call. Tool-call rationale on the reactive/adaptive paths is opt-in (audit feature, not performance — pure token/latency cost):

  • auditRationale opt-in.withReasoning({ auditRationale: true }) (or env RA_RATIONALE_AUDIT=1). When on, the kernel coaxes one <rationale call="N">{"why":"…","confidence":0-1}</rationale> block per tool call. Off by default.
  • Native function-calling captureparseRationaleBlocks() reads side-channel blocks from thought + thinking content and attaches each rationale to the matching ToolCallSpec by position. The parser tolerates fenced/prose-wrapped JSON, over-length why, and repeated call="N" attributes, so capture is reliable on small local models.
  • plan-execute-reflect enforcement (always on)LLMPlanStepSchema carries a rationale: { why, confidence? } field, MANDATORY for every tool_call step (independent of auditRationale). Failures after retry emit a plan_rationale_missing metric — no synthetic fallback invented.
  • AgentDebrief.rationale[] — Unified milestone-decision log: tool selections, curator decisions, strategy switches, reactive interventions, and terminations. All render in debrief.markdown under ## Decision Rationale.

See Decision Tracing for the full pipeline and Debrief & Chat for the result shape.

Pin the exact context window the provider receives instead of relying on the model’s assumed maximum:

  • .withModel({ model, numCtx })numCtx maps to Ollama’s num_ctx; cloud providers without a context-window knob ignore it. Now a first-class AgentConfig field, so it round-trips through toConfig() / fromJSON() and the config-driven path. See Builder API and Local Models.
  • Cortex Studio — exposed as a Context length (numCtx) field in the Lab Builder’s Inference section, and used as the authoritative denominator for the context-usage gauge.

The Cortex Run View’s Trace Panel adds a Timeline view: a fine-grained, filterable, chronological event stream (LLM exchanges with prompt-cache %, tool calls, strategy switches, verifier verdicts, guards) grouped by iteration, reusing the same TraceEvent model as rax diagnose. The classic per-iteration Frames view remains a click away. See Cortex.


v0.10.x — Local models match frontier (May 2026)

Section titled “v0.10.x — Local models match frontier (May 2026)”

The biggest release since v0.9 — 0.10.0 through 0.10.6, shipped over four weeks. The headline: local Ollama models now hit 91–94% on the same task suite as paid frontier APIs, thanks to a closed-loop healing pipeline and adaptive tool-calling. Read the full v0.10.0 changelog for engineering detail.

  • Healing Pipeline — 4-stage closed-loop recovery on every tool call (tool-name fuzzy match → parameter-name aliasing → path resolution → type coercion). 86.7% recovery rate, +80pp accuracy, 90% cheaper than LLM reprompt. Ships on by default — see LLM Providers and Resilience.
  • Adaptive tool calling — Each model gets fingerprinted on first run; native FC capable models route through the JSON path, weaker ones through a 3-tier text-parse cascade (XML → JSON → pseudo-code). The framework learns each model’s dialect after 5 runs and stops asking it to do things it can’t.
  • Calibration system — Per-model observations (parallel-call capability, classifier reliability, tool-call dialect) adapt empirically. Auto-enabled when .withReasoning() is on.
  • Frontier benchmark: 100% on ra-full verified across claude-sonnet-4-6, claude-haiku-4-5, gpt-4o-mini, gemini-2.5-pro. Bare LLM only reaches 85% on the same suite.
  • Local benchmark: 91–94% on ra-full for gemma4:e4b (4 GB) and cogito:14b (9 GB) — tied with gemini-2.5-flash and gpt-4o-mini on the same 35-task suite.
  • Three-stage context curation — Tool results get compressed and stashed → curator renders only what’s needed → optional reactive trim. 60.7% context reduction, 38.6% token savings, 0.16 ms overhead per step. See Intelligent Context Synthesis.
  • Reactive Intelligence dispatcher — 6 corrective interventions fire automatically when an agent shows entropy signs (early-stop, temperature adjust, strategy switch, context compress, tool inject, skill activate). Suppression gates prevent runaway dispatch. See Reactive Intelligence.
  • @reactive-agents/diagnose — Standalone npm package detects system-prompt, API-key, credential, and internal-instruction leaks in any output. 100% true positive, 0% false positive, 0.02 ms latency. 25 regex patterns + 4 FP filters.
  • Single-owner termination — All 12 phases route stop decisions through one arbitrator. CI lint guard prevents future bypass paths. Agents always finish cleanly, never get stuck.
  • @reactive-agents/cortex — Cortex Studio is now installable from npm: bunx @reactive-agents/cortex or rax cortex launches the live agent canvas, debrief UI, and visual builder. See Cortex.
  • Gateway chat mode — Per-sender SQLite session history, episodic context injection, daily compaction. Set channels.mode: 'chat' for conversational webhooks; keep 'task' for one-shot triggers. See Gateway and Messaging Channels.
  • Composable kernel architecture — Internal kernel/ reorganized by capability (act/ · attend/ · comprehend/ · decide/ · reason/ · reflect/ · sense/ · verify/ + loop/ + state/). Doesn’t change the public API; makes contributing to the framework easier. See Composable Kernel.
  • 5,294 tests across 741 files — verified by bun test on every PR.
VersionHighlights
0.10.0Phase 1 release — healing pipeline, calibration, diagnose, cortex npm
0.10.1–0.10.2Documentation polish, version drift fixes across 28 packages
0.10.3Coordinated package alignment, npm publish drift CI guard
0.10.4Coordinated changeset release (single source of truth)
0.10.5–0.10.6Static-asset serving in Cortex server, README + cookbook freshness

None. All existing ReactiveAgents.create().with*() builder chains keep working unchanged. New calibration fields are forward-compatible — existing ~/.reactive-agents/observations/ files decode cleanly.


v0.9.x — MCP Production Hardening + Pre-v0.10 Polish

Section titled “v0.9.x — MCP Production Hardening + Pre-v0.10 Polish”
  • MCP client rewritten on @modelcontextprotocol/sdk — smart auto-detection between stdio and HTTP-only containers, two-phase docker lifecycle — see Orchestration
  • Composable kernel architecture (initial)react-kernel.ts reduced from ~1,700 to ~197 lines via makeKernel({ phases }) factory — see Composable Kernel
  • Permanently-failed required tools fix — tools that always error no longer cause loop-until-maxIterations — see Harness Control Flow
  • Cortex MCP CRUD + JSON import — import Cursor/Claude-style MCP configs directly into Cortex — see Cortex
  • StatusRenderer TUI — live terminal display with collapsible think panel (t key toggles), mode: 'stream' | 'status'
  • 3 new terminal toolsgit-cli, gh-cli, and gws-cli are now built-in
  • Web-search provider Serper.dev — third web-search backend alongside Tavily
  • crypto-price built-in tool — CoinGecko price lookup, no API key required
  • Observability on by default — minimal verbosity is now enabled out of the box
  • Sub-agent maxIterations fully honored — the silent cap of 3 has been removed

  • MCP client rewritten on @modelcontextprotocol/sdk — smart auto-detection between stdio and HTTP-only containers, two-phase docker lifecycle — see Orchestration
  • Composable kernel architecturereact-kernel.ts reduced from ~1,700 to ~197 lines via makeKernel({ phases }) factory; phases are now individually swappable — see Composable Kernel
  • Permanently-failed required tools fix — tools that always error no longer cause loop-until-maxIterations; framework detects and stops early — see Harness Control Flow
  • Cortex MCP CRUD + JSON import — import Cursor/Claude-style MCP configs directly into Cortex — see Cortex
  • effect moved to peerDependencies — add effect explicitly if you import from it directly — see Installation

v0.8.5 — Native FC Hardening + Web Framework Adapters

Section titled “v0.8.5 — Native FC Hardening + Web Framework Adapters”
  • React, Vue, and Svelte adaptersuseAgentStream() and useAgent() hooks/composables/stores for all three frameworks, consuming SSE endpoints — see Web Integration and Streaming
  • 7-hook provider adapter systemtaskFraming, toolGuidance, errorRecovery, synthesisPrompt, qualityCheck, continuationHint, systemPromptPatch fully wired — see Reactive Intelligence
  • Dynamic stopping (3-layer) — novelty signal (Jaccard overlap), budget exhaustion phase transition, and per-tool call cap (maxCallsPerTool) — see Harness Control Flow
  • Full prompt observabilitylogModelIO: true logs the complete FC conversation thread with no truncation — see Observability
  • Actionable failure messages — loop detection, required-tools, and stall detection all emit Fix: suggestions with specific builder options — see Troubleshooting

  • Entropy-aware intelligence pipeline — 5-source composite entropy sensor, trajectory classifier, and reactive controller that takes corrective action automatically — see Reactive Intelligence
  • Thompson Sampling strategy learner — SQLite-backed bandit learns which reasoning strategy wins per task category across runs — see Reactive Intelligence
  • Builder hardeningwithStrictValidation(), withTimeout(), withRetryPolicy(), withFallbacks(), withHealthCheck(), and withErrorHandler() — see Builder API
  • Automatic strategy switching — when entropy analysis detects a stuck loop, the agent switches reasoning strategy without user intervention — see Choosing Strategies
  • Observability dashboard upgrade — chalk/boxen terminal UI with entropy grade (A–F), sparklines, and entropy-informed alerts — see Observability

v0.5.0 — A2A Protocol + Observability Foundation

Section titled “v0.5.0 — A2A Protocol + Observability Foundation”
  • Full A2A (Agent-to-Agent) protocol — JSON-RPC 2.0 server, streaming SSE, client, discovery, and capability matching based on Google’s A2A spec — see A2A Protocol
  • Agent-as-tool pattern — wrap any local or remote A2A agent as a callable tool with createAgentTool() / createRemoteAgentTool() — see Sub-agents
  • Live observability streamingwithObservability({ live: true, verbosity }) writes structured phase logs to stdout as each step fires — see Observability
  • rax serve — expose any agent as an A2A-compliant HTTP server with a single CLI command — see CLI
  • EventBus reasoning events — all 5 strategies publish ReasoningStepCompleted; subscribe with agent.on() for custom monitoring — see Observability