Snapshot & Replay

Last updated 2 days ago · 2f8f0ef

Every Reactive Agents run produces a JSONL trace at ~/.reactive-agents/traces/<runId>.jsonl when tracing is enabled. The @reactive-agents/replay package lets you re-execute a recorded run against modified prompts, models, or temperatures while holding tool results constant — so you can audit decisions, test prompt changes without paying for tool calls, and A/B model swaps on real production traces.

Why this matters

No other agent framework lets you replay a recorded decision. The traceable-by-demo guarantee is one of the load-bearing claims in the Vision Pillar of Observability — “every decision an agent makes should be controllable, observable, and auditable.”

Three primary use cases:

Audit a production failure — replay the exact recording with no overrides; confirm the agent’s decision path is reproducible.
Test a prompt change — replay with systemPrompt: "<new>"; the tool sequence may diverge, but tool results are frozen so you only pay for LLM tokens.
A/B a model swap — replay with model: "gpt-4o-mini"; the diff reports token, cost, and output deltas.

API

import {
  loadRecordedRun,
  replay,
  makeReplayController,
  makeReplayToolLayer,
} from "@reactive-agents/replay"
import { ReactiveAgentBuilder } from "@reactive-agents/runtime"

const run = await loadRecordedRun("r-abc123")
//                                   ^^^^^^^^^^
//   resolves to ~/.reactive-agents/traces/r-abc123.jsonl
//   (also accepts an absolute path or a relative .jsonl)

const result = await replay(run, async (ctx) => {
  const ctrl = makeReplayController(ctx.recordedRun.toolTable)
  const layer = makeReplayToolLayer(ctrl, ctx.overrides.onMissingToolResult ?? "strict")

  return new ReactiveAgentBuilder()
    .withProvider("anthropic")
    .withModel(ctx.overrides.model ?? ctx.recordedRun.model)
    .withLayers(layer)        // ← replay layer wins ToolService.execute
    .build()
}, {
  systemPrompt: "You are extra concise.",
})

console.log(result.diff)
// {
//   identical: false,
//   iterationsDelta: -1,
//   toolSequenceDiff: [...],
//   outputDiff: { equal: false, original: "...", replay: "..." },
//   tokensDelta: -120,
//   costDelta: -0.0012,
//   durationDeltaMs: -340,
// }

Strict vs lenient mode

strict (default) — unrecorded tool calls during replay are a fatal error. Use for audits where any prompt change that alters tool sequence should fail loudly.
lenient — unrecorded calls return { success: false, error: "no recording" } so the agent can continue exploring. Use for prompt-iteration loops.

Truncated recordings (results larger than 8KB are clipped) are also strict-mode failures: replay can’t guarantee determinism when a tool result was lossy.

Diff shape

interface ReplayDiff {
  identical: boolean                       // all signals match
  iterationsDelta: number                  // replay − original
  toolSequenceDiff: ToolSeqEdit[]          // added / removed / reordered
  outputDiff: { original?: string; replay?: string; equal: boolean }
  tokensDelta: number
  costDelta: number
  durationDeltaMs: number
}

toolSequenceDiff is an edit script positional in iteration order. Each edit is one of:

{ kind: "added", toolName, argsHash, atIndex }
{ kind: "removed", toolName, argsHash, atIndex }
{ kind: "reordered", toolName, argsHash, from, to }

argsHash is a 16-char SHA-256 prefix over a stable JSON serialization of the arguments — the same key the replay controller uses to match calls.

CLI summary

rax diagnose replay-run r-abc123
# runId    r-abc123
# task     fetch HN top 10 then summarize
# model    qwen3:14b
# provider ollama
# events   84
# tools    7 calls across 3 unique tool(s): fetch, scrape, summarize

Full re-execution from the CLI requires a builder factory and is API-only in v0.11. Use rax diagnose replay-run --json to pipe metadata into a script.

The legacy standalone bin rax-diagnose replay-run <runId> continues to work for backwards compatibility.

Determinism guarantee (in progress)

The intent: with no overrides AND temperature: 0 AND a deterministic provider (e.g. the test provider with a scripted scenario), a replay produces an identical output to the recorded run.

What’s verified today:

Override mechanism — tests/layer-override.test.ts pins Layer.merge(live, extraLayers) giving the replay layer priority for ToolService.execute. If Effect’s merge semantics ever stopped honoring this order, the test fails and the override would silently call the live tool.
Tool-result freezing — tests/replay-tool-layer.test.ts proves the replay layer dispenses recorded results without touching the live tool.

What’s not yet verified (v0.11.1 follow-up):

End-to-end determinism — full builder integration test asserting result.diff.outputDiff.equal === true after a no-override replay through TestLLMServiceLayer. Manual verification works today; an automated gate is on the v0.11.1 list.

When replay isn’t enough

The recorded trace lacks tool result payloads. Older traces (pre-v0.11) only recorded success: boolean and durationMs. Re-record under v0.11+ to capture full payloads.
The tool result was truncated (>8KB) — strict mode rejects; switch to lenient and accept divergence.
The tool is genuinely stateful (DB writes, queue ingestion). Replay holds the recorded response constant but the world has moved on; treat results as historical, not live.
You want a different decision path — strict mode is the wrong tool. Use lenient or build a new run.

Stability

The replay API is @stable as of v0.11. See API Stability.