Skip to content

Decision Tracing

Reactive Agents records not just what the agent did but why. Every tool selection, model-stated assumption, curator action, and termination can carry a structured Rationale alongside the existing event stream. The rax diagnose debrief command renders that rationale as a decision-centric timeline that post-hoc reviewers can audit without re-running.

import type { Rationale } from "@reactive-agents/core";
type Rationale = {
why: string; // ≤280 chars
refs?: readonly string[]; // observation/scratchpad keys, e.g. "obs:1", "scratch:goal"
alternatives?: readonly { option: string; rejectedBecause: string }[];
confidence?: number; // [0,1]
};

The type lives in @reactive-agents/core so the trace, tools, reasoning, and runtime packages can share it without cross-package coupling. Validators (validateRationale, isRationale) ship from @reactive-agents/trace.

SourceTraceEvent kindRationale field
Tool call (native FC)ToolCallStartedrationale (required)
Tool call (text-parse)ToolCallStartedrationale (required)
Tool call (plan-execute step)ToolCallStartedrationale (required)
Model-stated assumptionassumption-recordedrationale (required)
Curator decisioncurator-decisionrationale (required)
Alternatives weighedalternatives-considered— (uses inline shape)
Terminationkernel-state-snapshotterminationRationale
Strategy switchstrategy-switchedrationale
Reactive decisiondecision-evaluatedrationale

Tool-call rationale is coaxed from the model by a kernel-injected system prompt and (for plan-execute) a schema-enforced planner field. When the model complies, rationale is captured; when it doesn’t, the field is absent and a metric fires — never synthesized.

Rationale capture is mandatory and coaxed from the model on three paths:

1. Native function-calling (Ollama, Anthropic, OpenAI, Gemini)

Section titled “1. Native function-calling (Ollama, Anthropic, OpenAI, Gemini)”

The kernel injects a hard requirement into the system prompt — independent of toolSchemaDetail — instructing the model to emit one <rationale> block per tool call, in order:

## Decision Rationale (MANDATORY — every tool call)
Every tool call you issue MUST be preceded by a rationale block in your text content...
<rationale call="1">{"why":"one sentence, ≤280 chars","confidence":0.0-1.0}</rationale>

parseRationaleBlocks() reads them from the assistant’s text + thinking content and attaches each one to the matching ToolCallSpec by 1-indexed position. Provider FC events have no sibling rationale field, so this side-channel is what carries the model’s stated “why” into the trace.

2. Text-parse drivers (small local models)

Section titled “2. Text-parse drivers (small local models)”

When the driver falls back to text-parse mode, the tier-2/3 parsers accept rationale as a sibling JSON field on the tool-call object:

[
{
"name": "web_search",
"arguments": { "query": "AAPL stock" },
"rationale": { "why": "needs fresh price data", "refs": ["scratch:goal"] }
}
]

The tier-1 XML format reads external <rationale> blocks identically to native-FC.

The planner’s structured-output schema requires rationale: { why, confidence? } on every tool_call step:

{
"title": "Fetch recent commits",
"type": "tool_call",
"toolName": "github/list_commits",
"toolArgs": { "owner": "acme", "repo": "app", "perPage": 10 },
"rationale": {
"why": "Need the raw commit list before any summarization can begin",
"confidence": 0.95
}
}

plan-execute.ts publishes ToolCallStarted with the step’s rationale before dispatching the tool. If the model omits rationale on any tool_call step, the strategy issues a [STRICT RETRY] plan regeneration with a stronger reminder. Non-compliance after retry emits a plan_rationale_missing metric — no synthetic fallback is invented, the field stays empty so observability surfaces the gap.

The think phase scans thought text for I assume X (because Y). patterns and emits an assumption-recorded event per detected assumption (capped at 3 per iteration). No model prompting required — the pattern is conventional enough that frontier and local models hit it naturally.

think.ts output: "I assume the user wants USD because no currency given. ..."
AssumptionRecordedEvent {
assumption: "the user wants USD",
rationale: { why: "no currency given" }
}

The terminate() helper accepts an optional rationale that surfaces on KernelStateSnapshotEvent.terminationRationale:

terminate(state, {
reason: "quality_threshold",
output: synthesized,
rationale: { why: "quality 0.92 ≥ threshold 0.90" },
});

Use this when reason is opaque (e.g. "quality_threshold") and the threshold/score context makes the choice auditable.

The debrief command folds every rationale-bearing event into a single timeline:

Terminal window
rax diagnose debrief <runId>
rax diagnose debrief latest
rax diagnose debrief <runId> --json

The legacy standalone bin rax-diagnose debrief … continues to work as well.

Example output:

Debrief: run abc-123
├─ Goal: find current price of AAPL stock
├─ Path: web_search → calculator
├─ Why this path
│ • iter 1 chose tool:web_search: "needs fresh price data" (refs: scratch:goal)
│ • iter 2 chose tool:calculator: "verify cited number"
├─ Assumptions
│ • "user means USD" (conf: 0.60) — no currency specified
├─ Curator
│ • iter 2 marked-untrusted obs:scrape-1 — "no audit trail"
├─ Termination: quality_threshold — "quality 0.92 ≥ threshold 0.90"
└─ Verdict: success | 1500 tok | 2500ms

Unlike rax-diagnose replay, which is event-centric and shows every event in the trace, debrief is decision-centric: it drops events that carry no rationale signal so reviewers see the audit trail, not the raw firehose.

For custom dashboards or LLM-as-judge debriefing, build the structured shape directly:

import { buildDebrief } from "@reactive-agents/diagnose";
const debrief = await buildDebrief("/path/to/trace.jsonl");
console.log(debrief.path); // [{ iter, action, rationale? }, ...]
console.log(debrief.termination); // { by, rationale? }
console.log(debrief.assumptions); // [{ iter, assumption, rationale }, ...]

Reading rationale from AgentResult.debrief

Section titled “Reading rationale from AgentResult.debrief”

result.debrief.rationale[] is a unified log of every task-advancing decision the agent made. Each entry carries an iteration, a decision tag, an optional toolName, and the structured rationale. The decision tag identifies the source:

decision valueSource
tool-selectionModel emitted <rationale> block for a tool call
curator-{kept|dropped|compressed|marked-untrusted}CuratorDecisionEmitted event from context curator
strategy-switch:{from}→{to}StrategySwitched event from the strategy evaluator
reactive-{early-stop|branch|compress|switch-strategy|attribute}ReactiveDecision event from RI dispatcher
termination:{reason}KernelStateSnapshotEmitted event with terminationRationale

Example:

const result = await agent.run("Fetch and summarize the last 10 commits, then write to file");
console.log(result.debrief?.rationale);
// [
// { iteration: 1, decision: "tool-selection", toolName: "github/list_commits",
// rationale: { why: "Need the raw commit list before any summarization can begin", confidence: 0.95 } },
// { iteration: 2, decision: "curator-dropped",
// rationale: { why: "Observation contained no audit trail", refs: ["obs:scrape-1"] } },
// { iteration: 3, decision: "tool-selection", toolName: "file-write",
// rationale: { why: "Save the final summary to a local file for future reference", confidence: 0.9 } },
// { iteration: 4, decision: "termination:quality_threshold",
// rationale: { why: "quality 0.92 ≥ threshold 0.90" } }
// ]

The rendered debrief.markdown includes a ## Decision Rationale section automatically — strategy switches, reactive interventions, curator decisions, and terminations all surface alongside tool selections.

Tool authors don’t need to do anything: the rationale lives on the model side and is coaxed by the kernel-injected system prompt. There is no opt-in flag — every tool call is expected to carry rationale, and the plan-execute strategy retries plan generation if the model forgets.

  • Not LLM-as-judge. Rationale is the model’s own stated reasoning. A separate judge layer (post-run) can score whether the rationale matches actual behavior; the trace captures the claim, not the verdict.
  • Not a confabulation guard. If a model emits a refs: ["obs:99"] that doesn’t exist, the trace records it as-is. A planned anti-confabulation guard will reject calls citing unknown refs.
  • Not synthesized. If a small model fails to comply after the strict retry, the field stays empty and a plan_rationale_missing metric fires. Rationale is intentional model output or nothing — never a generated stand-in derived from the instruction text.