Build a Local AI Agent with Ollama in TypeScript
This is a complete, runnable guide to building a local AI agent in TypeScript with Ollama. You will install Ollama, pull a small open model, wire it into Reactive Agents, give it tools, tune it for small-model reliability, and — the payoff — swap to a frontier API by changing a single line. The same agent code runs on a 4B model on your laptop and on Claude or GPT in production.
Why run an AI agent locally?
Section titled “Why run an AI agent locally?”Running a local LLM agent has three concrete advantages over calling a hosted API:
- Privacy — prompts, tool results, and documents never leave your machine. Nothing is logged by a third party.
- Cost — local inference is free. You pay for electricity, not per-token API billing. An agent that loops through many reasoning steps costs $0 to run locally.
- No API key, no rate limits — pull a model and go. No account, no quota, no network dependency.
The historical downside was quality: small open models were unreliable at tool calling, the core skill an agent needs. Reactive Agents closes most of that gap. A Healing Pipeline normalizes malformed tool calls from small models, and model-adaptive context profiles trim prompts and compact history so a 4B model isn’t drowned in tokens. The result is local-to-frontier parity: write the agent once, run it anywhere.
Prerequisites
Section titled “Prerequisites”- Ollama installed and running. On macOS/Linux:
Terminal window curl -fsSL https://ollama.com/install.sh | sh - A pulled model. Start with a small, fast one:
For tool-heavy work,
Terminal window ollama pull qwen3:4bqwen3:14bis the most reliable local model at its size (see Local Models for the full comparison). - Bun ≥ 1.0 (or Node ≥ 20). This guide uses Bun.
Confirm Ollama is serving on its default port (http://localhost:11434):
ollama list # should show qwen3:4bStep 1 — Install Reactive Agents
Section titled “Step 1 — Install Reactive Agents”mkdir local-agent && cd local-agentbun init -ybun add reactive-agentseffect ships as a dependency and installs automatically — you only import it directly if you write custom tools or hooks.
Step 2 — A minimal local agent
Section titled “Step 2 — A minimal local agent”Create src/agent.ts. This is the smallest agent that builds a local AI agent in TypeScript with Ollama — no API key required:
import { ReactiveAgents } from "reactive-agents";
const agent = await ReactiveAgents.create() .withProvider("ollama") .withModel("qwen3:4b") .build();
const result = await agent.run("Explain what an AI agent is in two sentences.");console.log(result.output);Run it:
bun run src/agent.tsThe agent talks to your local Ollama server — the prompt never leaves the machine. result.output holds the model’s answer; result.metadata carries { duration, cost, tokensUsed, stepsCount }, and cost is 0 because there’s no API meter.
Step 3 — Give the agent tools and reasoning
Section titled “Step 3 — Give the agent tools and reasoning”A model that only chats isn’t an agent. Add .withReasoning() to enable the Think → Act → Observe loop and .withTools() to register the built-in toolset (file read/write, HTTP, code execution, crypto prices, git, and more):
import { ReactiveAgents } from "reactive-agents";
const agent = await ReactiveAgents.create() .withProvider("ollama") .withModel("qwen3:4b") .withReasoning() // Think → Act → Observe loop .withTools({ allowedTools: ["http-get", "file-write"] }) // scope to a small tool set .build();
const result = await agent.run( "Fetch https://api.github.com/repos/oven-sh/bun and write the star count to stars.txt",);console.log(result.output);Tools are passed to the model through Ollama’s native function-calling API. When the model decides to act, the framework validates the arguments against the tool schema, runs the tool in a sandbox, and feeds the real result back into the loop.
Step 4 — Tune for small models
Section titled “Step 4 — Tune for small models”Small models need leaner prompts and a sized context window. Two methods do the heavy lifting.
Context profile — .withContextProfile({ tier: "local" }) switches on lean prompts, aggressive history compaction, and 800-character tool-result truncation. Without it the framework defaults to the verbose "large" tier, which wastes tokens and confuses small models.
Context window — pass the object form of .withModel() to set Ollama’s num_ctx exactly. The profile tunes how the prompt is built; numCtx sets how much context Ollama allocates.
import { ReactiveAgents } from "reactive-agents";
const agent = await ReactiveAgents.create() .withName("local-researcher") .withProvider("ollama") .withModel({ model: "qwen3:4b", numCtx: 32768 }) // exact num_ctx sent to Ollama .withReasoning({ defaultStrategy: "reactive" }) // ReAct is the most reliable local strategy .withTools({ allowedTools: ["http-get", "file-read", "file-write"] }) .withContextProfile({ tier: "local" }) // lean prompts + aggressive compaction .withMaxIterations(8) // cap the loop so it can't run away .build();
const result = await agent.run( "Read notes.md, summarize the key points, and write the summary to summary.md",);console.log(result.output);console.log(result.metadata); // { duration, cost: 0, tokensUsed, stepsCount }Stick with the "reactive" (ReAct) strategy on local models. Heavier strategies like Plan-Execute or Tree-of-Thought rely on structured generation that’s fragile below ~14B parameters.
Step 5 — Swap to a frontier model in one line
Section titled “Step 5 — Swap to a frontier model in one line”Here’s the parity payoff. Nothing about the agent’s logic, tools, or prompts is tied to Ollama. To run the exact same agent on a frontier API, change the provider and model — and add the relevant API key to your environment:
import { ReactiveAgents } from "reactive-agents";
const agent = await ReactiveAgents.create() .withName("cloud-researcher") .withProvider("anthropic") // was "ollama" .withModel("claude-sonnet-4-6") // was "qwen3:4b" .withReasoning({ defaultStrategy: "reactive" }) .withTools({ allowedTools: ["http-get", "file-read", "file-write"] }) .withContextProfile({ tier: "frontier" }) // was "local" .withMaxIterations(8) .build();
const result = await agent.run( "Read notes.md, summarize the key points, and write the summary to summary.md",);console.log(result.output);# .env — only needed for hosted providersANTHROPIC_API_KEY=sk-ant-...Develop and iterate locally for free, then ship the same code against a frontier model when you need maximum quality. Bump the context tier to "frontier" to take advantage of the larger window. That’s the whole change.
Troubleshooting
Section titled “Troubleshooting”model "qwen3:4b" not found — the model isn’t pulled. Run ollama pull qwen3:4b and confirm with ollama list. The model name in .withModel() must match an entry in that list exactly.
Connection refused / agent hangs at start — the Ollama server isn’t running. Start it (the desktop app, or ollama serve) and verify it answers on http://localhost:11434.
Tool calls fail or use wrong parameter names — this is the classic small-model failure, and it’s largely handled for you: the Healing Pipeline corrects malformed tool names, parameter names, paths, and types before they error out. To improve it further, set .withContextProfile({ tier: "local" }), keep the tool set to 3–5 via .withTools({ allowedTools: [...] }), and prefer qwen3:14b over a 4B model for tool-heavy work.
The agent loops without making progress — the circuit breaker catches most loops, but you can tighten the cap with .withMaxIterations(5) and simplify the prompt.
Out of memory / Ollama crashes — use a smaller model or a quantized build, e.g. ollama pull qwen3:14b-q4_K_M (~60% less memory, minimal quality loss).
Next steps
Section titled “Next steps”You now have a working local AI agent in TypeScript that runs entirely on Ollama and ports to frontier APIs without a rewrite. Go deeper:
- Local Models Guide — model recommendations by task, context tiers, strategy fit, and cost comparison.
- Tools Guide — built-in tools, custom tools via
ToolBuilder, MCP servers, and tool-result compression. - Quickstart — the broader 5-minute walkthrough and
HarnessProfilepresets.