How OpenClaw Works #3 — The Agent Loop
How OpenClaw turns a single LLM call into a resilient execution loop with session serialization, auto-compaction, and model fallback.
Post #3 in How OpenClaw Works — a series studying patterns from the OpenClaw codebase, an open-source personal AI assistant.
Source: run.ts
Key dependencies:
- @mariozechner/pi-agent-core — The underlying LLM session runtime. OpenClaw doesn't implement tool-use loops from scratch — it delegates the model ↔ tool cycle to pi-agent-core and wraps it with production concerns: retries, compaction, auth rotation.
- TypeBox — JSON Schema builder for validating RPC params at the gateway boundary. Keeps validation schemas co-located with handler code instead of in separate
.jsonfiles.
The Problem
A single LLM call is stateless — you send messages, you get a response. An agent needs more: call the model, execute whatever tools it requests, feed the results back, repeat until the model decides it's done.
But production loops break in ways that toy implementations don't. The context window fills up mid-conversation. API keys hit rate limits. The model picks a thinking level the provider doesn't support. Two messages arrive for the same session simultaneously. A while(true) loop handles the happy path; the other 900 lines of run.ts handle everything else.
The Pipeline
WebSocket RPC
→ Validate params, respond { status: "accepted" }
→ Resolve model, skills, thinking level
→ Serialize into session queue
→ while(true): run attempt
→ Build prompt, call LLM, execute tools
→ Bridge events to stream
→ Handle failure: compact / rotate / fallback
→ Assemble reply payloads
Fire and Forget
The agent RPC handler doesn't wait for the model to finish. It validates params, resolves the session, and returns immediately:
const accepted = {
runId,
status: 'accepted' as const,
acceptedAt: Date.now(),
};
context.dedupe.set(`agent:${idem}`, {
ts: Date.now(),
ok: true,
payload: accepted,
});
respond(true, accepted, undefined, { runId });
void agentCommand({ message, images, sessionId, sessionKey /* ... */ }, runtime, deps)
.then((result) => {
const payload = { runId, status: 'ok', summary: 'completed', result };
context.dedupe.set(`agent:${idem}`, { ts: Date.now(), ok: true, payload });
respond(true, payload, undefined, { runId });
})
.catch((err) => {
const payload = { runId, status: 'error', summary: String(err) };
context.dedupe.set(`agent:${idem}`, { ts: Date.now(), ok: false, payload });
respond(false, payload, errorShape(ErrorCodes.UNAVAILABLE, String(err)), { runId });
});The client gets { status: "accepted" } within milliseconds. The actual agent run streams events back over the same WebSocket connection as they happen. When the run finishes — or fails — a second response frame arrives with the final result.
The dedupe map keyed by idempotency key prevents duplicate runs. If the client retries the same request (network hiccup, reconnect), the handler returns the cached result instead of spawning a second run.
Session Serialization
Before the agent loop starts, runEmbeddedPiAgent acquires two queue slots:
const sessionLane = resolveSessionLane(params.sessionKey || params.sessionId);
const globalLane = resolveGlobalLane(params.lane);
return enqueueSession(() =>
enqueueGlobal(async () => {
// ... the entire agent run happens inside these queues
}),
);enqueueSession serializes runs per session. enqueueGlobal optionally serializes across sessions (used by messaging channels that need ordered delivery).
Why not a lock? Locks protect shared data. Queues serialize work. The distinction matters — a lock would block the caller, while a queue lets the RPC handler return immediately and the run waits its turn. When two messages arrive for the same session, the second queues behind the first instead of racing it.
This prevents a class of bugs that are hard to reproduce: tool calls mutating session state while a parallel run reads it, or two runs appending to the same transcript simultaneously.
The Retry Loop
The core of run.ts is a while(true) with three retry paths:
const MAX_OVERFLOW_COMPACTION_ATTEMPTS = 3;
let overflowCompactionAttempts = 0;
while (true) {
attemptedThinking.add(thinkLevel);
const attempt = await runEmbeddedAttempt({
sessionId: params.sessionId,
onPartialReply: params.onPartialReply,
onBlockReply: params.onBlockReply,
onToolResult: params.onToolResult,
onAgentEvent: params.onAgentEvent,
// ... dozens of parameters
});
// Path 1: Context overflow → compact → retry
if (contextOverflowError) {
const compactResult = await compactEmbeddedPiSessionDirect({
/* ... */
});
if (compactResult.compacted) {
overflowCompactionAttempts++;
continue;
}
}
// Path 2: Auth failure → rotate API key → retry
if (shouldRotate) {
const rotated = await advanceAuthProfile();
if (rotated) continue;
if (fallbackConfigured) {
throw new FailoverError(message, { reason, provider, model, profileId });
}
}
// Path 3: Thinking level unsupported → downgrade → retry
// (handled via attemptedThinking set)
// Success
const payloads = buildEmbeddedRunPayloads({
/* ... */
});
return {
payloads: payloads.length ? payloads : undefined,
meta: { durationMs: Date.now() - started, agentMeta, aborted },
didSendViaMessagingTool: attempt.didSendViaMessagingTool,
messagingToolSentTexts: attempt.messagingToolSentTexts,
};
}Context overflow is the most common failure. Long conversations or large tool results push the transcript past the model's context window. The loop calls compactEmbeddedPiSessionDirect — which asks the model to summarize the conversation so far — then retries with the shorter transcript. Up to 3 attempts, because compaction itself can produce output that's still too long. This is why the heartbeat runner (Post #2) prunes no-op cycles — 48 empty heartbeats per day would accelerate context overflow significantly.
Auth rotation handles rate limits and billing failures. OpenClaw supports multiple API keys per provider. When one key hits a limit, advanceAuthProfile switches to the next. If all keys are exhausted and a fallback model is configured, it throws FailoverError — caught by the outer runWithModelFallback wrapper, which retries with the fallback provider.
Thinking level fallback tracks which thinking levels have been attempted. If a provider rejects "high" thinking, the loop downgrades to "medium" or "low" and retries — but never tries the same level twice.
The alternative — propagating these errors to the caller — would mean every client needs retry logic for context overflow, rate limits, and provider quirks. Handling it inside the loop keeps the failure modes invisible to callers.
Event Bridging
The model runs inside pi-agent-core. OpenClaw clients consume a different event stream. subscribeEmbeddedPiSession bridges the gap:
export function createEmbeddedPiSessionEventHandler(ctx: EmbeddedPiSubscribeContext) {
return (evt: EmbeddedPiSubscribeEvent) => {
switch (evt.type) {
case 'message_start':
handleMessageStart(ctx, evt);
return;
case 'message_update':
handleMessageUpdate(ctx, evt);
return;
case 'message_end':
handleMessageEnd(ctx, evt);
return;
case 'tool_execution_start':
handleToolExecutionStart(ctx, evt);
return;
case 'tool_execution_end':
handleToolExecutionEnd(ctx, evt);
return;
case 'auto_compaction_start':
handleAutoCompactionStart(ctx);
return;
case 'auto_compaction_end':
handleAutoCompactionEnd(ctx, evt);
return;
case 'agent_end':
handleAgentEnd(ctx);
return;
}
};
}Each handler translates a pi-agent-core event into zero or more OpenClaw emitAgentEvent calls. But translation isn't 1:1 — the bridge layer does real work:
Think-tag stripping — some models wrap reasoning in <think> or <thinking> tags. The bridge strips these before they reach the client, routing reasoning content to a separate stream.
Block chunking — long model responses get split into paragraph-sized chunks for incremental delivery. The client renders each chunk as it arrives instead of waiting for the full response.
Messaging tool deduplication — when the model sends a message via a tool call (e.g., sending a WhatsApp message), the bridge suppresses the duplicate text that would otherwise appear in the assistant's block reply. Without this, the user sees the same message twice.
Reply Shaping
After the loop exits, buildEmbeddedRunPayloads assembles the final output:
export type EmbeddedPiRunResult = {
payloads?: Array<{
text?: string;
mediaUrl?: string;
mediaUrls?: string[];
isError?: boolean;
}>;
meta: EmbeddedPiRunMeta;
didSendViaMessagingTool?: boolean;
messagingToolSentTexts?: string[];
};Two edge cases stand out.
NO_REPLY is a sentinel token the model outputs when it has nothing to say (common after tool-only turns). The payload builder filters it — callers never see it:
// NO_REPLY is treated as a silent token and filtered from outgoing payloadsError fallback — if the model produced no renderable payloads and a tool errored during the run, the builder synthesizes an error payload. This prevents silent failures where the user sends a message and gets nothing back.
Run Lifecycle
Active runs are tracked in a global Map, giving the rest of the system a handle to interact with in-flight runs:
const ACTIVE_EMBEDDED_RUNS = new Map<string, EmbeddedPiQueueHandle>();
type EmbeddedPiQueueHandle = {
queueMessage: (text: string) => Promise<void>;
isStreaming: () => boolean;
isCompacting: () => boolean;
abort: () => void;
};queueMessage lets external systems inject follow-up messages into an active run without starting a new one. abort triggers the abort signal passed to pi-agent-core, which stops the model mid-stream. isCompacting gates message injection — you don't want to queue a message while the session is being compacted, because the compaction would immediately invalidate it.
This is how the /stop command works: the gateway calls abortEmbeddedPiRun(sessionId), which sets the abort signal. The pi-agent-core runtime checks this signal between tool calls and stops the loop.
What Survives a Restart
Everything above — active runs, session queues, dedupe caches, event sequences — lives in process memory. Kill the process and it's gone. So what does survive?
Session transcripts. Each conversation is an append-only JSONL file managed by pi-agent-core's SessionManager:
_persist(entry: SessionEntry): void {
if (!this.flushed) {
// First flush: bulk write header + user + assistant
for (const e of this.fileEntries) {
appendFileSync(this.sessionFile, `${JSON.stringify(e)}\n`);
}
this.flushed = true;
} else {
// Subsequent: append one line per message
appendFileSync(this.sessionFile, `${JSON.stringify(entry)}\n`);
}
}Two details matter. First, nothing is written until the first assistant message arrives — if the process dies between receiving a user message and the model's first response, the user message is lost from the file. Second, appendFileSync is synchronous and writes one line at a time, so a crash mid-write loses at most one partial JSONL line. On restart, the loader skips malformed lines.
OpenClaw adds a repair layer on top:
const cleaned = entries.map((entry) => JSON.stringify(entry)).join('\n') + '\n';
const backupPath = `${sessionFile}.bak-${process.pid}-${Date.now()}`;
const tmpPath = `${sessionFile}.repair-${process.pid}-${Date.now()}.tmp`;
await fs.writeFile(backupPath, content, 'utf-8');
await fs.writeFile(tmpPath, cleaned, 'utf-8');
await fs.rename(tmpPath, sessionFile);Backup the corrupt file, write a cleaned version to a temp path, atomic rename into place. The original is never modified in-place.
Session write locks also persist — they're lock files with a PID. Stale locks (from a crashed process) are auto-detected after 30 minutes and reclaimed.
Everything else is ephemeral:
| Component | Persisted? | On restart |
|---|---|---|
| Session transcripts | JSONL files | Lose at most one partial line |
| Active runs | In-memory Map | All in-flight runs lost |
| Session queues | In-memory Map | Queued messages lost |
| Dedupe caches | In-memory Map | Duplicates possible for ~20 min |
| Event sequences | In-memory Map | Counters reset; SSE clients see gaps |
| Write locks | Lock files | Stale locks auto-reclaimed |
The architecture is session-file-durable but run-ephemeral. Conversation history survives crashes. In-flight execution doesn't.
This is a deliberate tradeoff. Persisting run state would mean a write-ahead log, checkpoint recovery, and replaying partially-executed tool calls — complexity that buys little when the failure mode is "user sends the message again." The session file gives the model enough context to pick up where it left off. The run itself is disposable.
Before restarting, the gateway tries to drain gracefully:
export function deferGatewayRestartUntilIdle(opts: {
getPendingCount: () => number;
pollMs?: number; // default 500ms
maxWaitMs?: number; // default 30 seconds
}): void;It polls pending work every 500ms and delays the restart signal until idle — or until the 30-second timeout expires, whichever comes first. Config hot-reload (Post #1) triggers this path, so most restarts wait for active runs to finish.
Takeaways
- Fire-and-forget with idempotency — return immediately, stream results async, deduplicate by idempotency key. Clients don't need to hold connections open for the full run.
- Serialize with queues, not locks —
enqueueSessionserializes work per session without blocking callers. The queue makes ordering explicit and prevents state corruption from concurrent runs. - Retry inside the loop, not outside — context overflow, rate limits, and thinking level mismatches are handled by the loop itself. Callers don't need to know these failure modes exist.
- Bridge events at the boundary —
subscribeEmbeddedPiSessionis a translation layer that strips, chunks, and deduplicates. Keeping this at the boundary means neither side needs to know about the other's event format. - Track active runs as handles — a
Map<sessionId, handle>gives the system abort, queue, and status capabilities without coupling to the run's internals. - Persist conversations, not runs — session transcripts survive crashes via append-only JSONL. Run state is ephemeral. Recovering a run would require checkpointing and tool replay; recovering a conversation just means re-opening the file.
Next in the series: how OpenClaw assembles system prompts from workspace config, skills, context files, and runtime info — the "brain loading" phase before the first model call.