Inside Agent Orchestration: Breaking Down Claude Code’s System Design

Summary

The coordination system implemented in the public mirror of the Claude Code source is real and non-trivial, but the main finding is that its “magic” is not a hidden new search algorithm or a deeply novel multi-agent planner. The strongest coordination logic is a hybrid design: a centralized coordinator policy expressed in a long system prompt, plus deterministic runtime machinery for spawning workers, queueing messages, persisting task state, resuming agents, compacting context, and routing tool output back into the main conversation. In other words, the repository’s advantage comes from prompted decomposition + robust execution plumbing, not from a secret scheduler that replaces the model.

There are really two orchestration layers in the inspected code. The first is the coordinator/subagent path: AgentTool launches local agents, runAgent() drives each worker through the same query loop, LocalAgentTask tracks background state, and workers report back through structured <task-notification> messages. The second is an experimental “swarm/teammate” layer: TeamCreateTool, SendMessageTool, TeamDeleteTool, and InProcessTeammateTask enable named teammates, mailbox-based messaging, persistent team files, and in-process or pane-backed teammates. That second layer is closer to a persistent multi-agent runtime than the simpler coordinator/worker path.

The most important practical insight for coding efficiency is this: Claude Code’s architecture is optimized to lower the coordination tax that usually kills multi-agent coding systems. Subagents get isolated context windows, the coordinator is instructed to parallelize read-only work and serialize conflicting writes, stopped workers can be resumed from transcript, mid-flight messages can be queued for delivery at the next tool round, and the main loop aggressively manages transcript persistence, compaction, and budget/turn limits. That combination is exactly the kind of engineering that makes the same underlying model behave much more effectively in coding workflows.

My bottom-line judgment is that the repository represents a strong production-grade implementation of centralized prompt-driven orchestration, with some experimental movement toward more actor-like agent teams. It is ahead of naïve “call one model repeatedly” assistants, but it is not evidence of a fundamentally new planning algorithm on the level of a new academic orchestration paradigm. The novelty is in the integration quality: prompt design, task runtime, recovery, context hygiene, tool governance, and developer ergonomics

Assumptions and scope

This report is based on the public mirror on entity["company","GitHub","software platform"], the files reachable through that mirror, and public documentation from entity["company","Anthropic","ai company"]. I did not execute the code, build the repo, or verify runtime-only branches behind feature flags and environment variables. I therefore treat runtime backend details, exact model selection, and server-side behavior as partially unknown unless directly stated in the source comments.

The current analysis is strongest on the modules that visibly govern orchestration and turn-taking: coordinatorMode.ts, AgentTool/*, SendMessageTool/*, TaskStopTool/*, LocalAgentTask/*, InProcessTeammateTask/*, QueryEngine.ts, query.ts, Task.ts, and the team tools. I did not perform an exhaustive commit-by-commit or fork-by-fork differential audit, so where I discuss “implemented state of the art,” I mean the architecture visible in the currently accessible files, with one narrow use of the official issue tracker only to note that agent-team features still had open rough edges in April 2026.

The key unknowns that remain are the exact runtime environment, the exact model APIs and server-side policy enforcement, the full behavior of compile-time feature flags, and any internal services not represented in the inspected source. Those unknowns matter because several comments explicitly reference feature gates, server-side task-budget handling, and environment-driven behavior.

Repository map and the modules that actually coordinate work

The repository structure itself already tells a clear story. The src/ tree contains explicit coordination-related directories for coordinator, tasks, tools, state, and the top-level QueryEngine.ts and query.ts, which strongly suggests that orchestration is a first-class subsystem, not a thin wrapper around a single LLM call.

The highest-leverage files are these:

src/coordinator/coordinatorMode.ts: this is the coordinator’s policy brain. It defines the phase structure, the concurrency rules, the “always synthesize” rule, the continue-versus-spawn heuristic, and the canonical <task-notification> protocol used to return worker results. This file is extremely important because it shows that much of the orchestration policy lives in prompt text rather than in hard-coded scheduling logic.
src/tools/AgentTool/AgentTool.tsx: this is the launcher. It decides whether agent execution should be backgrounded, assembles the worker tool pool, supports worktree and team-aware execution modes, and passes control into runAgent().
src/tools/AgentTool/runAgent.ts: this is the worker runtime. It resolves prompts and tools, chooses abort-controller behavior, runs subagent start hooks, feeds the worker through the query loop, records a sidechain transcript, forwards progress, and cleans up agent-local state.
src/QueryEngine.ts and src/query.ts: together these implement the reusable turn loop. QueryEngine.submitMessage() constructs context, persists transcripts, invokes query(), tracks usage and denials, handles compact boundaries, and emits budget/turn-limit results. query.ts is the lower-level streaming loop with cross-iteration state, max-turn handling, and recovery logic.
src/tasks/LocalAgentTask/LocalAgentTask.tsx: this is the task-state layer for background workers. It defines task state, pending-message queues, progress updates, summary updates, task completion/failure transitions, and the function that turns task completion into a structured <task-notification> message sent back to the main session.
src/tools/SendMessageTool/SendMessageTool.ts: this is the continuation channel. It can queue a plain-text message for a running worker so the worker receives it “at its next tool round,” or it can auto-resume a stopped worker from its transcript if needed. It also doubles as the mailbox transport for teammate swarms and contains explicit safety checks for bridge-style cross-session messaging.
src/tools/TaskStopTool/TaskStopTool.ts and src/tasks/stopTask.ts: these provide controlled termination of background tasks via a task ID, including the special handling that suppresses noisy shell-stop notifications while preserving semantically useful agent-stop notifications.
src/tools/TeamCreateTool/TeamCreateTool.ts, src/tools/TeamDeleteTool/TeamDeleteTool.ts, and src/tasks/InProcessTeammateTask/InProcessTeammateTask.tsx: these form the more advanced swarm layer. TeamCreateTool creates persistent team files and corresponding task-list directories, SendMessageTool routes mailbox messages, InProcessTeammateTask manages queued teammate messages and shutdown requests, and TeamDeleteTool refuses cleanup while active members remain.
src/Task.ts: this is the common task abstraction, with explicit task types such as local_agent, remote_agent, and in_process_teammate, plus task-ID generation and terminal-state semantics. The most revealing architectural fact is that there is no single “orchestration algorithm” file that performs hard scheduling, graph search, auctioning, bandit allocation, or learned arbitration. Instead, the orchestration layer is distributed across prompt policy, task runtime, and message persistence. That is precisely why the system feels more like “well-instrumented agent operations” than “a new theorem in planning.”

Control flow, message passing, and turn management

At the top level, QueryEngine.submitMessage() creates a turn, loads system-prompt parts, injects user context, builds a ProcessUserInputContext, persists the accepted user message to transcript before the model turn begins, and then enters the streaming query() loop. As events come back, it updates mutableMessages, persists assistant and progress records, handles compact boundaries, tracks cost/usage, and emits terminal results if max turns or budget are exceeded. That means the engine is not just “call model, show output”; it is a stateful event processor for long-lived coding sessions.

The coordinator path then delegates through AgentTool. AgentTool computes whether a subagent should run asynchronously, builds a separate worker tool pool, optionally applies worktree isolation, and hands execution to runAgent(). The async decision is not just user-controlled; it can be forced by coordinator mode, background defaults on the agent definition, or other flags. This matters because it means the runtime was intentionally designed to treat delegation as a first-class concurrency primitive, not as a one-off plugin.

Inside runAgent(), the worker receives its own prompt messages, tools, hooks, model selection, abort controller, and sidechain transcript. The worker then runs through the same query() loop as the parent, records each new recordable message incrementally, forwards progress, and preserves parent-chain continuity through recorded UUID linkage. That design gives each worker its own conversational execution trace while keeping the parent session resumable and inspectable.

When a worker completes, LocalAgentTask.enqueueAgentNotification() constructs a <task-notification> payload containing the task ID, status, output-file path, summary, optional result, usage, and worktree information, then places that message into the pending notification queue. The coordinator prompt explicitly teaches the model to interpret that notification as the event that closes the fan-out and begins synthesis or correction. This is the pivotal fan-in mechanism.

Message passing back into a running worker is handled by SendMessageTool. If the named recipient maps to a running local agent task, the message is not injected immediately; instead it is appended to pendingMessages and delivered “at its next tool round.” If the task is stopped, the tool attempts to resume it from transcript. That is a subtle but important design choice: it avoids interrupting a worker mid-tool-call while still enabling asynchronous steering and recovery.

The following diagram captures the main coordinator/worker path:

sequenceDiagram
    participant U as User
    participant Q as QueryEngine/query
    participant C as Coordinator model
    participant A as AgentTool
    participant T as LocalAgentTask
    participant W as Worker runAgent/query
    participant N as Notification queue

    U->>Q: submitMessage(prompt)
    Q->>C: main turn with messages + system prompt
    C->>A: spawn worker(s)
    A->>T: register async task(s)
    T->>W: run worker with isolated context/tools
    W-->>T: progress / sidechain transcript / result
    T->>N: enqueue <task-notification>
    N->>Q: synthetic user-visible notification
    Q->>C: next coordinator turn with findings
    C->>A: continue worker OR spawn fresh verifier

That sequence is directly supported by the task-notification prompt contract, async task registration, sidechain transcript handling, and the queue/resume logic.

The teammate/swarm layer adds another path. TeamCreateTool creates a persistent team file, assigns a deterministic lead ID, initializes a task-list directory, and stores team context in app state. SendMessageTool then routes plain-text or structured messages to named teammates through mailbox writes, and InProcessTeammateTask stores pending teammate messages and shutdown requests while retaining task-local conversation history. TeamDeleteTool refuses cleanup if non-lead members are still active. This is much closer to a classical multi-agent runtime with durable identity and inboxes.

flowchart LR
    Lead["Team lead session"] --> TC["TeamCreateTool"]
    TC --> TF["Team file + task-list dir"]
    TF --> TM["Named teammates"]
    Lead --> SM["SendMessageTool"]
    SM --> MB["Mailbox / pending queue"]
    MB --> TP["InProcessTeammateTask or pane-backed teammate"]
    TP --> MB
    TP --> Lead
    Lead --> TD["TeamDeleteTool"]
    TD -->|only if no active members| Cleanup["Directory + context cleanup"]

The teammate layer is architecturally significant because it moves from “ephemeral subagent helper” to “persistent peer with identity, inbox, and lifecycle.”

A concise pseudocode sketch of the effective orchestration logic looks like this:

accept user turn
persist transcript early
build system/user context
while turn not terminal:
    run model/query iteration
    if assistant requests tools:
        execute tools
        if AgentTool async:
            register worker task
        if SendMessage to running worker:
            queue for next tool round
        if SendMessage to stopped worker:
            resume from transcript
    record progress and assistant output
    if task notification arrives:
        coordinator synthesizes finding
        choose continue(existing worker) or spawn(fresh worker)
    if compact/budget/maxTurns condition:
        compact or terminate
return result

This pseudocode is not a copy of one function; it is a synthesis of submitMessage(), query(), AgentTool, runAgent(), LocalAgentTask, and SendMessageTool.

Extracted algorithms, patterns, and tradeoffs

The repository’s dominant coordination pattern is master-worker orchestration with an LLM-controlled dispatcher. The coordinator prompt explicitly defines phases — research, synthesis, implementation, verification — and tells the coordinator when to parallelize and when not to. Read-only work should fan out aggressively; conflicting writes should not. This is a real concurrency policy, but it lives in prompt instructions rather than in a deterministic scheduler.

A second major pattern is event-driven fan-out/fan-in. Workers run asynchronously, and completion is reified as structured task notifications that re-enter the main conversation as events. That turns worker finish states into new turn-taking stimuli for the coordinator. The design is cleaner than ad hoc polling because the notification format carries task ID, status, summary, result, usage, and worktree metadata.

The implementation is also strongly planner-executor-verifier in spirit. The coordinator prompt tells the model to research first, then synthesize a concrete spec, then choose whether to continue the same worker or spawn a clean one, and finally use a fresh verification worker when appropriate. That is not merely “ask another agent”; it is a worked-out decomposition rule set meant to reduce context pollution and anchoring errors.

There is also a continuation-versus-fresh-spawn arbitration heuristic. The prompt explicitly says to continue the same worker when existing context is helpful, but to spawn fresh when a broad research context would pollute a focused implementation or when verification should be independent. That heuristic is simple, but it is one of the most practically valuable ideas in the whole system because it addresses one of the hardest problems in coding agents: when prior context helps and when it hurts.

For state management, the code uses a layered approach: mutableMessages for live session state, transcript persistence for resumability, sidechain transcripts for subagents, explicit task state in app state, deterministic task IDs, and optional team files and task-list directories for swarm mode. The result is a coordination system that is resilient across interruptions and strong enough to support “resume stopped worker from transcript” behavior.

For context management, the design is unusually deliberate. Official docs explicitly recommend delegating research to subagents “to keep your main context clean,” and the code supports that with independent worker conversations, transcript compaction, snip replay, memory prefetch, and skill-discovery prefetch in the query loop. This is one of the clearest reasons the system can outperform otherwise similar front-end chat experiences built on the same model family.

Timeout, retry, and budget policy are present but not highly sophisticated in a scheduling-theory sense. query.ts defines a MAX_OUTPUT_TOKENS_RECOVERY_LIMIT of 3 for a specific recovery path, and QueryEngine enforces max-turn and max-budget termination while surfacing API-retry metadata through system messages. That is robust operational plumbing, but it is not a complex adaptive controller.

The repository also implements isolation at multiple levels. Async workers can get independent abort controllers; docs recommend parallel worktrees; and in-process teammates are described as using AsyncLocalStorage for isolation. Again, the point is not a mathematically new algorithm; it is reduced interference between concurrent coding threads.

The table below compares the key runtime components and their tradeoffs.

Component	Inputs → outputs	Coordination role	Latency / complexity tradeoff	Typical failure mode
`QueryEngine.submitMessage()` and `query()`	User prompt + session state → streamed events, messages, terminal result	Stateful turn loop, persistence, compaction, usage/budget accounting	Higher implementation complexity, but much lower session fragility and better resumability	Transcript inconsistency, compaction edge cases, max-turn/max-budget termination
`AgentTool` + `runAgent()`	Prompt + agent type + tool context → worker stream or async task handle	Worker launch, isolation, worktree/tool selection, progress forwarding	Added launch overhead and task bookkeeping, but enables parallel work and cleaner contexts	Wrong tool pool, bad isolation choice, worker reaches max turns
`LocalAgentTask	Worker lifecycle events → state transitions, summaries, task notifications	Background worker state machine and fan-in bridge	Extra app-state machinery, but enables notifications, retention, UI/task visibility	Duplicate notifications, stale task state, task killed before useful result
`SendMessageTool	Recipient + plain/structured message → queued delivery, mailbox write, or resume	Mid-flight steering, swarm messaging, cross-session bridge transport	Queued delivery avoids unsafe interruption, but introduces delayed response semantics	Recipient missing, no transcript to resume, bridge disconnected or permission denied
`TaskStopTool` + `stopTask()`	Task ID → stopped-task result	Safe cancellation and cleanup	Simple, low-latency stop path	Task not found, not running, unsupported type
`TeamCreateTool` / `InProcessTeammateTask` / `TeamDeleteTool`	Team name / mailbox messages / shutdown requests → persistent teammates and cleanup	Persistent swarm runtime with identity and mailboxes	More powerful than ephemeral subagents, but operationally heavier and more failure-prone	Active-member cleanup block, shutdown approval flow friction, stale team artifacts

Three concrete examples make the coding advantage easier to see. First, on a large bug investigation, the coordinator can launch two read-only workers in parallel — one to inspect the failing code path and one to enumerate relevant tests — then synthesize only the returned findings into a precise implementation prompt. That saves both wall-clock time and main-context capacity.

Second, if an implementation worker is already deep in context and a new correction arrives, the system can queue a message for delivery at the next safe tool round instead of discarding the worker or forcing the coordinator to re-prompt from scratch. That is a very practical coding acceleration mechanism, especially during test-fix loops.

Third, if the worker dies or is stopped, the runtime can resume it from transcript. That is much closer to “persistent task execution” than to the disposable subagent calls seen in many simpler coding assistants. For long-running changes, that is a major ergonomic win.

Comparison to academic and industry patterns

The closest academic analogue for the inner loop is ReAct: the code repeatedly interleaves model reasoning and tool-mediated action, with decisions about what to do next informed by intermediate results. The presence of the iterative query() loop and tool execution path fits that paradigm well.

The coordinator prompt also resembles Plan-and-Solve prompting. Its explicit “research → synthesis → implementation → verification” flow is a stronger, coding-specific cousin of the broader idea that you should plan before you solve. Official Claude Code docs separately recommend plan mode before editing, which lines up with that design philosophy.

At the tool layer, the repository is conceptually close to Toolformer in that tool calls are central to performance, but it differs in an important way: there is no evidence in the inspected coordination files that the tool-calling policy is learned by a dedicated self-supervised training objective. Instead, the runtime exposes tools, schemas, and permissions, and the model decides tool use at inference time inside the prompt/runtime loop. That is a product-engineering approach rather than a novel learned orchestration algorithm.

The teammate/swarm subsystem pulls the architecture closer to AutoGen-style multi-agent conversation. Agents have names, identities, message channels, and lifecycles, and the system supports more persistent inter-agent exchange than the simpler subagent path. However, the inspected implementation is still more centralized and operationally opinionated than a general actor framework: the main coordinator remains central, and much of the handoff logic is still encoded in prompts and tool contracts rather than in a declarative actor graph. Official research from entity["company","Microsoft","software company"] on AutoGen emphasizes multi-agent conversation and, in the later v0.4 work, an actor model for orchestration; the Claude Code mirror looks like a partial move in that direction, not a fully actor-native runtime.

Compared with official Claude Code documentation, the inspected source makes the advantage more concrete. The docs say subagents keep the main context clean and that custom subagents are configured as Markdown files with their own prompts and tool permissions. The source shows the runtime mechanisms that make those claims operational: separate worker tool pools, isolated worker contexts, transcript sidechains, independent abort controllers, and stateful resume logic.

The most important negative finding is also useful: in the coordination-relevant files I reviewed, I did not find explicit implementations of auction-based task assignment, bandit-style worker selection, Monte Carlo tree search, quorum-based voting over competing patches, or a typed DAG executor in the core path. That absence is meaningful. It suggests the repository’s effectiveness comes primarily from careful workflow design, not from an advanced formal arbitration algorithm.

Novel or leaked techniques, security implications, and recommended improvements

The most interesting “leaked” techniques are not fundamentally new algorithms, but high-value implementation details that explain why the product can feel unusually effective. The strongest of these are: structured XML task notifications; strict coordinator instructions against lazy delegation; asynchronous background workers with progress/state summaries; queued mid-turn steering; automatic resume from transcript; sidechain transcripts for worker continuity; compaction/snipping to keep long sessions viable; and worktree-aware isolation. That combination is operationally sophisticated and directly relevant to coding performance.

From a security and privacy standpoint, the leak matters because it exposes internal coordination prompts, message formats, persistent team/task directory layout, and bridge-message behavior. That makes it easier for clone projects to replicate the architecture, but it also makes it easier for attackers or red-teamers to imitate expected control messages, probe workflow assumptions, or craft prompt-injection strategies that target known coordination contracts. The settings docs also show how subagents are stored and how sensitive-file denial rules are expected to work, which further clarifies the product’s filesystem trust boundaries.

One particularly important safety design appears in SendMessageTool: bridge-style cross-session messaging requires explicit user consent because the message arrives as a prompt in another Claude session “possibly another machine” via Anthropic’s servers. That is a concrete example of the code recognizing that cross-session agent handoff is not a routine local action but a higher-risk communication boundary.

The official issue tracker also suggests that agent-team features were still somewhat rough in practice. The public issues page showed open agent-teams-related items, including teammate permission-request handling and per-teammate effort-tier support. That does not undermine the architecture, but it does indicate that the more advanced swarm layer was still evolving and not yet fully smoothed into a stable product surface.

The best improvements would be fairly clear if one were building a next iteration. First, move some of the phase rules out of prompt text and into an explicit scheduler with conflict-aware file locking and typed “phase complete” transitions. Right now, the concurrency policy is smart, but mostly advisory. That makes the system flexible, yet it also means correctness depends heavily on model compliance. This recommendation is an inference from the current prompt-centric design.

Second, add a typed artifact channel instead of relying so heavily on plain text or XML-ish message blocks. Workers should be able to return structured findings, patch plans, candidate file sets, and verification results in schema-checked objects. The code already has examples of structured output handling and strongly typed task state, so the system is halfway there. This would reduce ambiguity and help prevent mis-synthesis by the coordinator.

Third, introduce stronger verifier independence and, for high-stakes code changes, optional multi-verifier or verifier-plus-static-check pipelines. The prompt already recommends using a fresh verifier; formalizing that into a first-class runtime pattern would likely improve robustness further.

Fourth, in swarm mode, harden mailbox and artifact provenance. The visible design is powerful, but once coordination depends on task files, inboxes, and output paths, file integrity and spoof-resistance matter much more. Even if the runtime is local-first, typed signing or stronger origin metadata would reduce protocol abuse risk. That recommendation follows directly from the repository’s file-based coordination layout.

Open questions and limitations

The biggest unresolved question is how much of the perceived performance delta comes from runtime/server behavior not visible here. Some comments mention server-side task-budget semantics and several behaviors are gated behind features or environment variables. So this report can confidently describe the visible orchestration design, but it cannot prove that no additional hidden orchestration exists elsewhere.

A second limitation is that I did not complete a systematic audit of public commits, forks, and every open issue in the mirror. My conclusions therefore rest mainly on the currently visible source modules and official documentation, not on a longitudinal evolution analysis of the architecture.

A third limitation is that I cannot validate runtime ergonomics without execution. Some features that look elegant in source may be unstable, expensive, or rarely enabled in practice, especially in the more advanced swarm layer. The existence of open official issues around agent-team behavior supports treating that part of the design as promising but still somewhat experimental. citeturn49search5

Key Findings

The public mirror shows a strong, real orchestration system, but its primary innovation is not a secret planning algorithm. The main advances are a prompt-defined coordinator policy, async worker tasks, task notifications, resumable subagent transcripts, context compaction, worktree isolation, and mailbox-style teammate messaging. That makes the same model materially better at coding because it can decompose work, preserve clean contexts, recover from interruptions, and keep long-running tasks operationally coherent. Architecturally, the repository is best described as centralized prompt-driven orchestration with a production-grade task runtime and an experimental step toward actor-like agent teams.