Architecture

Muster is a glass-box, multi-runtime, multi-domain agentic orchestrator. Supported native lanes include Claude Code, Codex, Kimi, and Cowork. The deterministic CLI makes no model calls and those native lanes need no Muster-specific model API. The private/local ChatGPT Work lane is the exception: Secure MCP Tunnel uses a separately billed OpenAI Platform API key and remains proof-gated. This page is the source-level map. For the gentler tour, start with Concepts.

Two layers

Muster is split into two layers with a hard boundary between them.

Layer	Lives in	Runtime	Talks to a model?
Deterministic CLI	`src/*.js`	Plain Node ESM	No
Model-facing	`plugin/` (commands, skills, agents)	Claude Code and Codex	Yes

The CLI layer is ordinary Node. The published package has two runtime dependencies, yaml and esbuild, requires Node 20 or newer, and normally makes no model calls. Direct CLI verbs need no precompile, while tests, publishing, and Codex plugin installation run the Codex build pipeline. Most verbs are fully local; five boundaries can use the network or the active model account. issue shells out to gh issue view for an explicit GitHub issue reference. vendor fetches sources declared with kind: github in vendor/manifest.yaml (local sources stay offline). doctor uses gh api to verify vendor note SHAs against pinned refs; if that remote check is unavailable, it reports the check as skipped offline, while a missing, unreadable, or invalid local vendor manifest still fails health. The opt-in install kimi --probe performs an authenticated models request; omit --probe for an offline install. codex-plan starts codex app-server --stdio and begins one model turn on the active Codex account. The remaining deterministic verbs run without network access.

The model-facing layer is what the harness loads as a plugin — Claude Code's plugin format, and a built Codex plugin (skills and custom-agent profiles) generated from the same sources. It is markdown: slash commands, skills, and agents. These files instruct the model how to drive a run. They call the CLI for every deterministic decision, then use the harness's built-in subagent dispatch to do the judgment work. The split is deliberate. Routing, scoring, and validation are reproducible because code owns them. Drafting, reviewing, and classifying are the model's job.

Portable MCP surface

The MCP implementation has one neutral core in mcp/server.mjs and explicit host adapters: mcp/codex-server.mjs for the generated Codex bundle, mcp/chatgpt-work-server.mjs for the generated Work bundle, and cowork/mcp-server.mjs for Claude Cowork. cowork/chatgpt-work-server.mjs is the compatibility shim retained for older Work/source-checkout callers. Build/install no longer string-rewrite Cowork source to produce Codex or Work runtimes; each bundle is built from its adapter directly. The public Work plugin still exposes runtime/chatgpt-work-server.mjs as its server command.

The capability and domain router

The router is the novel core. The problem it solves: you have an outcome and a pile of tools (some you installed, some Muster ships), and you need to pick the right tool for each piece of work, predictably.

Muster names a fixed vocabulary of roles, the kinds of work a crew might need. There are 26 of them (src/roles.js): implement, code-review, test-author, debug, refactor, architecture-review, security-review, author, research, score, humanize, prompt-quality, improve, image, video, lifecycle, and more. Roles are the stable interface. Pipelines and commands ask for a role, not for a specific tool.

Each role resolves through a ladder of provider sources, best-available first:

An installed external provider (a plugin, agent, or MCP server you already have)
A Muster built-in agent
A Muster built-in skill
Inline (the model does it directly, with no specialist attached)

muster capabilities walks this ladder for every role. For each role you get chosen (the winning provider), chain (the full ordered fallback list, always ending in inline), recommendations (installable external providers that would beat the current fallback), and model. The resolution is a single deterministic pass over the catalog, sorted by rank (src/capabilities.js). Because the ladder always terminates at inline, every role resolves to something.

The role enum is fixed, but the set of providers is not, and that creates a reach problem: some specialists do not map cleanly onto a named role. The escape hatch is description-search. muster match "<task>" is a deterministic token-overlap ranker (src/match.js, no LLM call). It tokenizes the task, builds a weighted bag of searchable tokens for every catalog provider, scores each by overlap, and boosts installed providers so a present tool edges out an equal-scoring fallback.

Per-role model selection

Each resolved role carries a model, picked to fit the work (src/model.js):

Tier	Roles	Why
scout	`code-navigation`, `docs-research`, `research`	Mechanical locating, gathering, and scanning
core	everything else (the default)	Implementation, review, authoring, and scoring
prime	fallback for apex	Frontier judgment when apex is disabled
apex	the tournament `judge`, `architecture-review`, `improve`, `advisor`	Rare peak judgment

The conceptual tier comes back as roles[<role>].model from muster capabilities, with the harness-concrete dispatch value for that lane beside it: claudeModel (plus claudeProfile for an agent-backed role) on the Claude Code and Cowork lanes, codexModel or kimiModel under their flags. Where a role's chosen agent has a profile in catalog/agents.manifest.json, that profile is the authoritative pin and claudeModel equals its claudeProfile.model; the role's own tier is the fallback for roles with no agent-backed profile. The orchestrator dispatches on the concrete value, never the raw tier. Set MUSTER_MAX_TIER=core for budget mode or MUSTER_MAX_TIER=prime to exclude apex. Apex is disabled by default; modelForRole degrades it to prime deterministically unless MUSTER_ENABLE_APEX=1 is set.

Compatibility only: legacy input aliases haiku, sonnet, opus, and fable normalize to scout, core, prime, and apex; they are not a second conceptual ladder. MUSTER_ENABLE_FABLE remains an alias for MUSTER_ENABLE_APEX.

Provider kinds

A provider resolves to one of four kinds, which decides how the orchestrator dispatches it:

agent: a subagent definition, dispatched by subagent_type.
skill: a markdown skill injected into a generic subagent.
mcp: an installed MCP server, surfaced as a tool.
inline: no specialist; the model does the work directly.

Dispatch honors chosen.kind, but the mechanism and the not-dispatchable contract differ by harness.

On Claude Code, an agent is dispatched by subagent_type and anything else gets a generic subagent with the relevant skill injected; the crew member's model is always passed as the Agent tool's model override. If the agent type is not dispatchable in the running session (a plugin installed mid-session), the orchestrator falls back to a generic subagent with the provider's brief injected and records the fallback in STATE — the model override still applies, so model selection survives the fallback (plugin/skills/orchestrator/SKILL.md).

On Codex, an agent routes through its installed Codex profile using agent_type and a bounded context fork. Because dispatch has no cwd field, worktree briefs carry absolute worktree, manifest, and STATE paths and require that worktree as every tool call's working directory. If a named agent profile is not dispatchable, the task fails closed with a reinstall/new-session diagnostic; it never silently falls back and loses the pinned role/model policy. Generic skill/MCP/inline dispatch inherits the parent model because Codex does not expose a per-call model override for that path.

Codex approve-first launch has two paths. From an interactive terminal, muster codex-plan owns a new local App Server connection, discovers the experimental Plan preset, combines its mask with the effective thread model, and starts the installed muster-plan skill using turn/start.collaborationMode. The effective thread/settings/updated receipt is the activation proof. The launcher deliberately omits approval-policy, approval-reviewer, permissions, and sandbox fields, relays structured user input to the terminal, and declines rather than accepts action requests. It cannot change an already-running chat. For existing sessions, non-interactive callers, or unavailable proof, /plan $muster-plan ... is the explicit fallback; no fallback response claims native activation. Both paths change only the authoring surface. Muster's deterministic scope, manifest validation, and approve/adjust/cancel gate remain authoritative.

Pipelines

A pipeline is a phased, gated recipe for producing one kind of artifact. Each declares a domain, an ordered list of phases (each phase names a role), an optional optional_phases list (run only when the outcome explicitly asks for it, e.g. a publish prep phase), and a gate (src/pipeline.js validates the shape). Routing is deterministic: muster route matches the outcome against each pipeline's match keywords on word boundaries and picks the earliest keyword hit position in the outcome text (ties break by longer phrase, then file order), falling back to the domain default when nothing matches.

Gating uses a floor principle (src/score.js): the weakest dimension must clear the gate's floor, and the total must clear pass_total. A strong average cannot rescue one weak dimension. The model only estimates the per-dimension scores; the code decides pass or fail.

See Pipelines for the full set and the prioritization models.

Execution model

Model work uses the account or subscription of the active runtime. On Claude Code it goes through built-in subagent dispatch, not the Agent SDK. Codex, Kimi, and Cowork use their documented native or MCP dispatch lanes. The CLI itself makes no model calls. The practical consequences:

Muster draws the active runtime's normal quota. On Claude Code it does not hit a separate Agent-SDK credit pool.
Fan-out spends that same quota faster, since parallel subagents are parallel quota.
Native CLI/harness lanes have no separate Muster runtime to deploy or model key to manage. The private/local ChatGPT Work tunnel is a separately billed Platform-key exception.

Orchestration loops until done via a Ralph-style primitive (src/loop.js). Each wave re-runs implement, review, and fix until the gate passes or the iteration cap escalates, so subagents drive toward the success criteria rather than stopping after one pass. A pre-flight plan-conflict review runs before wave 1, the muster-strategist is dispatched for root-cause analysis when a review gate escalates (before any human prompt), and concurrent file-writing wave tasks each run in their own git worktree so parallel edits never collide.

Plan tasks may declare owns/frozen arrays -- opaque path-label strings, shape-validated only, never glob-matched or overlap-checked -- which the orchestrator copies verbatim into a dispatch brief as scope fences, dispatching same-wave tasks in parallel only when their owns sets are disjoint. A manifest or task may also declare forbiddenActions (send/sign/submit/publish/purchase/delete-remote); the orchestrator writes the run's effective set to .muster/forbidden-actions at start, copies it into each brief, and removes the file just before the run's declared merge disposition executes. Every dispatch brief also ends with a mandatory return contract: implementers return raw data (<=2000 chars), reviewers return a verdict first with <=1500 chars of findings, and the orchestrator reads each subagent result exactly once with no cross-wave accumulation. Immediately after each wave commit, the orchestrator attaches a git notes --ref=muster record of the wave's intent (decisions, review cycles, findings fixed/accepted); the review gate reads it back to check the implementation against recorded intent, not just the diff against the spec -- and runs muster citation-check on research/content artifacts before dispatching reviewers so a dangling citation travels in their briefs as a finding.

Before wave 1, go's pre-wave spec gate dispatches a fresh strategist-tier agent to probe the validated manifest and plan as a lazy-or-malicious implementer; a round-1 FAIL always loops the findings back to the router (amendment 1). A round-2 FAIL compares its itemized findings against round 1's: any repeated/unresolved round-1 finding hard-aborts, but if every round-2 finding is disjoint (genuinely new, not a restatement) the gate allows a second amendment (amendment 2, the final one) instead of escalating immediately -- the disjointness call is recorded in STATE. A round-3 FAIL always hard-aborts regardless of disjointness (rounds capped at 3: initial + 2 amendments), and the gate is skippable for trivial single-task plans. At finish, a manifest-declared mergeDisposition (merge-local/merge-push/pr/keep) executes without asking; absent or ask falls back to the interactive merge-decision prompt, and unattended (Routine) runs always downgrade merge-local/merge-push to pr rather than push to a base branch.

After a run, the improve role (the read-only muster-improver agent) mines the run STATE, escalations, and review-gate fix-loops for recurring friction and proposes user-gated edits to Muster's own skills, agents, and rules. It proposes; it never applies, and never edits during a run.

Tournament synthesis is tunable via two env vars: MUSTER_FUSE_TOPK (default 3) caps the number of candidates passed to the synthesizer; MUSTER_FUSE_MIN_DISAGREEMENT (default 1) is the minimum disagreement score required to activate fusion -- below this threshold muster fuse falls back to the single best candidate.

The advisor role lets a lower-tier worker consult the apex tier, degrading to prime, at a hard decision point. The worker returns a structured advice-request, a consult budget (MUSTER_ADVISOR_MAX_CONSULTS, default 3) bounds cost, the consult is logged to STATE (glass box), and the advice is fed back so the worker keeps the decision. The advisor informs; the worker decides. Native (Claude Code Agent-tool dispatch, no external server tools), autonomous-first (no human prompt).

Driving Muster remotely uses Claude Code's own features, not a transport Muster ships. A Routine can fire /muster:go as a scheduled cloud run. Channels deliver steering events (approve, stop, status, retarget) to a running session. Remote Control hands phone or web access to a running local session.

Session hooks

Muster ships four plugin-native hooks in plugin/hooks/. All are declared in plugin/hooks/hooks.json, activate when Muster is enabled, and are removed when Muster is disabled. None write to your ~/.claude files. Every hook is fail-safe: any error returns a minimal valid result and exits cleanly.

Enforcement follows the run's EXTERNAL effects, not the orchestrator's own in-repo edits: the action-class fence below is the only DENY the PreToolUse hook can emit. A second, narrower hook-enforced block lives on TaskCompleted (below), gating the native task board's own completed tick instead of a tool call. Everything else is a single warn-only "border invitation" (guidance.js: CREW_INVITATION) that sells the value of a crew run -- parallel dispatch, adversarial review, a receipts trail -- rather than commanding, once per crossing. Review gates remain Muster's actual quality enforcement.

SessionStart (session-start.js) injects a one-line pointer ("muster available; /muster:plan for orchestration-scale work") at the start of every session -- a Claude Code plugin cannot auto-load a CLAUDE.md, but a SessionStart hook can return additionalContext, which Claude Code prepends to the session. A fresh start clears per-session drift state. Repository-global .muster/wave-active/run-active markers use one-hour activity leases: fresh or unverifiable markers are preserved, only regular markers with expired leases are recovered, and PreToolUse renews live leases. /compact/resume (mid-session) leave all of that intact.

UserPromptSubmit (user-prompt-submit.js) fires the ONLY prompt-time nudge: the isDirective-triggered border invitation. A directive-shaped prompt (an imperative verb like fix/build/implement, optionally after a polite lead-in; "Update:"/"Fix for" declaratives and questions are excluded) landing with no active Muster run, and corroborated by at least one distinct file already touched this crossing (a directive verb alone is opener detection, not scale -- "fix typo" never invites on its own), injects the value-toned invitation, once per crossing -- then stays silent until re-armed by a Muster run starting, SessionStart, or 60 minutes of inactivity, and even then only once the shared invite cooldown (below) has cleared.

PreToolUse (pre-tool-use.js) first classifies every payload through the action-class fence (the only deny this hook can emit). A recognized forbidden external effect remains fenced even when the call is delegated or carries a .muster/.claude metadata path. After that check, subagent, metadata-path, and out-of-tree calls are exempt from the advisory inline-drift accounting; qualifying in-tree calls reach the warn-only border invitation and all others are allowed.

Action-class fence (action-guard.js): when both .muster/run-active and .muster/forbidden-actions exist, the hook classifies the tool call (an mcp__-prefixed tool name against a send/submit/publish/sign/purchase keyword set, word-boundary matched; or a Bash command against a small high-confidence allowlist -- git push, npm publish, gh release create, gh pr merge, curl -X POST) and denies a match against the run's declared forbiddenActions, honouring MUSTER_ACTION_GUARD (off/warn/deny-by-default). Either file absent, or no class matching, is a no-op.

Border invitation: independent of the fence, an Edit/Write/NotebookEdit with a resolved target, or a high-confidence Bash file write, feeds a cumulative cross-turn distinct-file counter whenever no Muster run is active. Crossing MUSTER_INLINE_SCALE (default 3) for the first time this crossing window, with the shared cooldown below not active, warns once (never a deny) with the value-toned copy; further files in the same crossing stay silent. A live Muster run resets the counter instead of recording (that work is tracked/dispatched, not drift).

Invite cooldown (hysteresis): a crossing re-arming only makes the next touch eligible to invite -- a shared, session-keyed cooldown (MUSTER_INVITE_COOLDOWN_MS, default 15 minutes) started by the last actual invite from either signal still gates whether that eligible touch is spoken. This absorbs a noisy border -- a rapid Muster-run restart, or a drift counter oscillating around the threshold -- so it cannot flap a repeat invite seconds apart, while a genuinely long-lived session (crossings spaced hours or days apart) still gets one invite per crossing.

TaskCompleted (task-completed-gate.js) gates the native task board's own completion tick: the orchestrator writes .muster/task-board.json (one entry per native task Muster created) at TaskCreate time and flips it to reviewGate: "pass" only once review-gate actually passes that task, before ever marking it completed via TaskUpdate. This hook denies (exit 2) a completion attempt on a tracked task with no recorded PASS, and fails open for anything the map doesn't track. MUSTER_TASK_GATE=off disables it.

Codex install architecture

Codex is a first-class runtime, but it installs differently enough to be worth spelling out. muster install codex [--scope project|user] is an npm-installer action, not a Codex one.

Managed manifests. Every artifact the installer writes is recorded in a Muster-owned manifest (.muster-managed.json, owner: "muster", format: 1) listing the exact profile files, hook runtime files, and hook groups it owns. Uninstall removes only what a manifest claims, so unrelated profiles and hook groups in the same hooks.json survive untouched. A manifest that fails validation is a hard conflict — the command refuses rather than guessing at ownership.

packageVersion as the coherence key. Each manifest records the package version that wrote it. An install scope is only "healthy" when its own recorded packageVersion matches the selected plugin's, so a scope left behind by an older Muster is reported as stale instead of passing a health check on internal self-agreement. muster doctor --codex reads the same key.

Hook-scope collapse. The user scope is canonical for hooks. If $CODEX_HOME (or ~/.codex) already carries a healthy Muster hook install, a --scope project install skips its own hook merge entirely — profiles still install. Rerunning --scope project on a machine with both scopes therefore converges to one firing scope instead of double-firing every lifecycle event.

The Codex plugin is deliberately hooks-free. Codex executes plugin-bundled hooks by default, so bundling them there would double-fire against the installed ones. Instead, the hook runtime lives under .codex/muster/ (or the user-scope equivalent) and is installed by the npm installer, which merges owned hook groups into .codex/hooks.json through Codex's supported project/user hook layer. Trust applies to the exact hook definition. Review again after an update changes hook content, because Codex skips the changed hook until it is trusted; project trust and managed-only policy can also suppress hooks. Inspect the active definitions with /hooks and confirm them with muster doctor --codex. The hooks inject orchestration context and surface diagnostics and policy warnings; todo and spawn enforcement stay advisory on Codex, and write-capable waves must use isolated Git worktrees.

Vendoring

Muster ships a curated set of built-in skills and agents, imported from upstream projects rather than hand-copied. vendor/manifest.yaml lists every source (repository, license, ref) and the items pulled from each, mapped to the Muster roles they serve. muster vendor generates the built-ins into plugin/ and writes provenance into NOTICE. See Credits for the sources.

Architecture ​

Two layers ​

Portable MCP surface ​

The capability and domain router ​

Per-role model selection ​

Provider kinds ​

Pipelines ​

Execution model ​

Session hooks ​

Codex install architecture ​

Vendoring ​