Over the previous 49 lessons we explored every subsystem in isolation: the boot sequence, query engine, tool system, skills, agents, permissions, MCP, state management, and more. This capstone lesson stitches them all together. By the end you will have a single mental model that explains exactly what happens — at the code level — from the moment you press Enter on a prompt to the moment the response finishes rendering.
main.tsx → setup.ts → QueryEngine.ts →
query.ts → tools.ts → Tool.ts →
bootstrap/state.ts → state/AppStateStore.ts →
replLauncher.tsx → screens/REPL.tsx →
services/api/ → services/mcp/
Claude Code is a TypeScript application built on Bun, React/Ink (terminal UI), and the Anthropic API. Its architecture has six clearly-separated layers that hand off responsibility in sequence:
main.tsx, setup.ts, entrypoints/init.ts — process startup, settings, migrations, session wiring
replLauncher.tsx, screens/REPL.tsx, components/App.tsx — Ink-rendered interactive terminal interface
state/AppStateStore.ts, bootstrap/state.ts, state/store.ts — immutable AppState + global singleton state
QueryEngine.ts, query.ts — conversation lifecycle, system prompt assembly, API streaming loop
tools.ts, Tool.ts, tools/*/ — capability registry: Bash, file I/O, agents, search, MCP, skills
services/api/, services/mcp/, services/compact/ — Anthropic API client, MCP connections, compaction
This ASCII diagram maps every major component and its relationship to the others. Read it top-down: the user's keystroke flows downward through each layer until it reaches the API, then the response bubbles back up through tools and the UI.
Now let's trace a single user message — say "refactor this function" —
through the entire stack. Every numbered step maps to real code in the source tree.
attach images, memory files
build UserMessage PUI-->>QE: { messages, shouldQuery, allowedTools } QE->>QE: fetchSystemPromptParts()
assemble system + user + system context QE->>SS: recordTranscript(messages) [BEFORE API call] QE->>Q: query({ messages, systemPrompt, canUseTool, ... }) Q->>API: POST /v1/messages (streaming SSE) API-->>Q: message_start API-->>Q: content_block_delta (text streaming) Q-->>REPL: yield AssistantMessage (partial) REPL->>U: renders streaming text API-->>Q: content_block (tool_use: Bash) Q->>Q: canUseTool() permission check Q->>T: BashTool.call({ command: "..." }) T-->>Q: yield BashProgress (live output) Q-->>REPL: yield ProgressMessage REPL->>U: renders tool progress T-->>Q: ToolResult { output: "..." } Q->>API: POST /v1/messages (tool_result appended) API-->>Q: next assistant turn API-->>Q: stop_reason: end_turn Q-->>QE: Terminal result QE->>SS: recordTranscript (final) QE-->>REPL: yield SDKResultMessage REPL->>U: final render complete
QueryEngine.ts:450 for the comment explaining this design decision.
The boot sequence is carefully orchestrated to minimize time-to-first-render. Three categories of work run in parallel as early as possible:
Background I/O
startMdmRawRead() — MDM policy plutil subprocesses.
startKeychainPrefetch() — macOS keychain reads.
Both fire before the 135ms module eval completes.
Network Warm-up
preconnectAnthropicApi() establishes a TCP connection to the API endpoint before the user types anything, so the first API request doesn't pay TCP handshake cost.
Deferred Prefetches
startDeferredPrefetches() — user/git context, tips, model capabilities, file count, change detectors. Runs after paint so it doesn't block the prompt.
Session Wiring
captureHooksConfigSnapshot() must run after setCwd() but before any query. The hooks config is read once and frozen so mid-session file modifications can't inject new hooks.
Plugin Cache
getCommands() and loadPluginHooks() are prefetched as background tasks. They populate caches consumed at first query time without blocking the render path.
Config Upgrades
runMigrations() checks migrationVersion against CURRENT_MIGRATION_VERSION=11 and runs only the needed model string / settings schema migrations.
Why does main.tsx import everything statically but still feel fast?
The ~135ms module eval cost of all static imports is overlapped with the
startMdmRawRead() and startKeychainPrefetch() subprocess
calls that fire at the very top of the file before any imports complete.
By the time JavaScript finishes evaluating the module graph, both subprocess
calls have already been dispatched to the OS.
Heavy modules like OpenTelemetry (~400KB) and gRPC (~700KB) are lazy-loaded
via dynamic import() inside init() only when telemetry
is actually needed — they never touch the critical path.
React and Ink are also lazy: launchRepl() in replLauncher.tsx
only import()s App.tsx and REPL.tsx at call
time. In headless mode (-p), these are never loaded at all.
QueryEngine (introduced in a later refactor) extracts what was the
monolithic ask() function into a class that owns the full conversation
lifecycle. One instance lives per conversation session.
State the engine owns
class QueryEngine {
private config: QueryEngineConfig // tools, commands, mcpClients, model, ...
private mutableMessages: Message[] // full conversation history (grows each turn)
private abortController: AbortController // shared with all tools in this session
private permissionDenials: SDKPermissionDenial[] // accumulated for SDK result
private totalUsage: NonNullableUsage // token counts across all turns
private readFileState: FileStateCache // snapshot of files read this session
private discoveredSkillNames: Set<string> // skills seen in this turn (for telemetry)
private loadedNestedMemoryPaths: Set<string> // CLAUDE.md files already injected
}
submitMessage() — the turn lifecycle
Each call to submitMessage() is an async generator yielding
SDKMessage events. The sequence every turn:
| Step | Code | Purpose |
|---|---|---|
| 1. Process slash commands | processUserInput() | Handle /commands, build UserMessage array, determine shouldQuery |
| 2. Persist user message | recordTranscript(messages) | Write to disk BEFORE API so kill-mid-request is resumable |
| 3. Assemble system prompt | fetchSystemPromptParts() | Combine default + custom + memory + coordinator context |
| 4. Load skills + plugins | getSlashCommandToolSkills() | Cache-only load for headless; full refresh for interactive |
| 5. Yield system init | buildSystemInitMessage() | SDK callers receive the list of tools, commands, agents |
| 6. Enter query loop | query() | Streaming API call + tool execution until end_turn |
| 7. Yield result | SDKResultMessage | Final cost, usage, permission_denials, stop_reason |
The query() loop — iteration anatomy
// query.ts — simplified loop skeleton
async function* queryLoop(params) {
let state = { messages, turnCount: 1, autoCompactTracking, ... }
const config = buildQueryConfig() // snapshot env/statsig state
while (true) {
// 1. Optionally start skill/job prefetch (async, consumes settled results only)
// 2. Send streaming API request via deps.sendRequest()
for await (const event of streamEvents) {
yield event // passes text deltas directly to REPL
if (event.type === 'tool_use') collectToolUse(event)
}
// 3. Check stop reason
if (stopReason === 'end_turn') return 'success'
// 4. Execute tools (StreamingToolExecutor — parallel where possible)
for await (const result of runTools(toolUseBlocks, canUseTool, context)) {
yield result
}
// 5. Token budget / compact checks → may compact and continue
// 6. Append tool_results to messages, increment turnCount, loop
}
}
Every capability Claude can invoke is a Tool. The tool system is
intentionally flat — there is no tool hierarchy, just a registry function
getAllBaseTools() in tools.ts that returns the authoritative list.
Tool interface (Tool.ts)
// Simplified Tool interface
interface Tool {
name: string // must be stable (used in Statsig cache key)
description: string // injected into system prompt
inputSchema: ZodSchema // validation before call()
isEnabled(): boolean // feature-gate / env check
call(input, context: ToolUseContext): // async generator
AsyncGenerator<ToolProgressData, ToolResult>
renderToolResult(result, context): React.ReactNode // Ink UI rendering
}
ToolUseContext — the tool's window into the world
Every tool call receives a ToolUseContext that bundles together everything
the tool might need without coupling it to global state:
| Property | Type | Purpose |
|---|---|---|
messages | Message[] | Full conversation history |
mainLoopModel | ModelSetting | Current model for sub-agent spawning |
tools | Tools | Available tool set (for AgentTool to pass down) |
mcpClients | MCPServerConnection[] | Active MCP connections |
agentDefinitions | AgentDefinitionsResult | Custom agent configs |
abortController | AbortController | Shared abort signal (Ctrl-C propagation) |
readFileState | FileStateCache | Snapshot of files read (for diff/undo) |
setAppState | Setter<AppState> | Tools can mutate UI state (e.g. TodoWriteTool) |
handleElicitation | ElicitFn | MCP URL elicitation (OAuth flows) |
Feature-gated tools
Many tools are conditionally included based on feature() flags
(Bun bundle-time dead code elimination) or environment variables. This keeps
the tool list deterministic for Anthropic's prompt cache key:
// tools.ts — feature gate pattern
const SleepTool =
feature('PROACTIVE') || feature('KAIROS')
? require('./tools/SleepTool/SleepTool.js').SleepTool
: null
// getAllBaseTools() filters nulls from the array
// NOTE: this list is synced to Statsig console for prompt cache invalidation
Claude Code has a two-layer state model. Understanding which layer to use for what is essential to understanding the codebase.
bootstrap/state.ts
Process-lifetime constants: sessionId, cwd, projectRoot, model, auth token, telemetry meter, hook registry. Explicitly NOT a React store. Comments in the file warn: "DO NOT ADD MORE STATE HERE".
state/AppStateStore.ts
DeepImmutable<AppState> — everything the UI needs: messages, mcpClients, permission context, speculation state, settings, task list, agent definitions, file history. Updated immutably via setAppState(prev => ...).
The key design principle: bootstrap/state.ts is a module-level singleton (plain JS object) while AppState is React context. This separation means the query engine and tools can access session identity without importing React, while the UI can re-render reactively on any AppState change.
// bootstrap/state.ts — the singleton shape (partial)
type State = {
originalCwd: string
projectRoot: string
totalCostUSD: number
totalAPIDuration: number
cwd: string
modelUsage: { [modelName: string]: ModelUsage }
mainLoopModelOverride: ModelSetting | undefined
isInteractive: boolean
sessionId: SessionId
sdkBetas: BetaMessageStreamParams['betas']
hookRegistry: RegisteredHookMatcher[]
meter: Meter | undefined
tokenBudgetInfo: { remainingTokens: number; ... }
// ... ~40 more fields, all process-lifetime
}
// state/AppStateStore.ts — the React state shape (partial)
type AppState = DeepImmutable<{
settings: SettingsJson
mainLoopModel: ModelSetting
toolPermissionContext: ToolPermissionContext
messages: Message[]
mcpClients: MCPServerConnection[]
agentDefinitions: AgentDefinitionsResult
speculation: SpeculationState
fileHistory: FileHistoryState
plugins: LoadedPlugin[]
tasks: TaskState | null
// ... ~50 more fields
}>
A "session" in Claude Code is a persistent conversation with a unique UUID.
Sessions are stored as JSONL transcript files under
~/.claude/projects/<cwd-hash>/<session-id>.jsonl.
Session lifecycle
// Startup: generate or restore session ID
getSessionId() // reads bootstrap/state.ts
registerSession() // registers in concurrent sessions tracking
countConcurrentSessions() // used for display in status bar
// During conversation:
recordTranscript(messages) // enqueues write (lazy 100ms JSONL flush)
flushSessionStorage() // forced flush (EAGER_FLUSH env / cowork)
cacheSessionTitle() // first user message → title for resume UI
// Resume path (--continue / --resume):
loadTranscriptFromFile() // reads JSONL back into Message[]
processResumedConversation() // validates + replays into initial messages
recordTranscript()
call is fire-and-forget for assistant messages but awaited for user messages.
This is intentional — the comment in QueryEngine.ts:727 explains that
awaiting assistant writes would block the streaming generator, preventing
message_delta events from processing.
MCP (Model Context Protocol) servers connect as MCPServerConnection objects
held inside AppState.mcpClients. They are initialized before the first
query and passed into every ToolUseContext.
MCP tools are not in getAllBaseTools() — they are dynamically
added alongside the base tools at session startup via
getMcpToolsCommandsAndResources(). This is why MCP tool names can
conflict with base tool names: the deduplication happens at load time.
Every tool call passes through canUseTool() before execution.
This single function is the architectural choke point for all permission decisions.
// The permission gate — called by query.ts before runTools()
const wrappedCanUseTool: CanUseToolFn = async (
tool, input, toolUseContext, assistantMessage, toolUseID, forceDecision
) => {
const result = await canUseTool(tool, input, toolUseContext, ...)
if (result.behavior !== 'allow') {
// Track for SDK result reporting
this.permissionDenials.push({ tool_name, tool_use_id, tool_input })
}
return result
}
| Permission Mode | Behavior | Configured via |
|---|---|---|
default | Ask user for any tool not in allow-list | Normal CLI startup |
auto | Automatically allow safe tools, block dangerous | --permission-mode auto |
bypass | Allow all tools without asking | --dangerously-skip-permissions |
| alwaysAllow rules | Per-tool allow-list (from settings + session) | User accepts during session |
When the conversation grows large enough to threaten the model's context window, Claude Code triggers automatic compaction. This is transparent to the user.
Token threshold
calculateTokenWarningState() compares current context token count against the model's context window. At ~80% fill, auto-compact triggers.
buildPostCompactMessages()
Sends the conversation to Claude with a summarization prompt. Returns a single compact summary message plus any preserved recent messages.
Snip compaction
Feature-gated alternative: snipCompact.ts yields a compact_boundary system message. The SDK path truncates in-memory; the REPL preserves full scrollback and projects on demand.
500k continuation
checkTokenBudget() handles the case where a single API response exceeds max_output_tokens. It auto-continues with "Please continue" until the response is complete.
Hooks let users inject shell commands or callbacks at specific lifecycle points.
They are configured in settings.json and captured once at startup
(immutable snapshot pattern).
| Hook type | Fires when | Can block? |
|---|---|---|
PreToolUse | Before any tool executes | Yes — can deny the tool |
PostToolUse | After any tool completes | No |
PreCompact | Before context compaction | No |
PostCompact | After compaction finishes | No |
Stop | When Claude outputs stop_reason=end_turn | Yes — can continue |
Notification | Any assistant notification event | No |
FileChanged | Watched file modified on disk | No |
SessionStart | Before first query in new session | Yes — delays first query |
captureHooksConfigSnapshot() must run after setCwd()
and before any query. Once snapshotted, the hooks config is frozen for the
session. This prevents a malicious project from modifying settings.json
mid-session to inject hook commands that execute with the current permissions.
The codebase has two distinct runtime paths that share QueryEngine but
differ significantly in their UI and startup behavior:
| Aspect | Interactive (default) | Headless (-p / --print) |
|---|---|---|
| UI | Ink/React terminal rendering | stdout text output only |
| Trust dialog | Shown on first launch | Skipped (implicit trust) |
| Session transcript | Awaited before API call | Fire-and-forget |
| React imports | Fully loaded | Never imported |
| Plugin prefetch | Background during setup | Skipped (isBareMode()) |
| Deferred prefetches | Run after first render | Skipped entirely |
| QueryEngine path | REPL → ask() | print.ts → QueryEngine.submitMessage() |
| Entrypoint label | cli | sdk-cli |
isBareMode() returns true when --print/-p is active.
The codebase uses this flag extensively to skip all interactive-only work.
This is also the flag SDK callers rely on to get predictable, low-latency execution.
The AgentTool enables recursive execution: Claude can spawn sub-agents,
each with their own QueryEngine instance and a restricted tool set.
In swarm mode (ENABLE_AGENT_SWARMS=true), agents communicate via
the Unix Domain Socket (UDS) messaging server started in setup.ts.
Each agent registers with TeamCreateTool and can send messages back
to the coordinator via SendMessageTool.
Everything together — the complete timeline from cold start to streaming response:
1. Async generator threading
The entire data flow from API to UI is a chain of async generators.
query() yields StreamEvents, QueryEngine.submitMessage()
yields SDKMessages, and the REPL consumes them.
This enables true streaming without callbacks or event buses.
2. Dead code elimination via feature()
Bun's bundle-time feature('FLAG_NAME') completely removes disabled
feature branches from the compiled binary. This means the tool list is deterministic
per build (important for Anthropic's prompt cache key), and disabled features add
zero runtime overhead.
3. Cache-warming for latency
Critical paths (system prompt, tools, commands, model capabilities) are all pre-warmed in parallel during setup/startup. By the time the user submits their first prompt, nearly all expensive I/O has already completed. The pattern: fire async work, discard the promise, and memo/cache the result.
4. Immutable AppState + mutable bootstrap/state
React state is immutable (DeepImmutable) to prevent accidental mutation
and enable React's change detection. But session-level constants (cwd, sessionId,
model) live in a plain module singleton that is intentionally not React state —
these values are accessed by non-React code deep inside the query engine.
5. The isBareMode() fast path
Every expensive startup operation is guarded by if (!isBareMode()).
This single flag (true when running headless) skips React, Ink, UDS messaging,
plugin prefetch, deferred prefetches, and all interactive-only setup. Headless
execution becomes nearly pure compute.
6. Parallel subprocess investment
Instead of sequential I/O, the codebase fires subprocesses and async operations
as early as possible and lets them run in parallel with JavaScript execution.
startMdmRawRead() and startKeychainPrefetch() both fire
before the 135ms module graph finishes evaluating. By the time the code that
consumes their results runs, they're usually already done.
Capstone Takeaways
- Boot is parallel by design. MDM reads, keychain reads, TCP warm-up, and command prefetching all fire before they're needed to eliminate sequential I/O cost.
- QueryEngine is the conversation owner. One instance per conversation. It holds message history, token usage, file cache, and abort controller across all turns.
- The query loop is a pure async generator. Every message — text delta, tool progress, tool result — flows through
yieldfrom API to UI. No callbacks, no event buses. - Tools are a flat registry.
getAllBaseTools()intools.tsis the single source of truth. The list is stable per build for prompt cache purposes. - Two state layers serve different masters.
bootstrap/state.ts(singleton) for the query engine;AppStateStore.ts(React) for the UI. - Permissions are a single choke point.
canUseTool()is called before every tool execution. All three permission modes (default, auto, bypass) flow through it. - The transcript is written before the API call. This ensures sessions are resumable even if the process dies mid-request.
- Headless mode is architecturally distinct.
isBareMode()strips out React, Ink, UDS, plugins, and all deferred work. SDK callers get near-zero overhead. - Feature gates are bundle-time, not runtime.
feature('FLAG')is dead-code eliminated by Bun at build time. Disabled features genuinely do not exist in the binary. - MCP servers are first-class peers. Their tools, commands, and resources integrate into the same registries as built-in tools and are passed through the same
ToolUseContext.
Capstone Quiz
A)
startKeychainPrefetch() B) Module imports complete C) Commander.parse()startKeychainPrefetch() is a top-level side effect that fires before the module graph finishes evaluating (~135ms). Commander.parse() runs well after all imports complete.startKeychainPrefetch() is called at the top of main.tsx before imports complete — this parallelism is intentional to overlap I/O with module eval time.recordTranscript(messages) called BEFORE the API request in QueryEngine.submitMessage()?bootstrap/state.ts and state/AppStateStore.ts?getAllBaseTools() in tools.ts. True or false?captureHooksConfigSnapshot() run after setCwd() but before any query?Course Complete
You've completed all 50 lessons of the Claude Code source code course. You now have a complete mental model of how Claude Code works — from the first keystroke to the final rendered token. This knowledge is the foundation for contributing to, extending, or simply deeply understanding one of the most sophisticated AI coding tools ever built.