The Agentic CLI, Today: Claude, Codex & Gemini Compared

This is Part 1 of a two-part series. Audience: architects, security engineers, and platform leads designing or evaluating agentic coding tools. This brief catalogues how three production tools — Claude CLI, OpenAI Codex, Gemini CLI — actually build their agentic surface; Part 2 distils what an ideal design would look like from the patterns that recur.

All three CLIs converge on a similar shape: a REPL with tools, an MCP plugin layer, hierarchical project memory, and a sandbox. Look one layer deeper and the engineering priorities diverge sharply. The point of this brief is not to crown a winner; each tool optimises for a defensible bet given its parent organisation’s constraints. The point is to make those bets legible, so an architect can borrow from each with eyes open.

Production agentic CLIs

Claude CLI, OpenAI Codex, Gemini CLI

~5,800

Source files across the three

Across TypeScript and Rust

Architectural dimensions compared

From transport to governance

~50

Novel mechanisms identified

Non-obvious bets unique to one tool

The shape of the comparison — three tools, three stacks, eighteen dimensions.

How to read this brief

Each dimension is treated as a three-card comparison — what Claude does, what Codex does, what Gemini does — with a few words on each tool’s distinctive mechanism (marked NOVEL where appropriate). Source-file paths are included so the curious reader can verify. The brief deliberately understates novelty when a tool has merely adopted a third-party SDK; “NOVEL” means in-tree engineering investment, not “first to ship.”

Two acts. This brief is the descriptive half — what each tool is, today. Part 2 of the series is the synthesis — what an ideal design looks like, distilled from the patterns that recur.

TL;DR — three personalities

Tool	One-line characterisation
Claude CLI	The cache-aware, careful one. Heavy investment in prompt-cache economics (single-marker policy, `cache_reference` for tool results, on-disk cache-break diffs) and safety classifiers (LLM-based YOLO and Bash classifiers gating dangerous operations). Aggressive enterprise plumbing: remote-managed settings, policy-limits, an upstream-proxy WebSocket relay with MITM CA injection inside managed-session containers, compile-time PII guards on telemetry.
OpenAI Codex	The systems-engineered one. Rust + multi-OS sandboxing (Seatbelt, bubblewrap+seccomp, Windows restricted-token, a `process-hardening` ctor that disables core dumps before `main`). An app-server daemon cleanly separates engine from UI; the TUI is a JSON-RPC client that can run remotely. Guardian is a second LLM session that auto-approves on-request tool calls fail-closed. Three-tier persistence (JSONL rollouts + SQLite index + rollout-trace reducer).
Gemini CLI	The platform-integrated one. The most protocol-coupled: A2A (Agent-to-Agent) server with GCS persistence, ACP stdio adapter so external clients (e.g. Zed) drive Gemini as a sub-process, MCP, plus a `vscode-ide-companion`. Routing uses a local on-device Gemma classifier (LiteRT binary) to pick model tier; context management is a graph pipeline of processors, not a single compactor. CCPA remote admin pushes MCP servers, required-servers, and extension toggles at runtime.

Codebase shapes

	Claude CLI	OpenAI Codex	Gemini CLI
Language	TypeScript · React/Ink	Rust workspace · Ratatui (+ TS shell)	TypeScript · npm monorepo
Source files	~1,900	~1,865 `.rs`	~1,977
Tree size	34 MB	51 MB	108 MB
Layout	Flat root: `QueryEngine.ts`, `Tool.ts`, `query.ts`, `cost-tracker.ts`, `tools/` (40+ first-party tools), `bridge/`, `buddy/`, `coordinator/`, `memdir/`, `remote/`, `plugins/`, `skills/`, `upstreamproxy/`	`codex-rs/core/` (engine), `codex-rs/tui/` (Ratatui), sandbox family (`sandboxing/`, `linux-sandbox/`, `bwrap/`, `windows-sandbox-rs/`, `execpolicy/`, `process-hardening/`), persistence (`rollout`, `state`, `agent-graph-store`), governance (`cloud-tasks`, `cloud-requirements`, `responses-api-proxy`)	`packages/cli` (Ink front-end), `packages/core` (engine: `agent/`, `scheduler/`, `confirmation-bus/`, `context/`, `policy/`, `routing/`, `safety/`, `sandbox/`), `packages/a2a-server`, `packages/vscode-ide-companion`, `packages/sdk` + `devtools` + `test-utils`; dedicated `evals/`, `perf-tests/`, `memory-tests/` harnesses

The shape itself tells the story. Claude has the flattest tree and the most ad-hoc UI experiments (voice STT, animated “buddy” companion sprite, vim mode, perfetto-style profiling). Codex is the cleanest workspace separation — each piece of the architecture is its own crate. Gemini is the most ops-aware monorepo, with separate harnesses for evals, perf, and memory.

The 18-dimension matrix

The differentiator per cell — what’s distinctive, not what’s the same. NOVEL marks novel/unique mechanisms; otherwise cells describe approach in 6–12 words.

A · Model interaction

Dimension	Claude CLI	Codex	Gemini CLI
Transport & streaming	NOVEL Chunked heartbeat yields under unattended-retry; stall detector + fallback to non-streaming	NOVEL Responses-over-WebSocket with prewarm; sticky session reconnect	NOVEL Mid-stream retry with UI-visible RETRY event; SSL `BAD_RECORD_MAC` in retry allow-list
Prompt economics & caching	NOVEL Single-marker `cache_control`, `cache_reference` on tool results, on-disk cache-break diffs	Server-side prompt cache keyed by thread id; trim-from-head to preserve prefix	Implicit context caching only; no explicit cache wiring
Context-window management	NOVEL Three-tier: auto-compact + microcompact + session-memory; reactive on `prompt_too_long`	NOVEL Remote (server-side) compaction supported per provider	NOVEL Directed-graph pipeline with processors and profiles
Prompt assembly	Hierarchical `CLAUDE.md` + auto-`memdir/MEMORY.md`; sectioned cache-scoped blocks	NOVEL Root-to-cwd concat of all `AGENTS.md`; `AGENTS.override.md` for local override	4-tier `GEMINI.md` (global → user-project → extension → project); inode dedup
Model routing	Provider strategy (firstParty / Bedrock / Vertex / Foundry) + small-fast/main split	Data-driven `ModelProviderInfo`; built-in Ollama & LM Studio crates; Responses API only	NOVEL Composite strategy chain incl. local Gemma classifier (LiteRT)

B · Execution & safety

Dimension	Claude CLI	Codex	Gemini CLI
Tool dispatch & concurrency	Per-tool `isConcurrencySafe`; parallel groups; shell-task tracking per agent	NOVEL RwLock parallelism: read-lock = parallel-safe, write-lock = exclusive	Stateful scheduler; `wait_for_previous` flag batches parallelizable calls
Trust & permission	NOVEL LLM-based YOLO/bash classifiers gating each call; killswitch on bypass mode	NOVEL Guardian LLM auto-approver (fail-closed); 5-category `Granular` approval config	NOVEL Per-prompt LLM-generated `Conseca` security policy; subagent-sanitizing message bus
Sandboxing & isolation	External `@anthropic-ai/sandbox-runtime` + CCR upstream-proxy WS relay with MITM CA	NOVEL Seatbelt + bwrap+seccomp + Windows restricted-token + process-hardening ctor	NOVEL Six backends: docker / podman / sandbox-exec / runsc / lxc / windows-native
Sub-agent isolation	NOVEL Built-in adversarial verification agent; coordinator + tmux swarm	Full child `Session`; dedicated `agent-graph-store` + `agent-identity` JWT	NOVEL Full A2A protocol (Agent-to-Agent); GCS-backed task store
Error handling	Typed errors; query-source-aware retry; reactive compaction on prompt-too-long	Split `request_max_retries` vs `stream_max_retries`; explicit `PendingUnauthorizedRetry`	Per-model terminal/sticky-retry tracking; `onPersistent429` validation flow

C · State & extensibility

Dimension	Claude CLI	Codex	Gemini CLI
State persistence	JSONL transcripts; `conversationRecovery` with orphan filters; sidechain transcripts for sub-agents	NOVEL JSONL rollouts + SQLite state-db + `rollout-trace` reducer (three-tier)	NOVEL Shadow-Git checkpoint repo per session for tool-call rollback
Plugins / MCP / extensions	In-process + SDK + VS Code MCP transports; `.mcpb` zipped bundles; hooks event-bus	Layered: `plugin` / `core-plugins` / `skills` / `connectors`; `external-agent-migration` imports Claude config	Integrity-checked extensions; `requiredMcpServers` from admin policy
IDE integration	Lockfile-based detection across 13+ JetBrains products; VS Code SDK transport; diff push	App-server JSON-RPC bridge; `SessionSource::VSCode` first-class	NOVEL MCP companion + ACP stdio adapter (driven-from-outside)
Terminal UI	Ink (React) + voice STT, vim mode, buddy companion sprite, FPS tracker	Ratatui; TUI is app-server client (in-process or remote)	Ink + Kitty-keyboard protocol + dedicated screen-reader layout

D · Ops & governance

Dimension	Claude CLI	Codex	Gemini CLI
Usage & cost tracking	Full per-session/per-tool/per-model accounting; persisted cost; 5-hour Pro/Max windows	Token usage with `cached_input_tokens` + `BASELINE_TOKENS` floor; server-side billing	NOVEL “Google One AI credits” overage currency; in-band consumption per stream chunk
Telemetry & privacy	NOVEL Compile-time PII guards (typed-`never` casts); Datadog + 1P BigQuery; 3-tier privacy levels	Separate OTel + analytics pipelines; W3C trace context; SQLite log-db	OTel SDK + Google Cloud exporters + Clearcut (Google-internal logging)
Auth & identity	OAuth+PKCE; API key; Bedrock/Vertex creds; CCR JWT bridge; OAuth-revoked recovery	NOVEL PKCE + device-code login; `AgentIdentity` JWKS-verified JWT for M2M	Six auth types incl. Code Assist; loopback OAuth2; ADC; gateway URL override
Remote admin	NOVEL Remote-managed settings + policy-limits; CCR session viewer; upstream-proxy WS	NOVEL App-server daemon + cloud-tasks + cloud-requirements + responses-api-proxy	NOVEL CCPA admin push (MCP servers, required-MCP, extensions); IPC-forwarded across sandbox

Selected deep dives

The full matrix is dense by design — it’s a reference, not a narrative. Below are the four dimensions where the divergence between tools is most architecturally instructive.

Prompt economics & caching — three philosophies

Claude is the most invested. Exactly one cache_control: ephemeral marker per request on the last message; for fire-and-forget forks (skipCacheWrite) it’s shifted one back so the fork does a server-side no-op merge. System prompts are split into named, cache-scoped text blocks (buildSystemPromptBlocks). cache_reference is injected on tool-result blocks by tool_use_id, letting the server re-use already-cached tool outputs. A cache-break detector hashes system/tool schemas/betas/extra-body every call, diffs them, and dumps a cache-break-*.diff artifact when something invalidates the cache.

Codex leans on OpenAI’s server-side cache. Every Responses API request carries prompt_cache_key = thread_id. The CLI deliberately keeps the cached prefix stable: when the context window overflows mid-turn it remove_first_item() rather than the last — “preserve cache (prefix-based) and keep recent messages intact” (compact.rs:223–231).

Gemini has no explicit Gemini context-cache wiring in source. Caching is delegated to Gemini’s implicit context cache via stable history shape.

Trust & permission — three central novel ideas

All three layer rule-based gates with LLM-based judgment, but each has its own central novel mechanism.

Claude has the YOLO classifier: in auto mode, every tool call is run through an LLM-based classifier (YOLO_CLASSIFIER_TOOL_NAME = 'classify_result') that takes a stripped transcript — only user text plus assistant tool_use blocks, never assistant text, preventing the model from influencing its own classifier — and returns {thinking, shouldBlock, reason}. Bypass-permissions has a remotely-disablable killswitch.

Codex has Guardian: a second LLM session (codex-auto-review) that auto-grants or denies on-request approvals using a structured-output contract — fail-closed on timeout or malformed output, capped at MAX_CONSECUTIVE_GUARDIAN_DENIALS_PER_TURN = 3. The five-mode AskForApproval enum includes a uniquely fine-grained Granular(GranularApprovalConfig) variant with separate booleans for sandbox approval, exec-policy rules, skill approval, MCP elicitations.

Gemini has Conseca plus a sub-agent-sanitising MessageBus. Conseca is a per-prompt LLM-generated security policy enforced in-process. The MessageBus’s derive(subagentName) produces an untrusted child bus that scrubs forcedDecision, metadata, and rewrites subagent identity — so a sub-agent cannot impersonate its parent’s policy.

Sandboxing — the most divergent dimension

Claude delegates to an external package — @anthropic-ai/sandbox-runtime — and adds a CCR upstream-proxy: inside a managed-session container, the CLI reads a session token, sets PR_SET_DUMPABLE=0 to block ptrace of the heap, downloads a MITM CA cert, and starts a CONNECT-over-WebSocket relay wrapping bytes in UpstreamProxyChunk protobufs so every subprocess (curl, gh, python) goes through the org’s egress proxy.

Codex ships three per-platform sandbox backends picked by get_platform_sandbox: MacosSeatbelt, LinuxSeccomp (bubblewrap+seccomp via the codex-linux-sandbox helper binary), and WindowsRestrictedToken. macOS uses /usr/bin/sandbox-exec hardcoded against PATH injection with .sbpl policies modeled on Chromium’s renderer sandbox. A separate codex-process-hardening crate runs pre_main_hardening() via #[ctor] — disables core dumps, ptrace, strips LD_PRELOAD / DYLD_* before main executes.

Gemini ships six backends: docker, podman, sandbox-exec, runsc (Docker + gVisor), lxc, windows-native. The CLI re-execs itself inside the chosen sandbox; supports a project-level .gemini/sandbox.Dockerfile. Separate SandboxPolicyManager for shell-command-level filtering inside the sandbox.

State persistence — and the closest thing to event sourcing in the comparison

Claude persists session transcripts append-only at ~/.claude/projects/<projectHash>/<sessionId>.jsonl. On resume, conversationRecovery.ts rebuilds the chain with orphan-filtering and copy-forwards fileHistory plus plan snapshots. Compact boundaries are first-class persisted message types (SystemCompactBoundaryMessage, MicrocompactBoundaryMessage, RequestStartEvent, TombstoneMessage, ToolUseSummaryMessage) — so the log can replay non-trivial state including microcompact deletions. Sub-agent transcripts are stored as sidechains.

Codex does the cleanest job. Three-tier persistence: JSONL rollouts at ~/.codex/sessions/rollout-<rfc3339>-<uuid>.jsonl, a SQLite state DB (state_db.rs, sqlite_metrics.rs) for metadata indexes, and a separate rollout-trace crate for replay and reduction of conversations. RolloutItem discriminates ResponseItem / EventMsg / Compacted / TurnContext / SessionMeta. Two persistence policies (Limited vs Extended) tunable per session. The TUI directly resumes from any rollout via find_thread_path_by_id_str.

Gemini persists sessions to <projectTempDir>/chats/session-<timestamp>-<id>.jsonl. The unique mechanism is a separate shadow-Git checkpoint repo (author “Gemini CLI”) created per session for tool-call rollback — file-state snapshots captured as Git commits, allowing undo by checkout.

Where each one teaches the others

A short list of patterns worth borrowing, by tool of origin:

From Claude: compile-time PII type-guards in telemetry (AnalyticsMetadata_I_VERIFIED_THIS_IS_NOT_CODE_OR_FILEPATHS cast through typed-never), cache_reference by content identity, the adversarial verification sub-agent pattern.
From Codex: the RwLock concurrency model (read-shared, write-exclusive), the three-tier persistence design, the JWKS-verified AgentIdentity JWT for M2M sub-agents, the process-hardening ctor.
From Gemini: the MessageBus derive() that strips parent-only fields when forking to a sub-agent (genuinely capability-shaped), the directed-graph context pipeline with explicit invariant checks, the shadow-Git for filesystem rollback, ACP as a standard external-driving protocol.

What the comparison doesn’t answer

The descriptive half stops at “what is.” It doesn’t tell you which patterns to adopt or what an ideal design would look like if you started fresh tomorrow. That’s the synthesis — and the subject of Part 2 of this series.

Some recurring questions the data raises and Part 2 takes up: Should cache be a content-addressed Merkle DAG? Are LLM-based safety classifiers load-bearing or hedge? Is one microVM per tool call practical? What does a portable agent ↔ IDE protocol look like? Why does an event-sourced session log keep emerging as the right shape?

The brief catalogue you’ve just read is the raw material; the next one is the synthesis.