A working cheat sheet for engineers who already drive agentic CLIs (Claude Code, Aider, Cursor CLI…) and want the shortest path to being productive in OpenAI’s Codex CLI. Focused on what changes how you work — not an exhaustive flag dump.
Open-source Rust binary (openai/codex on GitHub). Sign in with your ChatGPT plan (Plus/Pro/Business/Enterprise) or an API key.
# npm
npm install -g @openai/codex
# macOS Homebrew
brew install --cask codex
cd your-repo
codex # interactive TUI in this repo
codex doctor # diagnostic report
codex update # check for updates
codex completion zsh # shell completions
Run from the repo root — the working directory is the workspace the sandbox protects, and where AGENTS.md is read from.
codex login # ChatGPT plan, or login --api-key
codex
> /init # scaffold AGENTS.md from the codebase
> /model # pick model + reasoning effort
> /permissions # tune what runs without asking
> /status # model, sandbox, token usage
Then ask a real question before asking for code: “give me an overview of this codebase”. Codex explores with its own tools — don’t paste files in; @-mention them if you must.
Billing: plan usage by default; /status shows limits. API-key auth bills per token.
Same genre, different philosophy: Codex leans on an OS-level sandbox (Seatbelt/Landlock) plus an approval policy, where Claude Code leans on per-tool permission rules. Codex defaults to more autonomy inside the workspace, less outside it.
| You know (Claude Code) | Codex equivalent | Notes |
|---|---|---|
| claude | codex | Interactive TUI in cwd. |
| claude -p "…" | codex exec "…" | Headless mode; --json for machine-readable events. |
| CLAUDE.md | AGENTS.md | Repo root + nested dirs + ~/.codex/AGENTS.md global. Same idea, open standard. |
| Shift+Tab permission modes | /permissions · --sandbox · -a | Read Only / Auto / Full Access presets; fine-grained via approval_policy + sandbox_mode in config.toml. |
| /clear | /new | Gotcha: Codex’s /clear only resets the visible UI. /new starts a fresh conversation. |
| /rewind · Esc Esc checkpoints | Esc (backtrack) · /fork | Esc steps back through the transcript to edit an earlier turn; /fork branches the thread. No file-state restore — lean on git. |
| claude -c / -r | codex resume --last / codex resume | Picker or by session ID; codex fork branches from an old one. |
| Tab thinking · “ultrathink” | model_reasoning_effort | minimal → low → medium → high → xhigh, set via /model or -c model_reasoning_effort=high. |
| ~/.claude/skills | ~/.codex/skills | Same SKILL.md format — skills are portable across both tools. |
| settings.json hooks | [[hooks.*]] in config.toml | Same lifecycle-hook concept, TOML syntax. Not yet Windows-compatible. |
| WebSearch tool | built-in (cached) · --search for live | Web search is on by default against a cached index. |
/diff and /review.~40 built-ins; these are the ones that earn their keystrokes. Type / to fuzzy-search them all.
/new — fresh conversation between unrelated tasks. The highest-leverage habit. (/clear only wipes the screen.)/compact — summarize history to free tokens; Codex also auto-compacts on long-horizon work./resume — reopen a past thread (also codex resume --last from shell)./fork — branch the conversation to explore an alternative without losing the original./side — ephemeral side conversation for a quick tangent; main thread stays clean./status — model, sandbox, approvals, token usage at a glance./init — scaffold AGENTS.md for the repo./model — switch model and reasoning effort mid-session./fast — toggle the Fast service tier (same model, faster tokens) when available./permissions — adjust what runs without asking, in-session./plan — demand an execution plan before any implementation./goal — pin a persistent objective the agent keeps optimizing for./mention / /ide — attach files / pull editor context into the next prompt./review — dedicated reviewer over uncommitted changes, a base branch, or a specific commit. Run it before every push./diff — git diff including untracked files, without leaving the TUI./approve — retry something the auto-reviewer denied./mcp — list configured MCP servers and tools./skills — inspect local skills (~/.codex/skills)./plugins — installed + discoverable plugins (skills + MCP + apps bundles)./hooks — review lifecycle hooks./apps — insert ChatGPT apps into a prompt as $app-slug./memories — toggle memory injection/generation./experimental — opt into feature flags (also codex features)./agent — switch between agent threads (multi-agent sessions)./ps — background terminals + recent output (dev servers, tails)./stop — kill all background terminal work./statusline / /title — model/branch/tokens in the footer & terminal title./theme — syntax-highlight theme; /keymap — remap any shortcut; /vim — vim composer./copy — copy last response; /raw — raw scrollback for clean terminal selection./personality — communication style; /debug-config — which config layer won; /feedback — file an issue with diagnostics.Everything is remappable via /keymap (persisted under [tui.keymap]). Defaults below.
/copy).@ — fuzzy file search (@src/auth/jwt.ts) to attach files.$app-slug — mention a connected app (via /apps).-i mock.png from the shell) for UI work./quit — exit.Two orthogonal dials: sandbox (what the OS lets Codex touch) and approval policy (when it asks you). The presets combine both.
| Preset | Behavior | Use when |
|---|---|---|
Read Only -s read-only | Browse and answer; no edits, no commands with side effects. | Code spelunking, audits, “explain this” sessions. |
| Auto (default) | Reads, edits, and runs commands freely inside the workspace; asks before network access or touching anything outside it (sandbox_mode = "workspace-write" + approval_policy = "on-request"). | The 90% case. The sandbox is the safety net. |
Full Access -s danger-full-access + -a never | No sandbox, no prompts, machine-wide. | Containers/CI only. The flag name is honest. |
/permissions in-session, or in ~/.codex/config.toml: approval_policy = "untrusted" | "on-request" | "never", sandbox_mode = "read-only" | "workspace-write" | "danger-full-access", and [sandbox_workspace_write] network_access = true if your tests need the network. Save combos as profiles (codex --profile ci).Codex is also a Unix tool: pipe data in, get JSON events out, wire it into CI. codex exec (alias codex e) is the workhorse.
codex "fix the flaky retry test" # prompt as arg
codex resume --last # continue last session
codex resume # session picker
codex fork # branch an old session
codex -m gpt-5.4 # model override
codex -c model_reasoning_effort=high # any config key inline
codex -s workspace-write -a on-request
codex --profile ci # named config layer
codex -i mock.png "build this screen"
codex --search "…" # live web search
codex sandbox -- npm test # run any cmd under the sandbox
# one-shot, non-interactive
codex exec "update the changelog for the next release"
# pipe anything in
cat error.log | codex exec "root-cause this stack trace"
# machine-readable event stream
codex exec --json "lint my commit messages"
# CI: full autonomy inside a container
codex exec -s danger-full-access -a never \
"fix the failing tests and commit"
Building something bigger? The Codex SDK (TypeScript/Python) and codex app-server expose the same agent programmatically; codex cloud manages cloud tasks from the terminal, and codex remote-control runs a daemon you can drive remotely.
Four complementary layers: AGENTS.md shapes behavior, memories carry context forward between sessions, skills package repeatable processes, MCP connects external systems.
~/.codex/AGENTS.md # you, everywhere (style, tools)
./AGENTS.md # repo root — commit it
./packages/api/AGENTS.md # nested — closest file wins
What belongs there: build/test commands, conventions the model gets wrong, repo etiquette (“never commit to main”), gotchas. What doesn’t: anything derivable from the code. It’s loaded every session — bloat costs context and attention.
It’s an open standard (agents.md) — the same file serves Codex, Claude Code, Cursor, and friends.
/memories; enable via [features] memories = true./new between tasks. Stale context degrades quality more than anything else./compact at a natural breakpoint beats an automatic one mid-thought./side and /fork keep tangents and experiments out of the main thread./status shows token usage when things feel sluggish; prune unused MCP servers — tool schemas are a silent context hog.Same field-tested patterns as any serious agent CLI: give the model a target it can verify itself against — tests, a screenshot, a plan.
For anything non-trivial: /plan (or start in -s read-only) and have Codex research the change. Read the plan, push back, approve. Implement in Auto mode, then /review against the base branch before committing.
Skipping the plan step is the #1 cause of “it confidently did the wrong thing.”
Ask for failing tests first, from the spec — explicitly not implementation. Commit them. Then: “make these pass; don’t modify the tests.” The red/green loop gives Codex an objective target it will grind against autonomously.
For UI: paste a mock (or codex -i mock.png), give it eyes — a Playwright/Chrome DevTools MCP server for screenshots — and say “iterate until it matches.” Agents with visual feedback converge in 2–3 rounds; agents without don’t.
git worktree add ../proj-auth feature/auth
cd ../proj-auth && codex
One Codex per worktree = zero collisions. In-session, /agent switches between agent threads, and gpt-5.4-mini makes cheap subagents.
Start locally, delegate the long grind: codex cloud manages Codex Cloud tasks from the terminal, and the /apps→Desktop handoff moves a CLI thread into the Codex app. @codex on a GitHub PR or issue triggers cloud work and review there.
Mechanical sweeps (lint, codemods, dependency bumps): run inside a container with -s danger-full-access -a never and let it grind. Outside a container, prefer Auto mode — workspace-write with no network is already “safe YOLO.” Checkpoint with git; there’s no /rewind, so commits are your undo.
Two plausible designs? /fork and try both, then keep the winner. Quick tangent (“what does this env var do?”) — /side so the main thread’s context stays on task. Wrong direction mid-turn — Esc, edit the earlier message, resend.
/review runs a dedicated reviewer model over your diff — uncommitted work, a branch delta, or one commit — and it doesn’t modify files, so it’s safe to run constantly. Pair with /diff for your own pass. You still own the merge.
codex exec --json in a pipeline step covers triage, changelog generation, failure analysis. Use a ci profile pinning model, sandbox, and approval policy so behavior is reproducible. GitHub-side, @codex mentions and Codex code review handle PR flow.
A skill is a folder with a SKILL.md (frontmatter + instructions, optional scripts/resources) that loads only when relevant. Plugins (first-class since v0.117, Mar 2026) bundle skills, MCP servers, and app connectors into one versioned, installable unit — skills are the authoring primitive, plugins the distribution primitive.
~/.codex/skills/deploy-checks/SKILL.md
# SKILL.md
---
name: deploy-checks
description: Pre-deploy validation for our k8s services.
Use before any production deploy.
---
1. Run ./scripts/preflight.sh
2. Verify migrations are reversible…
codex plugin # install / manage plugins
/plugins # browse discoverable plugins in-session
/skills # inspect what's loaded
The description is the trigger — write it as a “when to use” sentence. Restart Codex after installing or editing a skill so metadata reloads.
ComposioHQ/awesome-codex-skills — the curated community index; the fastest way to see what exists before writing your own.~/.codex/skills/). Why it matters: skills ship executable scripts that run with your permissions — provenance is the whole game.anthropics/skills document suite (pdf, docx, xlsx, pptx) — the SKILL.md format is a cross-tool standard, so the best Claude-ecosystem skills work in Codex too. Real Office files via scripts instead of hallucinated markup.Audit GitHub-sourced skills before installing — read the SKILL.md and every script it calls.
MCP servers give Codex tools beyond the filesystem; ChatGPT apps (mentioned as $app-slug) bring connected services straight into prompts. Web search needs neither — it’s built in.
# CLI
codex mcp add github -- npx -y @modelcontextprotocol/server-github
codex mcp list
# or config.toml — with per-tool control
[mcp_servers.playwright]
command = "npx"
args = ["-y", "@playwright/mcp@latest"]
[mcp_servers.github]
command = "npx"
args = ["-y", "@modelcontextprotocol/server-github"]
disabled_tools = ["delete_repository"]
default_tools_approval_mode = "auto"
/mcp in-session lists servers and tools. enabled_tools/disabled_tools trim context cost and risk per server.
gh via shell is often enough)./apps) — Figma, Linear, and other ChatGPT app connectors as $mentions, no MCP config needed.Every server’s tool schemas consume context. Connect what you use; prune when /status shows bloat.
One TOML file rules everything. Precedence: CLI flags → profile → project .codex/config.toml → user ~/.codex/config.toml → system /etc/codex/config.toml. /debug-config shows which layer won.
# ~/.codex/config.toml
[[hooks.PreToolUse]]
matcher = { tool_name = "shell" }
[[hooks.PreToolUse.hooks]]
command = "./scripts/block-prod-db.sh"
[[hooks.PostCompact]]
[[hooks.PostCompact.hooks]]
command = "notify-send 'History compacted'"
Deterministic shell commands at lifecycle points — guarantees, not suggestions. Classic uses: auto-format after edits, block dangerous commands, notify on long-running turns. /hooks to inspect. Caveat: not yet Windows-compatible.
model = "gpt-5.5"
model_reasoning_effort = "medium" # minimal…xhigh
approval_policy = "on-request" # untrusted|on-request|never
sandbox_mode = "workspace-write"
web_search = "cached" # disabled|cached|live
file_opener = "vscode"
notify = ["terminal-notifier", "-title", "Codex", "-message"]
[sandbox_workspace_write]
network_access = false
[features]
memories = true
[tui]
notifications = true
theme = "github-light"
Profiles: ~/.codex/ci.config.toml overlays the base when you pass --profile ci — pin a model/sandbox/approval combo per context.
| Model | ID | Use for |
|---|---|---|
| GPT-5.5 | gpt-5.5 | Current flagship and recommended default — complex coding, computer use, research workflows. |
| GPT-5.4 | gpt-5.4 | Strong agentic workhorse; slightly cheaper/faster than 5.5. |
| GPT-5.4 mini | gpt-5.4-mini | Fast + cheap: responsive edits, subagents, headless pipelines. |
| Codex Spark | gpt-5.3-codex-spark | Near-instant real-time iteration (research preview, Pro plans). |
minimal → low → medium → high → xhigh, set per-session via /model or per-run with -c model_reasoning_effort=high. More effort = slower + costlier — keep it at medium for routine edits, crank it for architecture. /fast toggles the Fast service tier where available. The older gpt-5.2-codex / gpt-5.3-codex models are deprecated for ChatGPT sign-in — don’t pin them in configs./plan for anything non-trivial. Cheap insurance against confident nonsense./new between tasks — and remember /clear doesn’t do what you think.network_access, writable_roots, profiles. Approval fatigue means your config is wrong.xhigh for architecture, minimal/mini/Spark for chores./review like it’s a teammate’s PR — then read the diff yourself. You still own the merge.