OpenAI Codex Agent Loop How AI Coding Agents Work

How OpenAI’s Codex CLI Turns Your Terminal Into an AI Coding Agent

Just imagine opening your terminal and typing a simple command like “Optimize this Python script for speed.” Within minutes, an AI agent analyzes your code, runs benchmarks, refactors functions, commits the changes to Git, and even opens a pull request all securely on your local machine.
No copy-pasting between IDEs, no context switching, just pure productivity. This isn’t science fiction. It’s OpenAI’s Codex CLI in action, revealed in their groundbreaking technical blog post, Unrolling the Codex Agent Loop.

Penned by Michael Bolin, OpenAI’s technical staff wizard with deep roots in developer tools (he built Kythe at Google), this January 22, 2026, post launches a series dissecting the agent’s guts. If you’re knee deep in AI agent development, software engineering automation, or just geeking out on large language model orchestration, buckle up. We’ll unpack the agent loop step by step, dive into performance hacks like prompt caching and context compaction, compare architectures, and show why Codex CLI’s open source foundation makes it a must try for every developer stack.

Launched April 2025 alongside o3 and o4 mini models, Codex CLI targets ChatGPT Plus, Pro, Business, Edu, and Enterprise users. Install via npm or Homebrew, and it integrates natively with your workflow. Fully Rust powered for terminal speed, it supports local models via Ollama or LM Studio too. Bolin stresses: The agent’s true output is your modified codebase, not chat fluff.

Core Architecture: The Heartbeat of AI Coding Agents

At its essence, the Codex agent loop is a state machine for AI driven software tasks. Bolin illustrates it with crystal clear diagrams: User prompt → Prompt assembly → Responses API call → Handle output (tool call or final message) → Loop or respond.

Unlike chatty copilots, Codex focuses on action. It uses OpenAI’s Responses API a streaming endpoint for structured tool calls over raw Chat Completions. Why? Tool parallelism, native caching, and ZDR compliance (stateless, no history storage).

Key loop phases:

Input Processing: Your message joins conversation history. Codex injects system prompts first (config.toml or model md files), then developer instructions, aggregated AGENTS.md directives (global/repo/folder scoped, max 32KiB), env details (pwd, shell), and tools manifest.
Inference: POST to /v1/responses (chatgpt.com or custom). Model samples tokens, streams SSE: text for display, json tool_calls for execution.
Tool Handling: Sandboxed shell (e.g., ls, vim, git), MCP servers for custom tools. Output appends as tool message; re prompt with prefix match for caching.
Termination: Assistant message sans tools ends turn. History persists for next.

This enables complex flows: “Implement user auth” → Inspect deps → Edit files → npm test → Fix fails → Commit.

Performance Mastery: Caching and Context Optimization

Without optimization, agent loops explode: N turns = O(N²) tokens. Codex flips this with prompt caching. Static prefixes (system instr, tools) cache perfectly; dynamic history prefixes match prior prompts. Cache hit? Reuse computation, linear scaling.

Bolin shares metrics: High hit rates from tool order consistency, no mid loop model swaps. ZDR? Stateless calls, encrypted compaction items.

Enter context compaction: Hit auto_compact_limit? Call /responses/compact. It summarizes history into a compact message + encrypted reasoning blob (type=compaction). Model “remembers” via this artifact. Evolved from CLI /compact command.

Git smarts add flair: status/diff/history in context; full repo preload for monorepos.

Optimization Technique	How It Works	Benefits	Tradeoffs
Prompt Caching	Static prefix + history prefix match	Linear token cost; 90%+ hit rates	Requires consistent structure
Context Compaction	/compact endpoint summary + encrypted item	Reclaims 70-90% tokens	Minor reasoning loss (rare)
AGENTS.md Hierarchy	Global/repo/folder instructions	Contextual without bloat	32KiB cap per level
Git Awareness	Preloads status/history	Repo scale handling	Initial load time

Open Source Ecosystem: Customize and Extend Freely

Codex CLI’s crown jewel? Open source (github.com/openai/codex). Rust core, active PRs/issues detail every decision. Bolin links src like AgentLoop::run(). 128k+ workflows prove vitality.

Extensibility rocks:

MCP Servers: Remote tools (e.g., databases, APIs).
Sandbox: Containerized, no net, configurable perms via md templates.
Integrations: Datadog for metrics during tasks; Skyscanner JetBrains MCP.

Forks like open codex run local models. Community raves on Reddit/X: “Game changer for async coding.”

Real World Deployments and Developer Wins

Case studies abound

Kodiak: Autonomous refactors/tests for shipping features.
Datadog: Real time observability in agent runs fix infra with live data.
Superhuman: Non eng folks iterating UI logic.

Bolin’s LinkedIn: “Intentional links to code for builders.” More posts ahead: CLI arch, tools, sandboxing.

Use Case	Tools Used	Outcomes	Source
Code Refactor	edit, test, git	80% faster iterations	DevOps.com
DevOps Debug	shell, logs	Live fixes with metrics	Datadog Blog
Repo Audit	git status, grep	Architecture docs gen	PromptLayer
Auth Impl	npm, edit files	Secure flows in mins	OpenAI Blog

Building Your First Codex Workflow

npm i -g @openai/codex or brew install codex.
codex auth with API key.
Create AGENTS.md: Repo rules like “Always add tests.”
Prompt: codex "Migrate to TypeScript".
Customize: ~/.codex/config.toml for models/endpoints.

Pro tips: Use /compact manually; monitor cache via logs; chain with VS Code extensions.

Future of Agentic Coding

OpenAI’s candor empowers: Stateless APIs, compaction patterns apply broadly. As models like o5 loom, agents will orchestrate teams. Training consistency (sandbox=prod) cuts flakiness.

Codex isn’t perfect hallucinations persist, needs human oversight but it’s the closest to reliable autonomy yet. Fork it, build on it, shape the future.

OpenAI Codex Agent Loop Deep Dive

OpenAI’s Michael Bolin unveils the Codex agent loop in a must-read technical blog post, detailing how the open source Codex CLI transforms software development with autonomous AI coding agents.

This comprehensive guide breaks down the ReAct style agent loop from prompt assembly and Responses API inference to sandboxed tool execution and iterative refinement powering tasks like code refactoring, debugging, and Git commits directly in your terminal.

Key highlights include prompt caching for linear performance scaling, context compaction to manage token limits, Git awareness for repo scale projects, and full extensibility via MCP servers and AGENTS.md configs.

Explore real world wins from Datadog DevOps automation to Superhuman PM coding, backed by comparison tables on optimizations and use cases. Fully Rust built and GitHub hosted (github.com/openai/codex), Codex CLI delivers 2 5x productivity gains for ChatGPT Pro users.

Perfect for AI agent developers, the post sets blueprints for stateless APIs and agentic workflows first in a series covering CLI architecture, tools, and sandboxing. Install via npm or Homebrew and start building today.

What’s your Codex story? Drop in comments. Follow for series updates.