Inside OpenAI’s Codex Agent Loop How AI Is Learning to Build Software
AI visual of OpenAI Codex powering autonomous AI coding agents.
How OpenAI’s Codex CLI Turns Your Terminal Into an AI Coding Agent
Just imagine opening your terminal and typing a simple command like “Optimize this Python script for speed.” Within minutes, an AI agent analyzes your code, runs benchmarks, refactors functions, commits the changes to Git, and even opens a pull request all securely on your local machine.
No copy-pasting between IDEs, no context switching, just pure productivity. This isn’t science fiction. It’s OpenAI’s Codex CLI in action, revealed in their groundbreaking technical blog post, Unrolling the Codex Agent Loop.
Penned by Michael Bolin, OpenAI’s technical staff wizard with deep roots in developer tools (he built Kythe at Google), this January 22, 2026, post launches a series dissecting the agent’s guts. If you’re knee deep in AI agent development, software engineering automation, or just geeking out on large language model orchestration, buckle up. We’ll unpack the agent loop step by step, dive into performance hacks like prompt caching and context compaction, compare architectures, and show why Codex CLI’s open source foundation makes it a must try for every developer stack.
Launched April 2025 alongside o3 and o4 mini models, Codex CLI targets ChatGPT Plus, Pro, Business, Edu, and Enterprise users. Install via npm or Homebrew, and it integrates natively with your workflow. Fully Rust powered for terminal speed, it supports local models via Ollama or LM Studio too. Bolin stresses: The agent’s true output is your modified codebase, not chat fluff.
Core Architecture: The Heartbeat of AI Coding Agents
At its essence, the Codex agent loop is a state machine for AI driven software tasks. Bolin illustrates it with crystal clear diagrams: User prompt → Prompt assembly → Responses API call → Handle output (tool call or final message) → Loop or respond.
Unlike chatty copilots, Codex focuses on action. It uses OpenAI’s Responses API a streaming endpoint for structured tool calls over raw Chat Completions. Why? Tool parallelism, native caching, and ZDR compliance (stateless, no history storage).
Key loop phases:
Input Processing: Your message joins conversation history. Codex injects system prompts first (config.toml or model md files), then developer instructions, aggregated AGENTS.md directives (global/repo/folder scoped, max 32KiB), env details (pwd, shell), and tools manifest.
Inference: POST to /v1/responses (chatgpt.com or custom). Model samples tokens, streams SSE: text for display, json tool_calls for execution.
Tool Handling: Sandboxed shell (e.g., ls, vim, git), MCP servers for custom tools. Output appends as tool message; re prompt with prefix match for caching.
Termination: Assistant message sans tools ends turn. History persists for next.
This enables complex flows: “Implement user auth” → Inspect deps → Edit files → npm test → Fix fails → Commit.
Performance Mastery: Caching and Context Optimization
Without optimization, agent loops explode: N turns = O(N²) tokens. Codex flips this with prompt caching. Static prefixes (system instr, tools) cache perfectly; dynamic history prefixes match prior prompts. Cache hit? Reuse computation, linear scaling.
Bolin shares metrics: High hit rates from tool order consistency, no mid loop model swaps. ZDR? Stateless calls, encrypted compaction items.
Enter context compaction: Hit auto_compact_limit? Call /responses/compact. It summarizes history into a compact message + encrypted reasoning blob (type=compaction). Model “remembers” via this artifact. Evolved from CLI /compact command.
Git smarts add flair: status/diff/history in context; full repo preload for monorepos.
| Optimization Technique | How It Works | Benefits | Tradeoffs |
|---|---|---|---|
| Prompt Caching | Static prefix + history prefix match | Linear token cost; 90%+ hit rates | Requires consistent structure |
| Context Compaction | /compact endpoint summary + encrypted item | Reclaims 70-90% tokens | Minor reasoning loss (rare) |
| AGENTS.md Hierarchy | Global/repo/folder instructions | Contextual without bloat | 32KiB cap per level |
| Git Awareness | Preloads status/history | Repo scale handling | Initial load time |
Codex CLI’s crown jewel? Open source (github.com/openai/codex). Rust core, active PRs/issues detail every decision. Bolin links src like AgentLoop::run(). 128k+ workflows prove vitality.
Extensibility rocks:
MCP Servers: Remote tools (e.g., databases, APIs).
Sandbox: Containerized, no net, configurable perms via md templates.
Integrations: Datadog for metrics during tasks; Skyscanner JetBrains MCP.
Forks like open codex run local models. Community raves on Reddit/X: “Game changer for async coding.”
Real World Deployments and Developer Wins

Case studies abound
Kodiak: Autonomous refactors/tests for shipping features.
Datadog: Real time observability in agent runs fix infra with live data.
Superhuman: Non eng folks iterating UI logic.
Bolin’s LinkedIn: “Intentional links to code for builders.” More posts ahead: CLI arch, tools, sandboxing.
| Use Case | Tools Used | Outcomes | Source |
|---|---|---|---|
| Code Refactor | edit, test, git | 80% faster iterations | DevOps.com |
| DevOps Debug | shell, logs | Live fixes with metrics | Datadog Blog |
| Repo Audit | git status, grep | Architecture docs gen | PromptLayer |
| Auth Impl | npm, edit files | Secure flows in mins | OpenAI Blog |
npm i -g @openai/codexorbrew install codex.codex authwith API key.Create AGENTS.md: Repo rules like “Always add tests.”
Prompt:
codex "Migrate to TypeScript".Customize: ~/.codex/config.toml for models/endpoints.
Pro tips: Use /compact manually; monitor cache via logs; chain with VS Code extensions.
Future of Agentic Coding
OpenAI’s candor empowers: Stateless APIs, compaction patterns apply broadly. As models like o5 loom, agents will orchestrate teams. Training consistency (sandbox=prod) cuts flakiness.
Codex isn’t perfect hallucinations persist, needs human oversight but it’s the closest to reliable autonomy yet. Fork it, build on it, shape the future.
OpenAI Codex Agent Loop Deep Dive
OpenAI’s Michael Bolin unveils the Codex agent loop in a must-read technical blog post, detailing how the open source Codex CLI transforms software development with autonomous AI coding agents.
This comprehensive guide breaks down the ReAct style agent loop from prompt assembly and Responses API inference to sandboxed tool execution and iterative refinement powering tasks like code refactoring, debugging, and Git commits directly in your terminal.
Key highlights include prompt caching for linear performance scaling, context compaction to manage token limits, Git awareness for repo scale projects, and full extensibility via MCP servers and AGENTS.md configs.
Explore real world wins from Datadog DevOps automation to Superhuman PM coding, backed by comparison tables on optimizations and use cases. Fully Rust built and GitHub hosted (github.com/openai/codex), Codex CLI delivers 2 5x productivity gains for ChatGPT Pro users.
Perfect for AI agent developers, the post sets blueprints for stateless APIs and agentic workflows first in a series covering CLI architecture, tools, and sandboxing. Install via npm or Homebrew and start building today.
What’s your Codex story? Drop in comments. Follow for series updates.



