The Architecture of Agentic Engineering
Project structuring, context isolation, and parallel workflows in the Anthropic development model.
TL;DR: Agentic coding changes the developer's role from "writing every line of code" to orchestrating autonomous engineering loops. The strongest workflows are not based on one perfect prompt — they are based on architecture: keep the main agent focused on core goals, delegate implementation to sub-agents with isolated context, use branching for experiments, use Plan Mode before implementation, use CLAUDE.md and AGENTS.md for durable project memory, use Git worktrees for parallel agents, and treat the developer as a systems designer, reviewer, and orchestrator.
Agentic engineering works best when code, context, memory, and execution environments are deliberately separated.
Why Agentic Engineering Is Different
Traditional IDEs help developers write code faster. Autocomplete tools suggest the next line, complete a function, or generate small snippets inside an existing workflow. They operate close to the surface of programming.
Agentic coding environments are different. Tools like Claude Code are designed to read across a whole project, understand file structure and dependencies, plan multi-file changes, edit code directly, run commands, execute tests, debug failures, iterate through implementation loops, and create commits or pull requests.
This changes the software development workflow from:
Human writes code → tool assists locally.
To:
Human defines intent → agent explores, plans, implements, verifies, and reports back.
That shift creates a new engineering problem. The challenge is no longer only "how do we prompt the model?" It becomes: how do we structure the project, context, tools, memory, and environment so that agents can work reliably?
The Core Idea: Core-and-Branch Engineering
A useful way to understand the Anthropic-style workflow is through a core-and-branch model.
The "core" contains the high-level goal, the architectural direction, the project constraints, the implementation roadmap, and the final decision-making context.
The "branches" contain local experiments, implementation attempts, error logs, failed fixes, file-specific debugging, temporary reasoning, and disposable exploration.
The main agent should not absorb every messy detail from every failed implementation attempt. If it does, the context window becomes noisy. As the context window fills with irrelevant logs, repeated failures, half-correct assumptions, and stale reasoning, the model's performance can degrade.
The core-and-branch model avoids this by isolating work. The main session stays clean. Sub-agents handle messy tasks. Failed paths can be discarded. Successful outputs are summarized and merged back into the core.
The Origin of Claude Code as an Internal Tool
Claude Code reportedly began less as a polished flagship product and more as an internal experiment. Boris Cherny, Head of Claude Code, built the early prototype to test how far Anthropic's own APIs could be pushed for software engineering tasks.
The tool spread internally because it solved an immediate problem: it helped people move faster inside real codebases. Its adoption expanded beyond traditional engineering teams into adjacent functions such as data science, product management, design, operations, and technical research.
That internal usage gave Anthropic a live testing environment for agentic workflows. Instead of designing the product around abstract AI demos, the team could observe how people actually used agents inside complex work.
Development Pattern
| Phase | Focus | Main Users |
|---|---|---|
| Experimental prototype | API stress testing and internal utility | Boris Cherny and early engineering users |
| Internal adoption | Workflow automation and codebase navigation | Anthropic employees |
| Cross-functional expansion | Broader technical assistance | Designers, PMs, data scientists |
| Enterprise scaling | Large migrations and refactors | External engineering teams |
The key product principle that emerged was radical simplicity and deep extensibility. Claude Code's terminal-first interface forces the product to work with the existing developer environment rather than replacing it.
That means the agent can use familiar primitives: grep, rg, find, cat, sed, awk, npm test, git, shell scripts, linters, formatters, and project-specific CLIs.
In this model, Claude Code behaves less like a closed IDE and more like a Unix-style engineering utility.
Context Isolation: The Most Important Architectural Pattern
The main technical constraint in agentic coding is not only model intelligence. It is context quality.
When agents work on complex tasks, they generate noise: long stack traces, failed test outputs, repeated build errors, large file reads, incorrect hypotheses, half-completed implementation paths, and dead-end debugging attempts. All of this consumes context.
The larger problem is that failed reasoning can linger. If the main agent keeps every bad attempt in memory, the next decision may be influenced by irrelevant or incorrect assumptions. This is why Anthropic-style workflows emphasize uncorrelated context windows.
Main Agent vs Sub-Agent
| Agent Type | Responsibility | Context Should Contain |
|---|---|---|
| Main agent | Owns architecture, goal, constraints, roadmap | Clean summaries, final decisions, task status |
| Sub-agent | Handles local implementation or investigation | Logs, local failures, file-specific details |
| Reviewer agent | Audits or challenges output | Risk analysis, false positives, verification notes |
| Synthesizer agent | Merges results back into the main plan | Summaries, completed steps, unresolved blockers |
The main agent should receive only the useful output: what changed, what passed, what failed, what remains blocked, and what decision is needed. It should not absorb every intermediate failure.
Branching as a Failure-Management Strategy
In traditional software engineering, Git branches let developers explore changes without affecting the main branch. In agentic engineering, branching applies at two levels.
Code branching includes Git branches, worktrees, isolated file states, and pull requests. Context branching includes isolated sessions, sub-agents, forked reasoning paths, and disposable experiments.
This distinction matters. A bad code branch can be deleted. A bad reasoning branch should also be disposable. That is the deeper architectural principle.
If an agent tries an approach and fails, the workflow should allow that failure to be abandoned cleanly. The developer or main agent can then try a different approach without dragging the failed context forward.
This is especially useful for tasks like legacy migrations, large refactors, security audits, test suite repairs, API changes, dependency upgrades, and multi-module redesigns. The branch is allowed to fail. The core remains stable.
Map-Reduce for Large Code Migrations
One of the clearest examples of agentic branching is the map-reduce pattern for large migrations. Imagine a codebase with thousands of files that needs to move from one testing framework to another. A single agent handling the entire task in one context window would quickly become overloaded.
A better workflow looks like this: the main agent explores the project, creates a migration plan, generates a task list, and assigns independent files or directories to sub-agents. Each sub-agent works locally and reports results. A synthesizer merges the successful work. Failures are retried, rerouted, or escalated to a human.
| Component | Role | Context Impact |
|---|---|---|
| Main orchestrator | Defines the migration plan and global rules | Low |
| Sub-agent | Converts one file, module, or directory | High |
| Test runner | Verifies each local change | Medium |
| Synthesizer | Summarizes completed work and unresolved failures | Medium |
| Human reviewer | Resolves ambiguous architectural decisions | Low but critical |
This prevents a single edge case from blocking the whole migration. If one file has unusual dependencies, only that branch gets stuck. Other sub-agents continue working.
The EPIC Workflow: Explore, Plan, Implement, Commit
A strong agentic workflow needs phase separation. The most practical structure is: Explore, Plan, Implement, Commit.
Phase 1: Explore
The first phase should be read-only. The agent searches the codebase and builds a model of the system before making changes.
During exploration, the agent should identify existing file structure, relevant modules, dependency paths, existing conventions, similar implementations, reusable utilities, test patterns, build commands, linting rules, and known architectural constraints.
The goal is to avoid premature implementation. A good exploration phase answers: how does this codebase already solve this type of problem? That question matters because agentic tools often fail when they invent new patterns instead of following existing ones.
Phase 2: Plan
Plan Mode is the safety layer. Before editing files, the agent proposes a structured implementation plan.
A good plan should include target files, files that should not be modified, step-by-step implementation order, expected tests, possible edge cases, dependencies, rollback strategy, risks, and open questions.
The human should challenge the plan before implementation. Useful prompts at this stage include:
What existing patterns are you following?
Which files will you modify and why?
What are the main failure modes?
What should remain unchanged?
How will you verify the implementation?
Which tests are most relevant?
The goal is not to micromanage the agent. The goal is to align the agent's execution path with the project's architecture.
Phase 3: Implement
Once the plan is approved, the agent starts making changes. For small tasks, this can happen in the main session. For large tasks, implementation should be delegated to sub-agents or isolated branches.
During implementation, the agent should make scoped changes, run relevant tests, fix failures, keep a clear task checklist, avoid unrelated refactors, report deviations from the plan, and summarize completed work.
Auto-accept workflows can be useful here, but only when the plan is narrow and the verification loop is strong. The more freedom the agent has, the more important automated checks become.
Phase 4: Commit
The final phase packages the work. The agent should produce a clean diff, a concise summary, a commit message, test results, known limitations, follow-up tasks, and an optional pull request description.
A good final agent report should answer: what changed, why did it change, how was it tested, and what should the reviewer pay attention to?
Custom slash commands can automate this stage. A command like /commit-push-pr could run git status, run tests, format code, generate a commit message, push the branch, open a pull request, and draft a PR summary.
The point is not the command itself. The point is that repeatable development rituals should become reusable agent workflows.
Structural Memory: CLAUDE.md, AGENTS.md, and Project Instructions
Agentic systems perform better when project knowledge is made explicit. The core mechanism for this is persistent memory.
For Claude Code, this often means files like CLAUDE.md, AGENTS.md, MEMORY.md, CLAUDE.local.md, and .claude/rules/*.md. These files act as durable context for the agent.
They reduce the need to repeatedly explain how the project is built, how tests are run, which conventions matter, which files are sensitive, which patterns should be reused, which commands are safe, and how pull requests should be written.
| File | Purpose | Typical Contents |
|---|---|---|
CLAUDE.md |
Codebase context | Tech stack, commands, architecture, conventions |
AGENTS.md |
Process context | Branch naming, PR format, review process |
MEMORY.md |
Session learnings | Repeated corrections, recurring project-specific lessons |
CLAUDE.local.md |
Personal overrides | Local URLs, sandbox credentials, personal shortcuts |
.claude/rules/*.md |
Scoped rules | Path-specific instructions for frontend, backend, tests, docs |
The important distinction is between codebase context and process context. CLAUDE.md should describe the project. AGENTS.md should describe how work gets done. This separation keeps the memory clean.
Keep Project Memory Short and Operational
A common mistake is turning CLAUDE.md into a long essay. That creates the same problem agentic workflows are trying to avoid: bloated context. A better CLAUDE.md is short, direct, and operational.
It should include a brief project overview, tech stack, common commands, architecture notes, conventions, testing rules, and files that should not be modified. Good memory files are not inspirational. They are executable context.
Hierarchical Memory Loading
Agentic memory should be scoped. Not every instruction should apply everywhere.
| Level | Example | Scope |
|---|---|---|
| Global | ~/.claude/CLAUDE.md |
Personal defaults across all projects |
| Project | ./CLAUDE.md |
Repository-wide conventions |
| Modular | .claude/rules/frontend.md |
Specific directories or domains |
| Local | CLAUDE.local.md |
Personal machine-specific notes |
| Session | MEMORY.md |
Recent corrections and repeated lessons |
This matters because frontend rules should not always apply to backend code. Database migration rules should not always apply to UI components. Security-sensitive modules may need stricter instructions than ordinary feature code. The more precisely memory is scoped, the less likely the agent is to apply the wrong rule in the wrong place.
Parallelism: Running Multiple Agents at Once
One of the most powerful agentic workflows is parallel development. Instead of using one agent for one task, developers can run multiple sessions simultaneously.
Examples: one agent migrates tests, one agent fixes TypeScript errors, one agent audits authentication logic, one agent writes documentation, one agent investigates flaky tests, one agent prepares a PR summary.
This can create enormous leverage, but only if the environments are isolated. Running multiple agents in the same working directory is risky. They can edit the same file, overwrite each other's work, corrupt local state, break each other's test runs, pollute session history, and create confusing diffs.
The solution is environment isolation.
Git Worktrees for Agent Isolation
Git worktrees allow multiple branches of the same repository to exist in separate directories at the same time. This makes them well-suited for parallel agentic development.
git worktree add ../feature-auth feature/auth
git worktree add ../fix-tests fix/tests
git worktree add ../docs-update docs/update-api
Each worktree can have its own branch, its own terminal, its own Claude session, its own .env, its own test runs, and its own local changes.
The parallel agent workflow follows these steps: create one worktree per independent task, open a terminal in each worktree, start a separate agent session, give each agent a narrow plan, run verification inside each worktree, review diffs independently, merge completed branches back into main, and remove completed worktrees.
git worktree remove ../feature-auth
For database-heavy applications, each worktree should also have separate local state. That may mean separate SQLite files, separate local schemas, separate test databases, separate ports, or separate .env overrides. Otherwise, agents may interfere with each other through shared infrastructure even if their file systems are isolated.
Extensibility: Bash, Hooks, and the Unix Model
One of the strongest ideas behind Claude Code is that it does not need a custom integration for every task. The existing Unix ecosystem already provides powerful primitives.
| Task | Tools |
|---|---|
| Search | grep, rg, find |
| File inspection | cat, less, head, tail |
| Data manipulation | sed, awk, jq |
| Git workflow | git status, git diff, git log |
| Testing | npm test, pytest, go test, cargo test |
| Formatting | prettier, black, gofmt, rustfmt |
| Linting | eslint, ruff, clippy |
| Web access | curl, fetch tools, docs search |
| Build verification | npm run build, make, project-specific scripts |
This makes the agent more useful because it can interact with real project tools. The agent is not just predicting code. It is operating inside the development environment.
Hooks: Automating the Verification Loop
Hooks are scripts that run at specific points in the agent workflow. They are critical because they turn "agent output" into "verified output."
Post-edit formatting hooks can automatically run bun run format or npm run lint -- --fix after the agent edits a file. This prevents style errors from accumulating.
Post-task test verification hooks run npm test, pytest, or go test ./... when the agent thinks it is done. If tests fail, the hook can instruct the agent to continue fixing the issue rather than stopping prematurely.
This creates a tighter autonomy loop: edit → format → test → fix → retest → summarize. The human should receive work only when it reaches a known-good state.
Opponent Agents and Adversarial Review
One of the more advanced patterns is adversarial agent review. Instead of trusting one agent's output, you can assign another agent to attack it.
Agent A finds security vulnerabilities. Agent B checks whether those vulnerabilities are false positives. Agent C tries to reproduce the issue. Agent D writes the final risk report.
This is useful because single-agent outputs often sound more confident than they should. Opponent agents create structured skepticism.
| Agent | Role |
|---|---|
| Scout agent | Searches for possible vulnerabilities |
| Critic agent | Challenges each finding |
| Reproduction agent | Attempts to verify exploitability |
| Patch agent | Implements fixes |
| Review agent | Checks the patch for regressions |
This is especially useful for security reviews, API correctness, performance optimization, migration validation, compliance-sensitive changes, and incident response. The goal is not to create more AI output. The goal is to produce better-filtered output.
Safety: The Agentic Harness
Giving an agent shell access, file-write access, and tool access creates real risk. The model itself is only one part of the system. The more important layer is the harness: the software environment that controls what the model can do.
A good harness provides permission checks, tool restrictions, filesystem boundaries, network controls, sandboxing, logging, revert mechanisms, and human approval gates.
| Mode | Description | Risk Level |
|---|---|---|
| Human approval | Requires confirmation before actions | Low |
| Plan mode | Read-only exploration and planning | Low |
| Auto mode | Allows low-risk actions automatically | Medium |
| Sandbox mode | Restricts filesystem or network access | Low to medium |
| Skip permissions | Allows actions without prompts | High |
The safest default is deny-first. The agent should earn autonomy through constraints, not receive unlimited access by default.
Failure Erasure and Disposable Reasoning
A subtle but important idea in agentic workflows is that failed reasoning should be disposable. If a sub-agent enters a loop, hallucinates a false cause, or pursues an invalid fix, that branch should not contaminate the main session.
The system should be able to stop the branch, discard the failed attempt, preserve only useful diagnostics, return to a clean planning state, and try a different approach.
This is the reasoning equivalent of deleting a bad Git branch. It helps prevent the agent from building future decisions on top of flawed assumptions.
Preserve verified results. Discard noisy reasoning.
Industrial Impact: Why This Matters
The reported productivity gains from agentic workflows are significant because they come from parallelism, not just faster typing. The developer is no longer limited to one implementation thread at a time. They can supervise multiple workstreams.
| Organization | Reported Use Case | Claimed Result |
|---|---|---|
| Stripe | Large Scala-to-Java migration | 10,000-line migration in four days |
| Wiz | Python-to-Go library migration | 50,000-line migration in around 20 hours |
| Rakuten | Feature delivery acceleration | Reduced delivery time from 24 working days to 5 |
| Ramp | Incident response | Reduced investigation time by 80% |
These examples should be verified before publication, but the pattern is credible: agentic productivity comes from delegating bounded workstreams, not from asking one model to do everything.
The Developer's New Role
Agentic engineering does not remove the developer. It changes the developer's job.
The developer becomes responsible for defining the target architecture, writing clear constraints, reviewing plans, designing verification loops, managing context, isolating environments, auditing outputs, deciding which branches to merge, and deciding which branches to delete.
The core skill is no longer just syntax production. It is systems orchestration. A good agentic engineer knows how to answer: what should the agent know, what should the agent ignore, what can be delegated, what must be reviewed, what can safely run in parallel, what should be isolated, and what must be verified before merge?
This is closer to being a technical lead than a line-by-line implementer.
Practical Checklist for Agentic Engineering
Before starting
- Define the task clearly.
- Identify the relevant project area.
- Open or update
CLAUDE.md. - Confirm test and build commands.
- Decide whether the task needs a separate branch or worktree.
During exploration
- Keep the agent read-only.
- Ask it to find existing patterns.
- Ask it to identify files likely to change.
- Ask it to summarize architecture before coding.
During planning
- Require a step-by-step plan.
- Ask for risk areas.
- Set files or directories that must not change.
- Ask which tests will verify success.
- Challenge unnecessary abstractions.
During implementation
- Use narrow tasks.
- Prefer sub-agents for large work.
- Run tests frequently.
- Keep unrelated refactors out of scope.
- Stop branches that become noisy or confused.
During review
- Inspect the diff.
- Check test output.
- Ask another agent to critique the work.
- Verify edge cases manually.
- Merge only clean, explainable changes.
After completion
- Update memory if the agent learned something useful.
- Remove stale worktrees.
- Delete failed branches.
- Capture reusable commands.
- Improve hooks or scripts for next time.
Conclusion: Architecture Beats Prompting
The biggest lesson from Anthropic-style agentic workflows is that success does not come from a single perfect prompt. It comes from engineering the environment around the agent.
The strongest workflows combine clean context, explicit memory, planning before implementation, isolated branches, parallel agents, automated verification, human review, and disposable failure paths.
This is the real architecture of agentic engineering. The agent can write code, run commands, and debug failures, but the developer still owns the system. The best results come when the human provides structure and the agent provides execution.
In that model, software engineering becomes less about manually producing every line of syntax and more about designing reliable loops of delegated work.
The future developer is not just a coder. The future developer is an orchestrator of agents, contexts, branches, tools, and verification systems.