Back to Blog

The Architecture of Agentic Engineering

Project structuring, context isolation, and parallel workflows in the Anthropic development model.

Exploded x-ray style robot illustration with separated limbs and visible internal components

TL;DR: Agentic coding changes the developer's role from "writing every line of code" to orchestrating autonomous engineering loops. The strongest workflows are not based on one perfect prompt — they are based on architecture: keep the main agent focused on core goals, delegate implementation to sub-agents with isolated context, use branching for experiments, use Plan Mode before implementation, use CLAUDE.md and AGENTS.md for durable project memory, use Git worktrees for parallel agents, and treat the developer as a systems designer, reviewer, and orchestrator.

Agentic engineering works best when code, context, memory, and execution environments are deliberately separated.

Why Agentic Engineering Is Different

Traditional IDEs help developers write code faster. Autocomplete tools suggest the next line, complete a function, or generate small snippets inside an existing workflow. They operate close to the surface of programming.

Agentic coding environments are different. Tools like Claude Code are designed to read across a whole project, understand file structure and dependencies, plan multi-file changes, edit code directly, run commands, execute tests, debug failures, iterate through implementation loops, and create commits or pull requests.

This changes the software development workflow from:

Human writes code → tool assists locally.

To:

Human defines intent → agent explores, plans, implements, verifies, and reports back.

That shift creates a new engineering problem. The challenge is no longer only "how do we prompt the model?" It becomes: how do we structure the project, context, tools, memory, and environment so that agents can work reliably?

The Core Idea: Core-and-Branch Engineering

A useful way to understand the Anthropic-style workflow is through a core-and-branch model.

The "core" contains the high-level goal, the architectural direction, the project constraints, the implementation roadmap, and the final decision-making context.

The "branches" contain local experiments, implementation attempts, error logs, failed fixes, file-specific debugging, temporary reasoning, and disposable exploration.

The main agent should not absorb every messy detail from every failed implementation attempt. If it does, the context window becomes noisy. As the context window fills with irrelevant logs, repeated failures, half-correct assumptions, and stale reasoning, the model's performance can degrade.

The core-and-branch model avoids this by isolating work. The main session stays clean. Sub-agents handle messy tasks. Failed paths can be discarded. Successful outputs are summarized and merged back into the core.

The Origin of Claude Code as an Internal Tool

Claude Code reportedly began less as a polished flagship product and more as an internal experiment. Boris Cherny, Head of Claude Code, built the early prototype to test how far Anthropic's own APIs could be pushed for software engineering tasks.

The tool spread internally because it solved an immediate problem: it helped people move faster inside real codebases. Its adoption expanded beyond traditional engineering teams into adjacent functions such as data science, product management, design, operations, and technical research.

That internal usage gave Anthropic a live testing environment for agentic workflows. Instead of designing the product around abstract AI demos, the team could observe how people actually used agents inside complex work.

Development Pattern

Phase Focus Main Users
Experimental prototype API stress testing and internal utility Boris Cherny and early engineering users
Internal adoption Workflow automation and codebase navigation Anthropic employees
Cross-functional expansion Broader technical assistance Designers, PMs, data scientists
Enterprise scaling Large migrations and refactors External engineering teams

The key product principle that emerged was radical simplicity and deep extensibility. Claude Code's terminal-first interface forces the product to work with the existing developer environment rather than replacing it.

That means the agent can use familiar primitives: grep, rg, find, cat, sed, awk, npm test, git, shell scripts, linters, formatters, and project-specific CLIs.

In this model, Claude Code behaves less like a closed IDE and more like a Unix-style engineering utility.

Context Isolation: The Most Important Architectural Pattern

The main technical constraint in agentic coding is not only model intelligence. It is context quality.

When agents work on complex tasks, they generate noise: long stack traces, failed test outputs, repeated build errors, large file reads, incorrect hypotheses, half-completed implementation paths, and dead-end debugging attempts. All of this consumes context.

The larger problem is that failed reasoning can linger. If the main agent keeps every bad attempt in memory, the next decision may be influenced by irrelevant or incorrect assumptions. This is why Anthropic-style workflows emphasize uncorrelated context windows.

Main Agent vs Sub-Agent

Agent Type Responsibility Context Should Contain
Main agent Owns architecture, goal, constraints, roadmap Clean summaries, final decisions, task status
Sub-agent Handles local implementation or investigation Logs, local failures, file-specific details
Reviewer agent Audits or challenges output Risk analysis, false positives, verification notes
Synthesizer agent Merges results back into the main plan Summaries, completed steps, unresolved blockers

The main agent should receive only the useful output: what changed, what passed, what failed, what remains blocked, and what decision is needed. It should not absorb every intermediate failure.

Branching as a Failure-Management Strategy

In traditional software engineering, Git branches let developers explore changes without affecting the main branch. In agentic engineering, branching applies at two levels.

Code branching includes Git branches, worktrees, isolated file states, and pull requests. Context branching includes isolated sessions, sub-agents, forked reasoning paths, and disposable experiments.

This distinction matters. A bad code branch can be deleted. A bad reasoning branch should also be disposable. That is the deeper architectural principle.

If an agent tries an approach and fails, the workflow should allow that failure to be abandoned cleanly. The developer or main agent can then try a different approach without dragging the failed context forward.

This is especially useful for tasks like legacy migrations, large refactors, security audits, test suite repairs, API changes, dependency upgrades, and multi-module redesigns. The branch is allowed to fail. The core remains stable.

Map-Reduce for Large Code Migrations

One of the clearest examples of agentic branching is the map-reduce pattern for large migrations. Imagine a codebase with thousands of files that needs to move from one testing framework to another. A single agent handling the entire task in one context window would quickly become overloaded.

A better workflow looks like this: the main agent explores the project, creates a migration plan, generates a task list, and assigns independent files or directories to sub-agents. Each sub-agent works locally and reports results. A synthesizer merges the successful work. Failures are retried, rerouted, or escalated to a human.

Component Role Context Impact
Main orchestrator Defines the migration plan and global rules Low
Sub-agent Converts one file, module, or directory High
Test runner Verifies each local change Medium
Synthesizer Summarizes completed work and unresolved failures Medium
Human reviewer Resolves ambiguous architectural decisions Low but critical

This prevents a single edge case from blocking the whole migration. If one file has unusual dependencies, only that branch gets stuck. Other sub-agents continue working.

The EPIC Workflow: Explore, Plan, Implement, Commit

A strong agentic workflow needs phase separation. The most practical structure is: Explore, Plan, Implement, Commit.

Phase 1: Explore

The first phase should be read-only. The agent searches the codebase and builds a model of the system before making changes.

During exploration, the agent should identify existing file structure, relevant modules, dependency paths, existing conventions, similar implementations, reusable utilities, test patterns, build commands, linting rules, and known architectural constraints.

The goal is to avoid premature implementation. A good exploration phase answers: how does this codebase already solve this type of problem? That question matters because agentic tools often fail when they invent new patterns instead of following existing ones.

Phase 2: Plan

Plan Mode is the safety layer. Before editing files, the agent proposes a structured implementation plan.

A good plan should include target files, files that should not be modified, step-by-step implementation order, expected tests, possible edge cases, dependencies, rollback strategy, risks, and open questions.

The human should challenge the plan before implementation. Useful prompts at this stage include:

What existing patterns are you following?
Which files will you modify and why?
What are the main failure modes?
What should remain unchanged?
How will you verify the implementation?
Which tests are most relevant?

The goal is not to micromanage the agent. The goal is to align the agent's execution path with the project's architecture.

Phase 3: Implement

Once the plan is approved, the agent starts making changes. For small tasks, this can happen in the main session. For large tasks, implementation should be delegated to sub-agents or isolated branches.

During implementation, the agent should make scoped changes, run relevant tests, fix failures, keep a clear task checklist, avoid unrelated refactors, report deviations from the plan, and summarize completed work.

Auto-accept workflows can be useful here, but only when the plan is narrow and the verification loop is strong. The more freedom the agent has, the more important automated checks become.

Phase 4: Commit

The final phase packages the work. The agent should produce a clean diff, a concise summary, a commit message, test results, known limitations, follow-up tasks, and an optional pull request description.

A good final agent report should answer: what changed, why did it change, how was it tested, and what should the reviewer pay attention to?

Custom slash commands can automate this stage. A command like /commit-push-pr could run git status, run tests, format code, generate a commit message, push the branch, open a pull request, and draft a PR summary.

The point is not the command itself. The point is that repeatable development rituals should become reusable agent workflows.

Structural Memory: CLAUDE.md, AGENTS.md, and Project Instructions

Agentic systems perform better when project knowledge is made explicit. The core mechanism for this is persistent memory.

For Claude Code, this often means files like CLAUDE.md, AGENTS.md, MEMORY.md, CLAUDE.local.md, and .claude/rules/*.md. These files act as durable context for the agent.

They reduce the need to repeatedly explain how the project is built, how tests are run, which conventions matter, which files are sensitive, which patterns should be reused, which commands are safe, and how pull requests should be written.

File Purpose Typical Contents
CLAUDE.md Codebase context Tech stack, commands, architecture, conventions
AGENTS.md Process context Branch naming, PR format, review process
MEMORY.md Session learnings Repeated corrections, recurring project-specific lessons
CLAUDE.local.md Personal overrides Local URLs, sandbox credentials, personal shortcuts
.claude/rules/*.md Scoped rules Path-specific instructions for frontend, backend, tests, docs

The important distinction is between codebase context and process context. CLAUDE.md should describe the project. AGENTS.md should describe how work gets done. This separation keeps the memory clean.

Keep Project Memory Short and Operational

A common mistake is turning CLAUDE.md into a long essay. That creates the same problem agentic workflows are trying to avoid: bloated context. A better CLAUDE.md is short, direct, and operational.

It should include a brief project overview, tech stack, common commands, architecture notes, conventions, testing rules, and files that should not be modified. Good memory files are not inspirational. They are executable context.

Hierarchical Memory Loading

Agentic memory should be scoped. Not every instruction should apply everywhere.

Level Example Scope
Global ~/.claude/CLAUDE.md Personal defaults across all projects
Project ./CLAUDE.md Repository-wide conventions
Modular .claude/rules/frontend.md Specific directories or domains
Local CLAUDE.local.md Personal machine-specific notes
Session MEMORY.md Recent corrections and repeated lessons

This matters because frontend rules should not always apply to backend code. Database migration rules should not always apply to UI components. Security-sensitive modules may need stricter instructions than ordinary feature code. The more precisely memory is scoped, the less likely the agent is to apply the wrong rule in the wrong place.

Parallelism: Running Multiple Agents at Once

One of the most powerful agentic workflows is parallel development. Instead of using one agent for one task, developers can run multiple sessions simultaneously.

Examples: one agent migrates tests, one agent fixes TypeScript errors, one agent audits authentication logic, one agent writes documentation, one agent investigates flaky tests, one agent prepares a PR summary.

This can create enormous leverage, but only if the environments are isolated. Running multiple agents in the same working directory is risky. They can edit the same file, overwrite each other's work, corrupt local state, break each other's test runs, pollute session history, and create confusing diffs.

The solution is environment isolation.

Git Worktrees for Agent Isolation

Git worktrees allow multiple branches of the same repository to exist in separate directories at the same time. This makes them well-suited for parallel agentic development.

git worktree add ../feature-auth feature/auth
git worktree add ../fix-tests fix/tests
git worktree add ../docs-update docs/update-api

Each worktree can have its own branch, its own terminal, its own Claude session, its own .env, its own test runs, and its own local changes.

The parallel agent workflow follows these steps: create one worktree per independent task, open a terminal in each worktree, start a separate agent session, give each agent a narrow plan, run verification inside each worktree, review diffs independently, merge completed branches back into main, and remove completed worktrees.

git worktree remove ../feature-auth

For database-heavy applications, each worktree should also have separate local state. That may mean separate SQLite files, separate local schemas, separate test databases, separate ports, or separate .env overrides. Otherwise, agents may interfere with each other through shared infrastructure even if their file systems are isolated.

Extensibility: Bash, Hooks, and the Unix Model

One of the strongest ideas behind Claude Code is that it does not need a custom integration for every task. The existing Unix ecosystem already provides powerful primitives.

Task Tools
Search grep, rg, find
File inspection cat, less, head, tail
Data manipulation sed, awk, jq
Git workflow git status, git diff, git log
Testing npm test, pytest, go test, cargo test
Formatting prettier, black, gofmt, rustfmt
Linting eslint, ruff, clippy
Web access curl, fetch tools, docs search
Build verification npm run build, make, project-specific scripts

This makes the agent more useful because it can interact with real project tools. The agent is not just predicting code. It is operating inside the development environment.

Hooks: Automating the Verification Loop

Hooks are scripts that run at specific points in the agent workflow. They are critical because they turn "agent output" into "verified output."

Post-edit formatting hooks can automatically run bun run format or npm run lint -- --fix after the agent edits a file. This prevents style errors from accumulating.

Post-task test verification hooks run npm test, pytest, or go test ./... when the agent thinks it is done. If tests fail, the hook can instruct the agent to continue fixing the issue rather than stopping prematurely.

This creates a tighter autonomy loop: edit → format → test → fix → retest → summarize. The human should receive work only when it reaches a known-good state.

Opponent Agents and Adversarial Review

One of the more advanced patterns is adversarial agent review. Instead of trusting one agent's output, you can assign another agent to attack it.

Agent A finds security vulnerabilities. Agent B checks whether those vulnerabilities are false positives. Agent C tries to reproduce the issue. Agent D writes the final risk report.

This is useful because single-agent outputs often sound more confident than they should. Opponent agents create structured skepticism.

Agent Role
Scout agent Searches for possible vulnerabilities
Critic agent Challenges each finding
Reproduction agent Attempts to verify exploitability
Patch agent Implements fixes
Review agent Checks the patch for regressions

This is especially useful for security reviews, API correctness, performance optimization, migration validation, compliance-sensitive changes, and incident response. The goal is not to create more AI output. The goal is to produce better-filtered output.

Safety: The Agentic Harness

Giving an agent shell access, file-write access, and tool access creates real risk. The model itself is only one part of the system. The more important layer is the harness: the software environment that controls what the model can do.

A good harness provides permission checks, tool restrictions, filesystem boundaries, network controls, sandboxing, logging, revert mechanisms, and human approval gates.

Mode Description Risk Level
Human approval Requires confirmation before actions Low
Plan mode Read-only exploration and planning Low
Auto mode Allows low-risk actions automatically Medium
Sandbox mode Restricts filesystem or network access Low to medium
Skip permissions Allows actions without prompts High

The safest default is deny-first. The agent should earn autonomy through constraints, not receive unlimited access by default.

Failure Erasure and Disposable Reasoning

A subtle but important idea in agentic workflows is that failed reasoning should be disposable. If a sub-agent enters a loop, hallucinates a false cause, or pursues an invalid fix, that branch should not contaminate the main session.

The system should be able to stop the branch, discard the failed attempt, preserve only useful diagnostics, return to a clean planning state, and try a different approach.

This is the reasoning equivalent of deleting a bad Git branch. It helps prevent the agent from building future decisions on top of flawed assumptions.

Preserve verified results. Discard noisy reasoning.

Industrial Impact: Why This Matters

The reported productivity gains from agentic workflows are significant because they come from parallelism, not just faster typing. The developer is no longer limited to one implementation thread at a time. They can supervise multiple workstreams.

Organization Reported Use Case Claimed Result
Stripe Large Scala-to-Java migration 10,000-line migration in four days
Wiz Python-to-Go library migration 50,000-line migration in around 20 hours
Rakuten Feature delivery acceleration Reduced delivery time from 24 working days to 5
Ramp Incident response Reduced investigation time by 80%

These examples should be verified before publication, but the pattern is credible: agentic productivity comes from delegating bounded workstreams, not from asking one model to do everything.

The Developer's New Role

Agentic engineering does not remove the developer. It changes the developer's job.

The developer becomes responsible for defining the target architecture, writing clear constraints, reviewing plans, designing verification loops, managing context, isolating environments, auditing outputs, deciding which branches to merge, and deciding which branches to delete.

The core skill is no longer just syntax production. It is systems orchestration. A good agentic engineer knows how to answer: what should the agent know, what should the agent ignore, what can be delegated, what must be reviewed, what can safely run in parallel, what should be isolated, and what must be verified before merge?

This is closer to being a technical lead than a line-by-line implementer.

Practical Checklist for Agentic Engineering

Before starting

  • Define the task clearly.
  • Identify the relevant project area.
  • Open or update CLAUDE.md.
  • Confirm test and build commands.
  • Decide whether the task needs a separate branch or worktree.

During exploration

  • Keep the agent read-only.
  • Ask it to find existing patterns.
  • Ask it to identify files likely to change.
  • Ask it to summarize architecture before coding.

During planning

  • Require a step-by-step plan.
  • Ask for risk areas.
  • Set files or directories that must not change.
  • Ask which tests will verify success.
  • Challenge unnecessary abstractions.

During implementation

  • Use narrow tasks.
  • Prefer sub-agents for large work.
  • Run tests frequently.
  • Keep unrelated refactors out of scope.
  • Stop branches that become noisy or confused.

During review

  • Inspect the diff.
  • Check test output.
  • Ask another agent to critique the work.
  • Verify edge cases manually.
  • Merge only clean, explainable changes.

After completion

  • Update memory if the agent learned something useful.
  • Remove stale worktrees.
  • Delete failed branches.
  • Capture reusable commands.
  • Improve hooks or scripts for next time.

Conclusion: Architecture Beats Prompting

The biggest lesson from Anthropic-style agentic workflows is that success does not come from a single perfect prompt. It comes from engineering the environment around the agent.

The strongest workflows combine clean context, explicit memory, planning before implementation, isolated branches, parallel agents, automated verification, human review, and disposable failure paths.

This is the real architecture of agentic engineering. The agent can write code, run commands, and debug failures, but the developer still owns the system. The best results come when the human provides structure and the agent provides execution.

In that model, software engineering becomes less about manually producing every line of syntax and more about designing reliable loops of delegated work.

The future developer is not just a coder. The future developer is an orchestrator of agents, contexts, branches, tools, and verification systems.