Back to Blog

The Architecture of Agentic Engineering

May 10, 2026

Dovas Slanina

Project structuring, context isolation, and parallel workflows in the Anthropic development model.

TL;DR: Agentic coding changes the developer's role from "writing every line of code" to orchestrating autonomous engineering loops. The strongest workflows are not based on one perfect prompt — they are based on architecture: keep the main agent focused on core goals, delegate implementation to sub-agents with isolated context, use branching for experiments, use Plan Mode before implementation, use CLAUDE.md and AGENTS.md for durable project memory, use Git worktrees for parallel agents, and treat the developer as a systems designer, reviewer, and orchestrator.

Agentic engineering works best when code, context, memory, and execution environments are deliberately separated.

Why Agentic Engineering Is Different

Traditional IDEs help developers write code faster. Autocomplete tools suggest the next line, complete a function, or generate small snippets inside an existing workflow. They operate close to the surface of programming.

Agentic coding environments are different. Tools like Claude Code are designed to read across a whole project, understand file structure and dependencies, plan multi-file changes, edit code directly, run commands, execute tests, debug failures, iterate through implementation loops, and create commits or pull requests.

This changes the software development workflow from:

Human writes code → tool assists locally.

To:

Human defines intent → agent explores, plans, implements, verifies, and reports back.

That shift creates a new engineering problem. The challenge is no longer only "how do we prompt the model?" It becomes: how do we structure the project, context, tools, memory, and environment so that agents can work reliably?

The Core Idea: Core-and-Branch Engineering

A useful way to understand the Anthropic-style workflow is through a core-and-branch model.

The "core" contains the high-level goal, the architectural direction, the project constraints, the implementation roadmap, and the final decision-making context.

The "branches" contain local experiments, implementation attempts, error logs, failed fixes, file-specific debugging, temporary reasoning, and disposable exploration.

The main agent should not absorb every messy detail from every failed implementation attempt. If it does, the context window becomes noisy. As the context window fills with irrelevant logs, repeated failures, half-correct assumptions, and stale reasoning, the model's performance can degrade.

The core-and-branch model avoids this by isolating work. The main session stays clean. Sub-agents handle messy tasks. Failed paths can be discarded. Successful outputs are summarized and merged back into the core.

The Origin of Claude Code as an Internal Tool

Claude Code reportedly began less as a polished flagship product and more as an internal experiment. Boris Cherny, Head of Claude Code, built the early prototype to test how far Anthropic's own APIs could be pushed for software engineering tasks.

The tool spread internally because it solved an immediate problem: it helped people move faster inside real codebases. Its adoption expanded beyond traditional engineering teams into adjacent functions such as data science, product management, design, operations, and technical research.

That internal usage gave Anthropic a live testing environment for agentic workflows. Instead of designing the product around abstract AI demos, the team could observe how people actually used agents inside complex work.

Development Pattern

Phase	Focus	Main Users
Experimental prototype	API stress testing and internal utility	Boris Cherny and early engineering users
Internal adoption	Workflow automation and codebase navigation	Anthropic employees
Cross-functional expansion	Broader technical assistance	Designers, PMs, data scientists
Enterprise scaling	Large migrations and refactors	External engineering teams

The key product principle that emerged was radical simplicity and deep extensibility. Claude Code's terminal-first interface forces the product to work with the existing developer environment rather than replacing it.

That means the agent can use familiar primitives: grep, rg, find, cat, sed, awk, npm test, git, shell scripts, linters, formatters, and project-specific CLIs.

In this model, Claude Code behaves less like a closed IDE and more like a Unix-style engineering utility.

Context Isolation: The Most Important Architectural Pattern

The main technical constraint in agentic coding is not only model intelligence. It is context quality.

When agents work on complex tasks, they generate noise: long stack traces, failed test outputs, repeated build errors, large file reads, incorrect hypotheses, half-completed implementation paths, and dead-end debugging attempts. All of this consumes context.

The larger problem is that failed reasoning can linger. If the main agent keeps every bad attempt in memory, the next decision may be influenced by irrelevant or incorrect assumptions. This is why Anthropic-style workflows emphasize uncorrelated context windows.

Main Agent vs Sub-Agent

Agent Type	Responsibility	Context Should Contain
Main agent	Owns architecture, goal, constraints, roadmap	Clean summaries, final decisions, task status
Sub-agent	Handles local implementation or investigation	Logs, local failures, file-specific details
Reviewer agent	Audits or challenges output	Risk analysis, false positives, verification notes
Synthesizer agent	Merges results back into the main plan	Summaries, completed steps, unresolved blockers

The main agent should receive only the useful output: what changed, what passed, what failed, what remains blocked, and what decision is needed. It should not absorb every intermediate failure.

Branching as a Failure-Management Strategy

In traditional software engineering, Git branches let developers explore changes without affecting the main branch. In agentic engineering, branching applies at two levels.

Code branching includes Git branches, worktrees, isolated file states, and pull requests. Context branching includes isolated sessions, sub-agents, forked reasoning paths, and disposable experiments.

This distinction matters. A bad code branch can be deleted. A bad reasoning branch should also be disposable. That is the deeper architectural principle.

If an agent tries an approach and fails, the workflow should allow that failure to be abandoned cleanly. The developer or main agent can then try a different approach without dragging the failed context forward.

This is especially useful for tasks like legacy migrations, large refactors, security audits, test suite repairs, API changes, dependency upgrades, and multi-module redesigns. The branch is allowed to fail. The core remains stable.

Map-Reduce for Large Code Migrations

One of the clearest examples of agentic branching is the map-reduce pattern for large migrations. Imagine a codebase with thousands of files that needs to move from one testing framework to another. A single agent handling the entire task in one context window would quickly become overloaded.

A better workflow looks like this: the main agent explores the project, creates a migration plan, generates a task list, and assigns independent files or directories to sub-agents. Each sub-agent works locally and reports results. A synthesizer merges the successful work. Failures are retried, rerouted, or escalated to a human.

Component	Role	Context Impact
Main orchestrator	Defines the migration plan and global rules	Low
Sub-agent	Converts one file, module, or directory	High
Test runner	Verifies each local change	Medium
Synthesizer	Summarizes completed work and unresolved failures	Medium
Human reviewer	Resolves ambiguous architectural decisions	Low but critical

This prevents a single edge case from blocking the whole migration. If one file has unusual dependencies, only that branch gets stuck. Other sub-agents continue working.

The EPIC Workflow: Explore, Plan, Implement, Commit

A strong agentic workflow needs phase separation. The most practical structure is: Explore, Plan, Implement, Commit.

Phase 1: Explore

The first phase should be read-only. The agent searches the codebase and builds a model of the system before making changes.

During exploration, the agent should identify existing file structure, relevant modules, dependency paths, existing conventions, similar implementations, reusable utilities, test patterns, build commands, linting rules, and known architectural constraints.

The goal is to avoid premature implementation. A good exploration phase answers: how does this codebase already solve this type of problem? That question matters because agentic tools often fail when they invent new patterns instead of following existing ones.

Phase 2: Plan

Plan Mode is the safety layer. Before editing files, the agent proposes a structured implementation plan.

A good plan should include target files, files that should not be modified, step-by-step implementation order, expected tests, possible edge cases, dependencies, rollback strategy, risks, and open questions.

The human should challenge the plan before implementation. Useful prompts at this stage include:

What existing patterns are you following?
Which files will you modify and why?
What are the main failure modes?
What should remain unchanged?
How will you verify the implementation?
Which tests are most relevant?

The goal is not to micromanage the agent. The goal is to align the agent's execution path with the project's architecture.

Phase 3: Implement

Once the plan is approved, the agent starts making changes. For small tasks, this can happen in the main session. For large tasks, implementation should be delegated to sub-agents or isolated branches.

During implementation, the agent should make scoped changes, run relevant tests, fix failures, keep a clear task checklist, avoid unrelated refactors, report deviations from the plan, and summarize completed work.

Auto-accept workflows can be useful here, but only when the plan is narrow and the verification loop is strong. The more freedom the agent has, the more important automated checks become.

Phase 4: Commit

The final phase packages the work. The agent should produce a clean diff, a concise summary, a commit message, test results, known limitations, follow-up tasks, and an optional pull request description.

A good final agent report should answer: what changed, why did it change, how was it tested, and what should the reviewer pay attention to?

Custom slash commands can automate this stage. A command like /commit-push-pr could run git status, run tests, format code, generate a commit message, push the branch, open a pull request, and draft a PR summary.

The point is not the command itself. The point is that repeatable development rituals should become reusable agent workflows.

Structural Memory: CLAUDE.md, AGENTS.md, and Project Instructions

Agentic systems perform better when project knowledge is made explicit. The core mechanism for this is persistent memory.

For Claude Code, this often means files like CLAUDE.md, AGENTS.md, MEMORY.md, CLAUDE.local.md, and .claude/rules/*.md. These files act as durable context for the agent.

They reduce the need to repeatedly explain how the project is built, how tests are run, which conventions matter, which files are sensitive, which patterns should be reused, which commands are safe, and how pull requests should be written.

File	Purpose	Typical Contents
`CLAUDE.md`	Codebase context	Tech stack, commands, architecture, conventions
`AGENTS.md`	Process context	Branch naming, PR format, review process
`MEMORY.md`	Session learnings	Repeated corrections, recurring project-specific lessons
`CLAUDE.local.md`	Personal overrides	Local URLs, sandbox credentials, personal shortcuts
`.claude/rules/*.md`	Scoped rules	Path-specific instructions for frontend, backend, tests, docs

The important distinction is between codebase context and process context. CLAUDE.md should describe the project. AGENTS.md should describe how work gets done. This separation keeps the memory clean.

Keep Project Memory Short and Operational

A common mistake is turning CLAUDE.md into a long essay. That creates the same problem agentic workflows are trying to avoid: bloated context. A better CLAUDE.md is short, direct, and operational.

It should include a brief project overview, tech stack, common commands, architecture notes, conventions, testing rules, and files that should not be modified. Good memory files are not inspirational. They are executable context.

Hierarchical Memory Loading

Agentic memory should be scoped. Not every instruction should apply everywhere.

Level	Example	Scope
Global	`~/.claude/CLAUDE.md`	Personal defaults across all projects
Project	`./CLAUDE.md`	Repository-wide conventions
Modular	`.claude/rules/frontend.md`	Specific directories or domains
Local	`CLAUDE.local.md`	Personal machine-specific notes
Session	`MEMORY.md`	Recent corrections and repeated lessons

This matters because frontend rules should not always apply to backend code. Database migration rules should not always apply to UI components. Security-sensitive modules may need stricter instructions than ordinary feature code. The more precisely memory is scoped, the less likely the agent is to apply the wrong rule in the wrong place.

Parallelism: Running Multiple Agents at Once

One of the most powerful agentic workflows is parallel development. Instead of using one agent for one task, developers can run multiple sessions simultaneously.

Examples: one agent migrates tests, one agent fixes TypeScript errors, one agent audits authentication logic, one agent writes documentation, one agent investigates flaky tests, one agent prepares a PR summary.

This can create enormous leverage, but only if the environments are isolated. Running multiple agents in the same working directory is risky. They can edit the same file, overwrite each other's work, corrupt local state, break each other's test runs, pollute session history, and create confusing diffs.

The solution is environment isolation.

Git Worktrees for Agent Isolation

Git worktrees allow multiple branches of the same repository to exist in separate directories at the same time. This makes them well-suited for parallel agentic development.

git worktree add ../feature-auth feature/auth
git worktree add ../fix-tests fix/tests
git worktree add ../docs-update docs/update-api

Each worktree can have its own branch, its own terminal, its own Claude session, its own .env, its own test runs, and its own local changes.

The parallel agent workflow follows these steps: create one worktree per independent task, open a terminal in each worktree, start a separate agent session, give each agent a narrow plan, run verification inside each worktree, review diffs independently, merge completed branches back into main, and remove completed worktrees.

git worktree remove ../feature-auth

For database-heavy applications, each worktree should also have separate local state. That may mean separate SQLite files, separate local schemas, separate test databases, separate ports, or separate .env overrides. Otherwise, agents may interfere with each other through shared infrastructure even if their file systems are isolated.

Extensibility: Bash, Hooks, and the Unix Model

One of the strongest ideas behind Claude Code is that it does not need a custom integration for every task. The existing Unix ecosystem already provides powerful primitives.

Task	Tools
Search	`grep`, `rg`, `find`
File inspection	`cat`, `less`, `head`, `tail`
Data manipulation	`sed`, `awk`, `jq`
Git workflow	`git status`, `git diff`, `git log`
Testing	`npm test`, `pytest`, `go test`, `cargo test`
Formatting	`prettier`, `black`, `gofmt`, `rustfmt`
Linting	`eslint`, `ruff`, `clippy`
Web access	`curl`, fetch tools, docs search
Build verification	`npm run build`, `make`, project-specific scripts

This makes the agent more useful because it can interact with real project tools. The agent is not just predicting code. It is operating inside the development environment.

Hooks: Automating the Verification Loop

Hooks are scripts that run at specific points in the agent workflow. They are critical because they turn "agent output" into "verified output."

Post-edit formatting hooks can automatically run bun run format or npm run lint -- --fix after the agent edits a file. This prevents style errors from accumulating.

Post-task test verification hooks run npm test, pytest, or go test ./... when the agent thinks it is done. If tests fail, the hook can instruct the agent to continue fixing the issue rather than stopping prematurely.

This creates a tighter autonomy loop: edit → format → test → fix → retest → summarize. The human should receive work only when it reaches a known-good state.

Opponent Agents and Adversarial Review

One of the more advanced patterns is adversarial agent review. Instead of trusting one agent's output, you can assign another agent to attack it.

Agent A finds security vulnerabilities. Agent B checks whether those vulnerabilities are false positives. Agent C tries to reproduce the issue. Agent D writes the final risk report.

This is useful because single-agent outputs often sound more confident than they should. Opponent agents create structured skepticism.

Agent	Role
Scout agent	Searches for possible vulnerabilities
Critic agent	Challenges each finding
Reproduction agent	Attempts to verify exploitability
Patch agent	Implements fixes
Review agent	Checks the patch for regressions

This is especially useful for security reviews, API correctness, performance optimization, migration validation, compliance-sensitive changes, and incident response. The goal is not to create more AI output. The goal is to produce better-filtered output.

Safety: The Agentic Harness

Giving an agent shell access, file-write access, and tool access creates real risk. The model itself is only one part of the system. The more important layer is the harness: the software environment that controls what the model can do.

A good harness provides permission checks, tool restrictions, filesystem boundaries, network controls, sandboxing, logging, revert mechanisms, and human approval gates.

Mode	Description	Risk Level
Human approval	Requires confirmation before actions	Low
Plan mode	Read-only exploration and planning	Low
Auto mode	Allows low-risk actions automatically	Medium
Sandbox mode	Restricts filesystem or network access	Low to medium
Skip permissions	Allows actions without prompts	High

The safest default is deny-first. The agent should earn autonomy through constraints, not receive unlimited access by default.

Failure Erasure and Disposable Reasoning

A subtle but important idea in agentic workflows is that failed reasoning should be disposable. If a sub-agent enters a loop, hallucinates a false cause, or pursues an invalid fix, that branch should not contaminate the main session.

The system should be able to stop the branch, discard the failed attempt, preserve only useful diagnostics, return to a clean planning state, and try a different approach.

This is the reasoning equivalent of deleting a bad Git branch. It helps prevent the agent from building future decisions on top of flawed assumptions.

Preserve verified results. Discard noisy reasoning.

Industrial Impact: Why This Matters

The reported productivity gains from agentic workflows are significant because they come from parallelism, not just faster typing. The developer is no longer limited to one implementation thread at a time. They can supervise multiple workstreams.

Organization	Reported Use Case	Claimed Result
Stripe	Large Scala-to-Java migration	10,000-line migration in four days
Wiz	Python-to-Go library migration	50,000-line migration in around 20 hours
Rakuten	Feature delivery acceleration	Reduced delivery time from 24 working days to 5
Ramp	Incident response	Reduced investigation time by 80%

These examples should be verified before publication, but the pattern is credible: agentic productivity comes from delegating bounded workstreams, not from asking one model to do everything.

The Developer's New Role

Agentic engineering does not remove the developer. It changes the developer's job.

The developer becomes responsible for defining the target architecture, writing clear constraints, reviewing plans, designing verification loops, managing context, isolating environments, auditing outputs, deciding which branches to merge, and deciding which branches to delete.

The core skill is no longer just syntax production. It is systems orchestration. A good agentic engineer knows how to answer: what should the agent know, what should the agent ignore, what can be delegated, what must be reviewed, what can safely run in parallel, what should be isolated, and what must be verified before merge?

This is closer to being a technical lead than a line-by-line implementer.

Practical Checklist for Agentic Engineering

Before starting

Define the task clearly.
Identify the relevant project area.
Open or update CLAUDE.md.
Confirm test and build commands.
Decide whether the task needs a separate branch or worktree.

During exploration

Keep the agent read-only.
Ask it to find existing patterns.
Ask it to identify files likely to change.
Ask it to summarize architecture before coding.

During planning

Require a step-by-step plan.
Ask for risk areas.
Set files or directories that must not change.
Ask which tests will verify success.
Challenge unnecessary abstractions.

During implementation

Use narrow tasks.
Prefer sub-agents for large work.
Run tests frequently.
Keep unrelated refactors out of scope.
Stop branches that become noisy or confused.

During review

Inspect the diff.
Check test output.
Ask another agent to critique the work.
Verify edge cases manually.
Merge only clean, explainable changes.

After completion

Update memory if the agent learned something useful.
Remove stale worktrees.
Delete failed branches.
Capture reusable commands.
Improve hooks or scripts for next time.

Conclusion: Architecture Beats Prompting

The biggest lesson from Anthropic-style agentic workflows is that success does not come from a single perfect prompt. It comes from engineering the environment around the agent.

The strongest workflows combine clean context, explicit memory, planning before implementation, isolated branches, parallel agents, automated verification, human review, and disposable failure paths.

This is the real architecture of agentic engineering. The agent can write code, run commands, and debug failures, but the developer still owns the system. The best results come when the human provides structure and the agent provides execution.

In that model, software engineering becomes less about manually producing every line of syntax and more about designing reliable loops of delegated work.

The future developer is not just a coder. The future developer is an orchestrator of agents, contexts, branches, tools, and verification systems.