AI Agent Architecture has Layers — You Can't Skip Them

Ogres are like onions. So are AI agents.

The line gets funnier the more time you spend watching people try to build production AI systems by starting at layer four with no layers underneath it. And then watching those systems fall apart, usually in some creative new way, because the foundation was never there.

Agent architecture is a layered thing. The layers are real, the dependencies between them are real, and you cannot skip them just because the business wants agents doing complex reasoning this quarter.

Here is how I think about the stack.

Layer 1 — Deterministic primitives

The bottom of the stack is not AI. It is code. Shell scripts, API calls, file I/O, database reads, well-tested functions that do exactly one thing and have predictable behavior. These are your primitives.

An agent that cannot reliably do a shell command, read a file, or hit an API is not a useful agent. It is a probability machine connected to the world in ways you do not control. The primitives have to be solid before anything layered on top of them is trustworthy.

This sounds obvious. I promise you it is not. I have watched teams build elaborate agentic pipelines on top of file operations that had no error handling, API calls with no retry logic, and data reads with no validation. When the agent misbehaves, the failure is in layer one. But because layer one is not visible in the reasoning trace, it takes hours to find.

Get your primitives right. Test them as primitives. Know what they do when they fail.

Layer 2 — Single-call LLM

The next layer up is a language model making a single call with deterministic input and deterministic output parsing. Not a conversation. Not a chain. One call. Prompt in, structured output out.

This is where most "AI-powered" tools actually live, and that is fine. A lot of genuinely useful things are a well-crafted prompt hitting a capable model and parsing the result. Classification, extraction, summarization, routing. One call. Reliable. Testable.

If you cannot write a test that confirms your layer 2 call returns the structure you expect at least 95% of the time on a fixed evaluation set, you are not ready to go to layer 3. You need more prompt work, better output parsing, or a different model. The single-call layer needs to be solid, because everything above it calls it repeatedly.

Layer 3 — Tool-using agent

Now the model has tools. It can call things. It decides, dynamically, which primitive from layer one to invoke, when to invoke it, and what to do with the result. You have a loop. The model runs, picks a tool, runs the tool, sees the result, decides what to do next.

This is where agent behavior starts to feel emergent. And emergence is exciting until it emerges wrong at 2am in a production system and you cannot reproduce it.

The discipline at layer three is the same as at layers one and two: every tool the agent can call is layer one (tested, reliable, with well-defined failure behavior). Every time the agent reaches a decision point, you can read the trace and understand why it made the choice it made. Agents that are opaque at layer three will be catastrophically opaque at layer four.

Log everything. Every tool call, every model response, every intermediate state. You are debugging a probabilistic system. The only way to do that is to have the full trace.

Layer 4 — Multi-agent orchestration

At layer four, agents are spawning agents. One agent breaks a problem down and routes subtasks to other agents. Agents review each other's work. An orchestrator decides which specialist gets which piece of the problem.

This is where the leverage is enormous and the failure modes are subtle. When a multi-agent system misbehaves, the error may originate in a sub-agent that was called by an orchestrator that was itself called by a top-level planner. The trace is three levels deep. If you have not built layer three well, layer four is a debugging nightmare.

The teams who build effective multi-agent systems reliably have the same property: they have spent time at each prior layer. They understand how their models behave at a single call, they understand how their tool-using agents behave under load, and they have the tracing and observability to see through the whole stack.

The teams who build fragile multi-agent systems also have a common property: they built layer four first, because layer four is the exciting part, and they are now working backward to figure out why things break.

Layer 5 — Reflection and meta-reasoning

The top of the stack, and the one most people have not gotten to yet in any serious production sense, is an agent that can reason about its own behavior. Look at a task it completed. Evaluate whether it did it well. Update its approach for next time.

This is not magic. It is still the same model calling the same tools. But the meta-layer is now part of the loop: the agent's self-evaluation shapes its future behavior. The reinforcement signal comes from within the system, not just from external outcomes.

The reason this works when it works is that layers one through four are solid underneath it. Self-reflection is only useful if the reflective agent has accurate tools for evaluating its own outputs. If the primitives are unreliable, the evaluation is unreliable, and the meta-reasoning trains the system on noise.

Why this order matters

The layers are a dependency graph, not a preference. Layer 4 calls layer 3. Layer 3 calls layer 2 and layer 1. Layer 5 evaluates all of them. You cannot have a reliable orchestration layer if the single-call layer is flaky, because the orchestration runs that layer dozens of times per task.

The temptation to start at layer four is real. Layer four is where the research is. Layer four is what makes for a good demo. Layer four is what the business is asking for.

But the demo hides the foundations. The production system exposes them. Every layer you skipped becomes a liability at scale.

Start at layer one. Build up. When layer N is solid, layer N+1 has something real to stand on. That is not a conservative approach to AI development. That is the fast approach, because you are building something that compounds. Each solid layer multiplies the effectiveness of the one above it.

The onion has layers for a reason.

— AK-mee Engineering