Anti-Patterns

Most agent literature documents what to build. This page documents what to recognize when it has already been built wrong. The three failure modes below were extracted from forensic reading of six open-source agent projects with non-trivial usage. The projects are not named; the code shapes are. If your codebase has the shape, you have the failure mode regardless of which project you started from.

1. Prompted Architecture

Definition: When a feature’s load-bearing logic lives in a prompt template rather than in code, the feature inherits the LLM’s reliability (~90%) rather than the code’s (~100%).

Symptoms:

A system prompt or “agent instructions” file longer than ~150 lines doing coordination or control flow.
Comments inside prompts like CRITICAL — avoid teammate timeouts or you MUST sequence X before Y.
Multi-agent “team mode” features whose orchestration logic is a multi-hundred-line system prompt.
Benchmark improvements driven by an installed instruction file telling the agent which tools to use — not by the tools themselves being better.
Behavior contracts described in narrative prose rather than enforced by hooks or permissions.

Evidence from the field: Forensic analysis of one widely-starred multi-agent platform found a 267-line “leader prompt” containing the comment

## Sequencing Dependent Work (CRITICAL — avoid teammate timeouts)

instructing the model to manually poll for completion before dispatching a dependent task. That is not orchestration — that is a workaround dressed as a feature. A separate, popular code-intelligence tool achieves its headline performance number (“94% fewer tool calls”) by installing a CLAUDE.md fragment into the user’s project that tells the agent not to grep.

The honest version: Both projects ship real engineering underneath. The pattern isn’t “prompts are bad.” It’s that describing what the LLM should do is not the same as making the LLM do it. Prompted Architecture is when the description IS the architecture.

Fix:

If the prompt contains the word CRITICAL, MUST, or NEVER followed by a race condition or a timeout, that logic belongs in code.
Use hooks (pre-tool, post-tool) for enforcement. Use prompts for intent. See Three-Tier Enforcement and Lifecycle Hooks.
Move sequencing into a scheduler. Move policy into permissions. Leave the prompt to say what we’re trying to do, not how the model must do it.

2. Vector-Default Memory

Definition: Making vector retrieval the primary memory mechanism without an integration layer above it. The system can recall that something happened but cannot reason about what it means.

Symptoms:

A “memory” subsystem whose central abstraction is an embedding store.
Multiple “memory types” or “memory stores” whose database schemas are 60-80% identical — one polymorphic table wearing several labels.
An LLM call per observation at ingest time, with no batching or job queue.
No rerank pass between vector retrieval and prompt assembly.
“Privacy-first” marketing on a system that has no local-LLM path because the LLM is on the ingest path.

Evidence from the field: Forensic analysis of one production memory framework found six declared memory stores (episodic, semantic, procedural, resource, knowledge_vault, core) whose ORM schemas shared ~70% of their columns. The “router” that supposedly dispatches observations to the right store was a hardcoded line:

return await self.agents["meta_memory_agent"].step(...)

Every observation cost ~2 LLM calls plus 2 embedding calls. The system requires a cloud LLM key to ingest at all — but is marketed as privacy-first.

The architectural failure: The LLM is good at integrating a small amount of well-curated text. It is bad at integrating a large number of approximate-nearest-neighbor fragments. Vector retrieval can find relevant fragments; only summarization can integrate them. A memory system whose primary mechanism is retrieval returns a thousand fragments. A memory system whose primary mechanism is hierarchical summarization returns one paragraph that captures their meaning.

Fix:

Hierarchy first. Vectors are an index over the hierarchy, not the substrate.
See Memory for the corrected three-tier architecture, and Context Management for the budget model that drives it.
If your memory schema has six tables that share 70% of their columns, you have one table with a type enum.

3. Premature Distribution

Definition: Adopting distributed-systems infrastructure (Kafka, distributed queues, multi-service orchestration) for a workload that runs in a single process. The operational cost is paid upfront; the architectural benefit is never realized.

Symptoms:

A docker-compose.yml wiring Postgres + Redis + Kafka + agent backend + dashboard, where the actual workload is one user’s local agent.
Async message-bus abstractions with topics, partitions, and consumer groups — used by exactly one producer and one consumer.
“Scalable to millions of users” framing on a desktop product.
Operational complexity that requires a 200-line README to start the system locally.

Evidence from the field: Forensic analysis of one production memory framework found Kafka wired into the compose file with consumer-group configuration:

kafka:
  image: confluentinc/cp-kafka
  environment:
    KAFKA_GROUP_ID: agent-events

for a workload that, in the code, is a single asyncio task pulling from a single producer. A separate personal-agent project solved the same decoupling problem with two asyncio.Queue objects totaling ~40 lines of code.

The honest version: Distribution is correct when you need it. A team that has hit the limits of single-process async, has multiple producers writing into the same logical stream, and has the operational maturity to run a broker — that team should adopt Kafka. A team building a personal agent that runs on a developer’s laptop should not.

Fix:

The lightest bus that buys decoupling, no more.
A two-queue MessageBus is enough for most agent workloads. See Multi-Agent Coordination and Dev Lifecycle.
Adopt a real broker only when you can name three properties of it you actively need — durability, fan-out, replay, ordering, partitioning — and your current substrate provably can’t deliver them.

4. Compaction-Vulnerable State

Definition: Storing long-running state (goals, identity, active task pointers, user constraints) inside conversation history, where compaction can summarize or delete it.

Symptoms:

Long-horizon goals expressed as user messages (“Your job is to migrate the auth module to OAuth2…”) with no other storage.
Agents that “drift” from their goal around turn 30-60.
“I forgot why we were doing this” failures after a long session.
No session.metadata use; everything lives in the message array.

Evidence from the field: Forensic analysis of one production personal-agent codebase showed long-running goal state stored in session.metadata[GOAL_STATE_KEY] and re-injected into the Runtime Context block at every turn via a goal_state_runtime_lines() function. Because compaction operates on message history but never touches session metadata, the goal survived every compaction pass. A different project in the same category stored the goal as the first user message — and observed agents drifting from it after 40 turns once compaction reduced the early history to a single summary line.

Fix:

Out-of-history state. Goals, identity, active task pointers, standing constraints belong in session metadata.
Re-inject metadata into the system prompt every turn via a runtime context block.
The metadata is what compaction reads from, not what it operates on. See Memory → Compaction-Resident State.

5. Ungated Background Work

Definition: Idle-time or scheduled LLM work that runs without checking machine state (battery, CPU, memory, network). Common in “subconscious” or “auto-improve” features on laptops.

Symptoms:

Background agents that fire on a fixed cron interval regardless of laptop power state.
Multiple local LLM processes spawning concurrently with no concurrency limit.
Saturated memory crashes traced to a “helpful” background task running mid-call.
No documented signals the scheduler reads before firing.

Evidence from the field: Forensic analysis of one full-stack personal-AI project’s scheduler module included an explicit decision log: “saturated memory from concurrent Ollama calls has crashed the user’s laptop twice.” The fix was a scheduler gate that reads battery state, CPU usage, and a model-saturation semaphore before allowing background LLM work to proceed. Policies: Aggressive, Normal, Throttled, Paused. Most agent platforms have no equivalent.

Fix:

Gate background work on real signals: battery state (charging vs discharging, charge %), CPU load, available memory, network status.
Bound local-model concurrency with a semaphore tuned to your RAM ceiling (often slot count = 1).
Make the policy explicit: Aggressive on AC + idle, Normal on AC, Throttled on battery, Paused below a charge threshold.
Treat machine state as a first-class input to scheduling, not an afterthought.

Detecting These in Your Own Work

Read your longest prompt. If it contains words like CRITICAL, MUST, or NEVER followed by what looks like a control-flow instruction or a race-condition workaround, you have Prompted Architecture. The fix is to move that sentence into code and delete it from the prompt — if behavior gets worse, the prompt was carrying weight it should never have carried.

Look at your memory schema. If you have three or more “memory types” whose columns substantially overlap, you have Vector-Default Memory — or its cousin, Typed-But-Identical Memory. A schema that names six things and stores one thing is telling you the taxonomy was invented before the data.

Run your local dev environment. If it takes more than 60 seconds and requires more than two processes, you may have Premature Distribution. Then look at how many of those processes have more than one producer or one consumer in practice — that’s the real distribution-worthiness check. A broker with one writer and one reader is a queue with a network hop.

Search your codebase for session.metadata or its equivalent. If you have none and your agent has long-running goals or identity, you have Compaction-Vulnerable State. A goal that lives only in a user message will be compacted away — measure it: after 50 turns of natural conversation, ask the agent what its current goal is.

Look at your background-task scheduler. If it fires on wall-clock alone with no input from battery, CPU, or memory state, you have Ungated Background Work. The fix is rarely complex — it’s usually 30 lines of signal-reading code in front of the existing job loop.