Why AI agent and chatbot programs turn into data plumbing—and the architectural choice that determines whether you ship solutions or infrastructure.
TL;DR
Most AI agent programs become data plumbing projects because business context is rebuilt on every request. Stitching data together across systems creates inconsistency, slows iteration, and breaks as workflows span multiple steps.
This approach can work for simple chatbot interactions. It doesn’t hold when systems need to maintain state, chain decisions, and take actions reliably over time.
The shift is architectural: context must be persistent. Pipelines reconstruct context every time. A persistent layer keeps it consistent, current, and ready to use.
Every agent program I’ve reviewed in the last two years starts with the same optimism and ends with the same question from the CTO or CPO:
“Why is 80% of our effort going into data plumbing instead of the business outcome?”
– Ravi Marwaha, Chief Operating Officer and Chief Product & Technology Officer, Arango
The answer is almost always architectural. Teams treat business context as something to be assembled on the way to the model rather than something the system is built on. That single framing choice is what determines whether the program delivers business outcomes or becomes a permanent integration project.
This is a data architecture problem, and it shows up at exactly the moment the program matters most — when the POC works, the executives are excited, and the team commits to “productionizing.”
What teams are actually building when they say they’re “building an agent”
On paper, the plan is: connect a model to our data, give it some tools, ship a chatbot, then evolve it into an agent.
Between weeks 6–12, teams aren’t building “an agent”—they’re building 12 interdependent systems in parallel. The overlap is the insight: this is why it turns into an infrastructure project.
Agent Infrastructure: The Hidden Build — Weeks 1–14
- A retrieval layer (usually a vector DB)
- A relationship layer (graph or joins across relational stores)
- A keyword/lexical layer (BM25/full-text, often bolted on separately)
- An entity resolution and canonicalization layer
- A metadata and filter layer (tenancy, ACLs, recency, jurisdiction)
- A chunking + embedding pipeline, with re-embedding on drift
- A sync/CDC layer to keep all of the above coherent with systems of record
- A caching layer so latency is acceptable
- An evaluation and lineage layer so you can explain why the model said what it said
- A state/memory layer for multi-turn, multi-step interactions
- A tool/skill registry with permissions and routing
- An orchestration layer to tie it together
Each layer triggers the next. None of this was in the original scope doc. All of it is necessary. And none of it, individually, is the product.
The layer nobody scopes: keeping it current
Freshness isn’t a layer — it’s a continuous obligation that runs underneath every other layer simultaneously. The retrieval index goes stale when a policy changes. The graph goes inconsistent when an account is merged. The entity resolution layer drifts the moment a new data source arrives with slightly different naming conventions. Each requires a different propagation mechanism — streaming for some systems, batch for others, event-driven for others still — and none of those mechanisms owns itself. The staffing consequence is the part that surprises teams most: keeping a multi-layer retrieval architecture current requires engineers who span data engineering, platform, and application — and when something breaks quietly, none of those teams owns it cleanly. Users learn faster than you’d expect that the agent “doesn’t always know about recent changes.” Once that reputation sets in, it’s very hard to recover.
This is the 80% tax. It is paid in engineering hours, in on-call pages, in the gap between “the demo worked” and “we can trust this in production.” The business asked for an agent that helps underwriters, or triages tickets, or answers policy questions. What the team ends up shipping first — if they ship anything at all — is the scaffolding.
AI chatbots gave you an unearned sense of comfort. AI Agents will expose it.
AI chatbots could get away with being a bit “loose” with information in a way AI agents can’t. If you asked a chatbot a one-off question, it was usually fine if the information was slightly out of date or pieced together on the fly.
But AI agents are different. They don’t just answer questions—they take actions, make decisions, and carry things forward over time. That means they need accurate, up-to-date, and connected business information at all times. If the underlying context is wrong or incomplete, the agent doesn’t just give a slightly off answer—it can make the wrong decision.
AI agents break that forgiveness in 5 specific ways:
- State. An AI agent needs to know what is true right now — not what was true when you last ran the pipeline. If your business context is reconstructed per call from eventually-consistent pieces, multi-step reasoning will contradict itself inside a single session.
- Tools. Tool use turns read operations into write operations. Now incorrect context has side effects: wrong ticket updated, wrong customer emailed, wrong invoice approved.
- Multi-step. AI agents chain decisions. Small inconsistencies in business context compound across steps. A 2% variance per hop, five hops deep, is a different system each time it runs.
- Autonomy. Humans stopped being the final consistency check. Whatever your data layer produces, the agent acts on.
- Evolution. AI systems do not stay still. LLMs, retrieval strategies, data, and business rules all change. Without observability, automated test set creation, and continuous evaluation, the system drifts quietly over time.
The architectural shortcuts you could hide behind a chatbot becomes the defining failure mode of the AI agent. What was once “good enough” context becomes the point of failure—because context isn’t being maintained, it’s being reconstructed.
“An AI agent needs to know what is true right now — not what was true when you last ran the pipeline.”
– Ravi Marwaha, Chief Operating Officer and Chief Product & Technology Officer, Arango
The cross-domain expertise tax nobody priced into the roadmap
Here is the part most roadmaps understate. Building your own context layer isn’t a side quest for a couple of engineers. It requires senior expertise across disciplines that rarely live in the same team:
- Data engineering and Change Data Capture (CDC), to keep sources of truth in sync with derived representations
- Information retrieval, for ranking, re-ranking, and fusion across dense/sparse signals
- Graph modeling and traversal, for entity resolution and multi-hop reasoning
- Embedding selection, evaluation, and lifecycle management
- Distributed systems, to reason about consistency, partitioning, and failure modes
- ML engineering, for evaluation harnesses, drift detection, and feedback loops
- Security and governance, for row/column/attribute-level access, lineage, audit, PII handling
- Platform engineering, to give the rest of the company a stable surface to build on
Most organizations have some of this. Very few have all of it on one team, under one set of priorities, with the authority to make architectural trade-offs across layers. So the work fragments — by team, by tool, by quarter — and the fragmentation bakes itself into the architecture.
You end up with a stack whose shape is determined by your org chart, not by the problem.
Context can’t be specified upfront. Which makes stitching worse, not better.
Here’s the trap that catches even strong teams: business context is inherently iterative.
You don’t know what “relevant business context” means for underwriting until you’ve watched underwriters use the system for a month. You don’t know which relationships matter for fraud investigation until an investigator tells you the third hop is the one that matters. You don’t know what “current” means for a policy agent until a policy changes mid-conversation and the wrong answer costs you.
This means the context layer must be reshaped continuously. New signals, new relationships, new filters, new invariants.
In a bolted-together architecture, every one of those iterations is a coordinated change across three to six systems. New attribute? Update the source schema, update the CDC, update the embedding pipeline, update the filter Domain Specific Language (DSL), update the graph model, update the eval harness, re-test. Multiply that by every use case, every domain, every team.
Iteration velocity collapses. Not because any one system is slow, but because the coordination between them is the system.
And the worst part: each use case re-solves the same problem. The fraud team’s graph, the support team’s graph, the compliance team’s graph — all derived from overlapping sources, all separately maintained, all drifting independently. You are paying the context tax N times – in engineering effort, system complexity, and LLM compute costs.
A Note on Context
A contextual data layer is an architecture where relationships, state, and meaning are continuously maintained and directly usable—rather than reconstructed at runtime. This distinction becomes critical as AI systems move from experimentation to production.
What the context layer is actually supposed to do
Strip away the vendor language for a moment. A context layer, done properly, has a short list of hard requirements:
- It represents the current state of the business, continuously. Not snapshots. Not pipeline outputs. State, with defined consistency semantics, governed by known update paths.
- It holds multiple representations of the same thing in one place. Documents, relationships, vectors, time series, full-text — whatever the domain needs — queryable together, not stitched together.
- It answers queries that span those representations without leaving the system. “Find things semantically similar to X, filtered by tenant Y, within two hops of entity Z, updated in the last 24 hours, that the requesting user is allowed to see.” One query. One consistency boundary.
- It preserves lineage. Every answer the AI agent gives should be traceable back to the facts that produced it, with timestamps and provenance.
- It enforces governance at the layer, not at the pipeline. ACLs, PII, jurisdiction, retention — applied once, at the context layer, not reimplemented in every agent.
- It exposes one interface to the rest of the organization. Not six. One.
If your current architecture can’t do those things without runtime reconstruction across systems, you don’t have a context layer. You have a pipeline that approximates one, and the approximation is where the cost lives.
Platform, not silo: one context layer, N use cases
This is the CPO/CTO decision that actually matters.
You can treat business context as something each use case builds for itself. That’s the default path — the fraud team builds theirs, the support team builds theirs, the compliance team builds theirs. Every team is re-deriving the same entities, the same relationships, the same embeddings from the same systems of record. Every team is solving Access Control List (ACL), lineage, and freshness on its own. Every team is paying the 80% tax independently.
Or you can treat the context layer as a platform: one layer, governed centrally, serving many domains and many use cases. Individual teams own their domain models and their AI agents. The platform owns the invariants — consistency, lineage, governance, freshness, the multi-representation query surface.
The platform approach is harder to start. It’s dramatically less expensive to scale. And it’s the only approach I’ve seen reach the point where agent programs stop being infrastructure projects and start being solution projects.
The economics are asymmetric. Per-use-case business context costs scale linearly (at best) with the number of use cases. Platform context costs amortize. By the third or fourth agent, the platform team is shipping capabilities the domain teams didn’t even have to ask for — a new relationship type, a new filter, a new data source — and every AI agent benefits at once.
This is not a nice-to-have. It is the difference between an AI agent roadmap and scalable AI agent program.
“The platform approach … it’s the only approach I’ve seen reach the point where AI agent programs stop being infrastructure projects and start being solution projects.”
– Ravi Marwaha, Chief Operating Officer and Chief Product & Technology Officer, Arango
The question to ask your team this quarter
Not: “Which vector DB should we use?”
Not: “Which framework should we adopt?”
Not: “Do we need a graph?”
Ask this:
“How many places is our business context being reconstructed at runtime today, and what would it take to maintain it in one?”
If your team can’t answer cleanly — if the answer is “it depends on the use case,” or “it lives across these six systems”— you are already paying the 80% tax. Every new agent you green-light will compound it.
The fix isn’t another model. It isn’t another framework. It isn’t another database bolted onto the last one.
The fix is to stop treating context as an implementation detail and start treating it as the platform the rest of your AI strategy sits on. Build it once, build it governed, build it multi-model, build it to be iterated on continuously — and let your teams go build the solutions the business actually asked for.
That’s where the 80% goes back where it belongs.