The 6 requirements your contextual data layer must deliver

The 6 operating requirements of a contextual data layer: semantic clarity, relationships, freshness, provenance, an AI-native service layer, and unified multimodel coverage.

Data carries meaning, not just structure. Canonical entities resolve terminology drift across CRM, billing, and support.

Connections between entities are first-class and native. Multi-hop reasoning without federated joins.

Context reflects what is true now — and can answer what was true at time T. Bi-temporal by design.

Lineage, RBAC, and audit logging embedded in the platform. Every AI output is traceable to source.

Retrieve, rank, cite, and ground are built-in. Not reimplemented in every consuming application.

Graph, vector, document, key-value, and search — one foundation, one query language, one governance model.

Semantic clarity

Schemas capture structure. They don’t capture meaning.

Your schema does not capture what your data actually means. The CRM says “active”; billing says “paying”; support says “green.” Same word, three definitions, three consuming systems.

= logged-in in last 30 days

= invoice paid this cycle

= no open Sev-1

A contextual data layer resolves those definitions into a single shared meaning: one answer to “what does active mean,” applied consistently across every system that asks.

Figure 3.1 — Same word, three meanings. One governed entity model.
Figure 3.1 — Same word, three meanings. One governed entity model.

Every consuming system invents its own join key. The same customer is cust_3781 in CRM, acme-corp in billing, and ACME Corporation, Inc. in support tickets.

AI agents operating on one slice cannot reason across the others without a brittle mapping table that drifts the moment a system changes, a new source appears, or someone enters data differently.

Arango resolves those fragmented identifiers into a single, stable entity, automatically. Whether a match is exact, approximate, or inferred from structural similarity, the result is one customer, one ID, one definition that every agent and application reads from. And when definitions evolve (because they always do), Arango tracks those changes over time. A query about what was true last quarter returns last quarter’s answer. A query about today returns today’s.

Freshness & temporal correctness

Context must reflect what is true now — and what was true at time T.

Time is the thing most AI systems handle worst. The index was built last quarter; the graph updates nightly; the warehouse lags a day. Agents reason across this drift and quietly produce wrong answers. Persistent context carries an evolving state rather than reconstructing it. Bi-temporal modeling tracks both when an event occurred (valid time) and when it was recorded (transaction time) — so “what is true now” and “what was true at time T” come from the same model.

when it happened in the world

when the system learned about it

state as-of, or state as-known-at

Figure 3.3 — Current state vs. historical state, answerable from the same layer.
Figure 3.3 — Current state vs. historical state, answerable from the same layer.

Freshness gaps between stores mean agents answer “what is true” with stale state. Reconciling a warehouse-lagged record against a live graph requires ad-hoc reconciliation code — which itself becomes stale.

Historical queries (“what did we know on March 3rd?”) are almost always impossible without rebuilding from logs.

Arango tracks two things simultaneously: what is true in the world right now, and what the system knew at any given point in time.

That means you can query the actual state of a customer, service, or incident at any moment in history — not just what was recorded, but when it was recorded. AutoGraph ingests continuously, so freshness is a structural property of the platform — not an operational scramble after the fact.

Provenance & trust

All data must be traceable, governed, and explainable.

An AI output without lineage is not defensible. A SOC 2 audit, an EU AI Act review, or a regulator asking why a customer was denied all require the same thing: traceability back to source.

Carries a citation set traceable to source records

Access-controlled at the engine, not the API gateway

Logged with source, timestamp, and transformation chain

Emits an explainable plan alongside its results

Figure 3.4 — Provenance and trust
Figure 3.4 — Provenance and trust

When provenance lives outside the data layer, it goes stale within weeks. Pipelines fork, records get re-derived, and the lineage graph diverges from reality. Audits then require forensic reconstruction — engineers reading commit logs to explain why a model cited a document that no longer exists.

Lineage is captured at write time and at query time. Every record carries source, ingest timestamp, and transformation chain; every AQL execution emits a query plan and citation set returned alongside results. RBAC is enforced at document, edge, and field level. Audit logs are immutable and exportable for SOC 2, ISO 27001, HIPAA, and EU AI Act review.

Unified multimodel data platform

Context must be unified across all data types in a single model.

The mistake most teams make is assuming they have to pick one or two data representations and work around the limits of the rest. The Frankenstack tax — sync lag, no cross-model transactions, five access policies — is permanent.

Figure 3.6 — One query, four models, one execution path.
Figure 3.6 — One query, four models, one execution path.
ModelDescription
GraphNative edges, multi-hop traversal in milliseconds
DocumentNested JSON, schema-flexible
VectorSemantic similarity and nearest-neighbor search
Key-valueLow-latency state & lookups
+ AQLOne query language spanning all four
Ava - unified multimodel platform

Stitched stacks (Postgres + Neo4j + Pinecone + Mongo + Elasticsearch) carry a permanent tax: sync lag, no cross-model transactions, five credential sets, and application code fanning out and re-merging results on every request. Latency compounds; consistency degrades; the team runs a distributed systems project instead of an AI project.

Graph, document, vector, key-value, and search live in one data engine and share one query language. A single AQL statement can traverse a customer subgraph, filter by document fields, rank by vector similarity, and join to a key-value lookup — in one execution path, with one transaction boundary, one access policy.

Architecture defines requirements. Requirements define AI outcomes. Miss any of the six, and agentic AI will hit a ceiling no amount of engineering can fix after the fact.

Chapter 3 FAQs

Semantic clarity, connected relationships, freshness and temporal correctness, provenance and trust, an AI-native service layer, and unified multimodel coverage.

A retrieval approach that uses a knowledge graph to augment generative AI responses with relationship-aware context, enabling multi-hop reasoning that pure vector search can’t provide.

GraphRAG when the question requires traversing explicit relationships. HybridRAG when you need a mix of structured relationships and semantic similarity in one pass.

A contextual data layer exposes itself as an MCP server so agents can discover and invoke its retrieval, graph, and governance services through a consistent interface.