The knowledge graph as ambient context for agents
An agent without context is just a model. Classic vector retrieval treats knowledge as a bag of independent chunks — which is why it fails on the questions that matter most to a company. We walk through the real problems RAG developers hit (with data), why graph structure wins where vectors fall short, how the landscape stacks up (Pinecone, LangChain, LlamaIndex, Neo4j, Microsoft GraphRAG), and how we plan to be the best at the context that actually counts: a company's operation.

The promise of RAG (retrieval-augmented generation) is simple: feed the model the right documents and it will answer well (Lewis et al., 2020). The reality for anyone shipping it is rougher. A company's knowledge isn't a stack of loose texts: it's a network of cases, customers, invoices, tasks and agents connected to each other. Flattening that into independent chunks loses exactly what gives it meaning — and the numbers confirm it.
This article is our technical thesis: why we treat knowledge as a graph and use it as the ambient context of agents, not as a bag of chunks.
The problems developers actually hit
Anyone who has built a real RAG has hit the same wall: adding more context doesn't improve the answer — sometimes it hurts it. That's not anecdote, it's measured. Liu et al. showed that models use information well at the start and end of the context, but lose it when it lands in the middle (Liu et al., 2023).
Same fact, same question: only the position of the relevant document within the context changes. The mid-context drop exceeds 20 points.
Fuente: Liu et al., 2023 — Lost in the Middle (arXiv:2307.03172)
With 30 documents the effect is so severe that placing the fact in the middle (50.5%) does worse than answering with no document at all (56.1%): badly-ordered context subtracts (Liu et al., 2023). And this is just one of several failure modes the literature documents.
Barnett et al. cataloged seven recurring failure points when taking a RAG to production (Barnett et al., 2024): missing content, the relevant document missing the top-k, dropped during prompt consolidation, not extracted despite being present, wrong format, wrong specificity and incomplete answer. And RAGTruth measured that even with retrieval, a non-trivial fraction of responses hallucinate — up to 27% on data-to-text tasks with GPT-4 (Wu et al., 2024).
The common root
Almost all of these failures share a cause: similarity retrieval brings chunks similar to the question, but blind to one another. If the answer requires connecting several pieces (multi-hop) or synthesizing a whole corpus, vector similarity has no way to see it (Tang & Yang, 2024).
On top of that comes chunking fragmentation: splitting documents into fixed-size chunks cuts a single fact across two chunks, and neither holds the complete answer (Gao et al., 2023).
Three ways to retrieve (and why structure matters)
Not all retrieval architectures are equal. It helps to separate three paradigms:
| Paradigm | How it retrieves | Strong at | Blind spot |
|---|---|---|---|
| Vector RAG | k nearest neighbors by embedding similarity | Meaning, synonyms, speed | Multi-hop, relationships, global synthesis |
| Hybrid (BM25 + vector) | Fuses exact lexical + semantic (e.g. RRF) | Exact terms (codes, names) + semantics | Still ranking of disconnected passages |
| Graph RAG | Traverses explicit relationships + diffusion | Multi-hop, relational context, sensemaking | Cost of building the graph |
Hybrid fixes "vectors miss the exact term"; it does not fix "retrieval ignores how facts connect." That takes structure. And that's where the graph changes the rules.
The graph as ambient context
We model the operation as a directed graph where nodes are entities — cases, documents, customers, tasks, agents — and edges are the real relationships between them. Some nodes concentrate a huge number of connections; we call them god nodes, and they tend to be the points the whole operation flows through.
To measure a node's importance we use PageRank (Page et al., 1999), defined recursively: a node is important if important nodes point to it.
When an agent needs context we don't just fire a similarity search: we seed the graph with the nodes most aligned to the query and let relevance diffuse to their neighbors through the normalized adjacency :
This isn't a hunch: it's exactly the mechanism HippoRAG demonstrated to solve multi-hop questions in a single retrieval step, using Personalized PageRank over a knowledge graph (Gutiérrez et al., 2024). The evidence is striking.
Dense vector retrieval (ColBERTv2) vs graph + Personalized PageRank, same reader. Structure recovers roughly twice the useful evidence on questions that require chaining facts.
Fuente: Gutiérrez et al., 2024 — HippoRAG (arXiv:2405.14831)
And for global questions — "what are the themes that run through the whole operation?" — which have no single answer passage, Microsoft GraphRAG showed that detecting communities in the graph and summarizing them systematically beats vector RAG when an LLM judge scores comprehensiveness and diversity (Edge et al., 2024). Communities come from optimizing modularity — Louvain (Blondel et al., 2008) and its successor Leiden (Traag et al., 2019), which is the one GraphRAG uses:
Intellectual honesty
The graph doesn't always win. On single-hop questions, or when literal conciseness is valued, vector RAG is enough — and even better (Edge et al., 2024). That's why we don't replace the vector: we combine it with the graph and a precision reranker. Structure is used where it helps: relationships, multi-hop and the big picture.
How the landscape stacks up
The ecosystem is excellent at what it does, but almost all of it is built around the passage, not the relationship:
| Tool | What it is | Retrieval mechanism | Relational blind spot |
|---|---|---|---|
| Pinecone | Managed vector database | Vector similarity (+ hybrid) | No native notion of relationships |
| Weaviate | Vector database (not a graph DB) | Vector + BM25F | Cross-refs discouraged for deep traversal |
| LangChain | Orchestration framework | Delegates to the backend you plug in | No native relational retrieval of its own |
| LlamaIndex | Data framework for RAG | Vector + PropertyGraphIndex | The graph depends on LLM extraction |
| Neo4j | Graph database | Cypher + vector index | You must build and model the graph first |
| Microsoft GraphRAG | Graph pipeline | Graph + communities (Leiden) | Expensive, LLM-intensive indexing |
| Elastic / OpenSearch | Search engines | BM25 + kNN (RRF) | No relationship traversal across documents |
The point isn't that these tools are bad — they're superb building blocks. It's that the knowledge graph as the live ambient context of an operation isn't the use case almost any of them was designed for.
How we plan to be the best
We don't compete on having the best vector index: we compete on understanding a company's operation better than anyone. That's where we focus the edge, by domain:
The usual approach
BiVelio
- Operational context, not just documents. Our graph isn't born from chunking PDFs: it's born from how the company actually works. That yields precise, up-to-date relationships, not inferred ones.
- Multi-hop and the big picture, built in. Personalized PageRank to retrieve coherent neighborhoods (Gutiérrez et al., 2024) and communities to reason at the right granularity (Edge et al., 2024) — the two modes the evidence rewards.
- Coherent context, not fragments. We retrieve the case plus its customer plus its related invoices, not three chunks that share a word. We attack "lost in the middle" (Liu et al., 2023) head-on by delivering less context but better connected.
- Precision and cost. We pair the graph with ephemeral reranking so only the best reaches the agent's window — the idea we develop in Ephemeral reranking.
- Governance and traceability. Every piece of context traces back to the graph: you can audit where a decision came from. In enterprise operations, that's not a nice-to-have, it's a requirement.
Note: the figures in this article come from the cited literature (Liu et al., Edge et al., Gutiérrez et al., Barnett et al., Wu et al.) and describe graph approaches in general. They are the motivation for our design, not a fixed product benchmark.
References
- #graphs
- #knowledge graph
- #graphrag
- #agents
- #pagerank
- #rag