Evaluation · Graph retrieval
Evaluating graph retrieval without drowning in metrics
A small, opinionated set of offline checks keeps graph retrieval honest before you spend weeks on human eval.
By Perseus team1 min read

Teams new to graph-augmented retrieval often track too many dashboards at once. Start with a narrow task suite that mirrors the questions your users actually ask, then expand.
Build a golden set
For each task, record the question, the expected supporting facts (node IDs or relation paths), and the acceptable answer shape. That makes regressions visible when you change embedding models or rerankers.
task_id: q-184
question: Which subsidiary owns the trademark?
expected_nodes: [org:acme-eu, asset:mark-7712]
notes: Answer must cite the ownership edge, not a press release summary.
When to add online metrics
Once offline precision is stable, layer click-through, thumbs-up, or human spot checks—but only after you trust that the graph is returning the right neighborhood most of the time.
