Evaluating graph retrieval without drowning in metrics

Teams new to graph-augmented retrieval often track too many dashboards at once. Start with a narrow task suite that mirrors the questions your users actually ask, then expand.

Build a golden set

For each task, record the question, the expected supporting facts (node IDs or relation paths), and the acceptable answer shape. That makes regressions visible when you change embedding models or rerankers.

task_id: q-184
question: Which subsidiary owns the trademark?
expected_nodes: [org:acme-eu, asset:mark-7712]
notes: Answer must cite the ownership edge, not a press release summary.

When to add online metrics

Once offline precision is stable, layer click-through, thumbs-up, or human spot checks—but only after you trust that the graph is returning the right neighborhood most of the time.