Every term Yoke speaks, defined.
Short, opinionated definitions for the RAG and agent evaluation terms you will run into across the docs, the dashboard, and the reports. Each page has the formula (if there is one), a concrete example, and how Yoke Agent uses the term in practice.
Fraction of claims in the answer that are supported by the retrieved context.
RAG metric Answer relevancyHow well the answer responds to the question that was actually asked.
RAG metric Context precisionSignal-to-noise ratio of the chunks the retriever returned.
RAG metric Context recallFraction of the ground-truth information the retriever actually found.
RAG metric Noise sensitivityHow much answer quality drops when irrelevant chunks are injected into context.
RAG metric Entity recallFraction of named entities from the ground-truth answer actually recalled.
Quality signal HallucinationFactual claims in the output that are not supported by retrieved context or input.
Framework G-EvalLLM-as-judge evaluation using chain-of-thought rubrics and weighted scoring.
Method LLM-as-judgeUsing an LLM to score another LLM’s output against a rubric.
Agent metric Tool-call accuracyWhether an agent invoked the right tool with the right arguments.
Agent metric Refusal accuracyWhen the agent refused, was the refusal correct? When it didn’t refuse, should it have?
Workflow Grid-searchExhaustive evaluation of every combination in a defined configuration space.
Origin Poka-yokeJapanese manufacturing term for a jig that makes it impossible to ship a defective part.