Noise sensitivity — Yoke Agent glossary

How it’s computed

Run the same question twice — once with clean context (only relevant chunks), once with the actual retrieval output (relevant + some noise). The delta in faithfulness or correctness between the two runs is the noise sensitivity.

noise_sensitivity = score(clean) − score(with_noise)

A score near zero means the generator is robust — it ignores irrelevant chunks and produces the same answer either way. A large positive delta means the noise is actively degrading the output.

Worked example

Faithfulness with clean hand-picked context = 0.92. Faithfulness with the real retriever’s top-5 (which includes 2 irrelevant chunks) = 0.78. Noise sensitivity = 0.14 — the model loses 14 percentage points when facing realistic retrieval noise.

How Yoke Agent uses it

Noise sensitivity is an optional RAGAS metric, run on demand rather than by default. It is particularly useful when debugging pipelines that score well on small test sets but drift on larger corpora where retrievers return more noise.

Frequently asked

Why does this matter?

Real retrievers always return some noise. A model that collapses under a couple of irrelevant chunks will behave worse in production than evaluation on clean sets suggests.

How do I reduce noise sensitivity?

Two paths: make the retriever cleaner (rerankers, higher top-k thresholds), or make the generator more robust to noise (tighter prompts, lower temperature, explicit “ignore irrelevant context” instructions).

How many noise chunks should I inject for testing?

Two or three. More than that and you’re stress-testing a pathological case rather than realistic production.