Glossary · Workflow

Grid-search.

Exhaustive evaluation of every combination in a defined configuration space, scored against a fixed test set.

How it’s computed

Define axes (chunking, embedding, retriever, reranker, advanced strategy, etc.) each with a small set of candidate values. The grid is the Cartesian product. Run the test set against every cell. Score every cell with the metric suite. Rank.

number of runs = product of axis cardinalities

The output is a ranked leaderboard plus an improvement report that surfaces what the top cells have in common.

Worked example

Axes: 4 chunking × 3 embedding × 3 retriever × 2 advanced strategy = 72 configurations. With a 100-question dataset and the RAGAS core four metrics, that is 72 × 100 = 7,200 pipeline runs and roughly the same number of judge calls. On OpenAI flex-tier pricing, this is in the $30-$80 range.

How Yoke Agent uses it

Grid-search is the core workflow of the RAG and agent workbenches. You do not write Python loops; you declare axes in a YAML-ish grid definition and the workbench orchestrates the runs, cost tracking, judge calls, and leaderboard.

The output is the ranked configuration table plus an improvement report identifying what the top configurations have in common.

Frequently asked

When is grid-search overkill?

Fixed corpus, fixed configuration, only evaluating a single pipeline — no axes to sweep. In that case regression testing (DeepEval-style assertions) is the right tool.

Why not random search?

For small grids (under ~200 configurations) grid-search is exhaustive, reproducible, and no harder to run. Random search starts to pay off at hundreds of axes or highly non-linear score surfaces — not the typical RAG case.

How do I keep cost under control?

Set a budget cap before the sweep starts. Keep dataset size tight during exploration (50-100 questions); expand only for the final run of the top candidates.