Skip to content

SearchBench docs

Test the interfaces your coding agents use.

SearchBench is a work-in-progress harness for evaluating agent-facing interfaces over benchmark tasks — not a stable public API yet. It wraps benchmark families as games, compares incumbent vs challenger interfaces on the same dataset slice, and records bundles for promote / review / reject decisions.

The first game is code localization (symbol/code-search with lookahead on bug-localization slices).

text
Game → Round → Match → Evidence → Decision → NextChallenger

Current interface

SearchBench’s user-facing surface today:

SurfaceExample
Pkl round manifestconfigs/rounds/local-ic-vs-jcodemunch/round.pkl
CLI./searchbench run --manifest=... --bundle-root=...
Evidence bundleconfigs/rounds/local-ic-vs-jcodemunch/evidence/bundle/games/code-localization/rounds/round-001/
Continuationcontinuation.pkl in that bundle
Proof gatesbuck2 test //:check, buck2 test //:check_full

Excerpt from configs/rounds/local-ic-vs-jcodemunch/round.pkl:

pkl
round = (game.defineFromScratch("round-001")) {
  incumbent = game.jcodemunch()
  challenger = game.iterativeContext("policies/challenger_policy.py") { ... }
  matches = game.lca("py", "dev", 5)
  evaluator = game.fakeEvaluator()
  scoring = game.objective("scoring/localization-objective.pkl")
}

Meaning: Pkl declares intent; Go runs matches, scores evidence, writes the bundle. See start-here.md.

DocPurpose
Start hereManifest → CLI → bundle walkthrough
ConceptsVocabulary with file anchors
ComponentsMonorepo trees, example paths, proof targets
ArchitectureGo lifecycle and implementation paths
DevelopmentNix, Buck gates, docs build

Reference

DocAnchors on
pkl-rounds.mdconfigs/rounds/*/round.pkl, configs/schema/SearchBenchRound.pkl
pkl-objectives.mdconfigs/rounds/local-ic-vs-jcodemunch/scoring/localization-objective.pkl
bundles.mdChecked-in round-001 artifact tree
live-matrix-debugging.md**evaluate_matrix_*stderr triage, stalls, MCP / max-iteration signatures
candidate-workspaces.mdSandboxing, local_path / buck_descriptor
package-boundaries.mdGo import rules
optimizer-policy-validation.mdIC candidate pipeline

Research

research/agent-interface-research.md, research/bxl-meta-harness.md.