Start here
Work in progress — research harness, not a polished platform. First game: code localization (bug-localization slices, symbol/code-search with lookahead).
1. Round manifest
File: configs/rounds/local-ic-vs-jcodemunch/round.pkl
Excerpt:
amends "../../schema/games/code-localization.pkl"
import "../../schema/games/code-localization-helpers.pkl" as game
name = "example-local-ic-vs-jcodemunch-round-001"
round = (game.defineFromScratch("round-001")) {
incumbent = game.jcodemunch()
challenger = (game.iterativeContext("policies/challenger_policy.py")) {
selectionPolicy { id = "challenger-policy-round-001" }
}
matches = game.lca("py", "dev", 5)
evaluator = game.fakeEvaluator()
scoring = game.objective("scoring/localization-objective.pkl")
}| Stanza | Meaning | Example path |
|---|---|---|
incumbent | Baseline interface | jCodeMunch backend |
challenger | Interface under test | IC + policies/challenger_policy.py |
matches | Dataset slice (5 LCA rows) | game.lca("py", "dev", 5) |
evaluator | Match runner (fake = offline) | No live models |
scoring | Pkl objective module | scoring/localization-objective.pkl |
Schema: configs/schema/games/code-localization.pkl · Helpers: configs/schema/games/code-localization-helpers.pkl
2. Run it
From repo root (after building the CLI):
./searchbench run \
--manifest=configs/rounds/local-ic-vs-jcodemunch/round.pkl \
--bundle-root="$(pwd)/.tmp-artifacts"Proves with: same flow as buck2 test //src/searchbench-go:check (Go tests exercise manifests).
Offline only today — game.fakeEvaluator(), no live MCP or provider calls.
3. Bundle output
Path: {bundle-root}/games/code-localization/rounds/round-001/
Checked-in reference copy:
configs/rounds/local-ic-vs-jcodemunch/evidence/bundle/games/code-localization/rounds/round-001/
COMPLETE
resolved-round.json
round-report.json
round-report.txt
evidence.pkl
objective.json
decision.json
metadata.json
continuation.json
continuation.pkl
policies/challenger_policy.py| File | Inspect for |
|---|---|
decision.json | PROMOTE_CHALLENGER / REVIEW / REJECT |
objective.json | Scored values from the Pkl objective |
evidence.pkl | Round evidence fed into scoring |
continuation.pkl | Input for the next round manifest |
Details: reference/bundles.md.
4. Next round (continuation)
File: configs/rounds/optimize-ic/round.pkl
Excerpt:
amends "../local-ic-vs-jcodemunch/evidence/bundle/games/code-localization/rounds/round-001/continuation.pkl"
import "../../schema/games/code-localization-helpers.pkl" as game
round {
id = "round-002"
challenger {
generate {
optimizer = game.fakeOptimizer()
artifactName = "next_challenger_policy.round-002.py"
}
}
}Meaning: amends prior bundle continuation; optimizer proposes a new challenger policy for round 002. See reference/pkl-rounds.md.
5. Read next
- concepts.md — vocabulary → files
- reference/pkl-rounds.md — manifest API
- reference/bundles.md — bundle fields
- components.md — who owns what
- development.md —
buck2 test //:check_full