SearchBench run entrypoints
Repo-owned runs and evaluations use Buck targets only. Buck is the only supported public interface; the Go binary is private implementation plumbing invoked by Buck. Do not start runs via ad-hoc shell scripts or direct searchbench commands.
After any target completes, inspect report.json (then report.txt) in the bundle directory first.
Live IC vs jCodeMunch (canonical Buck package)
| Target | Command | Mode | Network / secrets |
|---|---|---|---|
validate | buck2 test | Pkl manifest validation | No |
validate_bundle | buck2 test | Deterministic bundle replay | No |
preflight_matrix_10 | buck2 run | BXL cost_plan + estimate_cost | No |
preflight_matrix_50 | buck2 run | BXL cost_plan + estimate_cost | No |
estimate_cost | buck2 run | Go cost estimate (historical_sources / artifact_root / bundles + optional pricing.pkl); BLOCK exits non-zero | No |
approve_cost | buck2 run | Writes cost/approval.json (human consent pinned to digest of cost-estimate.json) | No |
evaluate_matrix_10 | buck2 run | Parallel matrix (10 matches) | Yes (.env) |
evaluate_matrix_50 | buck2 run | Parallel matrix (50 matches) | Yes (.env) |
materialize_dataset | buck2 run | LCA HF export | Optional HF_TOKEN |
run | buck2 run | Single round execution | Depends on manifest |
live_smoke | buck2 test | Synthetic/local MCP + bundle smoke | Yes (.env) |
real_lca_smoke | buck2 test | Real HF LCA materialization + scoring proof | Yes (.env, HF_TOKEN) |
e2e | buck2 test | Alias for live_smoke | Yes |
evaluate_n | buck2 run | Multi-attempt promotion evaluation | Yes |
stability_probe | buck2 run | Same-input variance probe | Yes |
Spend gate: evaluate_matrix_*, evaluate_n, stability_probe, live_smoke, real_lca_smoke, and buck round run for manifests with non-fake backends require evidence/cost/cost-estimate.json (not BLOCK) and approve_cost. Tests may set SEARCHBENCH_SKIP_COST_SPEND_APPROVAL=1 only.
Estimate details: USD bands prefer scaled historical billed totals from prior round-report.json samples when each sample has nonzero cost_usd and an ancestor report.json in the allowed live freshness/mode set; otherwise the tool uses token blend bands. Inspect cost/usage-summary.json after preflight_matrix_* for historical_roots_resolved, usd_scaled_to_plan, and usd_estimate_source.
Canonical Buck package: //configs/rounds/live-ic-vs-jcodemunch:<target> (BUCK next to manifests + bundle tree; macro-only: searchbench_release)
Release manifest (release.pkl): configs/schema/SearchBenchRelease.pkl; live amend at configs/rounds/live-ic-vs-jcodemunch/release.pkl. Loaded in Go via config.LoadReleaseFromPath (generated bindings under generatedrelease/).
Graph planners: buck2 bxl //buck/bxl/searchbench.bxl:cost_plan, release_gate, repo_policy, affected_plan
Secrets: repo-root .env — CEREBRAS_API_KEY and optional HF_TOKEN only.
Round README: configs/rounds/live-ic-vs-jcodemunch/README.md.
Repo gates
| Gate | Command |
|---|---|
| Fast | buck2 test //buck:check (alias //:check) |
| Full | buck2 test //buck:check_full (alias //:check_full) |
| Policy | buck2 test //buck:policy_check |
Deprecated / removed
preflight.shand round-packagegenrulepreflight targetsscripts/run-live-e2e.sh- Direct
searchbenchCLI as a product workflow
Implementation details (not public)
Buck targets use searchbench_round_op, searchbench_release, and uv_project_test. Round operations invoke //src/searchbench-go/cmd/searchbench:searchbench. There are no repo-owned .sh operation entrypoints under configs/rounds/.