Skip to content

SearchBench run entrypoints

Repo-owned runs and evaluations use Buck targets only. Buck is the only supported public interface; the Go binary is private implementation plumbing invoked by Buck. Do not start runs via ad-hoc shell scripts or direct searchbench commands.

After any target completes, inspect report.json (then report.txt) in the bundle directory first.

Live IC vs jCodeMunch (canonical Buck package)

TargetCommandModeNetwork / secrets
validatebuck2 testPkl manifest validationNo
validate_bundlebuck2 testDeterministic bundle replayNo
preflight_matrix_10buck2 runBXL cost_plan + estimate_costNo
preflight_matrix_50buck2 runBXL cost_plan + estimate_costNo
estimate_costbuck2 runGo cost estimate (historical_sources / artifact_root / bundles + optional pricing.pkl); BLOCK exits non-zeroNo
approve_costbuck2 runWrites cost/approval.json (human consent pinned to digest of cost-estimate.json)No
evaluate_matrix_10buck2 runParallel matrix (10 matches)Yes (.env)
evaluate_matrix_50buck2 runParallel matrix (50 matches)Yes (.env)
materialize_datasetbuck2 runLCA HF exportOptional HF_TOKEN
runbuck2 runSingle round executionDepends on manifest
live_smokebuck2 testSynthetic/local MCP + bundle smokeYes (.env)
real_lca_smokebuck2 testReal HF LCA materialization + scoring proofYes (.env, HF_TOKEN)
e2ebuck2 testAlias for live_smokeYes
evaluate_nbuck2 runMulti-attempt promotion evaluationYes
stability_probebuck2 runSame-input variance probeYes

Spend gate: evaluate_matrix_*, evaluate_n, stability_probe, live_smoke, real_lca_smoke, and buck round run for manifests with non-fake backends require evidence/cost/cost-estimate.json (not BLOCK) and approve_cost. Tests may set SEARCHBENCH_SKIP_COST_SPEND_APPROVAL=1 only.

Estimate details: USD bands prefer scaled historical billed totals from prior round-report.json samples when each sample has nonzero cost_usd and an ancestor report.json in the allowed live freshness/mode set; otherwise the tool uses token blend bands. Inspect cost/usage-summary.json after preflight_matrix_* for historical_roots_resolved, usd_scaled_to_plan, and usd_estimate_source.

Canonical Buck package: //configs/rounds/live-ic-vs-jcodemunch:<target> (BUCK next to manifests + bundle tree; macro-only: searchbench_release)

Release manifest (release.pkl): configs/schema/SearchBenchRelease.pkl; live amend at configs/rounds/live-ic-vs-jcodemunch/release.pkl. Loaded in Go via config.LoadReleaseFromPath (generated bindings under generatedrelease/).

Graph planners: buck2 bxl //buck/bxl/searchbench.bxl:cost_plan, release_gate, repo_policy, affected_plan

Secrets: repo-root .envCEREBRAS_API_KEY and optional HF_TOKEN only.

Round README: configs/rounds/live-ic-vs-jcodemunch/README.md.

Repo gates

GateCommand
Fastbuck2 test //buck:check (alias //:check)
Fullbuck2 test //buck:check_full (alias //:check_full)
Policybuck2 test //buck:policy_check

Deprecated / removed

  • preflight.sh and round-package genrule preflight targets
  • scripts/run-live-e2e.sh
  • Direct searchbench CLI as a product workflow

Implementation details (not public)

Buck targets use searchbench_round_op, searchbench_release, and uv_project_test. Round operations invoke //src/searchbench-go/cmd/searchbench:searchbench. There are no repo-owned .sh operation entrypoints under configs/rounds/.