SearchBench run entrypoints

Repo-owned runs and evaluations use Buck targets only. Buck is the only supported public interface; the Go binary is private implementation plumbing invoked by Buck. Do not start runs via ad-hoc shell scripts or direct searchbench commands.

After any target completes, inspect report.json (then report.txt) in the bundle directory first.

Live IC vs jCodeMunch (canonical Buck package)

Target	Command	Mode	Network / secrets
`validate`	`buck2 test`	Pkl manifest validation	No
`validate_bundle`	`buck2 test`	Deterministic bundle replay	No
`preflight_matrix_10`	`buck2 run`	BXL `cost_plan` + `estimate_cost`	No
`preflight_matrix_50`	`buck2 run`	BXL `cost_plan` + `estimate_cost`	No
`estimate_cost`	`buck2 run`	Go cost estimate (`historical_sources` / `artifact_root` / bundles + optional `pricing.pkl`); `BLOCK` exits non-zero	No
`approve_cost`	`buck2 run`	Writes `cost/approval.json` (human consent pinned to digest of `cost-estimate.json`)	No
`evaluate_matrix_10`	`buck2 run`	Parallel matrix (10 matches)	Yes (`.env`)
`evaluate_matrix_50`	`buck2 run`	Parallel matrix (50 matches)	Yes (`.env`)
`materialize_dataset`	`buck2 run`	LCA HF export	Optional `HF_TOKEN`
`run`	`buck2 run`	Single round execution	Depends on manifest
`live_smoke`	`buck2 test`	Synthetic/local MCP + bundle smoke	Yes (`.env`)
`real_lca_smoke`	`buck2 test`	Real HF LCA materialization + scoring proof	Yes (`.env`, `HF_TOKEN`)
`e2e`	`buck2 test`	Alias for `live_smoke`	Yes
`evaluate_n`	`buck2 run`	Multi-attempt promotion evaluation	Yes
`stability_probe`	`buck2 run`	Same-input variance probe	Yes

Spend gate: evaluate_matrix_*, evaluate_n, stability_probe, live_smoke, real_lca_smoke, and buck round run for manifests with non-fake backends require evidence/cost/cost-estimate.json (not BLOCK) and approve_cost. Tests may set SEARCHBENCH_SKIP_COST_SPEND_APPROVAL=1 only.

Estimate details: USD bands prefer scaled historical billed totals from prior round-report.json samples when each sample has nonzero cost_usd and an ancestor report.json in the allowed live freshness/mode set; otherwise the tool uses token blend bands. Inspect cost/usage-summary.json after preflight_matrix_* for historical_roots_resolved, usd_scaled_to_plan, and usd_estimate_source.

Canonical Buck package: //configs/rounds/live-ic-vs-jcodemunch:<target> (BUCK next to manifests + bundle tree; macro-only: searchbench_release)

Release manifest (release.pkl): configs/schema/SearchBenchRelease.pkl; live amend at configs/rounds/live-ic-vs-jcodemunch/release.pkl. Loaded in Go via config.LoadReleaseFromPath (generated bindings under generatedrelease/).

Graph planners: buck2 bxl //buck/bxl/searchbench.bxl:cost_plan, release_gate, repo_policy, affected_plan

Secrets: repo-root .env — CEREBRAS_API_KEY and optional HF_TOKEN only.

Round README: configs/rounds/live-ic-vs-jcodemunch/README.md.

Repo gates

Gate	Command
Fast	`buck2 test //buck:check` (alias `//:check`)
Full	`buck2 test //buck:check_full` (alias `//:check_full`)
Policy	`buck2 test //buck:policy_check`

Deprecated / removed

preflight.sh and round-package genrule preflight targets
scripts/run-live-e2e.sh
Direct searchbench CLI as a product workflow

Implementation details (not public)

Buck targets use searchbench_round_op, searchbench_release, and uv_project_test. Round operations invoke //src/searchbench-go/cmd/searchbench:searchbench. There are no repo-owned .sh operation entrypoints under configs/rounds/.

SearchBench run entrypoints ​

Live IC vs jCodeMunch (canonical Buck package) ​

Repo gates ​

Deprecated / removed ​

Implementation details (not public) ​

SearchBench run entrypoints

Live IC vs jCodeMunch (canonical Buck package)

Repo gates

Deprecated / removed

Implementation details (not public)