Supports every CRS
Fuzzers, LLM agents, and hybrid systems run on the same sanitizer-based harness with the same resource limits. Any OSS-CRS-compatible CRS can run without changes.
CRSBench is the benchmark suite for OSS-CRS. It evaluates the full bug-finding and bug-fixing pipeline of any OSS-CRS-compatible CRS under production-style infrastructure (pre-collected corpora, incremental builds, RTS), and ships back into OSS-CRS as its standard evaluation framework.
Fuzzers, LLM agents, and hybrid systems run on the same sanitizer-based harness with the same resource limits. Any OSS-CRS-compatible CRS can run without changes.
The framework takes the PoVs found by the bug-finding CRS and sends them to patching, so bug finding and patching are evaluated as one connected flow.
Redis/RQ workers run trials across machines. Docker snapshot-based incremental builds skip full project rebuilds after each patch attempt, giving CRSs more tries within the same LLM budget.
Pre-collected fuzzing corpora and Regression Test Selection (RTS) reflect the setup real deployments already maintain, so scores focus on CRS performance instead of infrastructure overhead.
CRSBench comprises C/C++ and Java projects with both manually curated synthetic vulnerabilities and real-world bugs, packaged with ground-truth PoVs, patches, and functionality tests.
CRSBench runs on Linux hosts with Docker. The smallest first run installs CRSBench, pulls the managed dependencies, downloads the sanity benchmark suite, and runs one experiment with a local queue-backed worker.
The benchmark dataset on HuggingFace is gated. Before anything else, open huggingface.co/datasets/sslab-gatech/crsbench-dataset, request access, and wait for approval. Without it, crsbench download will fail.
git clone --recurse-submodules https://github.com/sslab-gatech/CRSBench.git
cd CRSBench
uv sync
./scripts/setup-third-party.sh
uv run crsbench prepare
uv run crsbench prepare --coverage
# Gated dataset: accept the DUA on HuggingFace first
uv run hf auth login
uv run crsbench download --benchmark-suite sanity
Save the following as first-run.yaml. atlantis-multilang-given_fuzzer is the bundled starter CRS, and litellm.skip: true means no external LLM keys are required.
experiment:
name: first-run
task: bugfinding
mode: full
benchmark_suite: sanity
sanitizers: [address]
runtime:
trials: 1
max_total_time: 3600
redis_host: localhost:6379
litellm:
skip: true
storage:
experiment_filestore: ./results/experiment-data
report_filestore: ./results/report-data
crs_compose:
atlantis-multilang-given_fuzzer:
num_cores: 4
uv run python scripts/valkey-helper.py start
# Terminal 1: worker executes CRS trial jobs
uv run crsbench worker --experiment-config first-run.yaml
# Terminal 2: orchestrator enqueues jobs
uv run crsbench run --experiment-config first-run.yaml
The CRS lifecycle reuses oss-crs prepare, oss-crs build-target, oss-crs artifacts, and oss-crs run, so any CRS listed in the OSS-CRS Registry plugs straight into crs_compose. For the distributed-experiment guide and configuration reference, see the upstream README and docs/.