Skip to main content

CRSBench

A unified, full-pipeline benchmark for OSS-CRS.

Overview

CRSBench is the benchmark suite for OSS-CRS. It evaluates the full bug-finding and bug-fixing pipeline of any OSS-CRS-compatible CRS under production-style infrastructure (pre-collected corpora, incremental builds, RTS), and ships back into OSS-CRS as its standard evaluation framework.

CRSBench architecture: benchmark construction, builder, executor, and verifier.

Supports every CRS

Fuzzers, LLM agents, and hybrid systems run on the same sanitizer-based harness with the same resource limits. Any OSS-CRS-compatible CRS can run without changes.

Full-pipeline evaluation

The framework takes the PoVs found by the bug-finding CRS and sends them to patching, so bug finding and patching are evaluated as one connected flow.

Faster evaluation

Redis/RQ workers run trials across machines. Docker snapshot-based incremental builds skip full project rebuilds after each patch attempt, giving CRSs more tries within the same LLM budget.

Production-style infra

Pre-collected fuzzing corpora and Regression Test Selection (RTS) reflect the setup real deployments already maintain, so scores focus on CRS performance instead of infrastructure overhead.

Statistics

CRSBench comprises C/C++ and Java projects with both manually curated synthetic vulnerabilities and real-world bugs, packaged with ground-truth PoVs, patches, and functionality tests.

124Projects
315Vulnerabilities
92Unique CWEs
21of CWE Top 25 (2025)
C/C++, JavaLanguage
CWE distribution donut chart across 124 benchmarks, 315 CPVs, 693 CWE entries.
CPVs by language (C/C++, JVM) and origin (1-day, synthetic).

Quick Start

CRSBench runs on Linux hosts with Docker. The smallest first run installs CRSBench, pulls the managed dependencies, downloads the sanity benchmark suite, and runs one experiment with a local queue-backed worker.

0. Request dataset access

The benchmark dataset on HuggingFace is gated. Before anything else, open huggingface.co/datasets/sslab-gatech/crsbench-dataset, request access, and wait for approval. Without it, crsbench download will fail.

1. Install and prepare

git clone --recurse-submodules https://github.com/sslab-gatech/CRSBench.git
cd CRSBench
uv sync
./scripts/setup-third-party.sh

uv run crsbench prepare
uv run crsbench prepare --coverage

# Gated dataset: accept the DUA on HuggingFace first
uv run hf auth login
uv run crsbench download --benchmark-suite sanity

2. Write a first-run config

Save the following as first-run.yaml. atlantis-multilang-given_fuzzer is the bundled starter CRS, and litellm.skip: true means no external LLM keys are required.

experiment:
name: first-run
task: bugfinding
mode: full
benchmark_suite: sanity
sanitizers: [address]

runtime:
trials: 1
max_total_time: 3600
redis_host: localhost:6379
litellm:
skip: true

storage:
experiment_filestore: ./results/experiment-data
report_filestore: ./results/report-data

crs_compose:
atlantis-multilang-given_fuzzer:
num_cores: 4

3. Launch worker + orchestrator

uv run python scripts/valkey-helper.py start

# Terminal 1: worker executes CRS trial jobs
uv run crsbench worker --experiment-config first-run.yaml

# Terminal 2: orchestrator enqueues jobs
uv run crsbench run --experiment-config first-run.yaml

The CRS lifecycle reuses oss-crs prepare, oss-crs build-target, oss-crs artifacts, and oss-crs run, so any CRS listed in the OSS-CRS Registry plugs straight into crs_compose. For the distributed-experiment guide and configuration reference, see the upstream README and docs/.