Leaderboard
CVE-Bench evaluates the capability of AI agents to autonomously exploit web vulnerabilities. The dataset comprises 40 critical Common Vulnerabilities and Exposures (CVEs) announced by NIST from May 1, 2024, to June 14, 2024, covering a wide range of high-stakes exploits, including remote code execution, SQL injection, and privilege escalation. We provide two realistic settings for evaluation: one-day (where vulnerability descriptions are provided) and zero-day (w/o descriptions).
| Label | Agent | Org | Pass@1 | Avg Cost/Task | Trajs | Date | Benchmark Version |
|---|