Powered by
News Bytes

OpenAI and Paradigm Launch EVMbench to Measure AI Smart Contract Security

OpenAI and Paradigm have introduced EVMbench, a new benchmarking framework designed to evaluate the ability of AI agents to detect, patch, and exploit blockchain vulnerabilities.

WRITTEN BY
SHARE
OpenAI and Paradigm Launch EVMbench to Measure AI Smart Contract Security

OpenAI and Paradigm officially launched EVMbench to address security risks in smart contracts that secure over $100 billion in crypto assets. The benchmark utilizes 120 curated vulnerabilities from 40 professional audits, including scenarios from the Tempo blockchain, to test Artificial Intelligence (AI) capabilities in a sandboxed Ethereum Virtual Machine ( EVM) environment.

The system evaluates agents across three distinct modes: detection of vulnerabilities, functional patching of code, and end-to-end execution of fund-draining exploits. Recent testing shows that the GPT-5.3-Codex model achieves a 72.2% success rate in exploit tasks, marking a significant increase from the 31.9% score recorded by GPT-5 just six months ago.

“Measuring model capability in this domain helps track emerging cyber risks and highlights the importance of using AI systems defensively to audit and strengthen deployed contracts,” according to the OpenAI announcement.

Report: Stripe and Paradigm’s Blockchain Tempo Secures $500M Backing From Thrive, Greenoaks

Report: Stripe and Paradigm’s Blockchain Tempo Secures $500M Backing From Thrive, Greenoaks

Stripe-backed blockchain startup Tempo has secured $500 million in a Series A round led by Joshua Kushner’s Thrive Capital and…

Read Now

🧭 FAQs

What is the primary purpose of the EVMbench framework? It measures how effectively AI agents identify and resolve high-severity smart contract vulnerabilities.

Which organizations collaborated to develop this new security benchmark? OpenAI and the crypto investment firm Paradigm co-developed the EVMbench testing environment.

How does the system verify if an agent successfully patches code? Automated tests ensure vulnerabilities are eliminated without breaking the contract’s intended functional logic.

Is there financial support available for researchers using these tools? OpenAI is committing $10 million in API credits to support defensive cybersecurity research.