AI Agents Are Getting Better at Writing Code—and Hacking It as Well

The latest artificial intelligence models are revolutionizing the field of cybersecurity by outperforming traditional tools in finding vulnerabilities. At the University of California, Berkeley, researchers have used a newly developed benchmark called CyberGym to test how AI models can identify zero-day vulnerabilities in 188 large open-source codebases. The results highlighted that AI models, including the latest versions of OpenAI and Anthropic, along with open-source tools like Meta and DeepSeek, have successfully discovered 17 new bugs, 15 of which are previously unknown or "zero-day" vulnerabilities. Given the criticality of these vulnerabilities, song (a professor at UC Berkeley) emphasized that many are potentially dangerous but also valuable as a learning tool.

The project lawyers厂 currently has a position on the leader of HackerOne’s bug hunting leaderboard, and the company announced $75 million in new funding._song described this as a pivotal moment, stating, "This is a pivotal moment," and adding that it exceeded expectations. The work by song and her team highlights the growing evidence that AI can automate the uncovering of zero-day vulnerabilities, a task that could be replicated by hackers. Despite this, song emphasized, "We didn’t even try that hard," suggesting that with more resources, these tools could push the boundaries further. The results of the study, which involved both traditional AI models and open-source solutions, suggest that AI could be a powerful tool for both finding and exploiting vulnerabilities.

The UC Berkeley team used descriptions of known vulnerabilities from 188 software projects to test how well AI models could identify the same flaws. They introduced cybersecurity agents powered by frontier AI models, which the researchers then used to uncover new vulnerabilities in previously unexplored code. The tools generated hundreds of proof-of-concept exploits, of which 15 were previously unseen vulnerabilities and one was already PATCHable. This research adds to the growing body of evidence that AI can automate the discovery of zero-day vulnerabilities, which are potentially dangerous but also valuable as a way to learn about security.

The project’s findings are poised to have a transformative impact on the cybersecurity industry, expert Sean Heelan noted in a recent research. He used an openAI-style reasoning model to identify a zero-day flaw in a widely-used Linux kernel, demonstrating how AI could repair software vulnerabilities after patching. Similarly, in November last year, Google announced discovering a previously unknown software vulnerability using AI, showing the potential of AI in cybersecurity.

However, while the tools and methods tested by song and her team build confidence that AI holds much promise, the reality remains that AI is not yet perfect. The team’s agents and models were unable to solve all 17 vulnerabilities, especially complex ones. This suggests that zero-day vulnerabilities pose a significant challenge for AI, highlighting the need for further development and testing in the cybersecurity community. The study underscores the potential for AI to automate security tasks but also reminds us of the limitations and ongoing work necessary to fully realize its potential.

In conclusion, the findings of this research demonstrate that AI could become an essential tool in the cybersecurity industry, capable of rapidly identifying zero-day vulnerabilities. This opportunity represents a rare and valuable chance to harness the power of artificial intelligence for improved security and problem-solving. While the results are promising, the technology is still far from fully realizeable, and immediate deployment of high-budging cybersecurity tools will require careful consideration of these limitations.

AI Agents Are Getting Better at Writing Code—and Hacking It as Well

Leave a Reply Cancel reply

Review: Tribit Stormbox Mini+

Your may also like!

OpenAI Scrambles to Update GPT-5 After Users Revolt

WIRED Roundup: Unpacking OpenAI’s Government Partnership

The Best Back-to-School Laptop Deals

In Alien: Earth, the Future Is a Corporate Hellscape

Company

Follow Socials