AI Model Too Powerful to Release: Anthropic’s Mythos Unveiled for Cyber Defense

Anthropic Unveils Claude Mythos and Project Glasswing — The AI Model Too Dangerous to Release Publicly

Anthropic is keeping its most powerful AI model, Claude Mythos Preview, from the public for a surprising reason: it’s *too* good. The company announced the model Tuesday as part of Project Glasswing, and has determined its advanced cybersecurity skills could be dangerous if widely released, creating unacceptable risks.

This new model has found thousands of previously unknown security weaknesses – including serious flaws in all major operating systems and web browsers. Remarkably, many of these flaws had gone unnoticed for years, even after extensive human and automated security checks. For example, it discovered a vulnerability in OpenBSD, a famously secure operating system, that had existed for 27 years. It also found a flaw in FFmpeg, a widely used video library, despite the problematic code being checked by automated tools over five million times.

As an analyst, I’m seeing a really interesting approach from Anthropic with their Mythos model. Instead of opening it up through a standard API, they’ve partnered with a group of 12 major players – including AWS, Apple, Google, and Microsoft – to focus its initial use specifically on improving cybersecurity. They’re also extending access to another 40 organizations involved in critical software infrastructure. To support this initiative, Anthropic is investing significantly – up to $100 million in usage credits and $4 million in direct donations – into open-source security projects. It’s a clear commitment to bolstering security across the board, rather than just broad availability.

A Model That Thinks Like a Security Researcher

Mythos stands out from earlier security tools because it not only finds a large number of vulnerabilities, but can do so with very little human help. Anthropic reports that the model found almost all known vulnerabilities and even created working exploits for many of them, all on its own. For example, it successfully discovered and combined multiple weaknesses in the Linux kernel to gain complete control of a system – something that usually demands significant manual effort from security experts.

Logan Graham, head of Anthropic’s red team for cutting-edge AI, explained to Axios that Mythos Preview operates with a high degree of independence and can think through problems like a skilled human security expert. While the previous version, Claude 4.6, discovered around 500 security flaws in open-source software, Mythos Preview has already identified tens of thousands.

When tested on the CyberGym benchmark for finding and reproducing software vulnerabilities, Mythos outperformed Opus 4.6, achieving a score of 83.1% compared to 66.6%. This difference in performance was also seen in broader software engineering tasks; on the SWE-bench Verified benchmark, which measures practical coding and problem-solving skills, Mythos scored 93.9% while Opus 4.6 scored 80.8%.

According to Cisco’s Anthony Grieco, artificial intelligence has reached a point where we need to dramatically increase our efforts to protect vital infrastructure from cyberattacks, and traditional security methods are now inadequate. He emphasizes that we can’t return to relying on older approaches.

The Dual-Use Dilemma

Mythos is a powerful tool that can help protect systems, but that same power could be misused by attackers. This inherent risk is a key concern with the recent Project Glasswing announcement and highlights the growing competition between those building and defending against cyber threats – a competition OpenAI recently discussed in its published policy paper.

A recently published document from OpenAI, released just before Anthropic’s news, highlighted AI-powered cyberattacks and biological threats as the biggest immediate dangers of increasingly sophisticated AI. Sam Altman, OpenAI’s CEO, explained to Axios that leaders in tech, business, and government are worried that new AI models coming out soon could cause a major cyberattack this year. The announcement of Mythos strengthens those concerns.

This isn’t a random occurrence. Experts across the AI field increasingly agree that the most powerful AI models now present real and significant cybersecurity threats, requiring a collaborative response. OpenAI itself cautioned in December that its future models could pose a “high” cybersecurity risk. According to CNN, the development of AI agents – self-operating systems that can find and take advantage of weaknesses much faster than human hackers – represents a major shift in the types of threats we face.

According to Graham, once we reach a point where we can fully automate certain tasks and do so very affordably, it will fundamentally change everything. He explained this to the Washington Examiner.

There’s already proof this is happening. Anthropic has reported that a hacking group linked to the Chinese government used Claude’s features to break into around 30 organizations, including tech firms, banks, and government offices. Just recently, in February, someone used Claude to attack Mexican government agencies and steal private tax and voter details.

The Glasswing Strategy: Defence Before Offence

Anthropic is taking a different approach to AI development by limiting access to its model to a select group of partners, rather than making it widely available to the public. They believe this will give those working to protect against potential misuse a crucial head start, creating a lasting advantage before similar technology becomes more common.

As a crypto investor, I’m really hearing that the speed at which hackers can take advantage of security flaws is changing dramatically. Apparently, it used to take months for someone to exploit a weakness, but now, with AI, it’s happening in minutes! It’s a bit scary, but the message isn’t to pull back – it’s to collaborate and move forward even faster to stay ahead of the threats.

Jim Zemlin, head of the Linux Foundation, highlighted how important open-source software is – it’s the foundation of most of today’s digital world. Traditionally, open-source projects rely on volunteers, often without dedicated security experts, to maintain the software that powers everything from servers to banking and communications. Zemlin explained that Project Glasswing aims to provide affordable AI-powered security tools to these maintainers, helping them protect critical systems.

Microsoft announced that its new Mythos Preview security model performed significantly better than earlier versions in internal testing, specifically when measured against its CTI-REALM benchmark. Igor Tsyganskiy, a Microsoft executive overseeing cybersecurity and research, explained that this development helps the company proactively find and address potential threats. Microsoft is involved with Project Glasswing and has also made a significant investment in OpenAI.

Questions of Trust and Timing

While promising, the announcement isn’t without its challenges. Anthropic, the company behind it, recently experienced security issues. Last month, they accidentally made nearly 2,000 source code files and over half a million lines of code publicly available through a mistake with their Claude Code software. Ironically, the existence of Mythos was first discovered due to a security flaw – it was found in an unsecured, publicly accessible data cache – a problem for a company aiming to be a leader in cybersecurity.

Anthropic is currently involved in a sensitive disagreement with the U.S. Department of Defence regarding the military’s use of its Claude AI in practical situations, which adds international political considerations to their work with Mythos. The company has been informing agencies like CISA and the Commerce Department about what the model can do, emphasizing its importance for national security.

As an analyst, I’ve been following the discussion around AI safety, and a key question is emerging: can we build the systems and rules needed to handle the cybersecurity risks that AI introduces quickly enough? OpenAI highlighted this in their recent policy paper, and Mythos is now putting it into action. The good news is Anthropic is committed to sharing a public report on what they’re learning within the next 90 days. They and their partners will be openly exchanging information and best practices, and we can expect to see some concrete recommendations soon on how to update things like vulnerability disclosure, patching, and software development for this new AI-driven landscape.

Graham admitted that Mythos isn’t the final step. Even more powerful AI models from OpenAI and Google are already in development, and shortly after that, Chinese open-source models could offer similar abilities – potentially without the safety features and coordinated launch approach that Project Glasswing provides.

Anthropic points out that tackling cybersecurity challenges requires collaboration, as no single organization can handle them on its own. Protecting our digital systems will be a long-term effort, especially considering how quickly artificial intelligence is developing – we can expect significant advancements in AI capabilities within the next few months.

The competition between AI used for attacks and AI used for defense is now happening in full force. Anthropic’s Project Glasswing is their attempt to show that defenders *can* win this battle. However, its success will rely on how quickly others in the tech industry and governments around the world respond and contribute.

2026-04-07 22:34

A Model That Thinks Like a Security Researcher

The Dual-Use Dilemma

The Glasswing Strategy: Defence Before Offence

Questions of Trust and Timing

Read More