Nothing to Penetrate

A few weeks ago, an AI lab grabbed headlines. Their newest model, they announced, was dangerously good at finding security vulnerabilities in source code. So dangerously good that they refused to release it to the public. It would go only to a select group of partners. Those partners could scan their own software and patch the holes before attackers got the same idea.

The headlines followed. Is cybersecurity finished? Are we headed for a flood of machine-generated exploits? The marketing worked. The conversation was everywhere.

Then someone pointed the model at a codebase that actually matters.

This was not some forgotten library with no maintainer and no test suite. This is software deployed in tens of billions of instances worldwide. Every smartphone. Every connected car. Every server. A C codebase spanning over a hundred thousand lines, maintained for decades by a small team. Fuzzed nonstop. Audited by paid professionals multiple times. Every commit reviewed by both humans and automated bots. Every compiler warning treated as an error.

The lead maintainer once shut down the project's bug bounty program entirely. AI-generated fake reports had buried the queue. His small team could no longer tell which submissions were real. So he pulled the plug and walked away from the bounty platform. When your defense against bad actors becomes a vector for noise, the math stops working.

If you want to prove your model means the end of cybersecurity, this is exactly the target you need to crack. And if you want to prove that disciplined engineering still works, this is exactly the codebase you want to be.

The scan ran. The report came back. And it was quiet.

The model flagged a small handful of issues. The security team sat down to triage each one. Most were false positives. Things that looked suspicious to the algorithm but turned out to be documented, intentional behavior. One item was a regular software bug, not a security concern. Exactly one was a genuine security vulnerability. Low severity. No memory safety impact. The kind of finding that gets patched in the next release cycle without anyone writing a breathless thread about it.

Zero memory corruption vulnerabilities. In a C codebase. By a model that was supposedly too dangerous for the public to handle.

The lead maintainer wrote about it afterward with the dry understatement you earn after decades of shipping code that runs inside every device on the planet. The hype, he concluded, was primarily marketing. He saw nothing in the results that justified any panic.

But here is where the story complicates, because elsewhere this same model found a lot. In other major codebases, it uncovered hundreds of confirmed security issues with remarkably few false positives. Partner organizations reported thousands of high-severity findings across the broader software ecosystem. This is not a model that fails to work. It is a model that finds what exists to be found.

So what made this particular codebase different?

It comes down to what the maintainer calls his security infrastructure. Fuzzing running nonstop in CI, not just during releases. Review bots on every pull request that flag suspicious patterns before a human even looks. Thousands of tests that run against every commit, across hundreds of build configurations, with every sanitizer turned on. A policy of fixing all compiler warnings immediately, treating each one as a defect. Private security reporting with a dedicated triage process. Signed releases verified by external parties. The code itself uses defensive patterns everywhere. Capped buffers. Explicit bounds on every numeric parse. Overflow guards on memory operations. Format-string enforcement at the compiler level.

None of this is exciting. None of it makes headlines. It is the slow accumulation of habits applied consistently, week after week, year after year. While the rest of the industry chased features, this team chased correctness.

That is why the scanner found nothing to penetrate. Not because the scanner is weak. Because the surface was hardened through decades of discipline.

Most software is not maintained this way. Most projects do not have fuzzing in CI. Most do not treat compiler warnings as errors. Most do not have a maintainer who treats every line of code like it might run inside a billion devices. Most codebases have vulnerabilities. And the models are getting better at finding them, fast.

The trend is undeniable. A year ago, the best scanners caught maybe a third of the known bugs in a benchmark. The newest generation claims to catch most of them. You can question the metrics when the labs publishing them profit from the narrative. But you cannot question the direction. These tools are improving. They can now do things that once required skilled researchers. Following logic across files. Understanding how components interact. Chaining small weaknesses into larger exploits in ways traditional static analyzers never could.

The question is not whether the tools will get good at finding flaws. The question is whether your codebase will have flaws for them to find.

The model over-promised and under-delivered against this particular target. That is fine. I would still bet it is better than the previous generation, even if only slightly. That is how progress works in software. Not giant leaps. Not singular breakthroughs that change everything overnight. Incremental improvement, compounding over time. The new compiler catches one more class of error. The new linter has one more rule. The new fuzzer explores one more code path. And a decade later, your codebase makes the most advanced scanner look lost.

The best defense against tomorrow's AI-powered vulnerability research is the same as the best defense against yesterday's human-powered research. Write less buggy code. Review everything. Fuzz relentlessly. Fix warnings. Treat maintenance like it matters, because it does.

The scanners are improving. So should you.

Nothing to Penetrate

Continue Reading.

Dirty Runtime

Pumping for a Fix

Bare Promises