Socrates comments on Against Credulous AI Hype

Socrates 13 May 2026 22:01 UTC
6 points
0 ∶ 0
I wanted to follow up on this thread and bring in some additional evidence on the whole AISLE thing. It isn’t definitive or anything, but both AISLE and Mythos have been used to scan curl and the results are interesting.

- AISLE identified five CVEs and 24 bugs (plus two more CVEs in a dependency).
- Mythos identified 1 CVE and potentially 20 bugs.

Now, Mythos scanned after AISLE. We don’t know what would have happened if Mythos had come first. But here’s some quotes by the maintainer of curl about the Mythos results:

> curl is certainly getting better thanks to this report, but counted by the volume of issues found, all the previous AI tools we have used have resulted in larger bugfix amounts.

> I see no evidence that this setup finds issues to any particular higher or more advanced degree than the other tools have done before Mythos. Maybe this model is a little bit better, but even if it is, it is not better to a degree that seems to make a significant dent in code analyzing.

> Any project that has not scanned their source code with AI powered tooling will likely find huge number of flaws, bugs and possible vulnerabilities with this new generation of tools. Mythos will, and so will many of the others.

Quotes taken from here: https://daniel.haxx.se/blog/2026/05/11/mythos-finds-a-curl-vulnerability/

Daniel’s blog post has some not great English, and also his tone is a bit less than objective. And of course, this is not a rigorous scientific comparison. But he is an expert in software and writing secure software, and he has demonstrated a willingness to change his mind on AI for cybersecurity,^[1] so I think he’s worth listening to.

AISLE wrote a little news release about their findings vis-a-vis Mythos in the curl project: https://aisle.com/blog/curl-adopts-aisle-after-its-ai-agents-discovered-5-cves

AISLE also wrote up another blog post building on the earlier one. In this case, they showcase a simple pipeline that is able to recreate a Mythos result without any steering towards the relevant snippet of code: https://aisle.com/blog/system-over-model-zero-day-discovery-at-the-jagged-frontier

Interestingly, the same harness is not as successful at recreating AISLE’s own results, but this is all a bit selective.

To my earlier claim that harnesses and pipelines enabling Mythos like results can and will be built in the near future, AISLE open-sourced nano-analyzer, the harness used in this blog to discover maybe as many as 40 bugs in FreeBSD, so it would appear that my prediction was fulfilled at the time of writing...

AISLE quotes $100 in spend to find the 40 bugs in FreeBSD and to recreate a Mythos result. That is presumably going to be 1/100th or less of compute spend by Anthropic. To me, this causes me to think that the compute spend by Anthropic is playing an important role in the whole assessment of the Mythos results. My belief now is that Anthropic’s high compute spend is probably due to an inefficient pipeline, which is somewhat of a reversal of my earlier belief that the pipeline was the key determinant in model capabilities. However, this doesn’t mean that I think model capabilities are hugely improved—it just means that I think that with a better pipeline, they would have spent less to get these results, given that AISLE was able to find some of these results for $100 with dumber models.
1. ^
  Here’s Daniel’s highly negative blog post on AI cybersecurity work from last Summer: https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-slops/ Here’s where he changed his mind just a few months later after ZeroPath started to systematically find bugs using AI: https://daniel.haxx.se/blog/2025/10/10/a-new-breed-of-analyzers/
- tobycrisford 🔸 14 May 2026 18:20 UTC
  1 point
  0 ∶ 0
  Parent
  This is all very interesting, thanks for following up!