I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing infrastructure, technically I’m usually hedging more than that but it’s accurate in spirit.
I’m happy this is reaching exactly the right people :-D
As for proving invariances, that makes sense as a goal, and I like it. If I perform any follow-up I’ll try to estimate how much more tokens that’ll produce, since IIRC seL4 or CakeML had proofs that exceeded 10× the length of their source code.
A recent experience I’ve had is to try and use LLMs to generate Lean definitions and proofs for a novel mathematical structure I’m toying with, they do well with anything below 10 lines but start to falter with more complicated proofs, and sorry their way out of anything I’d call non-trivial. My understanding is that a lot of software formal verification is gruntwork but there also needs to be interwoven moments of brilliance.
I’m always unsure what to think of claims that markets will incentivize the correct level of investment in software security. Like, initial computer security in the aughts seems to me like it was actually pretty bad, and while it became better over time it did take at least a decade. From afar, markets look efficient, from close up you can see efficiency establish itself. And then it’s the question how much of the cost is internalized, which I feel for private companies should be close to 100%? For open source projects of course that number then goes close to zero.
It’d be cool to see a time series of the number of found exploits in open source software, thanks for the curl numbers. You picked a fairly old/established codebase with an especially dedicated developer, I wonder what it’s like with newer ones, and whether one discovers more exploits in the early, middle, or late stage of development. The adoption of better programming languages than C/C++ of course helps.
Lean synthesis capabilities aren’t maximally elicited right now, because a lot of people view it as a text-to-text problem which leads to a bunch of stress about low amount of syntax in the pretraining data and the high code velocity (until about a year ago, language models still hadn’t fully internalized migrations from Lean3 to Lean4). Techniques like Cobblestone or the stuff that Higher Order Company talks about (logic programming / language model API call hybrid architecture) seem really promising to me (and, especially as HOC points out, so much cheaper!).
Thanks for your comment. I had broken my ankle in three places and was on too much oxycodone to engage the first time I read it. I continue to recommend your essay a lot.
I’m happy this is reaching exactly the right people :-D
As for proving invariances, that makes sense as a goal, and I like it. If I perform any follow-up I’ll try to estimate how much more tokens that’ll produce, since IIRC seL4 or CakeML had proofs that exceeded 10× the length of their source code.
A recent experience I’ve had is to try and use LLMs to generate Lean definitions and proofs for a novel mathematical structure I’m toying with, they do well with anything below 10 lines but start to falter with more complicated proofs, and
sorrytheir way out of anything I’d call non-trivial. My understanding is that a lot of software formal verification is gruntwork but there also needs to be interwoven moments of brilliance.I’m always unsure what to think of claims that markets will incentivize the correct level of investment in software security. Like, initial computer security in the aughts seems to me like it was actually pretty bad, and while it became better over time it did take at least a decade. From afar, markets look efficient, from close up you can see efficiency establish itself. And then it’s the question how much of the cost is internalized, which I feel for private companies should be close to 100%? For open source projects of course that number then goes close to zero.
It’d be cool to see a time series of the number of found exploits in open source software, thanks for the
curlnumbers. You picked a fairly old/established codebase with an especially dedicated developer, I wonder what it’s like with newer ones, and whether one discovers more exploits in the early, middle, or late stage of development. The adoption of better programming languages than C/C++ of course helps.Lean synthesis capabilities aren’t maximally elicited right now, because a lot of people view it as a text-to-text problem which leads to a bunch of stress about low amount of syntax in the pretraining data and the high code velocity (until about a year ago, language models still hadn’t fully internalized migrations from Lean3 to Lean4). Techniques like Cobblestone or the stuff that Higher Order Company talks about (logic programming / language model API call hybrid architecture) seem really promising to me (and, especially as HOC points out, so much cheaper!).
Thanks for your comment. I had broken my ankle in three places and was on too much oxycodone to engage the first time I read it. I continue to recommend your essay a lot.
Yeah, I goofed by using Claude for math, not any of the OpenAI models, which are much better at math.