This could be a long slog but I think it could be valuable to identify the top ~100 OS libraries and identify their level of resourcing to avoid future attacks like the XZ attack. In general, I think work on hardening systems is an underrated aspect of defending against future highly capable autonomous AI agents.
not sure if such a study would naturally also be helpful to potential attackers, perhaps even more helpful to attackers than defenders, so might need to be careful about whether / how you disseminate the information
My sense is that 100 is an underestimate for the number of OS libraries as important as that one. But I’m not sure if the correct number is 1k, 10k or 100k.
To further comment, this seems like it might be an intractable task, as the term “dependency hell” kind of implies. You’d have to scrap likely all of GitHub and calculate what libraries are used most frequently in all projects to get an accurate assessment. Then it’s not clear to me how you’d identify their level of resourcing. Number of contributors? Frequency of commits?
Also, with your example of the XZ attack, it’s not even clear who made the attack. If you suspect it was, say, the NSA, would you want to thwart them if their purpose was to protect American interests? (I’m assuming you’re pro-American) Things like zero-days are frequently used by various state actors, and it’s a morally grey question whether or not those uses are justified.
I also, as a comp sci and programmer, have doubts you’d ever be able to 100% prevent the risk of zero-days or something like the XZ attack from happening in open source code. Given how common zero-days seem to be, I suspect there are many in existing open source work that still haven’t been discovered, and that XZ was just a rare exception where someone was caught.
Yes, hardening these systems might somewhat mitigate the risk, but I wouldn’t know how to evaluate how effective such an intervention would be, or even, how you’d harden them exactly. Even if you identify the at-risk projects, you’d need to do something about them. Would you hire software engineers to shore up the weaker projects? Given the cost of competent SWEs these days, that seems potentially expensive, and could compete for funding with actual AI safety work.
I’d be interested in exploring funding this and the broader question of ensuring funding stability and security robustness for critical OS infrastructure. @Peter Wildeford is this something you guys are considering looking at?
This could be a long slog but I think it could be valuable to identify the top ~100 OS libraries and identify their level of resourcing to avoid future attacks like the XZ attack. In general, I think work on hardening systems is an underrated aspect of defending against future highly capable autonomous AI agents.
not sure if such a study would naturally also be helpful to potential attackers, perhaps even more helpful to attackers than defenders, so might need to be careful about whether / how you disseminate the information
My sense is that 100 is an underestimate for the number of OS libraries as important as that one. But I’m not sure if the correct number is 1k, 10k or 100k.
That said, this is a nice project, if you have a budget it shouldn’t be hard to find one or a few OS enthusiasts to delegate this to.
Relevant XKCD comic.
To further comment, this seems like it might be an intractable task, as the term “dependency hell” kind of implies. You’d have to scrap likely all of GitHub and calculate what libraries are used most frequently in all projects to get an accurate assessment. Then it’s not clear to me how you’d identify their level of resourcing. Number of contributors? Frequency of commits?
Also, with your example of the XZ attack, it’s not even clear who made the attack. If you suspect it was, say, the NSA, would you want to thwart them if their purpose was to protect American interests? (I’m assuming you’re pro-American) Things like zero-days are frequently used by various state actors, and it’s a morally grey question whether or not those uses are justified.
I also, as a comp sci and programmer, have doubts you’d ever be able to 100% prevent the risk of zero-days or something like the XZ attack from happening in open source code. Given how common zero-days seem to be, I suspect there are many in existing open source work that still haven’t been discovered, and that XZ was just a rare exception where someone was caught.
Yes, hardening these systems might somewhat mitigate the risk, but I wouldn’t know how to evaluate how effective such an intervention would be, or even, how you’d harden them exactly. Even if you identify the at-risk projects, you’d need to do something about them. Would you hire software engineers to shore up the weaker projects? Given the cost of competent SWEs these days, that seems potentially expensive, and could compete for funding with actual AI safety work.
I’d be interested in exploring funding this and the broader question of ensuring funding stability and security robustness for critical OS infrastructure. @Peter Wildeford is this something you guys are considering looking at?
@Peter Wildeford @Matt_Lerner interested in similar. This in-depth analysis’ was a bit strict in my opinion looking at file-level criteria:
https://www.metabase.com/blog/bus-factor
These massive projects were mostly maintained by 1 person last I checked a year ago:
https://github.com/curl/curl/graphs/contributors
https://github.com/vuejs/vue/graphs/contributors
https://github.com/twbs/bootstrap/graphs/contributors
https://github.com/laravel/laravel/graphs/contributors
https://github.com/pallets/flask/graphs/contributors
https://github.com/expressjs/express/graphs/contributors
https://github.com/redis/redis/graphs/contributors
https://github.com/tiangolo/fastapi/graphs/contributors
https://github.com/lodash/lodash/graphs/contributors
https://github.com/psf/requests/graphs/contributors
https://github.com/babel/babel/graphs/contributors
https://github.com/mastodon/mastodon/graphs/contributors (seemingly improved since)
https://github.com/BurntSushi/ripgrep/graphs/contributors
https://github.com/FFmpeg/FFmpeg/graphs/contributors
https://github.com/gorhill/uBlock/graphs/contributors
https://github.com/evanw/esbuild/graphs/contributors
I’d love to be able to maintain more polished current data