I’m Jeffrey Ladish. I’m a security researcher and risk consultant focused on global catastrophic threats. My website is at https://jeffreyladish.com
Jeffrey Ladish
Really appreciate you! It’s felt stressful sometimes as just someone in the community and it’s hard to imagine how stressful it would feel for me in your shoes. Really appreciate your hard work, and I think the EA movement is significantly improved through your hard work maintaining and improving and moderating the forum, and all the mostly-unseen-but-important work mitigating conflicts & potential harm in the community.
I think it’s worth noting that that I’d expect you would gain a significant relative advantage if you get out of cities before other people, such that acting later would be a lot less effective at furthering your survival & rebuilding goals.
I expect the bulk of the risk of an all out nuclear war to happen in the couple of weeks after the first nuclear use. If I’m right, then the way to avoid the failure mode you’re identifying is returning in a few weeks if no new nuclear weapons have been used, or similar.
I think the problem is that the vagueness of the type of commitment the GWWC represents. If it’s an ironclad commitment, people should lose a lot of trust in you. If it was a “best of intention” type commitment, people should only lose a modest amount of trust in you. I think the difference matters!
I super agree it’s important not to conflate “do you keep actually-thoughtful promises you think people expected you to interpret as real commitments” and “do you take all superficially-promise-like-things as serious promises”! And while I generally want people to think harder about what they’re asking for wrt commitments, I don’t think going overboard on strict-promise interpretations is good. Good promises have a shared understanding between both parties. I think a big part of building trust with people is figuring out a good shared language and context for what you mean, including when making strong and weak commitments.
I wrote something related my first draft but removed since it seemed a little tangtial, but I’ll paste it here:
”It’s interesting that there are special kinds of ways of saying things that hold more weight than other ways of saying things. If I say “I absolutely promise I will come to your party”, you will probably have a much higher expectation that I’ll attend then if I say “yeah I’ll be there”. Humans have fallible memory, they sometimes set intentions and then can’t carry through. I think some of this is a bit bad and some is okay. I don’t think everyone would be better off if every time they said they would do something they treated this as an ironclad commitment and always followed through. But I do think it would be better if we could move at least somewhat in this direction.”
Which, based on your comment, I now think the thing to move for is not just “interpreting commitments as stronger” but rather “more clarity in communication about what kind of commitments are what type.”
I think it will require us to reshape / redesign most ecosystems & probably pretty large parts of many / most animals. This seems difficult but well within the bounds of a superintelligence’s capabilities. I think that at least within a few decades of greater-than-human-AGI we’ll have superintelligence, so in the good future I think we can solve this problem.
I don’t think an ordinary small/medium tech company can succeed at this. I think it’s possible with significant (extraordinary) effort, but that sort of remains to be seen.
As I said in another thread:
>> I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
I agree that a lot of the research today by leading labs is being published. I think the norms are slowly changing, at least for some labs. Deciding not to (initially) release the model weights of GPT-2 was a big change in norms iirc, and I think the trend towards being cautious with large language models has continued. I expect that as these systems get more powerful, and the ways they can be misused gets more obvious, norms will naturally shift towards less open publishing. That being said, I’m not super happy with where we’re at now, and I think a lot of labs are being pretty irresponsible with their publishing.
The dual-use question is a good one, I think. Offensive security knowledge is pretty dual-use, yes. Pen testers can use their knowledge to illegally hack if they want to. But the incentives in the US are pretty good regarding legal vs. illegal hacking, less so in other countries. I’m not super worried about people learning hacking skills to protect AGI systems only to use those skills to cause harm—mostly because the offensive security area is already very big / well resourced. In terms of using AI systems to create hacking tools, that’s an area where I think dual-use concerns can definitely come into play, and people should be thoughtful & careful there.
I liked your shortform post. I’d be happy to see people apply infosec skills towards securing nuclear weapons (and in the biodefense area as well). I’m not very convinced this would mitigate risk from superintelligent AI, since nuclear weapons would greatly damage infrastructure without killing everyone, and thus not be very helpful to eliminating humans imo. You’d still need some kind of manufacturing capability in order to create more compute, and if you have the robotics capability to do this then wiping out humans probably doesn’t take nukes—you could do it with drones or bioweapons or whatever. But this is all highly speculative, of course, and I think there is a case for securing nuclear weapons without looking at risks form superintelligence. Improving the security of nuclear weapons may increase the stability of nuclear weapons states, and that seems good for their ability to negotiate with one another, so I could see there being some route to AI existential risk reduction via that avenue.
I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
- May 5, 2022, 5:37 AM; 5 points) 's comment on Information security considerations for AI and the long term future by (
I expect most people to think either that AMF or MIRI is much more likely to do good. So from most agent’s perspectives, the unilateral defection is only better if their chosen org wins. If someone has more of a portfolio approach that weights longtermist and global poverty efforts similarly, then your point holds. I expect that’s a minority position though.
Thanks!
I see you define it a few paragraphs down, but at the top would be helpful I think
Could you define ESG investing at the begining of your post?
Yeah, I would agree with that! I think radiological weapons are some of the most relevant nuclear capabilities / risks to consider from a longterm perspective, due to their risk of being developed in the future.
The part I added was:
”By a full-scale war, I mean a nuclear exchange between major world powers, such as the US, Russia, and China, using the complete arsenals of each country. The total number of warheads today (14,000) is significantly smaller than during the height of the cold war (70,000). While extinction from nuclear war is unlikely today, it may become more likely if significantly more warheads are deployed or if designs of weapons change significantly.”
I also think indirect extinction from nuclear war is unlikely, but I would like to address this more in a future post. I disagree that additional clarifications are needed. I think people made these points clearly in the comments, and that anyone motivated to investigate this area seriously can read those. If you want to try to doublecrux on why we disagree here I’d be up for that, though on a call might be preferable for saving time.
Thanks for this perspective!
Strong agree!
I mean that the amount required to cover every part of the Earth’s surface would serve no military purpose. Or rather, it might enhance one’s deterrent a little bit, but it would
1) kill all of one’s own people, which is the opposite of a defense objective
2) not be a very cost effective way to improve one’s deterrent. In nearly all cases it would make more sense to expand second strike capabilities by adding more submarines, mobile missile launchers, or other stealth second strike weapons.
Which isn’t to say this couldn’t happen! Military research teams have proposed crazy plans like this before. I’m just arguing, as have many others at RAND and elsewhere, that a doomsday machine isn’t a good deterrent, compared to the other options that exist (and given the extraordinary downside risks).
FWIW, my guess is that you’re already planning to do this, but I think it could be valuable to carefully consider information hazards before publishing on this [both because of messaging issues similar to the one we discussed here and potentially on the substance, e.g. unclear if it’d be good to describe in detail “here is how this combination of different hazards could kill everyone”]. So I think e.g. asking a bunch of people what they think prior to publication could be good. (I’d be happy to review a post prior to publication, though I’m not sure if I’m particularly qualified.)
Yes, I was planning to get review prior to publishing this. In general when it comes to risks from biotechnology, I’m trying to follow the principles we developed here: https://www.lesswrong.com/posts/ygFc4caQ6Nws62dSW/bioinfohazards I’d be excited to see, or help workshop, better guidance for navigating information hazards in this space in the future.
Thanks, fixed!
@Daniel_Eth asked me why I choose 1:1 offsets. The answer is that I did not have a principled reason for doing so, and do not think there’s anything special about 1:1 offsets except that they’re a decent schelling point. I think any offsets are better than no offsets here. I don’t feel like BOTECs of harm caused as a way to calculate offsets are likely to be particularly useful here but I’d be interested in arguments to this effect if people had them.