I’m Jeffrey Ladish. I’m a security researcher and risk consultant focused on global catastrophic threats. My website is at https://jeffreyladish.com
Donation offsets for ChatGPT Plus subscriptions
Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?
Really appreciate you! It’s felt stressful sometimes as just someone in the community and it’s hard to imagine how stressful it would feel for me in your shoes. Really appreciate your hard work, and I think the EA movement is significantly improved through your hard work maintaining and improving and moderating the forum, and all the mostly-unseen-but-important work mitigating conflicts & potential harm in the community.
When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️
I think it’s worth noting that that I’d expect you would gain a significant relative advantage if you get out of cities before other people, such that acting later would be a lot less effective at furthering your survival & rebuilding goals.
I expect the bulk of the risk of an all out nuclear war to happen in the couple of weeks after the first nuclear use. If I’m right, then the way to avoid the failure mode you’re identifying is returning in a few weeks if no new nuclear weapons have been used, or similar.
I think the problem is that the vagueness of the type of commitment the GWWC represents. If it’s an ironclad commitment, people should lose a lot of trust in you. If it was a “best of intention” type commitment, people should only lose a modest amount of trust in you. I think the difference matters!
I super agree it’s important not to conflate “do you keep actually-thoughtful promises you think people expected you to interpret as real commitments” and “do you take all superficially-promise-like-things as serious promises”! And while I generally want people to think harder about what they’re asking for wrt commitments, I don’t think going overboard on strict-promise interpretations is good. Good promises have a shared understanding between both parties. I think a big part of building trust with people is figuring out a good shared language and context for what you mean, including when making strong and weak commitments.
I wrote something related my first draft but removed since it seemed a little tangtial, but I’ll paste it here:
”It’s interesting that there are special kinds of ways of saying things that hold more weight than other ways of saying things. If I say “I absolutely promise I will come to your party”, you will probably have a much higher expectation that I’ll attend then if I say “yeah I’ll be there”. Humans have fallible memory, they sometimes set intentions and then can’t carry through. I think some of this is a bit bad and some is okay. I don’t think everyone would be better off if every time they said they would do something they treated this as an ironclad commitment and always followed through. But I do think it would be better if we could move at least somewhat in this direction.”
Which, based on your comment, I now think the thing to move for is not just “interpreting commitments as stronger” but rather “more clarity in communication about what kind of commitments are what type.”
Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
I think it will require us to reshape / redesign most ecosystems & probably pretty large parts of many / most animals. This seems difficult but well within the bounds of a superintelligence’s capabilities. I think that at least within a few decades of greater-than-human-AGI we’ll have superintelligence, so in the good future I think we can solve this problem.
My vision of a good future, part I
US Citizens: Targeted political contributions are probably the best passive donation opportunities for mitigating existential risk
I don’t think an ordinary small/medium tech company can succeed at this. I think it’s possible with significant (extraordinary) effort, but that sort of remains to be seen.
As I said in another thread:
>> I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
I agree that a lot of the research today by leading labs is being published. I think the norms are slowly changing, at least for some labs. Deciding not to (initially) release the model weights of GPT-2 was a big change in norms iirc, and I think the trend towards being cautious with large language models has continued. I expect that as these systems get more powerful, and the ways they can be misused gets more obvious, norms will naturally shift towards less open publishing. That being said, I’m not super happy with where we’re at now, and I think a lot of labs are being pretty irresponsible with their publishing.
The dual-use question is a good one, I think. Offensive security knowledge is pretty dual-use, yes. Pen testers can use their knowledge to illegally hack if they want to. But the incentives in the US are pretty good regarding legal vs. illegal hacking, less so in other countries. I’m not super worried about people learning hacking skills to protect AGI systems only to use those skills to cause harm—mostly because the offensive security area is already very big / well resourced. In terms of using AI systems to create hacking tools, that’s an area where I think dual-use concerns can definitely come into play, and people should be thoughtful & careful there.
I liked your shortform post. I’d be happy to see people apply infosec skills towards securing nuclear weapons (and in the biodefense area as well). I’m not very convinced this would mitigate risk from superintelligent AI, since nuclear weapons would greatly damage infrastructure without killing everyone, and thus not be very helpful to eliminating humans imo. You’d still need some kind of manufacturing capability in order to create more compute, and if you have the robotics capability to do this then wiping out humans probably doesn’t take nukes—you could do it with drones or bioweapons or whatever. But this is all highly speculative, of course, and I think there is a case for securing nuclear weapons without looking at risks form superintelligence. Improving the security of nuclear weapons may increase the stability of nuclear weapons states, and that seems good for their ability to negotiate with one another, so I could see there being some route to AI existential risk reduction via that avenue.
I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
- 5 May 2022 5:37 UTC; 5 points)'s comment on Information security considerations for AI and the long term future by (
Information security considerations for AI and the long term future
I expect most people to think either that AMF or MIRI is much more likely to do good. So from most agent’s perspectives, the unilateral defection is only better if their chosen org wins. If someone has more of a portfolio approach that weights longtermist and global poverty efforts similarly, then your point holds. I expect that’s a minority position though.
EA Hangout Prisoners’ Dilemma
Retrospective on Catalyst, a 100-person biosecurity summit
@Daniel_Eth asked me why I choose 1:1 offsets. The answer is that I did not have a principled reason for doing so, and do not think there’s anything special about 1:1 offsets except that they’re a decent schelling point. I think any offsets are better than no offsets here. I don’t feel like BOTECs of harm caused as a way to calculate offsets are likely to be particularly useful here but I’d be interested in arguments to this effect if people had them.