Jeffrey Ladish

Karma: 786

I’m Jeffrey Ladish. I’m a security researcher and risk consultant focused on global catastrophic threats. My website is at https://jeffreyladish.com

Bounty for Evidence on Some of Palisade Research’s Beliefs

bwr23 Sep 2024 20:05 UTC

5 points

0 comments1 min readEA link

AI companies are not on track to secure model weights

Jeffrey Ladish18 Jul 2024 15:13 UTC

73 points

3 comments19 min readEA link

Palisade is hiring Research Engineers

Charlie Rogers-Smith11 Nov 2023 3:09 UTC

23 points

0 comments3 min readEA link

Jeffrey Ladish 17 Mar 2023 0:05 UTC
8 points
2 ∶ 0
on: Donation offsets for ChatGPT Plus subscriptions
@Daniel_Eth asked me why I choose 1:1 offsets. The answer is that I did not have a principled reason for doing so, and do not think there’s anything special about 1:1 offsets except that they’re a decent schelling point. I think any offsets are better than no offsets here. I don’t feel like BOTECs of harm caused as a way to calculate offsets are likely to be particularly useful here but I’d be interested in arguments to this effect if people had them.

Donation offsets for ChatGPT Plus subscriptions

Jeffrey Ladish16 Mar 2023 23:11 UTC

76 points

10 comments3 min readEA link

Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?

Jeffrey Ladish10 Mar 2023 8:20 UTC

12 points

0 comments9 min readEA link

Jeffrey Ladish 6 Feb 2023 7:57 UTC
16 points
7 ∶ 0
on: Thank you so much to everyone who helps with our community’s health and forum.
Really appreciate you! It’s felt stressful sometimes as just someone in the community and it’s hard to imagine how stressful it would feel for me in your shoes. Really appreciate your hard work, and I think the EA movement is significantly improved through your hard work maintaining and improving and moderating the forum, and all the mostly-unseen-but-important work mitigating conflicts & potential harm in the community.

When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

Jeffrey Ladish5 Jan 2023 1:55 UTC

16 points

2 comments2 min readEA link

Jeffrey Ladish 5 Oct 2022 0:41 UTC
17 points
9 ∶ 1
in reply to: Kelsey Piper’s comment on: Overreacting to current events can be very costly
I think it’s worth noting that that I’d expect you would gain a significant relative advantage if you get out of cities before other people, such that acting later would be a lot less effective at furthering your survival & rebuilding goals.

I expect the bulk of the risk of an all out nuclear war to happen in the couple of weeks after the first nuclear use. If I’m right, then the way to avoid the failure mode you’re identifying is returning in a few weeks if no new nuclear weapons have been used, or similar.

Jeffrey Ladish 12 Jul 2022 0:35 UTC
9 points
0 ∶ 0
in reply to: Henry Howard🔸’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
I think the problem is that the vagueness of the type of commitment the GWWC represents. If it’s an ironclad commitment, people should lose a lot of trust in you. If it was a “best of intention” type commitment, people should only lose a modest amount of trust in you. I think the difference matters!

Jeffrey Ladish 12 Jul 2022 0:32 UTC
3 points
0 ∶ 0
in reply to: ClaireZabel’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
I super agree it’s important not to conflate “do you keep actually-thoughtful promises you think people expected you to interpret as real commitments” and “do you take all superficially-promise-like-things as serious promises”! And while I generally want people to think harder about what they’re asking for wrt commitments, I don’t think going overboard on strict-promise interpretations is good. Good promises have a shared understanding between both parties. I think a big part of building trust with people is figuring out a good shared language and context for what you mean, including when making strong and weak commitments.

I wrote something related my first draft but removed since it seemed a little tangtial, but I’ll paste it here:

”It’s interesting that there are special kinds of ways of saying things that hold more weight than other ways of saying things. If I say “I absolutely promise I will come to your party”, you will probably have a much higher expectation that I’ll attend then if I say “yeah I’ll be there”. Humans have fallible memory, they sometimes set intentions and then can’t carry through. I think some of this is a bit bad and some is okay. I don’t think everyone would be better off if every time they said they would do something they treated this as an ironclad commitment and always followed through. But I do think it would be better if we could move at least somewhat in this direction.”

Which, based on your comment, I now think the thing to move for is not just “interpreting commitments as stronger” but rather “more clarity in communication about what kind of commitments are what type.”

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments

Jeffrey Ladish11 Jul 2022 19:38 UTC

60 points

9 comments6 min readEA link

Jeffrey Ladish 6 Jul 2022 2:00 UTC
1 point
0 ∶ 0
in reply to: Fai’s comment on: My vision of a good future, part I
I think it will require us to reshape / redesign most ecosystems & probably pretty large parts of many / most animals. This seems difficult but well within the bounds of a superintelligence’s capabilities. I think that at least within a few decades of greater-than-human-AGI we’ll have superintelligence, so in the good future I think we can solve this problem.

My vision of a good future, part I

Jeffrey Ladish6 Jul 2022 1:23 UTC

34 points

3 comments9 min readEA link

US Citizens: Targeted political contributions are probably the best passive donation opportunities for mitigating existential risk

Jeffrey Ladish5 May 2022 23:04 UTC

51 points

20 comments5 min readEA link

Jeffrey Ladish 5 May 2022 5:37 UTC
5 points
0 ∶ 0
in reply to: HaydnBelfield’s comment on: Information security considerations for AI and the long term future
I don’t think an ordinary small/medium tech company can succeed at this. I think it’s possible with significant (extraordinary) effort, but that sort of remains to be seen.

As I said in another thread:

>> I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.

Jeffrey Ladish 5 May 2022 5:33 UTC
5 points
0 ∶ 0
in reply to: aog’s comment on: Information security considerations for AI and the long term future
I agree that a lot of the research today by leading labs is being published. I think the norms are slowly changing, at least for some labs. Deciding not to (initially) release the model weights of GPT-2 was a big change in norms iirc, and I think the trend towards being cautious with large language models has continued. I expect that as these systems get more powerful, and the ways they can be misused gets more obvious, norms will naturally shift towards less open publishing. That being said, I’m not super happy with where we’re at now, and I think a lot of labs are being pretty irresponsible with their publishing.

The dual-use question is a good one, I think. Offensive security knowledge is pretty dual-use, yes. Pen testers can use their knowledge to illegally hack if they want to. But the incentives in the US are pretty good regarding legal vs. illegal hacking, less so in other countries. I’m not super worried about people learning hacking skills to protect AGI systems only to use those skills to cause harm—mostly because the offensive security area is already very big / well resourced. In terms of using AI systems to create hacking tools, that’s an area where I think dual-use concerns can definitely come into play, and people should be thoughtful & careful there.

I liked your shortform post. I’d be happy to see people apply infosec skills towards securing nuclear weapons (and in the biodefense area as well). I’m not very convinced this would mitigate risk from superintelligent AI, since nuclear weapons would greatly damage infrastructure without killing everyone, and thus not be very helpful to eliminating humans imo. You’d still need some kind of manufacturing capability in order to create more compute, and if you have the robotics capability to do this then wiping out humans probably doesn’t take nukes—you could do it with drones or bioweapons or whatever. But this is all highly speculative, of course, and I think there is a case for securing nuclear weapons without looking at risks form superintelligence. Improving the security of nuclear weapons may increase the stability of nuclear weapons states, and that seems good for their ability to negotiate with one another, so I could see there being some route to AI existential risk reduction via that avenue.

Jeffrey Ladish 5 May 2022 5:20 UTC
3 points
0 ∶ 0
in reply to: AdamGleave’s comment on: Information security considerations for AI and the long term future
I think it’s an open question right now. I expect it’s possible with the right resources and environment, but I might be wrong. I think it’s worth treating as an untested hypothesis ( that we can secure X kind of system for Y application of resources ), and to try to get more information to test the hypothesis. If AGI development is impossible to secure, that cuts off a lot of potential alignment strategies. So it seems really worth trying to find out if it’s possible.
What links here?
- Jeffrey Ladish's comment on Information security considerations for AI and the long term future by Jeffrey Ladish (5 May 2022 5:37 UTC; 5 points)

Information security considerations for AI and the long term future

Jeffrey Ladish2 May 2022 20:53 UTC

134 points

8 comments11 min readEA link

Jeffrey Ladish 28 Sep 2021 0:43 UTC
5 points
0 ∶ 0
in reply to: Marcel2’s comment on: EA Hangout Prisoners’ Dilemma
I expect most people to think either that AMF or MIRI is much more likely to do good. So from most agent’s perspectives, the unilateral defection is only better if their chosen org wins. If someone has more of a portfolio approach that weights longtermist and global poverty efforts similarly, then your point holds. I expect that’s a minority position though.