I lead the DeepMind mechanistic interpretability team
Neel Nanda
Thanks for writing this up! I thought it was really interesting (and this seems a really excellent talk to be doing at student groups :) ). Especially the arguments about the economic impact of AGI, and the focus on what it costs—that’s an interesting perspective I haven’t heard emphasised elsewhere.
The parts I feel most unconvinced by:
The content in Crux 1 seems to argue that AGI will be important when it scales and becomes cheap, because of the economic impact. But the argument for the actual research being done seem more focused on AGI as a single monolithic thing, eg framings like a safety tax/arms race, comparing costs of building an unaligned AGI vs an aligned AGI.
My best guess for what you mean is that “If AGI goes well, for economic reasons, the world will look very different and so any future plans will be suspect. But the threat from AGI comes the first time one is made”, ie that Crux 1 is an argument for prioritising AGI work over other work, but unrelated to the severity of the threat of AGI—is this correct?
The claim that good alignment solutions would be put to use. The fact that so many computer systems put minimal effort into security today seems a very compelling counter-argument.
I’m especially concerned if the problems are subtle—my impression is that especially a lot of what MIRI thinks about sounds weird and “I could maybe buy this”, but could maybe not buy it. And I have much lower confidence that companies would invest heavily in security for more speculative, abstract concerns
This seems bad, because intuitively AI Safety research seems more counterfactually useful the more subtle the problems are—I’d expect people to solve obvious problems before deploying AGI even without AI Safety as a field.
Related to the first point, I have much higher confidence AGI would be safe if it’s a single, large project eg a major $100 billion deployment, that people put a lot of thought into, than if it’s cheap and used ubiquitously.
It seems like the obvious problem with this is that identifying the best investment opportunities is hard.
More specifically, I think EA really shines when identifying the problems nobody really cares about or is trying to solve already (eg, evaluating charity cost-effectiveness, improving the long-term future). It makes sense that there would be low hanging fruit for a competent altruist, because most of the world doesn’t care about those causes and isn’t trying. So there’s no reason to expect the low-hanging fruit not to already have been plucked.
Investment, on the other hand, gives EAs no such edge. The desire to make a lot of money seems near universal, and so you should expect the best investment returns to have already been taken. Because a lot of optimisation power is going into investment and into finding the best sources of returns. So I can’t see any clear edges of EAs here.
Arguably EAs have an edge in terms of caring an unusual amount about long time horizons? So I could believe that there are neglected investment opportunities that aren’t great in the short term but which sound excellent over 10+ year time horizons. And I’d be excited about seeing thought in that direction. This is still an area a lot of other people care about, but I think most investors care about shorter time horizons, so I can believe there are mispricings. It’d definitely require looking for things that aren’t also obviously good ideas in the short term though (ie not the Medallion Fund)
Long time-horizon institutions like university endowments, pension funds etc might be interesting places to look for what good strategies here look like.
It also seems plausible than an EA worldview isn’t fully priced into markets yet, eg if you believe there’s a realistic chance of transformative AI in the next few decades, tech/hardware companies might be relatively underpriced. Or more generally GCRs, like climate change, antibiotic resistance, risks of great power war, artificial pandemics might not be sufficiently priced in? (I’d have put natural pandemics on that list, but that’s probably priced in now?)
The world is full of wasted motion
What altruism means to me
Nice post! I definitely agree that being willing to call friends on their BS can be a super valuable service.
I think the right way to pull it off depends on the person you’re talking to though—it’s easy to get somebody else feeling defensive, or overwhelmed, and this detracts from the actual goal of getting them to do something. I have two approaches that seem pretty widely effective here:
1. “Socratic butt-kicking”—when I think somebody is obviously procrastinating, rather than outright telling them, I come up with an argument in my head for why I think this, and then ask a series of leading questions to lead them through that thought process. Eg, if someone is procrastinating on applying for something, I might ask “How long has it been since you decided you wanted to apply for this?”, and “Would you be surprised if it’s 2 weeks from now and you still haven’t gotten round to it?”. Or, if somebody is being insecure/imposter syndrome-y, asking “what’s the worst thing that could happen if you apply?” and “do you think you’d learn anything valuable from applying?”
I think this works really well for avoiding defensiveness, because you’re leading them through the thought process, which is generally a lot more motivating than it being externally imposed on them. And, if I am wrong in my thought process, this fails pretty gracefully, because they’ll give an unexpected answer to a question.
It can also be a good way to get them to take the Outside View—thinking about whether a typical candidate might feel the way they do, or whether they’ll ever get round to it. And to appreciate the value of cheap tests—that you should obviously do low-effort things with no real downside, even if they’re stressful. Which are both pretty obvious insights that take a lot of willpower and attention to ensure you do yourself.
2. Ensuring they leave the conversation with a concrete next action. I think a lot of stress/procrastination comes from something feeling fuzzy, stressful and overwhelming. And that there’s a lot of cognitive work in processing an overwhelming task and figuring out what to actually do about it. So I think a really valuable thing to do is to ask “what’s a concrete thing you could do to make progress towards ___?” And then once they give a vague idea, poke at it until it becomes specific and concrete.
It’s also great to ensure they have a specific time and plan—especially if you can get them to explicitly put time in their calendar for it. Long-term admin like applications sucks because it never feels urgent compared to short-term stuff in your life, so the default state of the world is that they put it off indefinitely. I often offer to message them after that block to check on them, and set myself a reminder afterwards to follow-up.
Suppose, in 10 years, that the Research Scholar’s Programme has succeeded way beyond what you expected now. What happened?
You have a pure maths research background. What areas/problems do you think this background and way of thinking give you the strongest comparative advantage at?
Can you give any examples of times your background has felt like it helped you come to valuable insights?
What do you think is the most valuable research you’ve produced so far? Did you think it would be so valuable at the time?
What common belief in EA do you most strongly disagree with?
I was thinking some more about how I approach butt-kicking, and generally helping debug others and helping them to be agenty, and wrote up a blog post on my thoughts
I think outreach directed at high schoolers feels more ethically questionable to me than outreach directed at students. I roughly think that high-schoolers tend to be significantly more impressionable/vulnerable, especially when talking to people who they consider worthy of respect. Admittedly, this also seems true of college students, albeit to a lesser degree, so I think I’m drawing arbitrary lines in the sand. But it feels different to do it with a minor/somebody still in school.
With all that said, I went to ESPR, and had an incredibly positive experience, that I think has significantly increased my expected lifetime impact! (I first went at 17). But I know people who also had pretty negative experiences (much more with the rationality side than EA, which wasn’t strongly emphasised)
I disagree that the counterfactual is comparable. I agree that they will have SOME influences, but I think the magnitude of influence really matters. By default, people aren’t exposed to strong, deliberate influence of the kind described in this post, for any set of ideas/values.
I guess you could argue that living in the West is a process of ambient influence towards Western values?
I’d expect a more significant risk to be that the outreach just wouldn’t work. I expect that for EA outreach to be effective, you need to significantly filter for a bunch of things, like altruism, truth-seeking, reliance on evidence and reason, meta-cognition, etc. I’d expect a school like Eton to filter pretty hard for expected future influence on the world, but not for probability of being interested in EA?
Though I guess it somewhat filters for intelligence, which correlates a bit with those things
Fairly strongly agreed—I think it’s much easier to express disagreement than agreement on the margin, and that on the margin people find it too intimidating to post to the EA Forum and it would be better to be perceived as friendlier. (I have a somewhat adjacent blog post about going out of your way to be a nicer person)
I strongly feel this way for specific positive feedback, since I think that’s often more neglected and can be as useful as negative feedback (at least, useful to the person making the post). I feel less strongly for “I really enjoyed this post”-esque comments, though I think more of those on the margin would be good.
An alternate approach would be to PM people the positive feedback—I think this adds a comparable amount to the person, but removes the “changing people’s perceptions of how scary posting on the EA Forum is” part
This is a great post, thanks for writing it! And I’m glad you’ve made a bunch of progress on this failure mode
Helping each other become more effective
Make a Public Commitment to Writing EA Forum Posts
Commitment: I commit to posting a post-mortem on some rationality workshops I organised for EA Cambridge by 7pm on December 6th
Commitment: I commit to writing a post-mortem about ‘a series of EA Cambridge events I organised, where members prepare & give talks on EA topics as a commitment device for learning more about EA’ by 7pm Sunday 20th Dec
I have low confidence in this, but I’m pretty excited about this idea! I’ve had many more conversations with to people super into EA over the last few months and this has definitely had a major impact on me, especially with regards to getting a better understanding of the ideas, and just making things concrete. Going from “this is some weird abstract stuff” to “these are ideas that some super awesome and smart people believe, and that I could realistically apply in my life or build my career around”.
I’m somewhat biased, because I personally much prefer talking to people to eg reading things. I think a large part is just really liking the people and finding them interesting. I also got a lot of this value from going to parties and being in an EA social environment, which this wouldn’t directly generalise to, but I conjecture that someone explicitly trying to create a good environment for this could do much better?
I’m wondering how much of the value of this could be captured by just having calls with people interested in EA but not at EA Hubs? This seems like it cuts out a lot of the logistical hassle of a residency, though at the cost of not being able to go to meetups, and losing out on the in-person interaction. I think it could capture much of the value of talking to someone highly into EA though.
This sounds good, but really hard to pull off well. I personally found that “highly dedicated EAs who have spent a lot of time thinking about this sometimes disagree on important points” only really felt visceral to me after having several IRL conversations with smart people who held different viewpoints. And after only talking to one person, it’s easy for their view and justifications to dominate, especially if they’ve thought about it a lot more than I have. Even if they give frequent caveats of “this is just my opinion”, I don’t think that feels visceral in the same way as talking to somebody really smart.
Suggested patches:
Actively try to be balanced in conversations, eg give steelmans for the positions you don’t hold
Point people towards high quality write-ups of opposing viewpoints
Some further thoughts from previous discussions with Buck:
For 1 on 1 chats with people super into EA (I’ve had a reasonable amount of experience being on both sides of this), I think one big failure mode is not being sure what to talk about. Eg, if I’m talking to somebody who actively researches an area that interests me, there’s obviously a lot of things you know a lot about that I’d find it interesting to talk about, but I struggle to come up with good questions to access those. I also expect this to be exacerbated if you’re having many conversations with people already somewhat engaged with EA, as you first need to figure out their prior level of context and knowledge. This seems a difficult problem to solve, a few ideas:
Focusing on career conversations, where this seems less of an issue
Brainstorming common things people misunderstand and trying to bring those up
Having longer conversations and trying to ensure the person in residency is a great conversationalist (this one is much less concrete, but I think the skill of finding worthwhile things to talk about varies a lot between people)
(Being on either side of these conversations and not knowing what to talk about is a problem I frequently run into, so I’d love to hear anyone’s suggestions for helping with this generally!)
Another potential failure mode is that I’d also guess there are a lot of people who might really benefit from a 1 on 1 who might feel socially awkward expressing interest or trying to arrange one, eg concern about taking up the person’s time, that they’re not impressive enough, general social anxiety/aversion to meeting a stranger 1 on 1, etc. Immediate thought for how to partially resolve this is asking local group organisers for introductions, as a friendlier point of contact? I think it’d also help to put a lot of thought into how to market this, for example whether people need to consider themselves high-achievers/high-potential. I think younger EAs systematically underestimate how much more experienced ones want to talk to them (at least in contexts like this, reaching out to people at EAG, etc)
The situation of “a conversation with somebody you’ll probably never see again” is weird, and the way to maximise impact probably differs from how I’d normally approach a conversation, since much of the value will come from things they do on their own after the conversation without (much) further prompting. Best levers to pull are probably suggesting options they wouldn’t have considered, eg career paths, or more generally challenging the narrative they’re framing their life with (though this seems high variance); connecting them with useful people to speak to; Buck’s argument about understanding their view of core EA arguments and addressing objections; and pointing them towards good resources they wouldn’t otherwise have found