Agreed, thanks for the pushback!
Sam Clarke
Ways of framing EA that (extremely anecdotally*) make it seem less ick to newcomers. These are all obvious/boring; I’m mostly recording them here for my own consolidation
EA as a bet on a general way of approaching how to do good, that is almost certainly wrong in at least some ways—rather than a claim that we’ve “figured out” how to do the most good (like, probably no one claims the latter, but sometimes newcomers tend to get this vibe). Different people in the community have different degrees of belief in the bet, and (like all bets) it can make sense to take it even if you still have a lot of uncertainty.
EA as about doing good on the current margin. That is, we’re not trying to work out the optimal allocation of altruistic resources in general, but rather: given how the rest of the world is spending its money and time to do good, which approaches could do with more attention? Corollary: you should expect to see EA behaviour changing over time (for this and other reasons). This is a feature not a bug.
EA as diverse in its ways of approaching how to do good. Some people work on global health and wellbeing. Others on animal welfare. Others on risks from climate change and advanced technology.
These frames can also apply to any specific cause area.
*like, I remember talking to a few people who became more sympathetic when I used these frames.
I’m still confused about the distinction you have in mind between inside view and independent impression (which also have the property that they feel true to me)?
Or do you have no distinction in mind, but just think that the phrase “inside view” captures the sentiment better?
Thanks—good points, I’m not very confident either way now
Thanks, I appreciate this post a lot!
Playing the devil’s advocate for a minute, I think one main challenge to this way of presenting the case is something like “yeah, and this is exactly what you’d expect to see for a field in its early stages. Can you tell a story for how these kinds of failures end up killing literally everyone, rather than getting fixed along the way, well before they’re deployed widely enough to do so?”
And there, it seems you do need to start talking about agents with misaligned goals, and the reasons to expect misalignment that we don’t manage to fix?
Thanks for writing this!
There are yet other views about about what exactly AI catastrophe will look like, but I think it is fair to say that the combined views of Yudkowsky and Christiano provide a fairly good representation of the field as a whole.
I disagree with this.
We ran a survey of prominent AI safety and governance researchers, where we asked them to estimate the probability of five different AI x-risk scenarios.
Arguably, the “terminator-like” scenarios are the “Superintelligence” scenario, and part 2 of “What failure looks like” (as you suggest in your post).[1]
Conditional on an x-catastrophe due to AI occurring, the median respondent gave those scenarios 10% and 12% probability (mean 16% each). The other three scenarios[2] got median 12.5%, 10% and 10% (means 18%, 17% and 15%).
So I don’t think that the “field as a whole” thinks terminator-like x-risk scenarios are the most likely. Accordingly, I’d prefer if the central claim of this post was “AI risk could actually be like terminator; stop saying it’s not”.
- ↩︎
Part 1 of “What failure looks like” probably doesn’t look that much like Terminator (disaster unfolds more slowly and is caused by AI systems just doing their jobs really well)
- ↩︎
That is, the following three secanrios: Part 1 of “What failure looks like”, existentially catastrophic AI misuse, and existentially catastrophic war between humans exacerbated by AI. See the post for full scenario descriptions.
- ↩︎
After practising some self-love I am now noticeably less stressed about work in general. I sleep better, have more consistent energy, enjoy having conversations about work-related stuff more (so I just talk about EA and AI risk more than I used to, which was a big win on my previous margin). I think I maybe work fewer hours than I used to because before it felt like there was a bear chasing me and if I wasn’t always working then it was going to eat me, whereas now that isn’t the case. But my working patterns feel healthy and sustainable now; before, I was going through cycles of half-burning out every 3 months or so (which was bad enough for my near-term productivity, not to mention long-term producitivity and health). I also spend relatively less time just turning the handle on my mainline tasks (vs zooming out, having random conversations that feel useful but won’t pay off immediately, reading more widely), which again I think was a win on my previous margin (maybe reduced it from ~90% to ~80% of my research hours).
I’m confused about how this happened. My model is that before there were two parts of me that strongly disagreed about whether work is good, and that these parts have now basically resolved (they agree that doing sensible amounts of work is good), because both feel understood and loved. Basically the part that didn’t think work was good just needed its needs to be understood and taken into account.
I think this model is quite different from Charlie’s main model of what happens (which is to do with memory consolidation), so I’m especially confused.
I haven’t attained persistent self-love of the sort described here.
I found this helpful and am excited to try it—thanks for sharing!
Also, nitpick, but I find the “inside view” a more confusing and jargony way of just saying “independent impressions” (okay, also jargon to some extent, but closer to plain English), which also avoids the problem you point out: inside view is not the opposite of the Tetlockian sense of outside view (and the other ambiguities with outside view that another commenter pointed out).
Nice post! I agree with ~everything here. Parts that felt particularly helpful:
There are even more reasons why paraphrasing is great than I thought—good reminder to be doing this more often
The way you put this point was v crisp and helpful: “Empirically, there’s a lot of smart people who believe different and contradictory things! It’s impossible for all of them to be right, so you must disagree with some of them. Internalising that you can do this is really important for being able to think clearly”
The importance of “how much feedback do they get from the world” in deferring intelligently
One thing I disagree with: the importance of forming inside views for community epistemic health. I think it’s pretty important. E.g. I think that ~2 years ago, the arguments for the longterm importance of AGI safety were pretty underdeveloped; that since then lots more people have come out with their insidee views about it; and that now the arguments are in much better shape.
Note: the deadline has been extended to 27 February 2022
Yes that would be helpful, thanks!
CSER is hiring for a senior research associate on longterm AI risk and governance
Maybe this process generalises and so longtermist AI governance can learn from other communities?
In some sense, this post explains how the longtermist AI governance community is trying to go from “no one understands this issue well”, to actually improving concrete decisions that affect the issue.
It seems plausible that the process described here is pretty general (i.e. not specific to AI governance). If that’s true, then there could be opportunities for AI governance to learn from how this process has been implemented in other communities/fields and vice-versa.
Something that would improve this post but I didn’t have time for:
For each kind of work, give a sense of:
The amount of effort currently going into it
What the biggest gaps/bottlenecks/open questions are
What kinds of people might be well-suited to it
Thanks!
I agree with your quibble. Other than the examples you list here, I’m curious for any other favourite reports/topics in the broader space of AI governance—esp. ones that you think are at least as relevant to longtermist AI governance as the average example I give in this post?
The longtermist AI governance landscape: a basic overview
Clarifications about structural risk from AI
Note: “If you want to add one or more co-authors to your post, you’ll need to contact the Forum team...” is no longer the easiest way to add co-authors, so might want to be updated accordingly.
And by the way, thanks for adding this new feature!
Another (unoriginal) way that heavy AI reg could be counterproductive for safety: AGI alignment research probably increases in productivity as you get close to AGI. So, regulation in jurisdictions with the actors who are closest to AGI (currently, US/UK) would give those actors less time to do high productivity AGI alignment research, before the 2nd place actor catches up
And within a jurisdiction, you might think that responsible actors are most likely to comply to regulation, differentially slowing them down