What do you think are the biggest mistakes that the AI Safety community is currently making?
Sam Clarke
My sense of the current general landscape of AI Safety is: various groups of people pursuing quite different research agendas, and not very many explicit and written-up arguments for why these groups think their agenda is a priority (a notable exception is Paul’s argument for working on prosaic alignment). Does this sound right? If so, why has this dynamic emerged and should we be concerned about it? If not, then I’m curious about why I developed this picture.
Thanks for the reply! Could you give examples of:
a) two agendas that seem to be “reflecting” the same underlying problem despite appearing very different superficially?
b) a “deep prior” that you think some agenda is (partially) based on, and how you would go about working out how deep it is?
On crux 4: I agree with your argument that good alignment solutions will be put to use, in worlds where AI risk comes from AGI being an unbounded maximiser. I’m less certain that they would be in worlds where AI risk comes from structural loss of control leading to influence-seeking agents (the world still gets better in Part I of the story, so I’m uncertain whether there would be sufficient incentive for corporations to use AIs aligned with complex values rather than AIs aligned with profit maximisation).
Do you have any thoughts on this or know if anyone has written about it?
FYI, broken link here:
I think my views on this are pretty similar to those Beckstead expresses here
I helped run the other survey mentioned , so I’ll jump in here with the relevant results and my explanation for the difference. The full results will be coming out this week.
Results
We asked participants to estimate the probability of an existential catastrophe due to AI (see definitions below). We got:
mean: 0.23
median: 0.1
Our question isn’t directly comparable with Rob’s, because we don’t condition on the catastrophe being “as a result of humanity not doing enough technical AI safety research” or “as a result of AI systems not doing/optimizing what the people deploying them wanted/intended”. However, that means that our results should be even higher than Rob’s.
Also, we operationalise existential catastophe/risk differently, though I think the operationalisations are similar to the point that they wouldn’t effect my estimate. Nonetheless:
it’s possible that some respondents mistook “existential catastrophe” for “extinction” in our survey, despite our clarifications (survey respondents often don’t read the clarifications!)
while “the overall value of the future will be drastically less than it could have been” and “existential catastrophe” are intended to be basically the same, the former intuitively “sounds” more likely than the latter, which might have affected some responses.
My explanation
I think it’s probably a combination of things, including this difference in operationalisation, random noise, and Rob’s suggestion that “respondents who were following the forum discussion might have been anchored in some way by that discussion, or might have had a social desirability effect from knowing that the survey-writer puts high probability on AI risk. It might also have made a difference that I work at MIRI.”
I can add a bit more detail to how it might have made a difference that Rob works at MIRI:
In Rob’s survey, 5⁄27 of respondents who specified an affiliation said they work at MIRI (~19%)
In our survey, 1⁄43 respondents who specified an affiliation said they work at MIRI (~2%)
(Rob’s survey had 44 respondents in total, ours had 75)
Definitions from our survey
Define an existential catastrophe as the premature extinction of Earth-originating intelligent life or the permanent and drastic destruction of its potential for desirable future development (Bostrom, 2013).
Define an existential catastrophe due to AI as an existential catastrophe that could have been avoided had humanity’s development, deployment or governance of AI been otherwise. This includes cases where:
AI directly causes the catastrophe.
AI is a significant risk factor in the catastrophe, such that no catastrophe would have occurred without the involvement of AI.
Humanity survives but its suboptimal use of AI means that we fall permanently and drastically short of our full potential.
Other results from our survey
We also asked participants to estimate the probability of an existential catastrophe due to AI under two other conditions.
Within the next 50 years
mean: 0.12
median: 0.05
In a counterfactual world where AI safety and governance receive no further investment or work from people aligned with the ideas of “longtermism”, “effective altruism” or “rationality” (but there are no other important changes between this counterfactual world and our world, e.g. changes in our beliefs about the importance and tractability of AI risk issues).
mean: 0.32
median: 0.25
- 4 Jun 2021 0:58 UTC; 5 points) 's comment on “Existential risk from AI” survey results by (LessWrong;
Survey on AI existential risk scenarios
Thanks Rob, interesting question. Here are the correlation coefficients between pairs of scenarios (sorted from max to min):
So it looks like there are only weak correlations between some scenarios.
It’s worth bearing in mind that we asked respondents not to give an estimate for any scenarios they’d thought about for less than 1 hour. The correlations could be stronger if we didn’t have this requirement.
An effective mental health intervention, for me, is listening to a podcast which ideally (1) discusses the thing I’m struggling with and (2) has EA, Rationality or both in the background. I gain both in-the-moment relief, and new hypotheses to test or tools to try.
Esp since it would be scalable, this makes me think that creating an EA mental health podcast would be an intervention worth testing—I wonder if anyone is considering this?
In the meantime, I’m on the look out for good mental health podcasts in general.
- 3 Nov 2021 8:26 UTC; 2 points) 's comment on Is there anyone working full-time on helping EAs address mental health problems? by (
Incidentally, that 80k episode and some from Clearer Thinking are the exact examples I had in mind!
Maybe one could promote specific podcast episodes of this type, see if people found them useful in that way, and if so then encourage those podcasts to have more such eps or a new such podcast to start?
As a step towards this, and in case any other find it independently useful, here are the episodes of Clearer Thinking that I recall finding helpful for my mental health (along with the issues they helped with).
#11 Comfort Languages and Nuanced Thinking (for thinking through I what need, and what loved ones need, in difficult times)
#21 Antagonistic Learning and Civilization (had some useful thoughts about how education has taught me that breaking rules makes me bad, whereas in reality, breaking rules is just a cost to include in my calculation of what the best action is)
#22 Self-Improvement and Research Ethics (getting more traction on why my attempts at self-improvement often don’t work)
#25 Happiness and Hedonic Adaptation (hedonic adaptation seems like a very important concept for living a happier life, and this is the best discussion of it that I’ve heard)
#26 Past / Future Selves and Intrinsic Values (I recall something being useful about how I relate to past and future me)
#43 Online and IRL Relationships (getting relationships are a big part of my happiness and this had a very dense collection of insights about how to do relationships well—other dense insights have come from reading Nonviolent Communication and doing Circling with partners)
#54 Self-Improvement and Behavior Change (lots of stuff, most important was realising that many “negative” behaviour patterns are actually bringing you some benefit in a convoluted way, and until you identify find a substitute for that benefit, they’ll be very hard to change)
#60 Heaven and hell on earth (thinking about the value of “bad” mental states like anxiety and depression)
#65 Utopia on earth and morality without guilt (thinking through how I relate to my desire to do good, guilt vs bright desire; the handle of “clingy-ness” for a certain flavour of mental experiences)
#68 How to communicate better with the people in your life (getting more traction on why some social interactions leave me feeling disconnected/isolated)
Thanks for writing this, I found it helpful and really clearly written!
One reaction: if you’re testing research as a career (rather than having committed and now aiming to maximise your chances of success), your goal isn’t exactly to succeed as an early stage researcher. It might be that trying your best to succeed is approximately the best way to test your fit—but it seems like there are a few differences:
“Going where there’s supervision” might be especially important, since a supervisor who comes to know you very well is a big and reliable source of information about your fit for research—which seems esp. important given that feedback in the form of “how much other people like your ideas” is often biased (e.g. because most of your early ideas are bad) or noisy (e.g. because some factors that influence the success of your research aren’t under your control).
It might be important to test your fit for different fields or flavours (e.g. quantitative vs qualitative, empircal vs theoretical) of research. This can come apart from the goal of trying to succeed as an early-stage researcher—since moving into unfamiliar territory might mean your outputs are less good in the short term.
Relatedly, it might be important to select at least some of your projects based on the skills or knowledge gaps they help you fill. Again, this goal might come apart from short term success (e.g. you pick a forecasting project to improve those skills, despite not expecting it to generate interesting findings)
Probably you want to spend less energy marketing your work, except to the extent that it’s helpful in getting more people to give you feedback on your fit for a research career.
[most uncertain] “Someone senior tells you what to work on” might actually not be the ideal solution to your problem 1. If the skills of research execution and research planning are importantly different, then you might fail to get enough info about your competence/enjoyment/fit for research planning skills (but I’m pretty uncertain if they are importantly different).
I’d be curious how much you agree with any of these points :)
Write out at least 10 project ideas, and ask somebody more senior to rank the best few
For bonus points, try to understand how they did the ranking. That way, you can start building up a model of how senior researchers think about evaluating project ideas, and refining your own research taste explicitly.
The ‘lean startup’ approach reminds me of Jacob Steinhardt’s post about his approach to research, of which the key takeaways are:
When working on a research project, you should basically either be in “de-risking mode” (determining if the project is promising as quickly as possible) OR “execution mode” (assuming the project is promising and trying to do it quickly). This probably looks like trying to do an MVP version of the project quickly, and then iterating on that if it’s promising.
If a project doesn’t work out, ask why. That way you:
avoid trying similar things that will fail for the same reasons.
will find out whether it didn’t work because your implementation was broken, or the high-level approach you were taking isn’t promising.
Try hard, early, to try to show that your project won’t solve the problem.
Some questions that aren’t super related to Redwood/applied ML AI safety, so feel free to ignore if not your priority:
-
Assuming that it’s taking too long to solve the technical alignment problem, what might be some of our other best interventions to reduce x-risk from AI? E.g., regulation, institutions for fostering cooperation and coordination between AI labs, public pressure on AI labs/other actors to slow deployment, …
-
If we solve the technical alignment problem in time, what do you think are the other major sources of AI-related x-risk that remain? How likely do you think these are, compared to x-risk from not solving the technical alignment problem in time?
-
How crucial a role do you expect x-risk-motivated AI alignment will play in making things go well? What are the main factors you expect will influence this? (e.g. the occurrence of medium-scale alignment failures as warning shots)
What might be an example of a “much better weird, theory-motivated alignment research” project, as mentioned in your intro doc? (It might be hard to say at this point, but perhaps you could point to something in that direction?)
Lessons learned running the Survey on AI existential risk scenarios
General vs specific arguments for the longtermist importance of shaping AI development
Thanks for the detailed reply, all of this makes sense!
I added a caveat to the final section mentioning your disagreements with some of the points in the “Other small lessons about survey design” section
Paul Christiano is a lot more optimistic than MIRI about whether we could align a Prosaic AGI. In a relatively recent interview with AI Impacts he said he thinks “probably most of the disagreement” about this lies in the question of “can this problem [alignment] just be solved on paper in advance” (Paul thinks there’s “at least a third chance” of this, but suggests MIRI’s estimate is much lower). Do you have a sense of why MIRI and Paul disagree so much on this estimate?