EdoArad comments on AMA or discuss my 80K podcast episode: Ben Garfinkel, FHI researcher

EdoArad 14 Jul 2020 5:32 UTC
12 points
0 ∶ 0
What is your theory of change for work on clarifying arguments for AI risk?
Is the focus more on immediate impact on funding/research or on the next generation? Do you feel this is important more to direct work to the most important paths or to understand how sure are we about all this AI stuff and grow the field or deprioritize it accordingly?
- bgarfinkel 19 Jul 2020 13:55 UTC
  5 points
  0 ∶ 0
  Parent
  I think the work is mainly useful for EA organizations making cause prioritization decisions (how much attention should they devote to AI risk relative to other cause areas?) and young/early-stage people deciding between different career paths. The idea is mostly to help clarify and communicate the state of arguments, so that more fully informed and well-calibrated decisions can be made.
  
  A couple other possible positive impacts:
  - Developing and shifting to improved AI risk arguments—and publicly acknowledging uncertainties/confusions—may, at least in the long run, cause other people to take the EA community and existential-risk-oriented AI safety communities more seriously. As one particular point, I think that a lot of vocal critics (e.g. Pinker) are mostly responding to the classic arguments. If the classic arguments actually have significant issues, then it’s good to acknowledge this; if other arguments (e.g. these) are more compelling, then it’s good to work them out more clearly and communicate them more widely. As another point, I think that sharing this kind of work might reduce perceptions that the EA is more group-think-y/unreflective than it actually is. I know that people have sometimes pointed to my EAG talk from a couple years back, for example, in response to concerns that the EA community is too uncritical in its acceptance of AI risk arguments.
  - I think that it’s probably useful for the AI safety community to have a richer and more broadly shared understanding of different possible “AI risk threat models”; presumably, this would feed into research agendas and individual prioritization decisions to some extent. I think that work that analyzes newer AI risk arguments, especially, would be useful here. For example, it seems important to develop a better understanding of the role that “mesa-optimization” plays in driving existential risk.
  (There’s also the possibility of negative impact, of course: focusing too much on the weaknesses of various arguments might cause people to downweight or de-prioritize risks more than they actually should.)
  
  I haven’t thought very much about the timelines of which this kind of work is useful, but I think it’s plausible that the delayed impact on prioritization and perception is more important than the immediate impact.