I guess as you disclaimed might be the case up front, I donât think these are the strongest or most informed examples of EAs impact on AI safety.
In many of cases of such impact, one can quibble about many things:
Whether that impact was clearly positive, or whether it had some kind of indirectly negative harmful effect, most commonly via speeding up AI development. See Paul Christianoâs reflections on the impact of Reinforcement Learning with Human Feedback as an example.
The counterfactuality and persistence of the impact â e.g., like you said for many of these, would this have happened (eventually) anyway?
How attributable that was to EA (and unfortunately in some cases, due to EA having a toxic brand in many places, itâs actually best if it is not that attributable to EA).
And last âDoes any of that matter? All of EAs impact â for better or worse â has been its influence on Anthropic.â
Yet, I think taken as a whole, I think EA has punched above its weight in many ways with respect to making AI go well. Itâs led to:
More and better staffed AI safety/âsecurity institutes
More and better staffed third-party evaluations, auditing, and science (METR, AVERI)
Large amounts of field-building that encourages talented people to work on making AI go well (MATS, BlueDot, 80k)
A significant amount of policy advocacy and public communications about AI risk.
Probably other examples, too.
A lot of the effort to make this happened relied on EA motivated people willing to take lower paid or less glamorous jobs.[1] While some specific organizationsâ or research or policy wins or public communications would have happened otherwise, but some wouldnât, and even still, happening earlier is still better.
I started out in EA caring about global health, and my first EA job was as a Researcher at GWWC. Even after becoming pretty convinced by AI risk and longtermism, I was still fairly sympathetic to concerns like âAI Safety alienating peopleâ. For instance, I was pretty against 80,000 Hours becoming explicitly focused on longtermism, and also pretty skeptical /â worried about its pivot last year into leaning even more into AI. Now, looking at just how fast AI progress is developing, how much there is still be done to make it go well, and how valuable (I think) EA has been to date, I think I got a lot of that wrong.
And of course, in some cases, they happened to get pretty well-paid jobs that ended up being fairly glamorous (even if they werenât in the beginning). I donât think that undermines the impact much. I donât really begrudge the quant finance folks who give >50% of their income to charities, even if theyâre still pretty rich at the end of the day.
Iâm not sure this addresses Henryâs critiques? In general, every bullet listed under âI think EA has punched above its weight in many ways with respect to making AI go wellâ is a proxy somewhere in the middle of the ToC chain while his comment is more end-of-ToC focused as heâs skeptical of the proxies actually being beneficial, and none of these bullets address the counterfactuality he brought up. In particular, and for instance, you mentioned the founding of Redwood Research as an example of EA making AI go well despite Henry explicitly being skeptical of its impact so far:
AI Safety organisations like MIRI an Redwood Research have been operating for 25 and 5 years respectively. As an outsider I coudnât point to any particular breakthrough theyâve made in AI alignment. Redwood seems to do some kinda interesting work on measuring rogue behaviour and creating checks. I dunno. Seems like any organisation trying to make a reliable AI product would be heavily incentivised to do this stuff regardless.
To be clear Iâm not taking sides or anything, Iâm just disheartened by what I perceive to be a lot of talking past each other between AIS advocates and skeptics on this forum, some of which seem easily preventable, like in this case.
Fair enough â I think I was trying to say something along the lines of âgoing through any specific example invites a lot of genuinely thorny and difficult questions about counterfactuality/âsign of impact/âattribution to EAâ (and again many of these are hard to discuss on a public forum) but I think zooming out, you can see EAs fingerprints in various important places. I think this leads to an overall common-sense perspective that EA has helped improve the situation.
Also, I agree I pointed to work in the middle of the ToC chain, but that seems kind of reasonable to me given that AI is currently not that powerful and not really that scary. AI hasnât yet been capable of causing a disaster, so itâs not really possible to have prevented one (yet).
On the specific example of Redwood Research is doing a lot of really valuable safety work. I think pioneering Control has been a fairly useful accomplishment, and I suspect if someone wanted to dig into the details, theyâd find that it was fairly counterfactual.
I guess as you disclaimed might be the case up front, I donât think these are the strongest or most informed examples of EAs impact on AI safety.
In many of cases of such impact, one can quibble about many things:
Whether that impact was clearly positive, or whether it had some kind of indirectly negative harmful effect, most commonly via speeding up AI development. See Paul Christianoâs reflections on the impact of Reinforcement Learning with Human Feedback as an example.
The counterfactuality and persistence of the impact â e.g., like you said for many of these, would this have happened (eventually) anyway?
How attributable that was to EA (and unfortunately in some cases, due to EA having a toxic brand in many places, itâs actually best if it is not that attributable to EA).
And last âDoes any of that matter? All of EAs impact â for better or worse â has been its influence on Anthropic.â
Yet, I think taken as a whole, I think EA has punched above its weight in many ways with respect to making AI go well. Itâs led to:
More and better staffed AI safety/âsecurity institutes
A richer non-profit ecosystem of safety research (like Truthful AI, FAR AI, Redwood Research, etc.)
More and better staffed third-party evaluations, auditing, and science (METR, AVERI)
Large amounts of field-building that encourages talented people to work on making AI go well (MATS, BlueDot, 80k)
A significant amount of policy advocacy and public communications about AI risk.
Probably other examples, too.
A lot of the effort to make this happened relied on EA motivated people willing to take lower paid or less glamorous jobs.[1] While some specific organizationsâ or research or policy wins or public communications would have happened otherwise, but some wouldnât, and even still, happening earlier is still better.
I started out in EA caring about global health, and my first EA job was as a Researcher at GWWC. Even after becoming pretty convinced by AI risk and longtermism, I was still fairly sympathetic to concerns like âAI Safety alienating peopleâ. For instance, I was pretty against 80,000 Hours becoming explicitly focused on longtermism, and also pretty skeptical /â worried about its pivot last year into leaning even more into AI. Now, looking at just how fast AI progress is developing, how much there is still be done to make it go well, and how valuable (I think) EA has been to date, I think I got a lot of that wrong.
And of course, in some cases, they happened to get pretty well-paid jobs that ended up being fairly glamorous (even if they werenât in the beginning). I donât think that undermines the impact much. I donât really begrudge the quant finance folks who give >50% of their income to charities, even if theyâre still pretty rich at the end of the day.
Iâm not sure this addresses Henryâs critiques? In general, every bullet listed under âI think EA has punched above its weight in many ways with respect to making AI go wellâ is a proxy somewhere in the middle of the ToC chain while his comment is more end-of-ToC focused as heâs skeptical of the proxies actually being beneficial, and none of these bullets address the counterfactuality he brought up. In particular, and for instance, you mentioned the founding of Redwood Research as an example of EA making AI go well despite Henry explicitly being skeptical of its impact so far:
To be clear Iâm not taking sides or anything, Iâm just disheartened by what I perceive to be a lot of talking past each other between AIS advocates and skeptics on this forum, some of which seem easily preventable, like in this case.
Fair enough â I think I was trying to say something along the lines of âgoing through any specific example invites a lot of genuinely thorny and difficult questions about counterfactuality/âsign of impact/âattribution to EAâ (and again many of these are hard to discuss on a public forum) but I think zooming out, you can see EAs fingerprints in various important places. I think this leads to an overall common-sense perspective that EA has helped improve the situation.
Also, I agree I pointed to work in the middle of the ToC chain, but that seems kind of reasonable to me given that AI is currently not that powerful and not really that scary. AI hasnât yet been capable of causing a disaster, so itâs not really possible to have prevented one (yet).
On the specific example of Redwood Research is doing a lot of really valuable safety work. I think pioneering Control has been a fairly useful accomplishment, and I suspect if someone wanted to dig into the details, theyâd find that it was fairly counterfactual.