the AI safety research seems unlikely to have strong enough negative unexpected consequences to outweigh the positive ones in expectation.
The word “unexpected” sort of makes that sentence trivially true. If we remove it, I’m not sure the sentence is true. [EDIT: while writing this I misinterpreted the sentence as: “AI safety research seems unlikely to end up causing more harm than good”] Some of the things to consider (written quickly, plausibly contains errors, not a complete list):
The AIS field (and the competition between AIS researchers) can cause decision makers to have a false sense of safety. It can be the case that it’s not feasible to solve AI alignment in a competitive way without strong coordination etc. But researchers are biased towards saying good things about the field, their colleagues and their (potential) employers. AIS researchers can make more people be more inclined to pursue capabilities research (which can contribute to race dynamics). Here’s Alexander Berger:
[Michael Nielsen] has tweeted about how he thinks one of the biggest impacts of EA concerns with AI x-risk was to cause the creation of DeepMind and OpenAI, and to accelerate overall AI progress. I’m not saying that he’s necessarily right, and I’m not saying that that is clearly bad from an existential risk perspective, I’m just saying that strikes me as a way in which well-meaning increasing salience and awareness of risks could have turned out to be harmful in a way that has not been… I haven’t seen that get a lot of grappling or attention from the EA community. I think you could tell obvious parallels around how talking a lot about biorisk could turn out to be a really bad idea.
And here’s the CEO of Conjecture (59:50) [EDIT: this is from 2020, probably before Conjecture was created]:
If you’re a really good machine learning engineer, consider working for OpenAI consider working for DeepMind, or someone else with good safety teams.
AIS work can “patch” small scale problems that might otherwise make our civilization better at avoiding some existential catastrophes. Here’s Nick Bostrom:
On the one hand, small-scale catastrophes might create an immune response that makes us better, puts in place better safeguards, and stuff like that, that could protect us from the big stuff. If we’re thinking about medium-scale catastrophes that could cause civilizational collapse, large by ordinary standards but only medium-scale in comparison to existential catastrophes, which are large in this context, again, it is not totally obvious what the sign of that is: there’s a lot more work to be done to try to figure that out. If recovery looks very likely, you might then have guesses as to whether the recovered civilization would be more likely to avoid existential catastrophe having gone through this experience or not.
The AIS field (and the competition between AIS researchers) can cause dissemination of info hazards. If a researcher thinks they came up with an impressive insight they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line. Here’s Alexander Berger again:
I think if you have the opposite perspective and think we live in a really vulnerable world — maybe an offense-biased world where it’s much easier to do great harm than to protect against it — I think that increasing attention to anthropogenic risks could be really dangerous in that world. Because I think not very many people, as we discussed, go around thinking about the vast future.
If one in every 1,000 people who go around thinking about the vast future decide, “Wow, I would really hate for there to be a vast future; I would like to end it,” and if it’s just 1,000 times easier to end it than to stop it from being ended, that could be a really, really dangerous recipe where again, everybody’s well intentioned, we’re raising attention to these risks that we should reduce, but the increasing salience of it could have been net negative.
… AI safety research seems unlikely to have strong enough negative unexpected consequences to outweigh the positive ones in expectation.
to
… Still, it’s possible that there will be a strong enough flow of negative (unforseen) consequences to outweigh the positives. We should take these seriously, and try to make them less unforseen so we can correct for them, or at least have more accurate expected-value estimates. But given what’s at stake, they would need to be pretty darn negative to pull down the expected values enough to outweigh a non-trivial risk of extinction.
Hey there!
The word “unexpected” sort of makes that sentence trivially true. If we remove it, I’m not sure the sentence is true. [EDIT: while writing this I misinterpreted the sentence as: “AI safety research seems unlikely to end up causing more harm than good”] Some of the things to consider (written quickly, plausibly contains errors, not a complete list):
The AIS field (and the competition between AIS researchers) can cause decision makers to have a false sense of safety. It can be the case that it’s not feasible to solve AI alignment in a competitive way without strong coordination etc. But researchers are biased towards saying good things about the field, their colleagues and their (potential) employers. AIS researchers can make more people be more inclined to pursue capabilities research (which can contribute to race dynamics). Here’s Alexander Berger:
And here’s the CEO of Conjecture (59:50) [EDIT: this is from 2020, probably before Conjecture was created]:
AIS work can “patch” small scale problems that might otherwise make our civilization better at avoiding some existential catastrophes. Here’s Nick Bostrom:
The AIS field (and the competition between AIS researchers) can cause dissemination of info hazards. If a researcher thinks they came up with an impressive insight they will probably be biased towards publishing it, even if it may draw attention to potentially dangerous information. Their career capital, future compensation and status may be on the line. Here’s Alexander Berger again:
Also, low quality research or poor discussion can make it less likely that important decision makers will take AI safety seriously.
Important point. I changed
to
I added an EDIT block in the first paragraph after quoting you (I’ve misinterpreted your sentence).