(Note: I’m so far very in favor of work on AI safety. This question isn’t intended to oppose work on AI safety, but to better understand it and its implications.)
(Edit: The point of this question is also to brainstorm some possible harms of AI safety and see if any of these can produce practical considerations to keep in mind for the development of AI safety.)
Is there any content that investigates the harms that could come from AI safety? I’ve so far only found the scattered comments listed below. All types of harm are relevant, but I think I most had in mind harm that could come from AI safety work going as intended as opposed to the opposite (an example of the opposite: it being misrepresented, de-legitimized as a result, and it then being neglected in a way that causes harm). In a sense, the latter seems much less surprising because the final mechanism of harm is still what proponents of AI safety are concerned about (chiefly, unaligned AI). Here, I’m a bit more interested in “surprising” ways the work could cause harm.
(This is not my area of expertise, although part of my relative disinterest in AI safety so far is because I’m neither convinced AI safety work is doing much at all, nor that it’s doing more good than harm. I’m more sympathetic to the cooperation side of AI safety, since there’s a stronger argument for it from a suffering-focused perspective.)
I mentioned a few more risks in this comment, for which I used AI safety work as an example for the problem of cluelessness for longtermist interventions:
Also from the same comment, and a concern for any work affecting extinction risks:
There’s a potential for concerns (even real concerns) about AI safety to increase the cost of AI research, to the point that relatively attainable and extremely wealth generating AI technologies simply don’t get developed because of the barriers put in place in front of their development. Even if they still get developed, AI safety concerns can certainly slow down that development. Whether that’s a good thing or not depends on both the potential dangers of AI and the potential benefits.
Another related issue is that while AI presents risks, it can also help us to deal with other risks. To the extent that AI safety research slows down the development of AI at all, it contributes to the other risks that AI could help us to mitigate. If AI can help us develop vaccines to prevent the next pandemic, failing to get AI developed before the next pandemic puts us at greater risk, for example.
Or, to sum up in other words: opportunity costs.
To the extent that you are concerned about intrinsically-multipolar negative outcomes (that is, failure modes which are limited to multipolar scenarios), AI safety which helps only to narrowly align individual automated services with their owners could help to accelerate such dangers.
Critch recently outlined this sort of concern well.
A classic which I personally consider to be related is Meditations on Moloch
[slightly lazy response] You may be interested in some of the sources linked to from the following pages:
accidental harm
information hazard
differential progress
Some of the sources list ways AI safety work specifically could be harmful, whereas other list more general types of / pathways to accidental harm which also happen to be relevant to AI safety work.
(Overall, I think a lot of AI safety work is very valuable, and people shouldn’t let somewhat generic worries about accidental harm strongly push them away from doing AI safety work, but that it’s also good to be aware of some accidental harm pathways, get feedback from sensible people before making big moves, etc. Obviously that sentence is fairly vague! But the above links can provide more details.)
This is not on direct harm, but if AI risks are exaggerated to a degree that the worst scenarios are not even possible, then a lot of EA talent might be wasted.
For those who are skeptical about AI skepticism may be interested in reading Magnus Vinding’s “Why Altruists Should Perhaps Not Prioritize Artificial Intelligence: A Lengthy Critique”.
Ok so maybe my idea is just nonsense but I think we could come up with super smart humans who could then understand what AI is doing. Like, genetically engineer them, or put a machine in their brain to make them supersmart humans. So, someone who is working on AI safety research isn’t working on how to enhance humans like this, and maybe they miss out on that opportunity, which causes relative (though not absolute) harm.
[Main takeaway: to some degree, this might increase the expected value of making AI safety measures performant.]
One I thought of:
Consider the forces pushing for untethered, rapid, maximal AI development and those pushing for controlled, safe, possibly slower AI development. If something happens in the future such that the “safety forces” become much stronger than the “development forces”—this could be due to some AI accident that causes significant harm, generates a lot of public attention, and leads to regulations being imposed on the development of AI—this could make AI development slower or mean that AI doesn’t get as advanced as it otherwise would. I haven’t read too much on the case for economic growth improving welfare, but if those arguments are true and the above scenario significantly reduces economic growth, and thus, welfare, then this could be one avenue for harm.
There are some caveats to this scenario:
If AI safety work goes really well, then it may not hinder AI development or performance. I’m not yet very knowledgeable on the field of AI safety, but from what I’ve heard, making AI safety measures performant is an area of active consideration (and possibly work) in the field. If development / performance isn’t hindered and economic growth is thus unaffected, then the above scenario isn’t cause for harm.
This scenario and line of reasoning rely on the harm from stunted economic growth outweighing the benefit of having safer AI. This is a very questionable assumption.