fergusq

Karma: 44

Reasons for my negative feelings towards the AI risk discussion

fergusq1 Sep 2022 7:33 UTC

41 points

9 comments4 min readEA link

fergusq 1 Sep 2022 15:53 UTC
4 points
0 ∶ 0
in reply to: Mau’s comment on: Reasons for my negative feelings towards the AI risk discussion
Thank you for these references, I’ll take a close look on them. I’ll write a new comment if I have any thoughts after going through them.
Before having read them, I want to say that I’m interested in research about risk estimation and AI progress forecasting. General research about possible AI risks without assigning them any probabilities is not very useful in determining if a threat is relevant. If anyone has papers specifically on that topic, I’m very interested in reading them too.

fergusq 1 Sep 2022 15:47 UTC
2 points
0 ∶ 0
in reply to: Devin Kalish’s comment on: Reasons for my negative feelings towards the AI risk discussion
I do agree that there is some risk, and it’s certainly worth some thought and research. However, in the EA context, the cause areas should have effective interventions. Due to all this uncertainty, AI risk seems a very low-priority cause, since we cannot be sure if the research and other projects funded have any real impact. It would seem more beneficial to use the money for interventions that have been proved effective. That is why I think that EA is a wrong platform for AI risk discussion.

fergusq 5 Sep 2022 9:01 UTC
1 point
0 ∶ 0
on: Explaining AI Misalignment with Stable Diffusion
While the AI did not understand your instructions, I don’t think this is same as value misalignment.
If we imagine a superhuman AI, I don’t think the problem will be that it doesn’t understand the instructions. The problem will be that it doesn’t care about the instructions. An ASI would most probably understand what humans want it to do and even pretend that it follows the instructions in order to reach it’s misaligned goals. If it didn’t understand what the humans want, it wouldn’t be superhuman.
Stable Diffusion is just an imperfect model. It cannot transform your request perfectly to the vector space, so it creates an approximation that loses information. So it’s certainly a non-general, non-superhuman AI. It doesn’t have any misaligned goals, in fact, it’s uncertain if we can say it has its own goals at all, not counting the prompt.