Thank you for these references, I’ll take a close look on them. I’ll write a new comment if I have any thoughts after going through them.
Before having read them, I want to say that I’m interested in research about risk estimation and AI progress forecasting. General research about possible AI risks without assigning them any probabilities is not very useful in determining if a threat is relevant. If anyone has papers specifically on that topic, I’m very interested in reading them too.
While the AI did not understand your instructions, I don’t think this is same as value misalignment.
If we imagine a superhuman AI, I don’t think the problem will be that it doesn’t understand the instructions. The problem will be that it doesn’t care about the instructions. An ASI would most probably understand what humans want it to do and even pretend that it follows the instructions in order to reach it’s misaligned goals. If it didn’t understand what the humans want, it wouldn’t be superhuman.
Stable Diffusion is just an imperfect model. It cannot transform your request perfectly to the vector space, so it creates an approximation that loses information. So it’s certainly a non-general, non-superhuman AI. It doesn’t have any misaligned goals, in fact, it’s uncertain if we can say it has its own goals at all, not counting the prompt.