Hey Andreas! Thanks for writing this up, it was a really interesting read and I’m glad you shared it!
Some quick rambling thoughts after reading:
I think some of the distinctions might be semantic—some of what you describe would fall under misuse risk/malicious use, which could indeed be a real problem (If an AI is causing harm because it its values are aligned with a malicious human, is it aligned or misaligned overall? I’m not sure, but the human alignment problem seems to be the issue here) - and I’m not sure how to weight that against the risk of unaligned AI. I think given that we are nowhere close to solving the alignment problem, people tend to assume that if we have AGI, it will be misaligned “by definition”. In terms of s-risks, I would really recommend checking out the work of CLR, as they seem to be the ones who spent most time thinking about s-risks. I think they also have a course on s-risks coming up sometime!
Awesome thank for the links and thoughts. I have actually been debating applying to the s risk fellowship, but with your mention finally applied.
Agreed that the big picture falls under the human alignment / malicious human use. It’s likely the area which has been more researched historically, and I need to delve deeper into it. I’ve been putting off being more involved in LessWrong, but I will now make an account there as well with your highlight.
Hey Andreas! Thanks for writing this up, it was a really interesting read and I’m glad you shared it!
Some quick rambling thoughts after reading:
I think some of the distinctions might be semantic—some of what you describe would fall under misuse risk/malicious use, which could indeed be a real problem (If an AI is causing harm because it its values are aligned with a malicious human, is it aligned or misaligned overall? I’m not sure, but the human alignment problem seems to be the issue here) - and I’m not sure how to weight that against the risk of unaligned AI. I think given that we are nowhere close to solving the alignment problem, people tend to assume that if we have AGI, it will be misaligned “by definition”. In terms of s-risks, I would really recommend checking out the work of CLR, as they seem to be the ones who spent most time thinking about s-risks. I think they also have a course on s-risks coming up sometime!
Awesome thank for the links and thoughts. I have actually been debating applying to the s risk fellowship, but with your mention finally applied.
Agreed that the big picture falls under the human alignment / malicious human use. It’s likely the area which has been more researched historically, and I need to delve deeper into it. I’ve been putting off being more involved in LessWrong, but I will now make an account there as well with your highlight.
Thank you