Thanks so much, this was unusually clearly written, with a small percentage of technicality a global health chump like me couldn’t understand, but I still could understand most of it. Please write more!
My initial reaction is, let’s assume you are right and Alignment is nowhere near as difficult as Yudkowsky claims.
This might not be relevant to your point that alignment might not be so hard, but it seemed like your arguments assume that the people making the AI are shooting for alignment, not misalignment.
For example your comment As far as I can tell, the answer is: don’t reward your AIs for taking bad actions.
What if someone does decide t reward it for that? Then do your optimistic arguments still hold? Maybe this is outside the scope of your points!
Thanks so much, this was unusually clearly written, with a small percentage of technicality a global health chump like me couldn’t understand, but I still could understand most of it. Please write more!
My initial reaction is, let’s assume you are right and Alignment is nowhere near as difficult as Yudkowsky claims.
This might not be relevant to your point that alignment might not be so hard, but it seemed like your arguments assume that the people making the AI are shooting for alignment, not misalignment.
For example your comment As far as I can tell, the answer is: don’t reward your AIs for taking bad actions.
What if someone does decide t reward it for that? Then do your optimistic arguments still hold? Maybe this is outside the scope of your points!