Thanks! I’ve commented on your post. I think you are assuming that major unsolved problems in alignment (reward hacking, corrigibility, inner alignment, outer alignment) are just somehow magically solved (it reads as though you are unaware of what the major problems are in AI alignment, sorry).
Thanks! I’ve commented on your post. I think you are assuming that major unsolved problems in alignment (reward hacking, corrigibility, inner alignment, outer alignment) are just somehow magically solved (it reads as though you are unaware of what the major problems are in AI alignment, sorry).