A presumption in favor of human values over unaligned AI values for some reasons that aren’t based on strict impartial utilitarian arguments. These could include the beliefs that: (1) Humans are more likely to have “interesting” values compared to AIs, and (2) Humans are more likely to be motivated by moral arguments than AIs, and are more likely to reach a deliberative equilibrium of something like “ideal moral values” compared to AIs.
I don’t think this is a crux. Even if you prefer unaligned AI values over likely human values (weighted by power), you’d probably prefer doing research on further improving AI values over speeding things up.
I don’t think this is a crux. Even if you prefer unaligned AI values over likely human values (weighted by power), you’d probably prefer doing research on further improving AI values over speeding things up.