For low probability of other civilizations, see https://arxiv.org/abs/1806.02404.
Humans don’t have obviously formalized goals. But you can formalize human motivation, in which case our final goal is going to be abstract and multifaceted, and it is probably going to be include a very very broad sense of well-being. The model applies just fine.
Because it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute. The proof is trivial, it comes from the very stipulation of what a goal is in the first place. It is just a framework for describing an agent. Now, with this framework, humans’ final goals happen to be complex and difficult to discern, and maybe AI goals will be like that too. But we tend to think that AI goals will not be like that. Omohundro argues some economic reasons in his paper on the “basic AI drives”, but also, it just seems clear that you can program an AI with a particular goal function and that will be all there is to it.
Yes, AI may end up with very different interpretations of its given goal but that seems to be one of the core issues in the value alignment problem that Bostrom is worried about, no?
A lot of baggage goes into the selection of a threshold for “highly accurate” or “ensured safe” or statements of that sort. The idea is that early safety work helps even though it won’t get you a guarantee. I don’t see any good reason to believe AI safety to be any more or less tractable than preemptive safety for any other technology, it just happens to have greater stakes. You’re right that the track record doesn’t look great; however I really haven’t seen any strong reason to believe that preemptive safety is generally ineffective—it seems like it just isn’t tried much.