titotal comments on Counting arguments provide no evidence for AI doom

titotal 28 Feb 2024 10:30 UTC
7 points
4 ∶ 0
Great post! I think the mixing up of “colloquial” type goals with “fanatical utility function maximization” type goals is a key flaw in a lot of x-risk arguments. I think the first thing could extend to some mild scheming, but is unlikely to extend to “kill everyone and tile the universe with paperclips”.
I really don’t get the “simplicity” arguments for fanatical maximising behaviour. When you consider subgoals, it seems that secretly plotting to take over the world will obviously be much more complicated? Do you have any idea how much computing power and subgoals it takes to try and conquer the entire planet?
I don’t buy the story that an AI starts with the simple “goal” of “maximise paperclips”, then gets yelled at for demolishing a homeless shelter to expand the factory, and then updates to a goal of “maximise paperclips in the long term, by hiding your intentions and conducting a secret world domination plot”. Why not update to “make lots of paperclips, but don’t try any galaxy brained shit”? It seems simpler and less computationally expensive.
- Matthew_Barnett 28 Feb 2024 19:05 UTC
  11 points
  5 ∶ 2
  Parent
  I really don’t get the “simplicity” arguments for fanatical maximising behaviour. When you consider subgoals, it seems that secretly plotting to take over the world will obviously be much more complicated? Do you have any idea how much computing power and subgoals it takes to try and conquer the entire planet?
  I think this is underspecified because
  1. The hard part of taking over the whole planet is being able to execute a strategy that actually works in a world with other agents (who are themselves vying for power), rather than the compute or complexity cost of having the subgoal of taking over the world
  2. The difficulty of taking over the world depends on the level of technology, among other factors. For example, taking over the world in the year 1000 AD was arguably impossible because you just couldn’t manage an empire that large. Taking over the world in 2024 is perhaps more feasible, since we’re already globalized, but it’s still essentially an ~impossible task.
  My best guess is that if some agent “takes over the world” in the future, it will look more like “being elected president of Earth” rather than “secretly plotted to release a nanoweapon at a precise time, killing everyone else simultaneously”. That’s because in the latter scenario, by the time some agent has access to super-destructive nanoweapons, the rest of the world likely has access to similarly-powerful technology, including potential defenses to these nanoweapons (or their own nanoweapons that they can threaten you with).