Moloch is a poetic way of describing failures of coordination and coherence inside an agent or between agents and the generation of harmful subcomponents or harmful agents. Perhaps this could be decomposed further, or at least partially covered, by randomly generated accidents, Goodhart’s law failures, and conflicts of optimization. Let’s zoom in on one aspect, conflicts of optimization.
What are conflicts of optimization? They are situations where more than one criterion is being optimized for and in practice improving one criterion causes at least one other criterion to become less optimal.
When does this occur? It occurs when you cannot find a way to improve all optimization criteria at the same time. For instance, if you cannot produce both more swords and more shields because you only have a limited amount of iron then you have a conflict of optimization.
This can be described by the concept of Pareto optimality. If you’re at a Pareto optimal point then there is no way to improve all criteria and the set of all such points is called the Pareto frontier.
What this looks like is that if you’re near or on the Pareto frontier there are few to zero ways to improve all criteria and as you get further away there are more options for improving all criteria. Iteratively then you can imagine that around every point there are some known ways to move that may be visualized as vectors from that point. A sequence of such changes is then a sequence of movements along vectors. Generally what you’d expect (with caveats) is that the trajectory moves up and to the right until it hits the pareto optimal frontier and then skates along the frontier till one or the other optimization process wins or they are at equilibrium (relatedly).
An example of this dynamic is job negotiations done well. At first both parties are working towards finding changes that benefit them both but as time goes on such opportunities run out and the last parts of the negotiation proceed in a zero sum way (like perhaps salary).
In practice the Pareto frontier isn’t necessarily static because background variables may be changing in time. As long as the process of moving towards the frontier is much faster than the speed at which the frontier changes though we’d continue to expect again the motion of going towards the frontier and then skating along it.
Between the criteria this then translates essentially to a bunch of positive sum transformations far away from the Pareto frontier and then as you get closer to it transformations become less and less positive sum, until finally becoming zero sum (one can only win at the expense of the other losing). This has natural implications about how game theory actors relate as progress occurs when going towards a Pareto frontier.
Let’s now relate this to Moloch. Let the x axis be optimizing for humanity’s ultimate values, the y axis be optimizing for competitiveness (things like winning in politics, wars, persuasion, and profit making), the points represent the world’s state in terms of x and y, and the trajectory through the points be how the world develops. Given the above we’d expect that at first competitiveness and the accomplishment of humanity’s ultimate values are both improved but eventually they come apart and the trajectory skates along the Pareto frontier (that roughly speaking happens when we are at maximum technology or technological change becomes sufficiently slow) until it maximizes competitiveness.
This is one of Moloch’s tools: The movement towards competitive advantage over achieving humanity’s ultimate values because the set of transformations is constrained near the Pareto frontier.
Moloch and the Pareto optimal frontier
Moloch is a poetic way of describing failures of coordination and coherence inside an agent or between agents and the generation of harmful subcomponents or harmful agents. Perhaps this could be decomposed further, or at least partially covered, by randomly generated accidents, Goodhart’s law failures, and conflicts of optimization. Let’s zoom in on one aspect, conflicts of optimization.
What are conflicts of optimization? They are situations where more than one criterion is being optimized for and in practice improving one criterion causes at least one other criterion to become less optimal.
When does this occur? It occurs when you cannot find a way to improve all optimization criteria at the same time. For instance, if you cannot produce both more swords and more shields because you only have a limited amount of iron then you have a conflict of optimization.
This can be described by the concept of Pareto optimality. If you’re at a Pareto optimal point then there is no way to improve all criteria and the set of all such points is called the Pareto frontier.
What this looks like is that if you’re near or on the Pareto frontier there are few to zero ways to improve all criteria and as you get further away there are more options for improving all criteria. Iteratively then you can imagine that around every point there are some known ways to move that may be visualized as vectors from that point. A sequence of such changes is then a sequence of movements along vectors. Generally what you’d expect (with caveats) is that the trajectory moves up and to the right until it hits the pareto optimal frontier and then skates along the frontier till one or the other optimization process wins or they are at equilibrium (relatedly).
An example of this dynamic is job negotiations done well. At first both parties are working towards finding changes that benefit them both but as time goes on such opportunities run out and the last parts of the negotiation proceed in a zero sum way (like perhaps salary).
In practice the Pareto frontier isn’t necessarily static because background variables may be changing in time. As long as the process of moving towards the frontier is much faster than the speed at which the frontier changes though we’d continue to expect again the motion of going towards the frontier and then skating along it.
Between the criteria this then translates essentially to a bunch of positive sum transformations far away from the Pareto frontier and then as you get closer to it transformations become less and less positive sum, until finally becoming zero sum (one can only win at the expense of the other losing). This has natural implications about how game theory actors relate as progress occurs when going towards a Pareto frontier.
Let’s now relate this to Moloch. Let the x axis be optimizing for humanity’s ultimate values, the y axis be optimizing for competitiveness (things like winning in politics, wars, persuasion, and profit making), the points represent the world’s state in terms of x and y, and the trajectory through the points be how the world develops. Given the above we’d expect that at first competitiveness and the accomplishment of humanity’s ultimate values are both improved but eventually they come apart and the trajectory skates along the Pareto frontier (that roughly speaking happens when we are at maximum technology or technological change becomes sufficiently slow) until it maximizes competitiveness.
This is one of Moloch’s tools: The movement towards competitive advantage over achieving humanity’s ultimate values because the set of transformations is constrained near the Pareto frontier.