Having given this a bit more thought, I think the starting point for something like this might be to generalize and assume the ASI just has “different” interests (we don’t know what those interests are right now both because we don’t know how ASI will be developed and because we haven’t solved alignment yet), and then also to assume that the ASI has just enough power to make it interesting to model (not because this assumption is realistic, but because if the ASI was too weak or too strong relative to humans, the modeling exercise would be uninformative).
I don’t know where to go from here, however. Maybe Buterin’s def/acc world that I linked in my earlier comment would be a good scenario to start with.
Having given this a bit more thought, I think the starting point for something like this might be to generalize and assume the ASI just has “different” interests (we don’t know what those interests are right now both because we don’t know how ASI will be developed and because we haven’t solved alignment yet), and then also to assume that the ASI has just enough power to make it interesting to model (not because this assumption is realistic, but because if the ASI was too weak or too strong relative to humans, the modeling exercise would be uninformative).
I don’t know where to go from here, however. Maybe Buterin’s def/acc world that I linked in my earlier comment would be a good scenario to start with.