I’m excited to see you posting this. My views are very closely agreed with yours. I summarised my views a few days ago here.
One of the most important similarities is that we both emphasise the importance of decision-making and supporting it with institutions. This could be seen as “enactivist” view on agent (human, AI, hybrid, team/organisation) cognition.
The biggest difference between our views is that I think the “cognitivist” agenda (i.e., agent internals and algorithms) is as important as the “enactivist” agenda (institutions), whereas you seem to almost disregard the “cognitivist” agenda.
Try to constrain, delay, or obstruct AI, in order to reduce risk, mitigate negative impacts, or give us more time to solve essential issues. This includes, for example, trying to make sure AIs aren’t able to take certain actions (i.e. ensure they are controlled).
I disagree with putting risk-detection/mitigation mechanisms, algorithms, monitorings in that bucket. I think we should just separate between engineering (cf. A plea for solutionism on AI safety) and non-engineering (policy, legislature, treaties, commitments, advocacy) approaches. In particular, the “scheming control” agenda that you link will be concrete engineering practice that should be used in the training of safe AI models in the future, even if we have good institutions, good decision-making algorithms wrapped on top of these AI models, etc. It’s not an “alternative path” just for “non-AI-dominated worlds”. The same applies ftoor monitoring, interpretability, evals, etc. processes. All of these will require very elaborate engineering on their own.
I 100% agree with your reasoning about Frames 1 and 2. I want to discuss the following point in detail because it’s a rare view in EA/LW circles:
It (IMO) wrongly imagines that the risk of coups comes primarily from the personal values of actors within the system, rather than institutional, cultural, or legal factors.
In my post, I also made a similar point: “aligning LLMs with human values” is hardly a part of [the problem of context alignment] at all”. But my framing was in general not very clear, so I’d try to improve it and integrate it with your take here:
Context alignment is a pervasive process that happens (and sometimes needed) on all timescales: evolutionary, developmental, and online (the examples of the latter in humans: understanding, empathy, rapport). The skill of context alignment is extremely important and should be practiced often by all kinds of agents in their interactions (and therefore we should build this skill into AIs), but it’s not something that we should “iron out once and for all”. That would be neither possible (agents’ contexts are constantly diverging from each other), nor desirable: the (partial) misalignment is also important, it’s the source of diversity that enables the evolution[1]. Institutions (norms, legal systems, etc.) are critical for channelling and controlling this misalignment so that it’s optimally productive and doesn’t pose excessive risk (though some risk is unavoidable: that’s the essence of misalignment!).
Flexible yet resilient legal and social structures that can adapt to changing conditions without collapsing
Good incentives for agents within the system, e.g. the economic value of trade is mostly internalized
Rafael Kaufmann and I have a take on this in our Gaia Network vision. Gaia Network’s term for internalised economic value of trade is subjective value. The unit of subjective accounting is called FER. Trade with FER induces flow that defines the intersubjective value, i.e., the “exchange rates” of “subjective FERs”. See the post for more details.
While sharing some features of the other two frames, the focus is instead on the institutions that foster AI development, rather than micro-features of AIs, such as their values
As I mentioned in the beginning, I think you are too dismissive of the “cognitivist” perspective. We shouldn’t paint all “micro-features of AIs” with the same brush. I agree that value alignment is over-emphasized[2], but other engineering mechanisms and algorithms, such as decision-making algorithms, “scheming control” procedures, context alignment algorithms, as well as architectural features: namely being world-model-based[3] and being amenable to computational proofs[4]are very important and couldn’t be recovered on the institutional/interface/protocol level. We demonstrated in the post about Gaia Network above that for for the “value economy” to work as intended, agents should make decisions based on maximum entropy rather than maximum likelihood estimates[5] and they should share and compose their world models (even if in a privacy-preserving way with zero-knowledge computations).
Indeed, this observation makes evident that the refrain question “AI should be aligned with whom?” doesn’t and shouldn’t have a satisfactory answer if “alignment” is meant to be “totalising value alignment as often conceptualised on LessWrong”; on the other hand, if “alignment” is meant to be context alignment as a practice, the question becomes as non-sensical (in the general form) as the question “AI should interact with whom?”—well, with someone, depending on the situation, in the way and to the degree appropriate!
However, still not completely irrelevant, at least for practical reasons: having shared values on the pre-training/hard-coded/verifiable level, as a minimum, reduces transaction costs because the AI agents shouldn’t then painstakingly “eval” each other’s values before doing any business together.
Which is just another way of saying that they should minimise their (expected) free energy in their model updates/inferences and the course of their actions.
I’m excited to see you posting this. My views are very closely agreed with yours. I summarised my views a few days ago here.
One of the most important similarities is that we both emphasise the importance of decision-making and supporting it with institutions. This could be seen as “enactivist” view on agent (human, AI, hybrid, team/organisation) cognition.
The biggest difference between our views is that I think the “cognitivist” agenda (i.e., agent internals and algorithms) is as important as the “enactivist” agenda (institutions), whereas you seem to almost disregard the “cognitivist” agenda.
I disagree with putting risk-detection/mitigation mechanisms, algorithms, monitorings in that bucket. I think we should just separate between engineering (cf. A plea for solutionism on AI safety) and non-engineering (policy, legislature, treaties, commitments, advocacy) approaches. In particular, the “scheming control” agenda that you link will be concrete engineering practice that should be used in the training of safe AI models in the future, even if we have good institutions, good decision-making algorithms wrapped on top of these AI models, etc. It’s not an “alternative path” just for “non-AI-dominated worlds”. The same applies ftoor monitoring, interpretability, evals, etc. processes. All of these will require very elaborate engineering on their own.
I 100% agree with your reasoning about Frames 1 and 2. I want to discuss the following point in detail because it’s a rare view in EA/LW circles:
In my post, I also made a similar point: “aligning LLMs with human values” is hardly a part of [the problem of context alignment] at all”. But my framing was in general not very clear, so I’d try to improve it and integrate it with your take here:
Context alignment is a pervasive process that happens (and sometimes needed) on all timescales: evolutionary, developmental, and online (the examples of the latter in humans: understanding, empathy, rapport). The skill of context alignment is extremely important and should be practiced often by all kinds of agents in their interactions (and therefore we should build this skill into AIs), but it’s not something that we should “iron out once and for all”. That would be neither possible (agents’ contexts are constantly diverging from each other), nor desirable: the (partial) misalignment is also important, it’s the source of diversity that enables the evolution[1]. Institutions (norms, legal systems, etc.) are critical for channelling and controlling this misalignment so that it’s optimally productive and doesn’t pose excessive risk (though some risk is unavoidable: that’s the essence of misalignment!).
This is interesting. I’ve also discussed this issue as “morphological intelligence of socioeconomies” just a few day ago :)
Rafael Kaufmann and I have a take on this in our Gaia Network vision. Gaia Network’s term for internalised economic value of trade is subjective value. The unit of subjective accounting is called FER. Trade with FER induces flow that defines the intersubjective value, i.e., the “exchange rates” of “subjective FERs”. See the post for more details.
As I mentioned in the beginning, I think you are too dismissive of the “cognitivist” perspective. We shouldn’t paint all “micro-features of AIs” with the same brush. I agree that value alignment is over-emphasized[2], but other engineering mechanisms and algorithms, such as decision-making algorithms, “scheming control” procedures, context alignment algorithms, as well as architectural features: namely being world-model-based[3] and being amenable to computational proofs[4] are very important and couldn’t be recovered on the institutional/interface/protocol level. We demonstrated in the post about Gaia Network above that for for the “value economy” to work as intended, agents should make decisions based on maximum entropy rather than maximum likelihood estimates[5] and they should share and compose their world models (even if in a privacy-preserving way with zero-knowledge computations).
Indeed, this observation makes evident that the refrain question “AI should be aligned with whom?” doesn’t and shouldn’t have a satisfactory answer if “alignment” is meant to be “totalising value alignment as often conceptualised on LessWrong”; on the other hand, if “alignment” is meant to be context alignment as a practice, the question becomes as non-sensical (in the general form) as the question “AI should interact with whom?”—well, with someone, depending on the situation, in the way and to the degree appropriate!
However, still not completely irrelevant, at least for practical reasons: having shared values on the pre-training/hard-coded/verifiable level, as a minimum, reduces transaction costs because the AI agents shouldn’t then painstakingly “eval” each other’s values before doing any business together.
Both Bengio and LeCun argue for this: see “Scaling in the service of reasoning & model-based ML” (Bengio and Hu, 2023) and “A Path Towards Autonomous Machine Intelligence” (LeCun, 2022).
See “Provably safe systems: the only path to controllable AGI” (Tegmark and Omohundro, 2023).
Which is just another way of saying that they should minimise their (expected) free energy in their model updates/inferences and the course of their actions.