Thank you for that substantive response, I really appreciate it! It was also very nice that you mentioned the Turner et.al definitions, I wasnāt expecting that.
(Maybe write a post on that? Thereās a comment that mentions uptake from major players in the EA ecosystem and maybe if you acknowledge you understand the arguments they would be more sympathetic? Just a quick thought but it might be worth engaging there a bit more?)
I just wanted to clarify some of the points I was trying to make yesterday as I do realise that they didnāt all get across as I wanted them to.
I completely agree with you on the advancing progress point, I personally am quite against it from a āgeneralā-level, I do not believe that we will be able to counterfactually change the ārowingā speed that much in the grand scheme of things. I also believe that is the conclusion of Tobyās posts if I remember correctly. Toby was rather stating that existential risk reduction is worth a lot compared to any progress that we might be able to make. āSteeringā away from the bad stuff is worth more. (Thatās the implicit claim from the modelling even though heās as epistemically humble as you philosophers always are (which is commendable!).)
Now for the power-seeking stuff. I appreciate your careful reasoning about these things and I see what you mean in that thereās no threat model from that claim in itself. If we say that the classical way it is construed is something that is equivalent to minimizing free energy, this is a tautological statement and doesnāt help for existential risk.
I think I can agree with you that weāre not clear enough about the existential risk angle to have a clearly defined goal for what to do. I do think thereās an argument there but that we have to be quite clear with how weāre defining it for it to make foundational sense. A question that arises is if in the process of working on it we get more clarity about what it fundamentally is, similar to a startup figuring out what theyāre doing along the way? It might still be worth the resources from a unknown unknown perspective and institutional practices shifting perspective if that makes sense? TAI is such a big thing and it will only happen once so spending those resources on relatively shaky foundations might still make sense?
Iām, however, not sure that this is the case and Wei Dai for example has an entire agenda about āmetaphilosophyā where the claim is that weāre too philosophically confused to make sense of alignment. In general, I would agree that ensuring the philosophical and mathematical basis is very important to coordinate the field and it is something Iāve been thinking about for a while.
I personally am trying to import ideas from existing fields that deal with generally intelligent agents in biology and cognitive science such as Active Inference and Computational Biology into the mix to see how TAI will affect society. If we see smaller branches of science as specific offshoots of philosophy then I think the places with the most rigorous thinking on the foundations are the ones that have dealt with it for a long time. Iāve found a lot of interesting models about misalignment in these areas that I think can be transported into the AI Safety frame.
I really appreciate the deconstructive approach that you have to the intellectual foundations of the field. I do believe that there are alternatives to the classic risk story but you have to some extent break down the flaws in the existing arguments in order to advocate for new arguments.
Finally, where I think these threat models come from are arguments similar to the ones in What Failure Looks Like from Paul Christiano and the going out with a wimper idea. This is also explored in Yuval Noah Harariās books Nexus and Homo Deus. This threat model is more similar to the authoritian capture idea compared to something like a runaway intelligence explosion.
Iām looking forward to more work in this area from you!
Thank you for that substantive response, I really appreciate it! It was also very nice that you mentioned the Turner et.al definitions, I wasnāt expecting that.
(Maybe write a post on that? Thereās a comment that mentions uptake from major players in the EA ecosystem and maybe if you acknowledge you understand the arguments they would be more sympathetic? Just a quick thought but it might be worth engaging there a bit more?)
I just wanted to clarify some of the points I was trying to make yesterday as I do realise that they didnāt all get across as I wanted them to.
I completely agree with you on the advancing progress point, I personally am quite against it from a āgeneralā-level, I do not believe that we will be able to counterfactually change the ārowingā speed that much in the grand scheme of things. I also believe that is the conclusion of Tobyās posts if I remember correctly. Toby was rather stating that existential risk reduction is worth a lot compared to any progress that we might be able to make. āSteeringā away from the bad stuff is worth more. (Thatās the implicit claim from the modelling even though heās as epistemically humble as you philosophers always are (which is commendable!).)
Now for the power-seeking stuff. I appreciate your careful reasoning about these things and I see what you mean in that thereās no threat model from that claim in itself. If we say that the classical way it is construed is something that is equivalent to minimizing free energy, this is a tautological statement and doesnāt help for existential risk.
I think I can agree with you that weāre not clear enough about the existential risk angle to have a clearly defined goal for what to do. I do think thereās an argument there but that we have to be quite clear with how weāre defining it for it to make foundational sense. A question that arises is if in the process of working on it we get more clarity about what it fundamentally is, similar to a startup figuring out what theyāre doing along the way? It might still be worth the resources from a unknown unknown perspective and institutional practices shifting perspective if that makes sense? TAI is such a big thing and it will only happen once so spending those resources on relatively shaky foundations might still make sense?
Iām, however, not sure that this is the case and Wei Dai for example has an entire agenda about āmetaphilosophyā where the claim is that weāre too philosophically confused to make sense of alignment. In general, I would agree that ensuring the philosophical and mathematical basis is very important to coordinate the field and it is something Iāve been thinking about for a while.
I personally am trying to import ideas from existing fields that deal with generally intelligent agents in biology and cognitive science such as Active Inference and Computational Biology into the mix to see how TAI will affect society. If we see smaller branches of science as specific offshoots of philosophy then I think the places with the most rigorous thinking on the foundations are the ones that have dealt with it for a long time. Iāve found a lot of interesting models about misalignment in these areas that I think can be transported into the AI Safety frame.
I really appreciate the deconstructive approach that you have to the intellectual foundations of the field. I do believe that there are alternatives to the classic risk story but you have to some extent break down the flaws in the existing arguments in order to advocate for new arguments.
Finally, where I think these threat models come from are arguments similar to the ones in What Failure Looks Like from Paul Christiano and the going out with a wimper idea. This is also explored in Yuval Noah Harariās books Nexus and Homo Deus. This threat model is more similar to the authoritian capture idea compared to something like a runaway intelligence explosion.
Iām looking forward to more work in this area from you!