I sort of wonder if some people in the AI communityâany maybe you, from what youâve said here? -- are using precise probabilities to get to the conclusion that you want to work primarily on AI stuff, and then spotlighting to that cause area when youâre analyzing at the level of interventions.
I think someone using precise probabilities all the way down is building a lot more explicit models every time they consider a specific intervention. Like if youâre contemplating running a fellowship program for AI interested people, and you have animals in your moral circle, youâre going to have to build this botec that includes the probability an X% of the people you bring into the fellowship are not going to care about animals and likely, if they get a policy role, to pass policies that are really bad for them. And all sorts of things like that. So your output would be a bunch of hypotheses about exactly how these fellows are going to benefit AI policy, and some precise probabilities about how those policy benefits are going to help people, and possibly animals to what degree, etc.
I sort of suspect that only a handful of people are trying to do this, and I get why! I made a reasonably straightforward botec for calculating the benefits to birds of bird-safe glass, that accounted for backfire to birds, and it took a lot of research effort. If you asked me how bird-safe glass policy is going to affect AI risk after all that, I might throw my computer at you. But I think the precise probabilities approach would imply that I should.
Re:
It might be interesting to move out of high-level reason zone entirely and just look at the interventions, e.g. directly compare the robustness of installing bird-safe glass in a building vs. something like developing new technical techniques to help us avoid losing control of AIs.
Iâm definitely interested in robustness comparisons but not always sure how they would work, especially given uncertainty about what robustness means. I suspect some of these things will hinge on how optimistic you are about the value of life. I think the animal community attracts a lot more folks who are skeptical about humans being good stewards of the world, and so are less convinced that a rogue AI would be worse in expectation (and even folks who are skeptical that extinction would be bad). So I worry AI folks would view âpreserving the value of the futureâ as extremely obviously positive by default, and that (at least some) animal folks wouldnât, and that would end up being the crux about whether these interventions are in fact robust. But perhaps you could still have interesting discussions among folks who are aligned on certain premises.
Re:
What would the justification standards in wild animal welfare say about uncertainty-laden decisions that involve neither AI nor animals: e.g. as a government, deciding which policies to enact, or as a US citizen, deciding who to vote for President?
Yeah, I think this is a feeling that the folks working on bracketing are trying to capture: that in quotidian decision-making contexts, we generally use the factors we arenât clueless about (@Anthony DiGiovanniâI think I recall a bracketing piece explicitly making a comparison to day-to-day decision making, but now canât find it⌠so correct me if Iâm wrong!). So Iâm interested to see how that progresses.
I suspect though, that people generally just donât think about justification that much. In the case of WAW-tractability-skeptics, Iâd guess some large percentage are likely more driven by the (not unreasonable at first glance) intuition that messing around in nature is risky. The problem of course is that all of life is just messing around in nature, so thereâs no avoiding it.
What would the justification standards in wild animal welfare say about uncertainty-laden decisions that involve neither AI nor animals: e.g. as a government, deciding which policies to enact, or as a US citizen, deciding who to vote for President?
Yeah, I think this is a feeling that the folks working on bracketing are trying to capture: that in quotidian decision-making contexts, we generally use the factors we arenât clueless about (@Anthony DiGiovanniâI think I recall a bracketing piece explicitly making a comparison to day-to-day decision making, but now canât find it⌠so correct me if Iâm wrong!). So Iâm interested to see how that progresses.
I think the vast majority of people making decisions about public policy or who to vote for either arenât ethically impartial, or theyâre âspotlightingâ, as you put it. I expect the kind of bracketing Iâd endorse upon reflection to look pretty different from such decision-making.
That said, maybe youâre thinking of this point I mentioned to you on a call: I think even if someone is purely self-interested (say), they plausibly should be clueless about their actionsâ impact on their expected lifetime welfare, because of strange post-AGI scenarios (or possible afterlives, simulation hypotheses, etc.).[1] See this paper. So it seems like the justification for basic prudential decision-making might have to rely on something like bracketing, as far as I can tell. Even if itâs not the formal theory of bracketing given here. (I have a draft about this on the backburner, happy to share if interested.)
I used to be skeptical of this claim, for the reasons argued in this comment. I like the âimpartial goodness is freaking weirdâ intuition pump for cluelessness given in the comment. But Iâve come around to thinking âtime-impartial goodness, even for a single moral patient who might live into the singularity, is freaking weirdâ.
I think the vast majority of people making decisions about public policy or who to vote for either arenât ethically impartial, or theyâre âspotlightingâ, as you put it. I expect the kind of bracketing Iâd endorse upon reflection to look pretty different from such decision-making.
But suppose I want to know who of two candidates to vote for, and Iâd like to incorporate impartial ethics into that decision. What do I do then?
That said, maybe youâre thinking of this point I mentioned to you on a call
Hmm, I donât recall this; another Eli perhaps? : )
@Eli Roseđ¸ I think Anthony is referring to a call he and I had :)
@Anthony DiGiovanni I think I meant more like there was a justification of the basic intuition bracketing is trying to capture as being similar to how someone might make decisions in their life, where we may also be clueless about many of the effects of moving home or taking a new job, but still move forward. But I could be misremembering! Just read your comment more carefully and I think youâre right that this conversation is what I was thinking of.
Like if youâre contemplating running a fellowship program for AI interested people, and you have animals in your moral circle, youâre going to have to build this botec that includes the probability an X% of the people you bring into the fellowship are not going to care about animals and likely, if they get a policy role, to pass policies that are really bad for them...
...I sort of suspect that only a handful of people are trying to do this, and I get why! I made a reasonably straightforward botec for calculating the benefits to birds of bird-safe glass, that accounted for backfire to birds, and it took a lot of research effort. If you asked me how bird-safe glass policy is going to affect AI risk after all that, I might throw my computer at you. But I think the precise probabilities approach would imply that I should.
Just purely on the descriptive level and not the normative one â
I agree but even more strongly: in AI safety Iâve basically never seen a BOTEC this detailed. I think Eric Neymanâs BOTEC of the cost-effectiveness of donating to congressional candidate Alex Bores is a good public example of the type of analysis common in EA-driven AI safety work: it bottoms out in pretty general goods like âgovernment action on AI safetyâ and does not try to model second-order effects to the degree described here. It doesnât model even considerations like âwhat if AI safety legislation is passed, but that legislation backfires by increasing polarization on the issue?â let alone anything about animals.
Instead, this kind of strategic discussion tends to be qualitative, and is hashed out in huge blocks of prose and comment threads e.g. on LessWrong, or verbally.
I sort of wonder if some people in the AI communityâany maybe you, from what youâve said here? -- are using precise probabilities to get to the conclusion that you want to work primarily on AI stuff, and then spotlighting to that cause area when youâre analyzing at the level of interventions.
I see why you describe it this way, and this directionally this seems right. But, what we do doesnât really sound like âspotlightingâ as you describe it in the post: focusing on specific moral patient groups and explicitly setting aside others.
Essentially I think the epistemic framework we use is just more anarchic and freeform than that! In AIS discourse, it feels like âbut this intervention could slow down the US relative to Chinaâ or âbut this intervention could backfire by increasing polarizationâ or âbut this intervention could be bad for animalsâ exist at the same epistemic level, and all are considered valid points to raise.
(I do think that there is a significant body of orthodox AI safety thought which takes particular stances on each of these issues and other issues, which in a lot of contexts likely makes various points feel like not âvalidâ to raise. I think this is unfortunate.)
Maybe itâs similar to the difference between philosophy and experimental science, where in philosophy a lot of discourse is fundamentally unstructured and qualitative, and in the experimental sciences there is much more structure because any contribution needs to be an empirical experiment, and there are specific norms and formats for those, which have certain implications for how second-order effects are or arenât considered. AI safety discourse also feels similar at times to wonk-ish policy discourse.
(Within certain well-scoped sub-areas of AI safety things are less epistemically anarchic; e.g. research into AI interpretability usually needs empirical results if itâs to be taken seriously.)
I think someone using precise probabilities all the way down is building a lot more explicit models every time they consider a specific intervention. Like if youâre contemplating running a fellowship program for AI interested people, and you have animals in your moral circle, youâre going to have to build this botec that includes the probability an X% of the people you bring into the fellowship are not going to care about animals and likely, if they get a policy role, to pass policies that are really bad for them. And all sorts of things like that. So your output would be a bunch of hypotheses about exactly how these fellows are going to benefit AI policy, and some precise probabilities about how those policy benefits are going to help people, and possibly animals to what degree, etc.
Hmm, I wouldnât agree that someone using precise probabilities âall the way downâ is necessarily building these kind of explicit models. I wonder if the term âprecise probabilitiesâ is being understood differently in our two areas.
In the Bayesian epistemic style that EA x AI safety has, itâs felt that anyone can attach precise probabilities to their beliefs with ~no additional thought, and that these probabilities are subjective things which may not be backed by any kind of explicit or even externally legible model. Thereâs a huge focus on probabilities as betting odds, and betting odds donât require such things (diverging notably from how probabilities are used in science).
I mean, I think typically people have something to say to justify their beliefs, but this can be & often is something as high-level as âit seems good if AGI companies are required to be more transparent about their safety practices,â with little in the way of explicit models about downstream effects thereof.[1]
Apologies for not responding to some of the other threads in your post, ran out of time; looking forward to discussing in person sometime.
While itâs common for AI safety people to agree with my statement about transparency here, some may flatly disagree (i.e. disagree about sign), and others (more commonly) may disagree massively about the magnitude of the effect. There are many verbal arguments but relatively few explicit models to adjudicate these disputes.
All very interesting, and yes letâs talk more later!
One quick thing: Sorry my comment was unclearâwhen I said âprecise probabilitiesâ I meant the overall approach, which amounts to trying to quantify everything about an intervention when deciding its cost effectiveness (perhaps the post was also unclear).
I think most people in EA/âAW spaces use the general term âprecise probabilitiesâ the same way youâre describing, but perhaps there is on average a tendency toward the more scientific style of needing more specific evidence for those numbers. That wasnât necessarily true of early actors in the WAW space and I think it had some mildly unfortunate consequences.
But this makes me realize I should not have named the approach that way in the original post, and should have called it something like the âquantify as much as possibleâ approach. I think that approach requires using precise probabilitiesâsince if you allow imprecise ones you end up with a lot of things being indeterminateâbut thereâs more to it than just endorsing precise probabilities over imprecise ones (at least as Iâve seen it appear in WAW).
Thanks Eli!
I sort of wonder if some people in the AI communityâany maybe you, from what youâve said here? -- are using precise probabilities to get to the conclusion that you want to work primarily on AI stuff, and then spotlighting to that cause area when youâre analyzing at the level of interventions.
I think someone using precise probabilities all the way down is building a lot more explicit models every time they consider a specific intervention. Like if youâre contemplating running a fellowship program for AI interested people, and you have animals in your moral circle, youâre going to have to build this botec that includes the probability an X% of the people you bring into the fellowship are not going to care about animals and likely, if they get a policy role, to pass policies that are really bad for them. And all sorts of things like that. So your output would be a bunch of hypotheses about exactly how these fellows are going to benefit AI policy, and some precise probabilities about how those policy benefits are going to help people, and possibly animals to what degree, etc.
I sort of suspect that only a handful of people are trying to do this, and I get why! I made a reasonably straightforward botec for calculating the benefits to birds of bird-safe glass, that accounted for backfire to birds, and it took a lot of research effort. If you asked me how bird-safe glass policy is going to affect AI risk after all that, I might throw my computer at you. But I think the precise probabilities approach would imply that I should.
Re:
Iâm definitely interested in robustness comparisons but not always sure how they would work, especially given uncertainty about what robustness means. I suspect some of these things will hinge on how optimistic you are about the value of life. I think the animal community attracts a lot more folks who are skeptical about humans being good stewards of the world, and so are less convinced that a rogue AI would be worse in expectation (and even folks who are skeptical that extinction would be bad). So I worry AI folks would view âpreserving the value of the futureâ as extremely obviously positive by default, and that (at least some) animal folks wouldnât, and that would end up being the crux about whether these interventions are in fact robust. But perhaps you could still have interesting discussions among folks who are aligned on certain premises.
Re:
Yeah, I think this is a feeling that the folks working on bracketing are trying to capture: that in quotidian decision-making contexts, we generally use the factors we arenât clueless about (@Anthony DiGiovanniâI think I recall a bracketing piece explicitly making a comparison to day-to-day decision making, but now canât find it⌠so correct me if Iâm wrong!). So Iâm interested to see how that progresses.
I suspect though, that people generally just donât think about justification that much. In the case of WAW-tractability-skeptics, Iâd guess some large percentage are likely more driven by the (not unreasonable at first glance) intuition that messing around in nature is risky. The problem of course is that all of life is just messing around in nature, so thereâs no avoiding it.
I think the vast majority of people making decisions about public policy or who to vote for either arenât ethically impartial, or theyâre âspotlightingâ, as you put it. I expect the kind of bracketing Iâd endorse upon reflection to look pretty different from such decision-making.
That said, maybe youâre thinking of this point I mentioned to you on a call: I think even if someone is purely self-interested (say), they plausibly should be clueless about their actionsâ impact on their expected lifetime welfare, because of strange post-AGI scenarios (or possible afterlives, simulation hypotheses, etc.).[1] See this paper. So it seems like the justification for basic prudential decision-making might have to rely on something like bracketing, as far as I can tell. Even if itâs not the formal theory of bracketing given here. (I have a draft about this on the backburner, happy to share if interested.)
I used to be skeptical of this claim, for the reasons argued in this comment. I like the âimpartial goodness is freaking weirdâ intuition pump for cluelessness given in the comment. But Iâve come around to thinking âtime-impartial goodness, even for a single moral patient who might live into the singularity, is freaking weirdâ.
But suppose I want to know who of two candidates to vote for, and Iâd like to incorporate impartial ethics into that decision. What do I do then?
Hmm, I donât recall this; another Eli perhaps? : )
@Eli Roseđ¸ I think Anthony is referring to a call he and I had :)
@Anthony DiGiovanni
I think I meant more like there was a justification of the basic intuition bracketing is trying to capture as being similar to how someone might make decisions in their life, where we may also be clueless about many of the effects of moving home or taking a new job, but still move forward. But I could be misremembering!Just read your comment more carefully and I think youâre right that this conversation is what I was thinking of.Oh woops didnât look at parent comment, haah
Just purely on the descriptive level and not the normative one â
I agree but even more strongly: in AI safety Iâve basically never seen a BOTEC this detailed. I think Eric Neymanâs BOTEC of the cost-effectiveness of donating to congressional candidate Alex Bores is a good public example of the type of analysis common in EA-driven AI safety work: it bottoms out in pretty general goods like âgovernment action on AI safetyâ and does not try to model second-order effects to the degree described here. It doesnât model even considerations like âwhat if AI safety legislation is passed, but that legislation backfires by increasing polarization on the issue?â let alone anything about animals.
Instead, this kind of strategic discussion tends to be qualitative, and is hashed out in huge blocks of prose and comment threads e.g. on LessWrong, or verbally.
I see why you describe it this way, and this directionally this seems right. But, what we do doesnât really sound like âspotlightingâ as you describe it in the post: focusing on specific moral patient groups and explicitly setting aside others.
Essentially I think the epistemic framework we use is just more anarchic and freeform than that! In AIS discourse, it feels like âbut this intervention could slow down the US relative to Chinaâ or âbut this intervention could backfire by increasing polarizationâ or âbut this intervention could be bad for animalsâ exist at the same epistemic level, and all are considered valid points to raise.
(I do think that there is a significant body of orthodox AI safety thought which takes particular stances on each of these issues and other issues, which in a lot of contexts likely makes various points feel like not âvalidâ to raise. I think this is unfortunate.)
Maybe itâs similar to the difference between philosophy and experimental science, where in philosophy a lot of discourse is fundamentally unstructured and qualitative, and in the experimental sciences there is much more structure because any contribution needs to be an empirical experiment, and there are specific norms and formats for those, which have certain implications for how second-order effects are or arenât considered. AI safety discourse also feels similar at times to wonk-ish policy discourse.
(Within certain well-scoped sub-areas of AI safety things are less epistemically anarchic; e.g. research into AI interpretability usually needs empirical results if itâs to be taken seriously.)
Hmm, I wouldnât agree that someone using precise probabilities âall the way downâ is necessarily building these kind of explicit models. I wonder if the term âprecise probabilitiesâ is being understood differently in our two areas.
In the Bayesian epistemic style that EA x AI safety has, itâs felt that anyone can attach precise probabilities to their beliefs with ~no additional thought, and that these probabilities are subjective things which may not be backed by any kind of explicit or even externally legible model. Thereâs a huge focus on probabilities as betting odds, and betting odds donât require such things (diverging notably from how probabilities are used in science).
I mean, I think typically people have something to say to justify their beliefs, but this can be & often is something as high-level as âit seems good if AGI companies are required to be more transparent about their safety practices,â with little in the way of explicit models about downstream effects thereof.[1]
Apologies for not responding to some of the other threads in your post, ran out of time; looking forward to discussing in person sometime.
While itâs common for AI safety people to agree with my statement about transparency here, some may flatly disagree (i.e. disagree about sign), and others (more commonly) may disagree massively about the magnitude of the effect. There are many verbal arguments but relatively few explicit models to adjudicate these disputes.
All very interesting, and yes letâs talk more later!
One quick thing: Sorry my comment was unclearâwhen I said âprecise probabilitiesâ I meant the overall approach, which amounts to trying to quantify everything about an intervention when deciding its cost effectiveness (perhaps the post was also unclear).
I think most people in EA/âAW spaces use the general term âprecise probabilitiesâ the same way youâre describing, but perhaps there is on average a tendency toward the more scientific style of needing more specific evidence for those numbers. That wasnât necessarily true of early actors in the WAW space and I think it had some mildly unfortunate consequences.
But this makes me realize I should not have named the approach that way in the original post, and should have called it something like the âquantify as much as possibleâ approach. I think that approach requires using precise probabilitiesâsince if you allow imprecise ones you end up with a lot of things being indeterminateâbut thereâs more to it than just endorsing precise probabilities over imprecise ones (at least as Iâve seen it appear in WAW).