Hmm, it seems to me (and you can correct me) that we should be able to agree that there are SOME technical AGI safety research publications that are positive under some plausible beliefs/values and harmless under all plausible beliefs/values, and then we don’t have to talk about cluelessness and tradeoffs, we can just publish them.
And we both agree that there are OTHER technical AGI safety research publications that are positive under some plausible beliefs/values and negative under others. And then we should talk about your portfolios etc. Or more simply, on a case-by-case basis, we can go looking for narrowly-tailored approaches to modifying the publication in order to remove the downside risks while maintaining the upside.
I feel like we’re arguing past each other: I keep saying the first category exists, and you keep saying the second category exists. We should just agree that both categories exist! :-)
Perhaps the more substantive disagreement is what fraction of the work is in which category. I see most but not all ongoing technical work as being in the first category, and I think you see almost all ongoing technical work as being in the second category. (I think you agreed that “publishing an analysis about what happens if a cosmic ray flips a bit” goes in the first category.)
(Luke says “AI-related” but my impression is that he mostly works on AGI governance not technical, and the link is definitely about governance not technical. I would not be at all surprised if proposed governance-related projects were much more heavily weighted towards the second category, and am only saying that technical safety research is mostly first-category.)
For example, if you didn’t really care about s-risks, then publishing a useful considerations for those who are concerned about s-risks might take attention away from your own priorities, or it might increase cooperation, and the default position to me should be deep uncertainty/cluelessness here, not that it’s good in expectation or bad in expectation or 0 in expectation.
This points to another (possible?) disagreement. I think maybe you have the attitude where (to caricature somewhat) if there’s any downside risk whatsoever, no matter how minor or far-fetched, you immediately jump to “I’m clueless!”. Whereas I’m much more willing to say: OK, I mean, if you do anything at all there’s a “downside risk” in a sense, just because life is uncertain, who knows what will happen, but that’s not a good reason to let just sit on the sidelines and let nature take its course and hope for the best. If I have a project whose first-order effect is a clear and specific and strong upside opportunity, I don’t want to throw that project out unless there’s a comparably clear and specific and strong downside risk. (And of course we are obligated to try hard to brainstorm what such a risk might be.) Like if a firefighter is trying to put out a fire, and they aim their hose at the burning interior wall, they don’t stop and think, “Well I don’t know what will happen if the wall gets wet, anything could happen, so I’ll just not pour water on the fire, y’know, don’t want to mess things up.”
The “cluelessness” intuition gets its force from having a strong and compelling upside story weighed against a strong and compelling downside story, I think.
If the first-order effect of a project is “directly mitigating an important known s-risk”, and the second-order effects of the same project are “I dunno, it’s a complicated world, anything could happen”, then I say we should absolutely do that project.
Perhaps the more substantive disagreement is what fraction of the work is in which category. I see most but not all ongoing technical work as being in the first category, and I think you see almost all ongoing technical work as being in the second category. (I think you agreed that “publishing an analysis about what happens if a cosmic ray flips a bit” goes in the first category.)
Ya, I think this is the crux. Also, considerations like the cosmic ray flips a bit tend to force a lot of things into the second category when they otherwise wouldn’t have been, although I’m not specifically worried about cosmic ray bit flips, since they seems sufficiently unlikely and easy to avoid.
(Luke says “AI-related” but my impression is that he mostly works on AGI governance not technical, and the link is definitely about governance not technical. I would not be at all surprised if proposed governance-related projects were much more heavily weighted towards the second category, and am only saying that technical safety research is mostly first-category.)
(Fair.)
The “cluelessness” intuition gets its force from having a strong and compelling upside story weighed against a strong and compelling downside story, I think.
This is actually what I’m thinking is happening, though (not like the firefighter example), but we aren’t really talking much about the specifics. There might indeed be specific cases where I agree that we shouldn’t be clueless if we worked through them, but I think there are important potential tradeoffs between incidental and agential s-risks, between s-risks and other existential risks, even between the same kinds of s-risks, etc., and there is a ton of uncertainty in the expected harm from these risks, so much that it’s inappropriate to use a single distribution (without sensitivity analysis to “reasonable” distributions, and with this sensitivity analysis, things look ambiguous), similar to this example, and we’re talking about “sweetening” one side or the other i, but that’s totally swamped by our uncertainty.
If the first-order effect of a project is “directly mitigating an important known s-risk”, and the second-order effects of the same project are “I dunno, it’s a complicated world, anything could happen”, then I say we should absolutely do that project.
What I have in mind is more symmetric in upsides and downsides (or at least, I’m interested in hearing why people think it isn’t in practice), and I don’t really distinguish between effects by order*. My post points out potential reasons that I actually think could dominate. The standard I’m aiming for is “Could a reasonable person disagree?”, and I default to believing a reasonable person could disagree when I point out such tradeoffs until we actually carefully work through them in detail and it turns out it’s pretty unreasonable to disagree.
*Although thinking more about it now, I suppose longer chains are more fragile and likely to have unaccounted for effects going in the opposite direction, so maybe we ought to give them less weight, and maybe this solves the issue if we did this formally? I think ignoring higher-order effects is formally irrational using vNM rationality or stochastic dominance, although maybe fine in practice, if what we’re actually doing is just an approximation of giving them far less weight with a skeptical prior and then they actually just get dominated completely by more direct effects.
I don’t really distinguish between effects by order*
I agree that direct and indirect effects of an action are fundamentally equally important (in this kind of outcome-focused context) and I hadn’t intended to imply otherwise.
Hmm, it seems to me (and you can correct me) that we should be able to agree that there are SOME technical AGI safety research publications that are positive under some plausible beliefs/values and harmless under all plausible beliefs/values, and then we don’t have to talk about cluelessness and tradeoffs, we can just publish them.
And we both agree that there are OTHER technical AGI safety research publications that are positive under some plausible beliefs/values and negative under others. And then we should talk about your portfolios etc. Or more simply, on a case-by-case basis, we can go looking for narrowly-tailored approaches to modifying the publication in order to remove the downside risks while maintaining the upside.
I feel like we’re arguing past each other: I keep saying the first category exists, and you keep saying the second category exists. We should just agree that both categories exist! :-)
Perhaps the more substantive disagreement is what fraction of the work is in which category. I see most but not all ongoing technical work as being in the first category, and I think you see almost all ongoing technical work as being in the second category. (I think you agreed that “publishing an analysis about what happens if a cosmic ray flips a bit” goes in the first category.)
(Luke says “AI-related” but my impression is that he mostly works on AGI governance not technical, and the link is definitely about governance not technical. I would not be at all surprised if proposed governance-related projects were much more heavily weighted towards the second category, and am only saying that technical safety research is mostly first-category.)
This points to another (possible?) disagreement. I think maybe you have the attitude where (to caricature somewhat) if there’s any downside risk whatsoever, no matter how minor or far-fetched, you immediately jump to “I’m clueless!”. Whereas I’m much more willing to say: OK, I mean, if you do anything at all there’s a “downside risk” in a sense, just because life is uncertain, who knows what will happen, but that’s not a good reason to let just sit on the sidelines and let nature take its course and hope for the best. If I have a project whose first-order effect is a clear and specific and strong upside opportunity, I don’t want to throw that project out unless there’s a comparably clear and specific and strong downside risk. (And of course we are obligated to try hard to brainstorm what such a risk might be.) Like if a firefighter is trying to put out a fire, and they aim their hose at the burning interior wall, they don’t stop and think, “Well I don’t know what will happen if the wall gets wet, anything could happen, so I’ll just not pour water on the fire, y’know, don’t want to mess things up.”
The “cluelessness” intuition gets its force from having a strong and compelling upside story weighed against a strong and compelling downside story, I think.
If the first-order effect of a project is “directly mitigating an important known s-risk”, and the second-order effects of the same project are “I dunno, it’s a complicated world, anything could happen”, then I say we should absolutely do that project.
Ya, I think this is the crux. Also, considerations like the cosmic ray flips a bit tend to force a lot of things into the second category when they otherwise wouldn’t have been, although I’m not specifically worried about cosmic ray bit flips, since they seems sufficiently unlikely and easy to avoid.
(Fair.)
This is actually what I’m thinking is happening, though (not like the firefighter example), but we aren’t really talking much about the specifics. There might indeed be specific cases where I agree that we shouldn’t be clueless if we worked through them, but I think there are important potential tradeoffs between incidental and agential s-risks, between s-risks and other existential risks, even between the same kinds of s-risks, etc., and there is a ton of uncertainty in the expected harm from these risks, so much that it’s inappropriate to use a single distribution (without sensitivity analysis to “reasonable” distributions, and with this sensitivity analysis, things look ambiguous), similar to this example, and we’re talking about “sweetening” one side or the other i, but that’s totally swamped by our uncertainty.
What I have in mind is more symmetric in upsides and downsides (or at least, I’m interested in hearing why people think it isn’t in practice), and I don’t really distinguish between effects by order*. My post points out potential reasons that I actually think could dominate. The standard I’m aiming for is “Could a reasonable person disagree?”, and I default to believing a reasonable person could disagree when I point out such tradeoffs until we actually carefully work through them in detail and it turns out it’s pretty unreasonable to disagree.
*Although thinking more about it now, I suppose longer chains are more fragile and likely to have unaccounted for effects going in the opposite direction, so maybe we ought to give them less weight, and maybe this solves the issue if we did this formally? I think ignoring higher-order effects is formally irrational using vNM rationality or stochastic dominance, although maybe fine in practice, if what we’re actually doing is just an approximation of giving them far less weight with a skeptical prior and then they actually just get dominated completely by more direct effects.
I agree that direct and indirect effects of an action are fundamentally equally important (in this kind of outcome-focused context) and I hadn’t intended to imply otherwise.