So, the tradeoff is something like 55% death spread over a five-year period vs no death for five years, for an eventual reward of reducing total chance of death (over a 10y period or whatever) from 89% to 82%.
Oh we disagree much more straightforwardly. I think the 89% should be going up, not down. That seems by far the most important disagreement.
(I thought you were saying that person-affecting views means that even if the 89% goes up that could still be a good trade.)
I still don’t know why you expect the 89% to go down instead of up given public advocacy. (And in particular I don’t see why optimism vs pessimism has anything to do with it.) My claim is that it should go up.
I was thinking of something like the scenario you describe as “Variant 2: In addition to this widespread pause, there is a tightly controlled and monitored government project aiming to build safe AGI.” It doesn’t necessarily have to be government-led, but maybe the government has talked to evals experts and demands a tight structure where large expenditures of compute always have to be approved by a specific body of safety evals experts.
But why do evals matter? What’s an example story where the evals prevent Molochian forces from leading to us not being in control? I’m just not seeing how this scenario intervenes on your threat model to make it not happen.
(It does introduce government bureaucracy, which all else equal reduces the number of actors, but there’s no reason to focus on safety evals if the theory of change is “introduce lots of bureaucracy to reduce number of actors”.)
Maybe I’m wrong: If the people who are closest to DC are optimistic that lawmakers would be willing to take ambitious measures soon enough
This seems like the wrong criterion. The question is whether this strategy is more likely to succeed than others. Your timelines are short enough that no ambitious measure is going to come into place fast enough if you aim to save ~all worlds.
But e.g. ambitious measures in ~5 years seems very doable (which seems like it is around your median, so still in time for half of worlds). We’re already seeing signs of life:
note the existence of the UK Frontier AI Taskforce and the people on it, as well as the intent bill SB 294 in California about “responsible scaling”
You could also ask people in DC; my prediction is they’d say something reasonably similar.
Oh we disagree much more straightforwardly. I think the 89% should be going up, not down. That seems by far the most important disagreement.
I think we agree (at least as far as my example model with the numbers was concerned). The way I meant it, 82% goes up to 89%.
(My numbers were confusing because I initially said 89% was my all-things-considered probability, but in this example model, I was giving 89% as the probability for a scenario where we take a (according to your view) suboptimal action. In the example model, 82% is the best chance we can get with optimal course of action, but it comes at the price of way higher risk of death in the first five years.)
In any case, my assumptions for this example model were:
(1) Public advocacy is the only way to install an ambitious pause soon enough to reduce risks that happen before 5 years.
(2) If it succeeds at the above, public advocacy will likely also come with negative side effects that increase the risks later on.
And I mainly wanted to point out how, from a person-affecting perspective, the difference between 82% and 89% isn’t necessarily huge, whereas getting 5 years of zero risk vs 5 years of 55% cumulative risks feels like something that could matter a lot.
But one can also discuss the validity of (1) and (2). It sounds like you don’t buy (1) at all. By contrast, I think (1) is plausible but I’m anot confident in my stance here, and you raise good points.
Regarding (2), I probably agree that if you achieve an ambitious pause via public advocacy and public pressure playing a large role, this makes some things harder later.
But why do evals matter? What’s an example story where the evals prevent Molochian forces from leading to us not being in control? I’m just not seeing how this scenario intervenes on your threat model to make it not happen.
Evals prevent the accidental creation of misaligned transformative AI by the project that’s authorized to go beyond the compute cap for safety research (if necessary; obviously they don’t have to go above the cap if the returns from alignment research are high enough at lower levels of compute).
Molochian forces are one part of my threat model, but I also think alignment difficulty is high and hard takeoff is more likely than soft takeoff. (Not all these components to my worldview are entirely independent. You could argue that being unusually concerned about Molochian forces and expecting high alignment difficulty is both produced by the same underlying sentiment. Arguably, most humans aren’t really aligned with human values, for Hansonian reasons, which we can think of as a subtype of Moloch problems. Likewise, if it’s already hard to align humans to human values, it’ll probably also be hard to align AIs to those values [or at least create an AI that is high-integrity friendly towards humans, while perhaps pursuing some of its own aims as well – I think that would be enough to generate a good outcome, so we don’t necessarily have to create AIs that care about nothing else besides human values].)
Taking your numbers at face value, and assuming that people have on average 40 years of life ahead of them (Google suggests median age is 30 and typical lifespan is 70-80), the pause gives an expected extra 2.75 years of life during the pause (delaying 55% chance of doom by 5 years) while removing an expected extra 2.1 years of life (7% of 30) later on. This looks like a win on current-people-only views, but it does seem sensitive to the numbers.
I’m not super sold on the numbers. Removing the full 55% is effectively assuming that the pause definitely happens and is effective—it neglects the possibility that advocacy succeeds enough to have the negative effects, but still fails to lead to a meaningful pause. I’m not sure how much probability I assign to that scenario but it’s not negligible, and it might be more than I assign to “advocacy succeeds and effective pause happens”.
It sounds like you don’t buy (1) at all.
I’d say it’s more like “I don’t see why we should believe (1) currently”. It could still be true. Maybe all the other methods really can’t work for some reason I’m not seeing, and that reason is overcome by public advocacy.
Oh we disagree much more straightforwardly. I think the 89% should be going up, not down. That seems by far the most important disagreement.
(I thought you were saying that person-affecting views means that even if the 89% goes up that could still be a good trade.)
I still don’t know why you expect the 89% to go down instead of up given public advocacy. (And in particular I don’t see why optimism vs pessimism has anything to do with it.) My claim is that it should go up.
But why do evals matter? What’s an example story where the evals prevent Molochian forces from leading to us not being in control? I’m just not seeing how this scenario intervenes on your threat model to make it not happen.
(It does introduce government bureaucracy, which all else equal reduces the number of actors, but there’s no reason to focus on safety evals if the theory of change is “introduce lots of bureaucracy to reduce number of actors”.)
This seems like the wrong criterion. The question is whether this strategy is more likely to succeed than others. Your timelines are short enough that no ambitious measure is going to come into place fast enough if you aim to save ~all worlds.
But e.g. ambitious measures in ~5 years seems very doable (which seems like it is around your median, so still in time for half of worlds). We’re already seeing signs of life:
You could also ask people in DC; my prediction is they’d say something reasonably similar.
I think we agree (at least as far as my example model with the numbers was concerned). The way I meant it, 82% goes up to 89%.
(My numbers were confusing because I initially said 89% was my all-things-considered probability, but in this example model, I was giving 89% as the probability for a scenario where we take a (according to your view) suboptimal action. In the example model, 82% is the best chance we can get with optimal course of action, but it comes at the price of way higher risk of death in the first five years.)
In any case, my assumptions for this example model were:
(1) Public advocacy is the only way to install an ambitious pause soon enough to reduce risks that happen before 5 years.
(2) If it succeeds at the above, public advocacy will likely also come with negative side effects that increase the risks later on.
And I mainly wanted to point out how, from a person-affecting perspective, the difference between 82% and 89% isn’t necessarily huge, whereas getting 5 years of zero risk vs 5 years of 55% cumulative risks feels like something that could matter a lot.
But one can also discuss the validity of (1) and (2). It sounds like you don’t buy (1) at all. By contrast, I think (1) is plausible but I’m anot confident in my stance here, and you raise good points.
Regarding (2), I probably agree that if you achieve an ambitious pause via public advocacy and public pressure playing a large role, this makes some things harder later.
Evals prevent the accidental creation of misaligned transformative AI by the project that’s authorized to go beyond the compute cap for safety research (if necessary; obviously they don’t have to go above the cap if the returns from alignment research are high enough at lower levels of compute).
Molochian forces are one part of my threat model, but I also think alignment difficulty is high and hard takeoff is more likely than soft takeoff. (Not all these components to my worldview are entirely independent. You could argue that being unusually concerned about Molochian forces and expecting high alignment difficulty is both produced by the same underlying sentiment. Arguably, most humans aren’t really aligned with human values, for Hansonian reasons, which we can think of as a subtype of Moloch problems. Likewise, if it’s already hard to align humans to human values, it’ll probably also be hard to align AIs to those values [or at least create an AI that is high-integrity friendly towards humans, while perhaps pursuing some of its own aims as well – I think that would be enough to generate a good outcome, so we don’t necessarily have to create AIs that care about nothing else besides human values].)
Oops, sorry for the misunderstanding.
Taking your numbers at face value, and assuming that people have on average 40 years of life ahead of them (Google suggests median age is 30 and typical lifespan is 70-80), the pause gives an expected extra 2.75 years of life during the pause (delaying 55% chance of doom by 5 years) while removing an expected extra 2.1 years of life (7% of 30) later on. This looks like a win on current-people-only views, but it does seem sensitive to the numbers.
I’m not super sold on the numbers. Removing the full 55% is effectively assuming that the pause definitely happens and is effective—it neglects the possibility that advocacy succeeds enough to have the negative effects, but still fails to lead to a meaningful pause. I’m not sure how much probability I assign to that scenario but it’s not negligible, and it might be more than I assign to “advocacy succeeds and effective pause happens”.
I’d say it’s more like “I don’t see why we should believe (1) currently”. It could still be true. Maybe all the other methods really can’t work for some reason I’m not seeing, and that reason is overcome by public advocacy.