Thanks titotal for taking the time to dig deep into our model and write up your thoughts, it’s much appreciated. This comment speaks for Daniel Kokotajlo and me, not necessarily any of the other authors on the timelines forecast or AI 2027. It addresses most but not all of titotal’s post.
Overall view: titotal pointed out a few mistakes and communication issues which we will mostly fix. We are therefore going to give titotal a $500 bounty to represent our appreciation. However, we continue to disagree on the core points regarding whether the model’s takeaways are valid and whether it was reasonable to publish a model with this level of polish. We think titotal’s critiques aren’t strong enough to overturn the core conclusion that superhuman coders by 2027 are a serious possibility, nor to significantly move our overall median (edit: I now think it’s plausible that changes made as a result of titotal’s critique will move our median significantly). Moreover, we continue to think that AI 2027’s timelines forecast is (unfortunately) the world’s state-of-the-art, and challenge others to do better. If instead of surpassing us, people simply want to offer us critiques, that’s helpful too; we hope to surpass ourselves every year in part by incorporating and responding to such critiques.
Clarification regarding the updated model
My apologies about quietly updating the timelines forecast with an update without announcing it; we are aiming to announce it soon. I’m glad that titotal was able to see it.
A few clarifications:
titotal says “it predicts years longer timescales than the AI2027 short story anyway.” While the medians are indeed 2029 and 2030, the models still give ~25-40% to superhuman coders by the end of 2027.
Other team members (e.g. Daniel K) haven’t reviewed the updated model in depth, and have not integrated it into their overall views. Daniel is planning to do this soon, and will publish a blog post about it when he does.
Most important disagreements
I’ll let titotal correct us if we misrepresent them on any of this.
Whether to estimate and model dynamics for which we don’t have empirical data. e.g. titotal says there is “very little empirical validation of the model,” and especially criticizes the modeling of superexponentiality as having no empirical backing. We agree that it would be great to have more empirical validation of more of the model components, but unfortunately that’s not feasible at the moment while incorporating all of the highly relevant factors.[1]
Whether to adjust our estimates based on factors outside the data. For example, titotal criticizes us for making judgmental forecasts for the date of RE-Bench saturation, rather than plugging in the logistic fit. I’m strongly in favor of allowing intuitive adjustments on top of quantitative modeling when estimating parameters.
[Unsure about level of disagreement] The value of a “least bad” timelines model. While the model is certainly imperfect due to limited time and the inherent difficulties around forecasting AGI timelines, we still think overall it’s the “least bad” timelines model out there and it’s the model that features most prominently in my overall timelines views. I think titotal disagrees, though I’m not sure which one they consider least bad (perhaps METR’s simpler one in their time horizon paper?). But even if titotal agreed that ours was “least bad,” my sense is that they might still be much more negative on it than us. Some reasons I’m excited about publishing a least bad model:
Reasoning transparency. We wanted to justify the timelines in AI 2027, given limited time. We think it’s valuable to be transparent about where our estimates come from even if the modeling is flawed in significant ways. Additionally, it allows others like titotal to critique it.
Advancing the state of the art. Even if a model is flawed, it seems best to publish to inform others’ opinions and to allow others to build on top of it.
The likelihood of time horizon growth being superexponential, before accounting for AI R&D automation. See this section for our arguments in favor of superexponentiallity being plausible, and titotal’s responses (I put it at 45% in our original model). This comment thread has further discussion. If you are very confident in no inherent superexponentiality, superhuman coders by end of 2027 become significantly less likely, though are still >10% if you agree with the rest of our modeling choices (see here for a side-by-side graph generated from my latest model).
How strongly superexponential the progress would be. This section argues that our choice of superexponential function is arbitrary. While we agree that the choice is fairly arbitrary and ideally we would have uncertainty over the best function, my intuition is that titotal’s proposed alternative curve feels less plausible than the one we use in the report, conditional on some level of superexponentiality.
Whether the argument for superexponentiality is stronger at higher time horizons. titotal is confused about why there would sometimes be a delayed superexponential rather than starting at the simulation starting point. The reasoning here is that the conceptual argument for superexponentiality is much stronger at higher time horizons (e.g. going from 100 to 1,000 years feels likely much easier than going from 1 to 10 days, while it’s less clear for 1 to 10 weeks vs. 1 to 10 days). It’s unclear that the delayed superexponential is the exact right way to model that, but it’s what I came up with for now.
Other disagreements
Intermediate speedups: Unfortunately we haven’t had the chance to dig deeply into this section of titotal’s critique, and it’s mostly based on the original version of the model rather than the updated one so we probably will not get to this. The speedup from including AI R&D automation seems pretty reasonable intuitively at the moment (you can see a side-by-side here).
RE-Bench logistic fit (section): We think it’s reasonable to set the ceiling of the logistic at wherever we think the maximum achievable performance would be. We don’t think it makes any sense to give weight to a fit that achieves a maximum of 0.5 when we know reference solutions achieve 1.0 and we also have reason to believe it’s possible to get substantially higher. We agree that we are making a guess (or with more positive connotation, “estimate”) about the maximum score, but it seems better than the alternative of doing no fit.
Mistakes that titotal pointed out
We agree that the graph we’ve tweeted is not closely representative of the typical trajectory of our timelines model conditional on superhuman coders in March 2027. Sorry about that, we should have prioritized making it more precisely faithful to the model. We will fix this in future communications.
They convinced us to remove the public vs. internal argument as a consideration in favor of superexponentiality (section).
We like the analysis done regarding the inconsistency of the RE-Bench saturation forecasts with an interpolation of the time horizons progression. We agree that it’s plausible that we should just not have RE-Bench in the benchmarks and gaps model; this is partially an artifact of a version of the model that existed before the METR time horizons paper.
In accordance with our bounties program, we will award $500 to titotal for pointing these out.
Communication issues
There were several issues with communication that titotal pointed out which we agree should be clarified, and we will do so. These issues arose from lack of polish rather than malice. 2 of the most important ones:
The “exponential” time horizon case still has superexponential growth once you account for automation of AI R&D.
The forecasts for RE-Bench saturation were adjusted based on other factors on top of the logistic fit.
Relatedly, titotal thinks that we made our model too complicated, while I think it’s important to make our best guess for how each relevant factor affects our forecast.
While the model is certainly imperfect due to limited time and the inherent difficulties around forecasting AGI timelines, we still think overall it’s the “least bad” timelines model out there and it’s the model that features most prominently in my overall timelines views. I think titotal disagrees, though I’m not sure which one they consider least bad
I also would be interested in learning what the “least bad” model is. Titotal says:
In my world, you generally want models to have strong conceptual justifications or empirical validation with existing data before you go making decisions based off their predictions
Are there alternative models that they believe have “strong conceptual justifications or empirical validation”? If not, then I feel confused about how they recommend people make decisions.
To make outcome-based decisions, you have to decide on the period in which you’re considering them. Considering any given period costs non-0 resources (reductio ad absurdum: in practice, considering all possible future timelines would cost infinite resources, so we presumably agree on the principle that excluding some from consideration is not only reasonable but necessary).
I think it’s a reasonable position to believe that if something can’t be empirically validated then it at least needs exceptionally strong conceptual justifications to inform such decisions.
This cuts both ways, so if the argument of AI2027 is ‘we shouldn’t dismiss this outcome out of hand’ then it’s a reasonable position (although I find Titotal’s longer backcasting an interesting counterweight, and it prompted me to wonder about a good way to backcast still further). If the argument is that AI safety researchers should meaningfully update towards shorter timelines based on the original essay or that we should move a high proportion of the global or altruistic economy towards event planning for AGI in 2027 - which seems to be what the authors are de facto pushing for—that seems much less defensible.
And I worry that they’ll be fodder for views like Aschenbrenner’s, and used to justify further undermining US-China relations and increasing the risk of great power conflict or nuclear war, both of which seems to me like more probable events in the next decade than AGI takeover.
if the argument of AI2027 is ‘we shouldn’t dismiss this outcome out of hand’ then it’s a reasonable position
Yep, that is how Titotal summarizes the argument:
The scenario in the short story is not the median forecast for any AI futures author, and none of the AI2027 authors actually believe that 2027 is the median year for a singularity to happen. But the argument they make is that 2027 is a plausible year
And if titotal had ended their post with something like ”… and so I think 2027 is a bit less plausible than the authors do” I would have no confusion. But they ended with:
What I’m most against is people taking shoddy toy models seriously and basing life decisions on them, as I have seen happen for AI2027
And I therefore am left wondering what less shoddy toy models I should be basing my life decisions on.[1]
I think their answer is partly “naively extrapolating the METR time horizon numbers forward is better than AI 2027”? But I don’t want to put words in their mouth and also I interpret them to have much longer timelines than this naive extrapolation would imply.
I think less selective quotation makes the line of argument clear.
Continuing the first quote:
The scenario in the short story is not the median forecast for any AI futures author, and none of the AI2027 authors actually believe that 2027 is the median year for a singularity to happen. But the argument they make is that 2027 is a plausible year,and they back it up with images of sophisticated looking modelling like the following:
[img]
This combination of compelling short story and seemingly-rigorous research may have been the secret sauce that let the article to go viral and be treated as a serious project:
[quote]
Now, I was originally happy to dismiss this work and just wait for their predictions to fail, but this thing just keeps spreading, including a youtube video with millions of views. So I decided to actually dig into the model and the code, and try to understand what the authors were saying and what evidence they were using to back it up.
The article is huge, so I focussed on one section alone: their “timelines forecast” code and accompanying methodology section. Not to mince words, I think it’s pretty bad. It’s not just that I disagree with their parameter estimates, it’s that I think the fundamental structure of their model is highly questionable and at times barely justified, there is very little empirical validation of the model, and there are parts of the code that the write-up of the model straight up misrepresents.
So the summary of this would not be ”… and so I think AI 2027 is a bit less plausible than the authors do”, but something like: “I think the work motivating AI 2027 being a credible scenario is, in fact, not good, and should not persuade those who did not believe this already. It is regrettable this work is being publicised (and perhaps presented) as much stronger than it really is.”
Continuing the second quote:
What I’m most against is people taking shoddy toy models seriously and basing life decisions on them, as I have seen happen for AI2027. This is just a model for a tiny slice of the possibility space for how AI will go, and in my opinion it is implemented poorly even if you agree with the author’s general worldview.
The right account for decision making under (severe) uncertainty is up for grabs, but in the ‘make a less shoddy toy model’ approach the quote would urge having a wide ensemble of different ones (including, say, those which are sub-exponential, ‘hit the wall’ or whatever else), and further urge we should put very little weight on the AI2027 model in whatever ensemble we will be using for important decisions.
Titotal actually ended their post with an alternative prescription:
I think people are going to deal with the fact that it’s really difficult to predict how a technology like AI is going to turn out. The massive blobs of uncertainty shown in AI 2027 are still severe underestimates of the uncertainty involved. If your plans for the future rely on prognostication, and this is the standard of work you are using, I think your plans are doomed. I would advise looking into plans that are robust to extreme uncertainty in how AI actually goes, and avoid actions that could blow up in your face if you turn out to be badly wrong.
I would advise looking into plans that are robust to extreme uncertainty in how AI actually goes, and avoid actions that could blow up in your face if you turn out to be badly wrong.
Seeing you highlight this now it occurs to me that I basically agree with this w.r.t. AI timelines (at least on one plausible interpretation, my guess is that titotal could have a different meaning in mind). I mostly don’t think people should take actions that blow up in their face if timelines are long (there are some exceptions, but overall I think long timelines are plausible and actions should be taken with that in mind).
A key thing that titotal doesn’t mention is how much probability mass they put on short timelines like, say, AGI by 2030. This seems very important for weighing various actions, even though we both agree that we should also be prepared for longer timelines.
In general, I feel like executing plans that are robust to extreme uncertainty is a prescription that is hard to follow without having at least a vague idea of the distribution of likelihood of various possibilities.
Thanks! This is helpful, although I would still be interested to hear if they believe there are models with “have strong conceptual justifications or empirical validation with existing data”.
I was going to reply with something longer here, but I think Gregory Lewis’ excellent comment highlights most of what I wanted to, r.e. titotal does actually give an alternative suggestion in the piece.
So instead I’ll counter two claims I think you make (or imply) in your comments here:
1. A shoddy toy model is better than no model at all
I mean this seems clearly not true, if we take model to be referring to the sort of formalised, quantified exercise similar to AI-2027. Some examples here might be Samuelson’s infamous predictions of the Soviet Union inevitably overtaking the US in GNP.[1] This was a bad model of the world, and even if it was ‘better’ than the available alternatives or came from a more prestigious source, it was still bad and I think worse than no model (again, defined as formal exercise ala AI2027).
A second example I can think of is the infamous Growth in a Time of Debtpaper, which I remember being used to win arguments and justify austerity across Europe in the 2010s, being rendered much less convincing after an Excel error was corrected.[2]
This also seems clearly false, unless we’re stretching “model” to mean simply “a reason/argument/justification” or defining “life decisions” narrowly as only those with enormous consequences instead of any ‘decision about my life’.
Even in the more serious cases, the role of models is to support presenting arguments for or against some decision or not, or to frame some explanation about the world, and of course simplification and quantification can be useful and powerful, but they shouldn’t be the only game in town. Other schools of thought are available.[3]
The reproduction paper turned critique is here, feels crazy that I can’t see the original data but the ‘model’ here seemed just to be spreadsheet of ~20 countries where the average only counted 15
This also seems clearly false, unless we’re stretching “model” to mean simply “a reason/argument/justification”
Yep, this is what I meant, sorry for the confusion. Or to phrase it another way: “I’m going off my intuition” is not a type of model which has privileged epistemic status; it’s one which can be compared with something like AI 2027 (and, like you say, may be found better).
Besides the point that “shoddy toy models” might be emotionally charged, I just want to point out that accelerating progress majorly increases variance and unknown unknowns? The higher energy a system is and the more variables you have the more chaotic it becomes. So maybe an answer is that a agile short-range model is the best? Outside view it in moderation and plan with the next few years being quite difficult to predict?
You don’t really need another model to disprove an existing one, you might as well point out that we don’t know and that is okay too.
I’m strongly in favor of allowing intuitive adjustments on top of quantitative modeling when estimating parameters.
We had a brief thread on this over on LW, but I’m still keen to hear why you endorse using precise probability distributions to represent these intuitive adjustments/estimates. I take many of titotal’s critiques in this post to be symptoms of precise Bayesianism gone wrong (not to say titotal would agree with me on that).
ETA: Which, to be clear, is a question I have for EAs in general, not just you. :)
^ I’m also curious to hear from those who disagree-voted my comment why they disagree. This would be very helpful for my understanding of what people’s cruxes for (im)precision are.
I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with “true terminal preference” deontological constraints, rather than just instrumental deontological constraints). There isn’t any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn’t necessarily get money pumped doesn’t mean it is a good decision procedure, it’s easy for something to avoid necessarily getting money pumped.
There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it’s useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like ‘represent an interval of credences and only take “actions” if the action seems net good across your interval of credences’. I think that following this policy in general would lead to lower expected value, do I don’t do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you’ll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you’d take no actions and do the default. (Starving to death? It’s unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying “we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more “speculative” until we’re able to take literally any actions at all and then proceed at that stop on the train”. This seems extremely ad hoc and I’m skeptical this is a good approach to decision making given that you accept the first bullet.
I’m worried that in practice you’re conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don’t think I care about motivating my terminal philosophy based on what we observe in this way!
(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don’t really make arguments against (1) except that implying other things are possible.)
My understanding is that your proposed policy would be something like ‘represent an interval of credences and only take “actions” if the action seems net good across your interval of credences’. … you’d take no actions and do the default. (Starving to death? It’s unclear what the default should be which makes this heuristic more confusing to apply.)
Definitely not saying this! I don’t think that (w.r.t. consequentialism at least) there’s any privileged distinction between “actions” and “inaction”, nor do I think I’ve ever implied this. My claim is: For any A and B, if it’s not the case that EV_p(A) > EV_p(B) for all p in the representor P,[1] and vice versa, then both A and B are permissible. This means that you have no reason to choose A over B or vice versa (again, w.r.t. consequentialism). Inaction isn’t privileged, but neither is any particular action.
Now of course one needs to pick some act (“action” or otherwise) all things considered, but I explain my position on that here.
properly incorporating model uncertainty into your estimates
What do you mean by “properly incorporating”? I think any answer here that doesn’t admit indeterminacy/imprecision will be arbitrary, as argued in my unawareness sequence.
basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1
Why do you think this? I argue here and here (see Q4 and links therein) why that need not be the case, especially when we’re forming beliefs relevant to local-scale goals.
My understanding is that you work around this by saying “we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more “speculative” until we’re able to take literally any actions at all and then proceed at that stop on the train”.
Also definitely not saying this. (I explicitly push back on such ad hoc ignoring of crazy-train considerations here.) My position is: (1) W.r.t. impartial consequentialism we can’t ignore any considerations. (2) But insofar as we’re making decisions based on ~immediate self-interest, parochial concern for others near to us, and non-consequentialist reasons, crazy-train considerations aren’t normatively relevant — so it’s not ad hoc to ignore them in that case. See also this great comment by Max Daniel. (Regardless, none of this is a positive argument for “make up precise credences about crazy-train considerations and act on them”.)
(ETA: The parent comment contains several important misunderstandings of my views, so I figured I should clarify here. Hence my long comments — sorry about that.)
Thanks for this, Ryan! I’ll reply to your main points here, and clear up some less central yet important points in another comment.
Here’s what I think you’re saying (sorry the numbering clashes with the numbering in your comment, couldn’t figure out how to change this):
The best representations of our actual degrees of belief given our evidence, intuitions, etc. — what you call the “terminally correct” credences — should be precise.[1]
In practice, the strategy that maximizes EV w.r.t. our terminally correct credences won’t be “make decisions by actually writing down a precise distribution and trying to maximize EV w.r.t. that distribution”. This is because there are empirical features of our situation that hinder us from executing that strategy ideally.
I (Anthony) am mistakenly inferring from (2) that (1) is false.
(In particular, any argument against (1) that relies on premises about the “empirical aspects of the current situation” must be making that mistake.)
Is that right? If so:
I do disagree with (1), but for reasons that have nothing to do with (2). My case for imprecise credences is: “In our empirical situation, any particular precise credence [or expected value] we might pick would be highly arbitrary” (argued for in detail here). (So I’m also not just saying “you can have imprecise credences without getting money pumped”.)
I’m not saying that “heuristics” based on imprecise credences “outperform” explicit EV max. I don’t think that principles for belief formation can bottom out in “performance” but should instead bottom out in non-pragmatic principles — one of which is (roughly) “if our available information is so ambiguous that picking one precise credence over another seems arbitrary, our credences should be imprecise”.
However, when we use non-pragmatic principles to derive our beliefs, the appropriate beliefs (not the principles themselves) can and should depend on empirical features of our situation that directly bear on our epistemic state: E.g., we face lots of considerations about the plausibility of a given hypothesis, and we seem to have too little evidence (+ too weak constraints from e.g. indifference principles or Occam’s razor) to justify any particular precise weighing of these considerations.[2] Contra (3.a), I don’t see how/why the structure of our credences could/should be independent of very relevant empirical information like this.
Intuition pump: Even an “ideal” precise Bayesian doesn’t actually terminally care about EV, they terminally care about the ex post value. But their empirical situation makes them uncertain what the ex post value of their action will be, so they represent their epistemic state with precise credences, and derive their preferences over actions from EV. This doesn’t imply they’re conflating terminal goals with empirical facts about how best to achieve them.
Separately, I haven’t yet seen convincing positive cases for (1). What are the “reasonably compelling arguments” for precise credences + EV maximization? And (if applicable to you) what are your replies to my counterarguments to the usual arguments here[3] (also here and here, though in fairness to you, those were buried in a comment thread)?
So in particular, I think you’re not saying the terminally correct credences for us are the credences that our computationally unbounded counterparts would have. If you are saying that, please let me know and I can reply to that — FWIW, as argued here, it’s not clear a computationally unbounded agent would be justified in precise credences either.
This is true of pretty much any hypothesis we consider, not just hypotheses about especially distant stuff. This ~adds up to normality / doesn’t collapse into radical skepticism, because we have reasons to have varying degrees of imprecision in our credences, and our credences about mundane stuff will only have a small degree of imprecision (more here and here).
Quote: “[L]et’s revisit why we care about EV in the first place. A common answer: “Coherence theorems! If you can’t be modeled as maximizing EU, you’re shooting yourself in the foot.” For our purposes, the biggest problem with this answer is: Suppose we act as if we maximize the expectation of some utility function. This doesn’t imply we make our decisions by following the procedure “use our impartial altruistic valuefunction to (somehow) assign a number to each hypothesis, and maximize the expectation”.” (In that context, I was taking about assigning precise values to coarse-grained hypotheses, but the same applies to assigning precise credences to any hypothesis.)
Side note: I appreciate that you actually sought out critiques with your bounty offer and took the time to respond and elaborate on your thinking here, thanks!
Thanks titotal for taking the time to dig deep into our model and write up your thoughts, it’s much appreciated. This comment speaks for Daniel Kokotajlo and me, not necessarily any of the other authors on the timelines forecast or AI 2027. It addresses most but not all of titotal’s post.
Overall view: titotal pointed out a few mistakes and communication issues which we will mostly fix. We are therefore going to give titotal a $500 bounty to represent our appreciation. However, we continue to disagree on the core points regarding whether the model’s takeaways are valid and whether it was reasonable to publish a model with this level of polish. We think titotal’s critiques aren’t strong enough to overturn the core conclusion that superhuman coders by 2027 are a serious possibility,
nor to significantly move our overall median(edit: I now think it’s plausible that changes made as a result of titotal’s critique will move our median significantly). Moreover, we continue to think that AI 2027’s timelines forecast is (unfortunately) the world’s state-of-the-art, and challenge others to do better. If instead of surpassing us, people simply want to offer us critiques, that’s helpful too; we hope to surpass ourselves every year in part by incorporating and responding to such critiques.Clarification regarding the updated model
My apologies about quietly updating the timelines forecast with an update without announcing it; we are aiming to announce it soon. I’m glad that titotal was able to see it.
A few clarifications:
titotal says “it predicts years longer timescales than the AI2027 short story anyway.” While the medians are indeed 2029 and 2030, the models still give ~25-40% to superhuman coders by the end of 2027.
Other team members (e.g. Daniel K) haven’t reviewed the updated model in depth, and have not integrated it into their overall views. Daniel is planning to do this soon, and will publish a blog post about it when he does.
Most important disagreements
I’ll let titotal correct us if we misrepresent them on any of this.
Whether to estimate and model dynamics for which we don’t have empirical data. e.g. titotal says there is “very little empirical validation of the model,” and especially criticizes the modeling of superexponentiality as having no empirical backing. We agree that it would be great to have more empirical validation of more of the model components, but unfortunately that’s not feasible at the moment while incorporating all of the highly relevant factors.[1]
Whether to adjust our estimates based on factors outside the data. For example, titotal criticizes us for making judgmental forecasts for the date of RE-Bench saturation, rather than plugging in the logistic fit. I’m strongly in favor of allowing intuitive adjustments on top of quantitative modeling when estimating parameters.
[Unsure about level of disagreement] The value of a “least bad” timelines model. While the model is certainly imperfect due to limited time and the inherent difficulties around forecasting AGI timelines, we still think overall it’s the “least bad” timelines model out there and it’s the model that features most prominently in my overall timelines views. I think titotal disagrees, though I’m not sure which one they consider least bad (perhaps METR’s simpler one in their time horizon paper?). But even if titotal agreed that ours was “least bad,” my sense is that they might still be much more negative on it than us. Some reasons I’m excited about publishing a least bad model:
Reasoning transparency. We wanted to justify the timelines in AI 2027, given limited time. We think it’s valuable to be transparent about where our estimates come from even if the modeling is flawed in significant ways. Additionally, it allows others like titotal to critique it.
Advancing the state of the art. Even if a model is flawed, it seems best to publish to inform others’ opinions and to allow others to build on top of it.
The likelihood of time horizon growth being superexponential, before accounting for AI R&D automation. See this section for our arguments in favor of superexponentiallity being plausible, and titotal’s responses (I put it at 45% in our original model). This comment thread has further discussion. If you are very confident in no inherent superexponentiality, superhuman coders by end of 2027 become significantly less likely, though are still >10% if you agree with the rest of our modeling choices (see here for a side-by-side graph generated from my latest model).
How strongly superexponential the progress would be. This section argues that our choice of superexponential function is arbitrary. While we agree that the choice is fairly arbitrary and ideally we would have uncertainty over the best function, my intuition is that titotal’s proposed alternative curve feels less plausible than the one we use in the report, conditional on some level of superexponentiality.
Whether the argument for superexponentiality is stronger at higher time horizons. titotal is confused about why there would sometimes be a delayed superexponential rather than starting at the simulation starting point. The reasoning here is that the conceptual argument for superexponentiality is much stronger at higher time horizons (e.g. going from 100 to 1,000 years feels likely much easier than going from 1 to 10 days, while it’s less clear for 1 to 10 weeks vs. 1 to 10 days). It’s unclear that the delayed superexponential is the exact right way to model that, but it’s what I came up with for now.
Other disagreements
Intermediate speedups: Unfortunately we haven’t had the chance to dig deeply into this section of titotal’s critique, and it’s mostly based on the original version of the model rather than the updated one so we probably will not get to this. The speedup from including AI R&D automation seems pretty reasonable intuitively at the moment (you can see a side-by-side here).
RE-Bench logistic fit (section): We think it’s reasonable to set the ceiling of the logistic at wherever we think the maximum achievable performance would be. We don’t think it makes any sense to give weight to a fit that achieves a maximum of 0.5 when we know reference solutions achieve 1.0 and we also have reason to believe it’s possible to get substantially higher. We agree that we are making a guess (or with more positive connotation, “estimate”) about the maximum score, but it seems better than the alternative of doing no fit.
Mistakes that titotal pointed out
We agree that the graph we’ve tweeted is not closely representative of the typical trajectory of our timelines model conditional on superhuman coders in March 2027. Sorry about that, we should have prioritized making it more precisely faithful to the model. We will fix this in future communications.
They convinced us to remove the public vs. internal argument as a consideration in favor of superexponentiality (section).
We like the analysis done regarding the inconsistency of the RE-Bench saturation forecasts with an interpolation of the time horizons progression. We agree that it’s plausible that we should just not have RE-Bench in the benchmarks and gaps model; this is partially an artifact of a version of the model that existed before the METR time horizons paper.
In accordance with our bounties program, we will award $500 to titotal for pointing these out.
Communication issues
There were several issues with communication that titotal pointed out which we agree should be clarified, and we will do so. These issues arose from lack of polish rather than malice. 2 of the most important ones:
The “exponential” time horizon case still has superexponential growth once you account for automation of AI R&D.
The forecasts for RE-Bench saturation were adjusted based on other factors on top of the logistic fit.
Relatedly, titotal thinks that we made our model too complicated, while I think it’s important to make our best guess for how each relevant factor affects our forecast.
I also would be interested in learning what the “least bad” model is. Titotal says:
Are there alternative models that they believe have “strong conceptual justifications or empirical validation”? If not, then I feel confused about how they recommend people make decisions.
To make outcome-based decisions, you have to decide on the period in which you’re considering them. Considering any given period costs non-0 resources (reductio ad absurdum: in practice, considering all possible future timelines would cost infinite resources, so we presumably agree on the principle that excluding some from consideration is not only reasonable but necessary).
I think it’s a reasonable position to believe that if something can’t be empirically validated then it at least needs exceptionally strong conceptual justifications to inform such decisions.
This cuts both ways, so if the argument of AI2027 is ‘we shouldn’t dismiss this outcome out of hand’ then it’s a reasonable position (although I find Titotal’s longer backcasting an interesting counterweight, and it prompted me to wonder about a good way to backcast still further). If the argument is that AI safety researchers should meaningfully update towards shorter timelines based on the original essay or that we should move a high proportion of the global or altruistic economy towards event planning for AGI in 2027 - which seems to be what the authors are de facto pushing for—that seems much less defensible.
And I worry that they’ll be fodder for views like Aschenbrenner’s, and used to justify further undermining US-China relations and increasing the risk of great power conflict or nuclear war, both of which seems to me like more probable events in the next decade than AGI takeover.
Yep, that is how Titotal summarizes the argument:
And if titotal had ended their post with something like ”… and so I think 2027 is a bit less plausible than the authors do” I would have no confusion. But they ended with:
And I therefore am left wondering what less shoddy toy models I should be basing my life decisions on.[1]
I think their answer is partly “naively extrapolating the METR time horizon numbers forward is better than AI 2027”? But I don’t want to put words in their mouth and also I interpret them to have much longer timelines than this naive extrapolation would imply.
I think less selective quotation makes the line of argument clear.
Continuing the first quote:
So the summary of this would not be ”… and so I think AI 2027 is a bit less plausible than the authors do”, but something like: “I think the work motivating AI 2027 being a credible scenario is, in fact, not good, and should not persuade those who did not believe this already. It is regrettable this work is being publicised (and perhaps presented) as much stronger than it really is.”
Continuing the second quote:
The right account for decision making under (severe) uncertainty is up for grabs, but in the ‘make a less shoddy toy model’ approach the quote would urge having a wide ensemble of different ones (including, say, those which are sub-exponential, ‘hit the wall’ or whatever else), and further urge we should put very little weight on the AI2027 model in whatever ensemble we will be using for important decisions.
Titotal actually ended their post with an alternative prescription:
Seeing you highlight this now it occurs to me that I basically agree with this w.r.t. AI timelines (at least on one plausible interpretation, my guess is that titotal could have a different meaning in mind). I mostly don’t think people should take actions that blow up in their face if timelines are long (there are some exceptions, but overall I think long timelines are plausible and actions should be taken with that in mind).
A key thing that titotal doesn’t mention is how much probability mass they put on short timelines like, say, AGI by 2030. This seems very important for weighing various actions, even though we both agree that we should also be prepared for longer timelines.
In general, I feel like executing plans that are robust to extreme uncertainty is a prescription that is hard to follow without having at least a vague idea of the distribution of likelihood of various possibilities.
Thanks! This is helpful, although I would still be interested to hear if they believe there are models with “have strong conceptual justifications or empirical validation with existing data”.
I was going to reply with something longer here, but I think Gregory Lewis’ excellent comment highlights most of what I wanted to, r.e. titotal does actually give an alternative suggestion in the piece.
So instead I’ll counter two claims I think you make (or imply) in your comments here:
1. A shoddy toy model is better than no model at all
I mean this seems clearly not true, if we take model to be referring to the sort of formalised, quantified exercise similar to AI-2027. Some examples here might be Samuelson’s infamous predictions of the Soviet Union inevitably overtaking the US in GNP.[1] This was a bad model of the world, and even if it was ‘better’ than the available alternatives or came from a more prestigious source, it was still bad and I think worse than no model (again, defined as formal exercise ala AI2027).
A second example I can think of is the infamous Growth in a Time of Debt paper, which I remember being used to win arguments and justify austerity across Europe in the 2010s, being rendered much less convincing after an Excel error was corrected.[2]
TL;dr, as Thane said on LessWrong, we shouldn’t grade models on a curve
2. You need to base life decisions on a toy model
This also seems clearly false, unless we’re stretching “model” to mean simply “a reason/argument/justification” or defining “life decisions” narrowly as only those with enormous consequences instead of any ‘decision about my life’.
Even in the more serious cases, the role of models is to support presenting arguments for or against some decision or not, or to frame some explanation about the world, and of course simplification and quantification can be useful and powerful, but they shouldn’t be the only game in town. Other schools of thought are available.[3]
The reproduction paper turned critique is here, feels crazy that I can’t see the original data but the ‘model’ here seemed just to be spreadsheet of ~20 countries where the average only counted 15
Such as:
Decisionmaking under Deep Uncertainty
Do The Math, Then Burn The Math and Go With Your Gut
Make a decision based on the best explanation of the world
Go with common-sense heuristics since they likely encode knowledge gained from cultural evolution
Yep, this is what I meant, sorry for the confusion. Or to phrase it another way: “I’m going off my intuition” is not a type of model which has privileged epistemic status; it’s one which can be compared with something like AI 2027 (and, like you say, may be found better).
Besides the point that “shoddy toy models” might be emotionally charged, I just want to point out that accelerating progress majorly increases variance and unknown unknowns? The higher energy a system is and the more variables you have the more chaotic it becomes. So maybe an answer is that a agile short-range model is the best? Outside view it in moderation and plan with the next few years being quite difficult to predict?
You don’t really need another model to disprove an existing one, you might as well point out that we don’t know and that is okay too.
We had a brief thread on this over on LW, but I’m still keen to hear why you endorse using precise probability distributions to represent these intuitive adjustments/estimates. I take many of titotal’s critiques in this post to be symptoms of precise Bayesianism gone wrong (not to say titotal would agree with me on that).
ETA: Which, to be clear, is a question I have for EAs in general, not just you. :)
^ I’m also curious to hear from those who disagree-voted my comment why they disagree. This would be very helpful for my understanding of what people’s cruxes for (im)precision are.
I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with “true terminal preference” deontological constraints, rather than just instrumental deontological constraints). There isn’t any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn’t necessarily get money pumped doesn’t mean it is a good decision procedure, it’s easy for something to avoid necessarily getting money pumped.
There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it’s useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like ‘represent an interval of credences and only take “actions” if the action seems net good across your interval of credences’. I think that following this policy in general would lead to lower expected value, do I don’t do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you’ll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you’d take no actions and do the default. (Starving to death? It’s unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying “we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more “speculative” until we’re able to take literally any actions at all and then proceed at that stop on the train”. This seems extremely ad hoc and I’m skeptical this is a good approach to decision making given that you accept the first bullet.
I’m worried that in practice you’re conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don’t think I care about motivating my terminal philosophy based on what we observe in this way!
(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don’t really make arguments against (1) except that implying other things are possible.)
Definitely not saying this! I don’t think that (w.r.t. consequentialism at least) there’s any privileged distinction between “actions” and “inaction”, nor do I think I’ve ever implied this. My claim is: For any A and B, if it’s not the case that EV_p(A) > EV_p(B) for all p in the representor P,[1] and vice versa, then both A and B are permissible. This means that you have no reason to choose A over B or vice versa (again, w.r.t. consequentialism). Inaction isn’t privileged, but neither is any particular action.
Now of course one needs to pick some act (“action” or otherwise) all things considered, but I explain my position on that here.
What do you mean by “properly incorporating”? I think any answer here that doesn’t admit indeterminacy/imprecision will be arbitrary, as argued in my unawareness sequence.
Why do you think this? I argue here and here (see Q4 and links therein) why that need not be the case, especially when we’re forming beliefs relevant to local-scale goals.
Also definitely not saying this. (I explicitly push back on such ad hoc ignoring of crazy-train considerations here.) My position is: (1) W.r.t. impartial consequentialism we can’t ignore any considerations. (2) But insofar as we’re making decisions based on ~immediate self-interest, parochial concern for others near to us, and non-consequentialist reasons, crazy-train considerations aren’t normatively relevant — so it’s not ad hoc to ignore them in that case. See also this great comment by Max Daniel. (Regardless, none of this is a positive argument for “make up precise credences about crazy-train considerations and act on them”.)
Technically this should be weakened to “weak inequality for all p + strict inequality for at least one p”.
(ETA: The parent comment contains several important misunderstandings of my views, so I figured I should clarify here. Hence my long comments — sorry about that.)
Thanks for this, Ryan! I’ll reply to your main points here, and clear up some less central yet important points in another comment.
Here’s what I think you’re saying (sorry the numbering clashes with the numbering in your comment, couldn’t figure out how to change this):
The best representations of our actual degrees of belief given our evidence, intuitions, etc. — what you call the “terminally correct” credences — should be precise.[1]
In practice, the strategy that maximizes EV w.r.t. our terminally correct credences won’t be “make decisions by actually writing down a precise distribution and trying to maximize EV w.r.t. that distribution”. This is because there are empirical features of our situation that hinder us from executing that strategy ideally.
I (Anthony) am mistakenly inferring from (2) that (1) is false.
(In particular, any argument against (1) that relies on premises about the “empirical aspects of the current situation” must be making that mistake.)
Is that right? If so:
I do disagree with (1), but for reasons that have nothing to do with (2). My case for imprecise credences is: “In our empirical situation, any particular precise credence [or expected value] we might pick would be highly arbitrary” (argued for in detail here). (So I’m also not just saying “you can have imprecise credences without getting money pumped”.)
I’m not saying that “heuristics” based on imprecise credences “outperform” explicit EV max. I don’t think that principles for belief formation can bottom out in “performance” but should instead bottom out in non-pragmatic principles — one of which is (roughly) “if our available information is so ambiguous that picking one precise credence over another seems arbitrary, our credences should be imprecise”.
However, when we use non-pragmatic principles to derive our beliefs, the appropriate beliefs (not the principles themselves) can and should depend on empirical features of our situation that directly bear on our epistemic state: E.g., we face lots of considerations about the plausibility of a given hypothesis, and we seem to have too little evidence (+ too weak constraints from e.g. indifference principles or Occam’s razor) to justify any particular precise weighing of these considerations.[2] Contra (3.a), I don’t see how/why the structure of our credences could/should be independent of very relevant empirical information like this.
Intuition pump: Even an “ideal” precise Bayesian doesn’t actually terminally care about EV, they terminally care about the ex post value. But their empirical situation makes them uncertain what the ex post value of their action will be, so they represent their epistemic state with precise credences, and derive their preferences over actions from EV. This doesn’t imply they’re conflating terminal goals with empirical facts about how best to achieve them.
Separately, I haven’t yet seen convincing positive cases for (1). What are the “reasonably compelling arguments” for precise credences + EV maximization? And (if applicable to you) what are your replies to my counterarguments to the usual arguments here[3] (also here and here, though in fairness to you, those were buried in a comment thread)?
So in particular, I think you’re not saying the terminally correct credences for us are the credences that our computationally unbounded counterparts would have. If you are saying that, please let me know and I can reply to that — FWIW, as argued here, it’s not clear a computationally unbounded agent would be justified in precise credences either.
This is true of pretty much any hypothesis we consider, not just hypotheses about especially distant stuff. This ~adds up to normality / doesn’t collapse into radical skepticism, because we have reasons to have varying degrees of imprecision in our credences, and our credences about mundane stuff will only have a small degree of imprecision (more here and here).
Quote: “[L]et’s revisit why we care about EV in the first place. A common answer: “Coherence theorems! If you can’t be modeled as maximizing EU, you’re shooting yourself in the foot.” For our purposes, the biggest problem with this answer is: Suppose we act as if we maximize the expectation of some utility function. This doesn’t imply we make our decisions by following the procedure “use our impartial altruistic value function to (somehow) assign a number to each hypothesis, and maximize the expectation”.” (In that context, I was taking about assigning precise values to coarse-grained hypotheses, but the same applies to assigning precise credences to any hypothesis.)
Side note: I appreciate that you actually sought out critiques with your bounty offer and took the time to respond and elaborate on your thinking here, thanks!