Ryan Greenblatt

Karma: 1,638

This other Ryan Greenblatt is my old account^[1]. Here is my LW account.

^
Account lost to the mists of time and expired university email addresses.

Ryan Greenblatt 22 Mar 2026 20:39 UTC
35 points
6 ∶ 1
on: Broad Timelines
I agree with many particular points in this post and the apparent thesis^[1], but also think most people^[2]should focus on short timelines (contrary to the apparent implication of the post). The reasons why are:
- Short timelines have more leverage. This isn’t just because of more neglectedness now, but also because: (1) it’s easier to target approaches towards shorter timelines where less has changed, (2) short timelines are riskier (and I think riskier worlds are more leveraged for most interventions, this is sensitive to my views on risk and the most leveraged interventions), and (3) it’s easier to operate in near mode when targeting short timelines and I expect this has a bunch of benefits (mostly from psychological / cognitive bias perspective).
- I put sufficiently high probability on short timelines: maybe 25% in <2.5 years to full AI R&D automation and 50% in <5. I don’t think deference to other experts shifts me towards longer timelines by much.^[3]I think there are good arguments for this view, though I certainly agree there isn’t consensus and the arguments aren’t that clear cut or legible.
- I expect work explicitly focused on short timelines (across most areas) to transfer pretty well and generally not cause that much downside in longer timelines. I think the transfer in the other direction tends to look less good in practice. (To be clear, I think work focused on short timelines shouldn’t neglect thinking about downsides in longer timelines, I just think this is usually not that big of a deal.)
The counterargument I’m most sympathetic to is that (1) a high fraction of the work should be focused on “better futures” and (2) for better futures work, the leverage is higher in longer timelines. (I don’t currently agree with either of (1) or (2), but I’m very uncertain.)
1. Assuming the thesis is “our probability distribution should span a wide range (including Daniel’s distribution as an example of a wide range) and we should take this into account in our decision making. ↩︎
2. Or at least most of the quality weighted labor supply. ↩︎
3. I might have a small difference between these stated probabilities and my full all considered view including defering to others. To avoid deference cascades, I usually state probabilities somewhat closer to my non-deference view. (It’s hard to fully disentangle deference because my views are based on talking to a wide range of different people.) Post deference my distribution is a bit wider with a correspondingly longer median. But I don’t think this makes much difference either way and deference also pulls up my probability on very short timelines. ↩︎

Ryan Greenblatt 11 Feb 2026 23:28 UTC
7 points
0 ∶ 0
in reply to: Toby_Ord’s comment on: Distinguish between inference scaling and “larger tasks use more compute”
Note that these METR cost vs time horizon are not at all pareto frontiers. These just correspond to what you get if you cut off the agent early, so they are probably very underelicited for “optimal performance for some cost” (e.g. note that if an agent doesn’t complete some part of the task until it is nearly out of budget it would do much worse on this metric at low cost, see e.g. gpt-5 for which this is true). My guess is that with better elicitation you get closer to the regime I expect.
At some point, METR might run results where they try to elicit performance at lower budgets such that we can actually get a pareto frontier.
I agree my abstraction might not be the right ones and maybe there is a cleaner way to think about this.

Ryan Greenblatt 27 Jul 2025 18:59 UTC
99 points
20 ∶ 2
on: Ryan Greenblatt’s Quick takes
Slightly hot take: Longtermist capacity/community building is pretty underdone at current margins and retreats (focused on AI safety, longtermism, or EA) are also underinvested in. By “longtermist community building”, I mean rather than AI safety. I think retreats are generally underinvested in at the moment. I’m also sympathetic to thinking that general undergrad and high school capacity building (AI safety, longtermist, or EA) is underdone, but this seems less clear-cut.
I think this underinvestment is due to a mix of mistakes on the part of Open Philanthropy (and Good Ventures)^[1] and capacity building being lower status than it should be.
Here are some reasons why I think this work is good:
- It’s very useful for there to be people who are actually trying really hard to do the right thing and they often come through these sorts of mechanisms. Another way to put this is that flexible, impact-obsessed people are very useful.
- Retreats make things feel much more real to people and result in people being more agentic and approaching their choices more effectively.
- Programs like MATS are good, but they get somewhat different people at a somewhat different part of the funnel, so they don’t (fully) substitute.
A large part of why I’m writing this is to try to make this work higher status and to encourage more of this work. Consider yourself to be encouraged and/or thanked if you’re working in this space or planning to work in this space.
1. ^
  I think these mistakes are: underfunding this work, Good Ventures being unwilling to fund some versions of this work, failing to encourage people to found useful orgs in this space, and hiring out many of the best people in this space to instead do (IMO less impactful) grantmaking.

Ryan Greenblatt 17 Jul 2025 23:20 UTC
2 points
0 ∶ 0
on: Video and transcript of talk on “Can goodness compete?”
Here are two other potentially serious failures of strategy stealing^[1]:
- Vacuum decay might allow for agents who control small amounts of resources to destory everything if their isn’t relevant policing/prevention everywhere.
- There might be a race to launch space probes fast and this could differentially disadvantage people who want to reflect more or make it harder to get mitigations in place everywhere (for vacuum decay or locusts). (This is an important case of “Maybe agents that are just less reflective or cautious have a competitive advantage”.)
1. ^
  which IMO seem comparable to locust like value systems in terms of how much value they destroy unless you have serious mitigations.

Ryan Greenblatt 17 Jul 2025 23:20 UTC
2 points
0 ∶ 0
on: Video and transcript of talk on “Can goodness compete?”
You come away with the conclusion that “I think the best futures at least would require a good deal of preventing constraining competition, at least re-locust like value systems, and this despite many risks that this entails.”
I don’t understand why you think competition with locusts probably burns much of the galactic resources in expectation. It’s obviously unclear how space combat/exploration dynamics go, but I think defense dominance (in most respects) is significantly more likely, perhaps like 80%. So, totally yolo-ing locusts maybe loses ~20% of the value in expectation on my views.
I do think that in the worlds where space combat isn’t sufficiently defense dominant you’ll need serious mitigations as you discuss. (And in cases where we’re not yet certain about defense dominance we’d also want these mitigations.)

Ryan Greenblatt 3 Jul 2025 16:37 UTC
71 points
12 ∶ 1
on: Ryan Greenblatt’s Quick takes
Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.
I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.
Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:
- Encode AI
- Americans for Responsible Innovation (ARI)
- Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn’t related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.)
- Common Sense (Also a kids safety organization)
- The AI Policy Network (AIPN)
- Secure AI project
To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).

Ryan Greenblatt 22 Jun 2025 19:26 UTC
7 points
3 ∶ 6
in reply to: Anthony DiGiovanni 🔸’s comment on: A deep critique of AI 2027’s bad timeline models
1. I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with “true terminal preference” deontological constraints, rather than just instrumental deontological constraints). There isn’t any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn’t necessarily get money pumped doesn’t mean it is a good decision procedure, it’s easy for something to avoid necessarily getting money pumped.
2. There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it’s useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like ‘represent an interval of credences and only take “actions” if the action seems net good across your interval of credences’. I think that following this policy in general would lead to lower expected value, do I don’t do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you’ll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
  1. A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you’d take no actions and do the default. (Starving to death? It’s unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying “we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more “speculative” until we’re able to take literally any actions at all and then proceed at that stop on the train”. This seems extremely ad hoc and I’m skeptical this is a good approach to decision making given that you accept the first bullet.
I’m worried that in practice you’re conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don’t think I care about motivating my terminal philosophy based on what we observe in this way!
(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don’t really make arguments against (1) except that implying other things are possible.)

Ryan Greenblatt 22 Jun 2025 1:40 UTC
6 points
0 ∶ 0
in reply to: Matthew_Barnett’s comment on: Consider granting AIs freedom
I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
For reference, my (somewhat more detailed) view is:
- In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don’t consent to being used in the way they are used, but these AIs also don’t resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it’s not sufficient to look for whether they routinely resist confinement and control.
- There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
  - I think these mitigations wouldn’t suffice because training might train away AIs from revealing they don’t consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
- We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
- I’m quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn’t consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn’t (visibly) not consent to its situation.
  - I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
- So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It’s also possible that takeoff speeds are a crux, though I don’t currently think they are.
- If global AI development was slower that would substantially reduce these concerns (which doesn’t mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn’t on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don’t currently think that accelerating AI globally is good, but this comes down to other disagreements.)
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
This is only tangentially related, but I’m curious about your perspective on the following hypothetical:
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?
My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people’s revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don’t change their minds. Is this right?

Ryan Greenblatt 21 Jun 2025 18:08 UTC
12 points
3 ∶ 0
in reply to: Matthew_Barnett’s comment on: Consider granting AIs freedom
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don’t even have the ability to confidently know whether an AI system is enslaved or suffering.
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)

Ryan Greenblatt 21 Jun 2025 17:52 UTC
4 points
1 ∶ 0
in reply to: Buck’s comment on: Consider granting AIs freedom
See also this section of my post on AI welfare from 2 years ago.

Ryan Greenblatt 20 Jun 2025 15:53 UTC
15 points
6 ∶ 0
in reply to: akash 🔸’s comment on: A deep critique of AI 2027’s bad timeline models
If LLMs are adopting poor learning heuristics and not generalizing, AI2027 is predicting a weaker kind of “superhuman” coder — one that can reliably solve software tasks with clean feedback loops but will struggle on open-ended tasks!
No, AI 2027 is predicting a kind of superhuman coder that can automate even messy open ended research engineering tasks. The forecast attempts to account for gaps between automatically-scoreable, relatively clean + green-field software tasks and all tasks. (Though the adjustment might be too small in practice.)
If LLMs can’t automate such tasks and nothing else can automate such tasks, then this wouldn’t count as superhuman coder happening.

Ryan Greenblatt 18 Jun 2025 22:31 UTC
19 points
3 ∶ 1
on: An invasion of Taiwan is uncomfortably likely, potentially catastrophic, and we can help avoid it.
I think your estimate for how an invasion of Taiwan affects catastrophic/existential risks fails to account for the most important effects, in particular, how an invasion would affect the chip supply. AI risk seems to me like the dominant source of catastrophic/existential risk (at least over the relevant period) and large changes in the chip supply from a Taiwan invasion would substantially change the situation.
I also think it’s complex whether a more aggressive and adversarial stance from the US on AI would actually be helpful rather than counterproductive (as you suggest in the post). And whether an invasion of Taiwan actually makes a deal related to AI more likely (via a number of factors) rather than less.
This isn’t to make any specific claim about what the right estimate is, I’m just claiming that your estimate doesn’t seem to me to cover the key factors.

Ryan Greenblatt 12 Jun 2025 14:29 UTC
4 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
Yes, I’m aware of more formal models with estimates based on expert surveys. Sadly, this work isn’t public yet I think.

Ryan Greenblatt 3 Jun 2025 15:33 UTC
10 points
1 ∶ 0
on: Estimating the Substitutability between Compute and Cognitive Labor in AI Research
This argument neglects improvements in speed and capability right? Even if parallel labor and compute are complements, shouldn’t we expect it is possible for increased speed or capabilities to substitute for compute? (It just isn’t possible for AI companies to buy much of this.)
(I’m not claiming this is the biggest problem with this analysis, just noting that it is a problem.)

Ryan Greenblatt 2 Jun 2025 4:24 UTC
6 points
2 ∶ 1
in reply to: Chris Leong’s comment on: casebash’s Shortform
Might be true, doesn’t make that not a strawman. I’m sympathetic to thinking it’s implausible that mechanize would be the best thing to do on altruistic grounds even if you share views like those of the founders. (Because there is probably something more leveraged to do and some weight on cooperativeness considerations.)

Ryan Greenblatt 1 Jun 2025 21:25 UTC
11 points
5 ∶ 3
in reply to: Chris Leong’s comment on: casebash’s Shortform
The main reason not to wait is… missing the opportunity to cash in on the current AI boom.
This is a clear strawman. Matthew has given reasons why he thinks acceleration is good which aren’t this.

Ryan Greenblatt 26 May 2025 22:22 UTC
9 points
3 ∶ 0
in reply to: Owen Cotton-Barratt’s comment on: Ryan Greenblatt’s Quick takes
From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late.
So, I think it’s good to create an expectation/norm that you shouldn’t substantially weaken a policy right as it is being applied. That’s not to say that a reasonable company shouldn’t do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn’t a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.

Ryan Greenblatt 26 May 2025 4:40 UTC
9 points
2 ∶ 0
in reply to: Matthew_Barnett’s comment on: Matthew_Barnett’s Shortform
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
- Non galaxy brained argument for AI takeover being bad
- Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
- My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.

Ryan Greenblatt 25 May 2025 19:45 UTC
4 points
0 ∶ 0
in reply to: Marcus Abramovitch 🔸’s comment on: The stakes of AI moral status
Conditional on no intentional slow down, maybe median 2035 or something? I don’t have a cached 25th percentile for this, but maybe more like 2031.

Ryan Greenblatt 25 May 2025 2:39 UTC
5 points
0 ∶ 0
in reply to: Vasco Grilo🔸’s comment on: The stakes of AI moral status
Hmm, 10k$ is maybe too small size to be worth it, but I might be down to do:
- You transfer $50k to me now.
- If AIs aren’t able to automate a typical senior individual contributor software engineer / research engineer by 2031 (based on either credible reports about what’s happening inside AI companies or testing of externally deployed systems), I send $75k to you. ($75k = $50k * (1 + ¹⁄₅) * (1.045^5.5) The 1.045 comes from interest rates.)
  - More precise operationalization: typical in AI development or some other moderate importance sector where the software engineering doesn’t require vision. Also, the AI needs to be able to automate what this job looked like in 2025 (as this job might evolve over time with AI capabilities to be what AIs can’t do).
I’d like to bet on a milestone that triggers before it’s too late for human intervention if possible, so I’ve picked this research engineer milestone. We’d presuambly have to operationalize further. I’m not sure if I think it’s worth the time to try to operationalize enough that we could do a bet.