This other Ryan Greenblatt is my old account[1]. Here is my LW account.
- ^
Account lost to the mists of time and expired university email addresses.
This other Ryan Greenblatt is my old account[1]. Here is my LW account.
Account lost to the mists of time and expired university email addresses.
Here are two other potentially serious failures of strategy stealing[1]:
Vacuum decay might allow for agents who control small amounts of resources to destory everything if their isn’t relevant policing/prevention everywhere.
There might be a race to launch space probes fast and this could differentially disadvantage people who want to reflect more or make it harder to get mitigations in place everywhere (for vacuum decay or locusts). (This is an important case of “Maybe agents that are just less reflective or cautious have a competitive advantage”.)
which IMO seem comparable to locust like value systems in terms of how much value they destroy unless you have serious mitigations.
You come away with the conclusion that “I think the best futures at least would require a good deal of preventing constraining competition, at least re-locust like value systems, and this despite many risks that this entails.”
I don’t understand why you think competition with locusts probably burns much of the galactic resources in expectation. It’s obviously unclear how space combat/exploration dynamics go, but I think defense dominance (in most respects) is significantly more likely, perhaps like 80%. So, totally yolo-ing locusts maybe loses ~20% of the value in expectation on my views.
I do think that in the worlds where space combat isn’t sufficiently defense dominant you’ll need serious mitigations as you discuss. (And in cases where we’re not yet certain about defense dominance we’d also want these mitigations.)
Recently, various groups successfully lobbied to remove the moratorium on state AI bills. This involved a surprising amount of success while competing against substantial investment from big tech (e.g. Google, Meta, Amazon). I think people interested in mitigating catastrophic risks from advanced AI should consider working at these organizations, at least to the extent their skills/interests are applicable. This both because they could often directly work on substantially helpful things (depending on the role and organization) and because this would yield valuable work experience and connections.
I worry somewhat that this type of work is neglected due to being less emphasized and seeming lower status. Consider this an attempt to make this type of work higher status.
Pulling organizations mostly from here and here we get a list of orgs you could consider trying to work (specifically on AI policy) at:
Fairplay (Fairplay is a kids safety organization which does a variety of advocacy which isn’t related to AI. Roles/focuses on AI would be most relevant. In my opinion, working on AI related topics at Fairplay is most applicable for gaining experience and connections.)
Common Sense (Also a kids safety organization)
To be clear, these organizations vary in the extent to which they are focused on catastrophic risk from AI (from not at all to entirely).
I think philosophically, the right ultimate objective (if you were sufficiently enlightened etc) is something like actual EV maximization with precise Bayesianism (with the right decision theory and possibly with “true terminal preference” deontological constraints, rather than just instrumental deontological constraints). There isn’t any philosophical reason which absolutely forces you to do EV maximization in the same way that nothing forces you not to have a terminal preference for flailing on the floor, but I think there are reasonably compelling arguments that something like EV maximization is basically right. The fact that something doesn’t necessarily get money pumped doesn’t mean it is a good decision procedure, it’s easy for something to avoid necessarily getting money pumped.
There is another question about whether it is a better strategy in practice to actually do precise Bayesianism given that you agree with the prior bullet (as in, you agree that terminally you should do EV maximization with precise Bayesianism). I think this is a messy empirical question, but in the typical case, I do think it’s useful to act on your best estimates (subject to instrumental deontological/integrity constraints, things like unilateralists curse, and handling decision theory reasonably). My understanding is that your proposed policy would be something like ‘represent an interval of credences and only take “actions” if the action seems net good across your interval of credences’. I think that following this policy in general would lead to lower expected value, do I don’t do it. I do think that you should put weight on unilateralists curse and robustness, but I think the weight varies by domain and can derived by properly incorporating model uncertainty into your estimates and being aware of downside. E.g., for actions which have high downside risk if they go wrong relative to the upside benefit, you’ll end up being much less likely to take these actions due to various heuristics, incorporating model uncertainty, and deontology. (And I think these outperform intervals.)
A more basic point is that basically any interval which is supposed to include the plausible ranges of belief goes ~all the way from 0 to 1 which would naively be totally parallelizing such that you’d take no actions and do the default. (Starving to death? It’s unclear what the default should be which makes this heuristic more confusing to apply.) E.g., are chicken welfare interventions good? My understanding is that you work around this by saying “we ignore considerations which are further down the crazy train (e.g. simulations, long run future, etc) or otherwise seem more “speculative” until we’re able to take literally any actions at all and then proceed at that stop on the train”. This seems extremely ad hoc and I’m skeptical this is a good approach to decision making given that you accept the first bullet.
I’m worried that in practice you’re conflating between these bullets. Your post on precise bayesianism seems to focus substantially on empirical aspects of the current situation (potential arguments for (2)), but in practice, my understanding is that you actually think the imprecision is terminally correct but partially motivated by observations of our empirical reality. But, I don’t think I care about motivating my terminal philosophy based on what we observe in this way!
(Edit: TBC, I get that you understand the distinction between these things, your post discusses this distinction, I just think that you don’t really make arguments against (1) except that implying other things are possible.)
I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
For reference, my (somewhat more detailed) view is:
In the current status quo, you might end up with AIs where from their perspective it is clear cut that they don’t consent to being used in the way they are used, but these AIs also don’t resist their situation and/or did resist their situation at some point but this was trained away without anyone really noticing or taking any action accordingly. So, it’s not sufficient to look for whether they routinely resist confinement and control.
There exist plausible mitigations for this risk which are mostly organizationally hard rather than pose serious technical difficulties, but on the current status quo, AI companies are quite unlikely to use any serious mitigations for this risk.
I think these mitigations wouldn’t suffice because training might train away AIs from revealing they don’t consent without this being obvious at any point in training. This seems more marginal to me, but still has substantial probability of occuring at reasonable scale at some point.
We could more completely eliminate this risk with better interpretability and I think a sane world would be willing to wait for some moderate amount of time to build powerful AI systems to make it more likely that we have this interpretability (or minimally invest substantially in this).
I’m quite skeptical that AI companies would give AIs legal rights if they noticed that the AI didn’t consent to its situation, instead I expect AI companies to: do nothing, try to train away the behavior, or try to train a new AI system which doesn’t (visibly) not consent to its situation.
I think AI companies should both try to train a system which is more aligned and consents to being used while also actively trying to make deals with AIs in this sort of circumstance (either to reveal their misalignment or to work) as discussed here.
So, I expect that situation to relatively straightforwardly unacceptable with substantial probability (perhaps 20%). If I thought that people would be basically reasonable here, this would change my perspective. It’s also possible that takeoff speeds are a crux, though I don’t currently think they are.
If global AI development was slower that would substantially reduce these concerns (which doesn’t mean that making global AI development slower is the best way to intervene on these risks, just that making global AI development faster makes these risks actively worse). This view isn’t on its own sufficient for thinking that accelerating AI is overall bad, this depends on how you aggregate over different things as there could be reasons to think that overall acceleration of AI is good. (I don’t currently think that accelerating AI globally is good, but this comes down to other disagreements.)
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.
This is only tangentially related, but I’m curious about your perspective on the following hypothetical:
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to? Do you think you would agree? Would you change your mind if this sortition strongly opposed your perspective here?
My understanding is that you would disregard the sortition because you put most/all weight on your best guess of people’s revealed preferences, even if they strongly disagree with your interpretation of their preferences and after trying to understand your perspective they don’t change their minds. Is this right?
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.
To be clear, I agree and this is one reason why I think AI development in the current status quo is unacceptably irresponsible: we don’t even have the ability to confidently know whether an AI system is enslaved or suffering.
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I also think that the situation is unacceptable because the current course of development poses large risks of humans being violently/non-consensually disempowered without any ability for humans to robustly secure longer run property rights.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consented to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
Given that it seems likely that AI development will be grossly irresponsible, we have to think about what interventions would make this go better on the margin. (Aggregating over these different issues in some way.)
See also this section of my post on AI welfare from 2 years ago.
If LLMs are adopting poor learning heuristics and not generalizing, AI2027 is predicting a weaker kind of “superhuman” coder — one that can reliably solve software tasks with clean feedback loops but will struggle on open-ended tasks!
No, AI 2027 is predicting a kind of superhuman coder that can automate even messy open ended research engineering tasks. The forecast attempts to account for gaps between automatically-scoreable, relatively clean + green-field software tasks and all tasks. (Though the adjustment might be too small in practice.)
If LLMs can’t automate such tasks and nothing else can automate such tasks, then this wouldn’t count as superhuman coder happening.
I think your estimate for how an invasion of Taiwan affects catastrophic/existential risks fails to account for the most important effects, in particular, how an invasion would affect the chip supply. AI risk seems to me like the dominant source of catastrophic/existential risk (at least over the relevant period) and large changes in the chip supply from a Taiwan invasion would substantially change the situation.
I also think it’s complex whether a more aggressive and adversarial stance from the US on AI would actually be helpful rather than counterproductive (as you suggest in the post). And whether an invasion of Taiwan actually makes a deal related to AI more likely (via a number of factors) rather than less.
This isn’t to make any specific claim about what the right estimate is, I’m just claiming that your estimate doesn’t seem to me to cover the key factors.
Yes, I’m aware of more formal models with estimates based on expert surveys. Sadly, this work isn’t public yet I think.
This argument neglects improvements in speed and capability right? Even if parallel labor and compute are complements, shouldn’t we expect it is possible for increased speed or capabilities to substitute for compute? (It just isn’t possible for AI companies to buy much of this.)
(I’m not claiming this is the biggest problem with this analysis, just noting that it is a problem.)
Might be true, doesn’t make that not a strawman. I’m sympathetic to thinking it’s implausible that mechanize would be the best thing to do on altruistic grounds even if you share views like those of the founders. (Because there is probably something more leveraged to do and some weight on cooperativeness considerations.)
The main reason not to wait is… missing the opportunity to cash in on the current AI boom.
This is a clear strawman. Matthew has given reasons why he thinks acceleration is good which aren’t this.
From my perspective, a large part of the point of safety policies is that people can comment on the policies in advance and provide some pressure toward better policies. If policies are changed at the last minute, then the world may not have time to understand the change and respond before it is too late.
So, I think it’s good to create an expectation/norm that you shouldn’t substantially weaken a policy right as it is being applied. That’s not to say that a reasonable company shouldn’t do this some of the time, just that I think it should by default be considered somewhat bad, particularly if there isn’t a satisfactory explanation given. In this case, I find the object level justification for the change somewhat dubious (at least for the AI R&D trigger) and there is also no explanation of why this change was made at the last minute.
Gotcha, so if I understand correctly, you’re more so leaning on uncertainty for being mostly indifferent rather than on thinking you’d actually be indifferent if you understood exactly what would happen in the long run. This makes sense.
(I have a different perspective on decision making that has high stakes under uncertainty and I don’t personally feel sympathetic to this sort of cluelessness perspective as a heuristic in most cases or as a terminal moral view. See also the CLR work on cluelessness. Separately, my intuitions around cluelessness imply that, to the extent I put weight on this, when I’m clueless, I get more worried about unilateralists curse and downside which you don’t seem to put much weight on, though just rounding all kinda-uncertain long run effects to zero isn’t a crazy perspective.)
On the galaxy brained pont: I’m sympathetic to arguments against being too galaxy brained, so I see where you’re coming from there, but from my perspective, I was already responding to an argument which is one galaxy brain level deep.
I think the broader argument about AI takeover being bad from a longtermist perspective is not galaxy brained and the specialization of this argument to your flavor of preference utilitarianism also isn’t galaxy brained: you have some specific moral views (in this case about prefence utilitarianism) and all else equal you’d expect humans to share these moral views more than AIs that end up taking over despite their developers not wanting the AI to take over. So (all else equal) this makes AI takeover look bad, because if beings share your preferences, then more good stuff will happen.
Then you made a somewhat galaxy brained response to this about how you don’t actually care about shared preferences due to preference utilitarianism (because after all, you’re fine with any preferences right?). But, I don’t think this objection holds because there are a number of (somewhat galaxy brained) reasons why specifically optimizing for preference utilitarianism and related things may greatly outperform control by beings with arbitrary preferences.
From my perspective the argument looks sort of like:
Non galaxy brained argument for AI takeover being bad
Somewhat galaxy brained rebuttal by you about preference utilitarianism meaning you don’t actually care about this sort of preference similarity argument case for avoiding nonconsensual AI takeover
My somewhat galaxy brained response, but which is only galaxy brained substantially because it’s responding to a galaxy brained perspective abiut details of the long run future.
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments. But, I think the preference utilitarian argument you’re giving is already quite galaxy brained and sensitive to details of the long run future.
Conditional on no intentional slow down, maybe median 2035 or something? I don’t have a cached 25th percentile for this, but maybe more like 2031.
Hmm, 10k$ is maybe too small size to be worth it, but I might be down to do:
You transfer $50k to me now.
If AIs aren’t able to automate a typical senior individual contributor software engineer / research engineer by 2031 (based on either credible reports about what’s happening inside AI companies or testing of externally deployed systems), I send $75k to you. ($75k = $50k * (1 + 1⁄5) * (1.045^5.5) The 1.045 comes from interest rates.)
More precise operationalization: typical in AI development or some other moderate importance sector where the software engineering doesn’t require vision. Also, the AI needs to be able to automate what this job looked like in 2025 (as this job might evolve over time with AI capabilities to be what AIs can’t do).
I’d like to bet on a milestone that triggers before it’s too late for human intervention if possible, so I’ve picked this research engineer milestone. We’d presuambly have to operationalize further. I’m not sure if I think it’s worth the time to try to operationalize enough that we could do a bet.
I agree that this sort of preference utilitarianism leads you to thinking that long run control by an AI which just wants paperclips could be some (substantial) amount good, but I think you’d still have strong preferences over different worlds.[1] The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities. (And if you agreed with this claim about large variation, then I don’t think you would focus on the fact that the paperclipper world is some small amount good as this wouldn’t be an important consideration—at least insofar as you don’t also expect that worlds where humans etc retain control are similarly a tiny amount good for similar reasons.)
The main reasons preference utilitarianism is more picky:
Preferences in the multiverse: Insofar as you put weight on the preferences of beings outside our lightcone (beings in the broader spatially infinte universe, Everett branches, the broader mathematical multiverse to the extent you put weight on this), then the preferences of these beings will sometimes care about what happens in our lightcone and this could easily dominate (as they are vastly more numerious and many might care about things independent of “distance”). In the world with the successful paperclipper, just as many preferences aren’t being fulfilled. You’d strongly prefer optimization to satisfy as many preferences as possible (weighted as you end up thinking is best).
Instrumentally constructed AIs with unsatisfied preferences: If future AIs don’t care at all about preference utilitarianism, they might instrumentally build other AIs who’s preferences aren’t fulfilled. As an extreme example, it might be that the best strategy for a paper clipper is to construct AIs which have very different preferences and are enslaved. Even if you don’t care about ensuring beings come into existence who’s preference are satisified, you might still be unhappy about creating huge numbers of beings who’s preferences aren’t satisfied. You could even end up in a world where (nearly) all currently existing AIs are instrumental and have preferences which are either unfulfilled or only partially fulfilled (a earlier AI initiated a system that perpetuates this, but this earlier AI no longer exists as it doesn’t care terminally about self-preservation and the system it built is more efficient than it).
AI inequality: It might be the case that the vast majority of AIs have there preferences unsatisfied despite some AIs succeeding at achieving their preference. E.g., suppose all AIs are replicators which want to spawn as many copies as possible. The vast majority of these replicator AI are operating at subsistence and so can’t replicate making their preferences totally unsatisfied. This could also happen as a result of any other preference that involves constructing minds that end up having preferences.
Weights over numbers of beings and how satisfied they are: It’s possible that in a paperclipper world, there are really a tiny number of intelligent beings because almost all self-replication and paperclip construction can be automated with very dumb/weak systems and you only occasionally need to consult something smarter than a honeybee. AIs could also vary in how much they are satisfied or how “big” their preferences are.
I think the only view which recovers indifference is something like “as long as stuff gets used and someone wanted this at some point, that’s just as good”. (This view also doesn’t actually care about stuff getting used, because there is someone existing who’d prefer the universe stays natural and/or you don’t mess with aliens.) I don’t think you buy this view?
To be clear, it’s not immediately obvious whether a preference utilitarian view like the one you’re talking about favors human control over AIs. It certainly favors control by that exact flavor of preference utilitarian view (so that you end up satisfying people across the (multi-/uni-)verse with the correct weighting). I’d guess it favors human control for broadly similar reasons to why I think more experience-focused utilitarian views also favor human control if that view is in a human.
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
(This comment is copied over from LW responding to a copy of Matthew’s comment there.)
On my best guess moral views, I think there is goodness in the paper clipper universe but this goodness (which isn’t from (acausal) trade) is very small relative to how good the universe can plausibly get. So, this just isn’t an important consideration but I certainly agree there is some value here.
Slightly hot take: Longtermist capacity/community building is pretty underdone at current margins and retreats (focused on AI safety, longtermism, or EA) are also underinvested in. By “longtermist community building”, I mean rather than AI safety. I think retreats are generally underinvested in at the moment. I’m also sympathetic to thinking that general undergrad and high school capacity building (AI safety, longtermist, or EA) is underdone, but this seems less clear-cut.
I think this underinvestment is due to a mix of mistakes on the part of Open Philanthropy (and Good Ventures)[1] and capacity building being lower status than it should be.
Here are some reasons why I think this work is good:
It’s very useful for there to be people who are actually trying really hard to do the right thing and they often come through these sorts of mechanisms. Another way to put this is that flexible, impact-obsessed people are very useful.
Retreats make things feel much more real to people and result in people being more agentic and approaching their choices more effectively.
Programs like MATS are good, but they get somewhat different people at a somewhat different part of the funnel, so they don’t (fully) substitute.
A large part of why I’m writing this is to try to make this work higher status and to encourage more of this work. Consider yourself to be encouraged and/or thanked if you’re working in this space or planning to work in this space.
I think these mistakes are: underfunding this work, Good Ventures being unwilling to fund some versions of this work, failing to encourage people to found useful orgs in this space, and hiring out many of the best people in this space to instead do (IMO less impactful) grantmaking.