Oh, interesting, thanks for the link—I didn’t realize this was already an area of research. (I brought up my collusion idea with a couple of CLR researchers before and it seemed new to them, which I guess made me think that the idea wasn’t already being discussed.)
Will Aldred
Perhaps this old comment from Rohin Shah could serve as the standard link?
(Note that it’s on the particular case of recommending people do/don’t work at a given org, rather than the general case of praise/criticism, but I don’t think this changes the structure of the argument other than maybe making point 1 less salient.)
Excerpting the relevant part:
On recommendations: Fwiw I also make unconditional recommendations in private. I don’t think this is unusual, e.g. I think many people make unconditional recommendations not to go into academia (though I don’t).
I don’t really buy that the burden of proof should be much higher in public. Reversing the position, do you think the burden of proof should be very high for anyone to publicly recommend working at lab X? If not, what’s the difference between a recommendation to work at org X vs an anti-recommendation (i.e. recommendation not to work at org X)? I think the three main considerations I’d point to are:
(Pro-recommendations) It’s rare for people to do things (relative to not doing things), so we differentially want recommendations vs anti-recommendations, so that it is easier for orgs to start up and do things.
(Anti-recommendations) There are strong incentives to recommend working at org X (obviously org X itself will do this), but no incentives to make the opposite recommendation (and in fact usually anti-incentives). Similarly I expect that inaccuracies in the case for the not-working recommendation will be pointed out (by org X), whereas inaccuracies in the case for working will not be pointed out. So we differentially want to encourage the opposite recommendations in order to get both sides of the story by lowering our “burden of proof”.
(Pro-recommendations) Recommendations have a nice effect of getting people excited and positive about the work done by the community, which can make people more motivated, whereas the same is not true of anti-recommendations.
Overall I think point 2 feels most important, and so I end up thinking that the burden of proof on critiques / anti-recommendations should be lower than the burden of proof on recommendations—and the burden of proof on recommendations is approximately zero. (E.g. if someone wrote a public post recommending Conjecture without any concrete details of why—just something along the lines of “it’s a great place doing great work”—I don’t think anyone would say that they were using their power irresponsibly.)
I would actually prefer a higher burden of proof on recommendations, but given the status quo if I’m only allowed to affect the burden of proof on anti-recommendations I’d probably want it to go down to ~zero. Certainly I’d want it to be well below the level that this post meets.
Thanks, I found this post helpful, especially the diagram.
What (if any) is the overlap of cooperative AI […] and AI safety?
One thing I’ve thought about a little is the possiblility of there being a tension wherein making AIs more cooperative in certain ways might raise the chance that advanced collusion between AIs breaks an alignment scheme that would otherwise work.[1]
- ^
I’ve not written anything up on this and likely never will; I figure here is as good a place as any to leave a quick comment pointing to the potential problem, appreciating that it’s but a small piece in the overall landscape and probably not the problem of highest priority.
- ^
Long Reflection Reading List
Hard to tell from the information given. Two sources saying an unknown number of people are threatening to resign could just mean that two people are disgruntled and might themselves resign.
Hmm, okay, so it sounds like you’re arguing that even if we measure the curvature of our observable universe to be negative, it could still be the case that the overall universe is positively curved and therefore finite? But surely your argument should be symmetric, such that you should also believe that if we measure the curvature of our observable universe to be positive, it could still be the case that the overall universe is negatively curved and thus infinite?
Thanks for replying, I think I now understand your position a bit better. Okay, so if your concern is around measurements only being finitely precise, then my exactly-zero example is not a great one, because I agree that it’s impossible to measure the universe as being exactly flat.
Maybe a better example: if the universe’s large-scale curvature is either zero or negative, then it necessarily follows that it’s infinite.
—(I didn’t give this example originally because of the somewhat annoying caveats one needs to add. Firstly, in the flat case, that the universe has to be simply connected. And then in the negatively curved case, that our universe isn’t one of the unusual finite types of hyperbolic 3-manifold given by Mostow’s rigidity theorem in pure math. (As far as I’m aware, all cosmologists believe that if the universe is negatively curved, then it’s infinite.))—
I think this new example might address your concern? Because even though measurements are only finitely precise, and contain uncertainty, you can still be ~100% confident that the universe is negatively curved based on measurement. (To be clear, the actual measurements we have at present don’t point to this conclusion. But in theory one could obtain measurements to justify this kind of confidence.)
(For what it’s worth, I personally have high credence in eternal inflation, which posits that there are infinitely many bubble/pocket universes, and that each pocket universe is negatively curved—very slightly—and infinitely large. (The latter on account of details in the equations.))
Hi Vasco, I’m having trouble parsing your comment. For example, if the universe’s large-scale curvature is exactly zero (and the universe is simply connected), then by definition it’s infinite, and I’m confused as to why you think it could still be finite (if this is what you’re saying; apologies if I’m misinterpreting you).
I’m not sure what kind of a background you already have in this domain, but if you’re interested in reading more, I’d recommend first going to the “Shape of the universe” Wikipedia page, and then, depending on your mileage, lectures 10–13 of Alan Guth’s introductory cosmology lecture series.
Evidential Cooperation in Large Worlds: Potential Objections & FAQ
Everett branches, inter-light cone trade and other alien matters: Appendix to “An ECL explainer”
Cooperating with aliens and AGIs: An ECL explainer
I’m confused about why you think forecasting orgs should be trying to acquire commercial clients.[1] How do you see this as being on the necessary path for forecasting initiatives to reduce x-risk, contribute to positive trajectory change, etc.? Perhaps you could elaborate on what you mean by “real-world impact”?
COI note: I work for Metaculus.
- ^
The main exception that comes to mind, for me, is AI labs. But I don’t think you’re talking about AI labs in particular as the commercial clients forecasting orgs should be aiming for?
- ^
What you describe in your first paragraph sounds to me like a good updating strategy, except I would say that you’re not updating your “natural independent opinion,” you’re updating your all-things-considered belief.
Related short posts I recommend—the first explains the distinction I’m pointing at, and the second shows how things can go wrong if people don’t track it:
Fair point, I’ve added a footnote to make this clearer.
AI x-risk is unique because humans would be replaced by other beings, rather than completely dying out. This means you can’t simply apply a naive argument that AI threatens total extinction of value
Paul Christiano wrote a piece a few years ago about ensuring that misaligned ASI is a “good successor” (in the moral value sense),[1] as a plan B to alignment (Medium version; LW version). I agree it’s odd that there hasn’t been more discussion since.[2]
Here’s a non-exhaustive list of guesses for why I think EAs haven’t historically been sympathetic [...]: A belief that AIs won’t be conscious, and therefore won’t have much moral value compared to humans.
I’ve wonderedabout this myself. My take is that this area was overlooked a year ago, but there’s now some good work being done. See Jeff Sebo’s Nov ’23 80k podcast episode, as well as Rob Long’s episode, and the paper that the two of them co-authored at the end of last year: “Moral consideration for AI systems by 2030”. Overall, I’m optimistic about this area becoming a new forefront of EA.
accelerationism would have, at best, temporary effects
I’m confused by this point, and for me this is the overriding crux between my view and yours. Do you really not think accelerationism could have permanent effects, through making AI takeover, or some other irredeemable outcome, more likely?
Are you sure there will ever actually be a “value lock-in event”?
I’m not sure there’ll be a lock-in event, in the way I can’t technically be sure about anything, but such an event seems clearly probable enough that I very much want to avoid taking actions that bring it closer. (Insofar as bringing the event closer raises the chance it goes badly, which I believe to be a likely dynamic. See, for example, the Metaculus question, “How does the level of existential risk posed by AGI depend on its arrival time?”, or discussion of the long reflection.)
- ^
Although, Paul’s argument routes through acausal cooperation—see the piece for details—rather than through the ASI being morally valuable in itself. (And perhaps OP means to focus on the latter issue.) In Paul’s words:
Clarification: Being good vs. wanting good
We should distinguish two properties an AI might have:
- Having preferences whose satisfaction we regard as morally desirable.
- Being a moral patient, e.g. being able to suffer in a morally relevant way.These are not the same. They may be related, but they are related in an extremely complex and subtle way. From the perspective of the long-run future, we mostly care about the first property.
- ^
There was a little discussion a few months ago, here, but none of what was said built on Paul’s article.
- ^
Individually, altruists [...] can make a habit of asking themselves and others what risks they may be overlooking, dismissing, or downplaying.
Institutionally, we can rearrange organizational structures to take these individual tendencies into account, for example by creating positions dedicated to or focused on managing risk.
I’ve been surprised by how this seems to be a bit of a blind spot in our community.[1] I’ve previously written a couple of comments—excerpted below—on this theme, about the state of community building. These garnered a decent number of upvotes, but I don’t think they led to any concrete actions or changes. (For instance, the second comment never received a reply from Open Phil.)
My attempts to raise this concern [about optimizing for numbers/hype at the expense of i) cause prio, ii) addressing particular talent bottlenecks, and iii) mitigating downside risks] with other community builders, including those above me, were mostly dismissed. This worried me. It seemed like the community building machine was not open to the hypothesis that (some of) what it was doing might be ineffective, or, worse, net negative. (More on the latter below.) On top of this, there seemed to be a tricky second-order effect at play: evaporative cooling whereby the community builders who developed concerns like mine exited, only to be replaced by more bullish community builders. The result: a disproportionately bullish community building machine. And there didn’t appear to be any countermeasures in place. For example, there was plenty of funding available if one wanted a paid role doing community building. But, in addition to the social disincentive, there was no funding available for evaluating/critiquing the impact of community building—at least, not that I was aware of.
(link)
There was near-consensus that Open Phil should generously fund promising AI safety community/movement-building projects they come across
Would you be able to say a bit about to what extent members of this working group have engaged with the arguments around AI safety movement-building potentially doing more harm than good? For instance, points 6 through 11 of Oli Habryka’s second message in the “Shutting Down the Lightcone Offices” post (link). If they have strong counterpoints to such arguments, then I imagine it would be valuable for these to be written up.
(link)
- ^
I mean, if one has a high prior on one’s actions being robustly positive, then it makes sense to continue full steam ahead without worrying about risks. (Because there is a tradeoff: spending time considering risks means spending less time acting.) However, I don’t think this level of confidence is warranted for the vast majority of longtermist interventions. For more, see this comment by Linch.
- ^
(Fwiw, the Forum moderation team does this for many of our cases.)
Objection 1: It is unlikely that there will be a point at which a unified agent will be able to take over the world, given the existence of competing AIs with comparable power
For what it’s worth, the Metaculus crowd forecast for the question “Will transformative AI result in a singleton (as opposed to a multipolar world)?” is currently “60%”. That is, forecasters believe it’s more likely than not that there won’t be competing AIs with comparable power, which runs counter to your claim.
(I bring this up seeing as you make a forecasting-based argument for your claim.)
Following on from your saner world illustration, I’d be curious to hear what kind of a call to action you might endorse in our current world.
I personally find your writings on metaphilosophy, and the closely related problem of ensuring AI philosophical competence, persuasive. In other words, I think this area has been overlooked, and that more people should be working in it given the current margin in AI safety work. But I also have a hard time imagining anyone pivoting into this area, at present, given that:[1]
There’s no research agenda with scoped out subproblems (as far as I’m aware), only the overall, wicked problem of trying to get advanced AIs to do philosophy well.
There are no streams within junior research programs, like MATS, to try one’s hand[2] in this area while gaining mentorship.
- ^
A third reason, which I add here as a footnote since it seems far less solvable: Monetary and social incentives are pushing promising people into empirical/ML-based intent alignment work. (To be clear, I believe intent alignment is important. I just don’t think it’s the only problem that deserves attention.) It takes agency—and financial stability, and a disregard for status—to strike out on one’s own and work on something weirder, such as metaphilosophy or other neglected, non-alignment AI topics.
[ETA: Ten days after I posted this comment, Will MacAskill gave an update on his work: he has started looking into neglected, non-alignment AI topics, with a view to perhaps founding a new research institution. I find this encouraging!]
- ^
Pun intended.
For what it’s worth, I endorse @Habryka’s old comment on this issue: