For what it’s worth, I was reminded of Jessica Taylor’s account of collective debugging and psychoses as I read that part of the transcript. (Rather than trying to quote pieces of Jessica’s account, I think it’s probably best that I just link to the whole thing as well as Scott Alexander’s response.)
Will Aldred
‘Five Years After AGI’ Focus Week happening over at Metaculus.
Inspired in part by the EA Forum’s recent debate week, Metaculus is running a “focus week” this week, aimed at trying to make intellectual progress on the issue of “What will the world look like five years after AGI (assuming that humans are not extinct)[1]?”
Leaders of AGI companies, while vocal about some things they anticipate in a post-AGI world (for example, bullishness in AGI making scientific advances), seem deliberately vague about other aspects. For example, power (will AGI companies have a lot of it? all of it?), whether some of the scientific advances might backfire (e.g., a vulnerable world scenario or a race-to-the-bottom digital minds takeoff), and how exactly AGI will be used for “the benefit of all.”
Forecasting questions for the week range from “Percentage living in poverty?” to “Nuclear deterrence undermined?” to “‘Long reflection’ underway?”
Those interested: head over here. You can participate by:
Forecasting
Commenting
Writing questions
There may well be some gaps in the admin-created question set.[4] We welcome question contributions from users.
The focus week will likely be followed by an essay contest, since a large part of the value in this initiative, we believe, lies in generating concrete stories for how the future might play out (and for what the inflection points might be). More details to come.[5]
- ^
This is not to say that we firmly believe extinction won’t happen. I personally put p(doom) at around 60%. At the same time, however, as I have previouslywritten, I believe that more important trajectory changes lie ahead if humanity does manage to avoid extinction, and that it is worth planning for these things now.
- ^
Moreover, I personally take Nuño Sempere’s “Hurdles of using forecasting as a tool for making sense of AI progress” piece seriously, especially the “Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions” part.
With short-term questions on things like geopolitics, I think one should just basically defer to the Community Prediction. Conversely, with certain long-term questions I believe it’s important to interrogate how forecasters are reasoning about the issue at hand before assigning their predictions too much weight. Forecasters can help themselves by writing comments that explain their reasoning.
- ^
In addition, stakeholders we work with, who look at our questions with a view to informing their grantmaking, policymaking, etc., frequently say that they would find more comments valuable in helping bring context to the Community Prediction.
- ^
All blame on me, if so.
- ^
Update: I ended up leaving Metaculus fairly soon after writing this post. I think that means the essay contest is less likely to happen, but I guess stay tuned in case it does.
as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.
How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy [...] Secondly, prioritizing competence.
A third way to potentially mitigate the issue is to simply become more skilled at winning power struggles. Such an approach would be uncooperative, and therefore undesirable in some respects, but on balance, to me, seems worth pursuing to at least some degree.
… I realize that you, OP, have debated a very similar point before (albeit in a non-AI safety thread)—I’m not sure if you have additional thoughts to add to what you said there? (Readers can find that previous debate/exchange here.)
Oh, sorry, I see now that the numberings I used in my second comment don’t map onto how I used them in my first one, which is confusing. My bad.
Your last two paragraphs are very informative to me.
I think digital minds takeoff going well (again, for digital minds and with respect to existential risk) makes it more likely that alignment goes well. [...] In taking alignment going well to be sensitive to how takeoff goes, I am denying that alignment going well is something we should treat as given independently of how takeoff goes.
This is interesting; by my lights this is the right type of argument for justifying AI welfare being a longtermist cause area (which is something that I felt was missing from the debate week). If you have time, I would be keen to hear how you see digital minds takeoff going well as aiding in alignment.[1]
[stuff about nudging AIs away from having certain preferences, etc., being within the AI welfare cause area’s purview, in your view]
Okay, interesting, makes sense.
Thanks a lot for your reply, your points have definitely improved my understanding of AI welfare work!
- ^
One thing I’ve previously been cautiously bullish about as an underdiscussed wildcard is the kinda sci-fi approach of getting to human mind uploading (or maybe just regular whole brain emulation) before prosaic AGI, and then letting the uploaded minds—which could be huge in number and running much faster than wall clock time—solve alignment. However, my Metaculus question on this topic indicates that such a path to alignment is very unlikely.
I’m not sure if the above is anything like what you have in mind? (I realize that human mind uploading is different to the thing of LLMs or other prosaic AI systems gaining consciousness (and/or moral status), and that it’s the latter that is more typically the focus of digital minds work (and the focus of your post, I think). So, on second thoughts, I imagine your model for the relationship between digital minds takeoff and alignment will be something different.)
- ^
I had material developed for other purposes [...] But the material wasn’t optimized for addressing whether AI welfare should be a cause area, and optimizing it for that didn’t strike me as the most productive way for me to engage given my time constraints.
Sounds very reasonable. (Perhaps it might help to add a one-sentence disclaimer at the top of the post, to signpost for readers what the post is vs. is not trying to do? This is a weak suggestion, though.)
I don’t see how buying (1) and (2) undermines the point I was making. If takeoff going well makes the far future go better in expectation for digital minds, it could do so via alignment or via non-default scenarios.
I feel unsure about what you are saying, exactly, especially the last part. I’ll try saying some things in response, and maybe that helps locate the point of disagreement…
(… also feel free to just bow out of this thread if you feel like this is not productive…)
In the case that alignment goes well and there is a long reflection—
i.e., (1) and (2) turn out true—my position is that doing AI welfare work now has no effect on the future, because all AI welfare stuff gets solved in the long reflection. In other words, I think that “takeoff going well makes the far future go better in expectation for digital minds” is an incorrect claim in this scenario. (I’m not sure if you are trying to make this claim.)In the case that alignment goes well but there is no long reflection—
i.e., (1) turns out true but (2) turns out false—my position is that doing AI welfare work now might make the far future go better for digital minds. (And thus in this scenario I think some amount of AI welfare should be done now.[1]) Having said this, in practice, in a world in which(2)whether or not a long reflection happens could go either way, I view trying to set up a long reflection as a higher priority intervention than any one of the things we’d hope to solve in the long reflection, such as AI welfare or acausal trade.In the case that alignment goes poorly, humans either go extinct or are disempowered. In this case, does doing AI welfare work now improve the future at all? I used to think the answer to this was “yes,” because I thought that better understanding sentience could help with designing AIs that avoid creating suffering digital minds.[2] However, I now believe that this basically wouldn’t work, and that something much hackier (and therefore lower cost) would work instead, like simply nudging AIs in their training to have altruistic/anti-sadistic preferences. (This thing of nudging AIs to be anti-sadistic is part of the suffering risk discourse—I believe it’s something that CLR works on or has worked on—and feels outside of what’s covered by the “AI welfare” field.)
- ^
Exactly how much should be done depends on things like how important and tractable digital minds stuff is relative to the other things on the table, like acausal trade, and to what extent the returns to working on each of these things are diminishing, etc.
- ^
Why would an AI create digital minds that suffer? One reason is that the AI could have sadistic preferences. A more plausible reason is that the AI is mostly indifferent about causing suffering, and so does not avoid taking actions that incidentally cause/create suffering. Carl Shulman explored this point in his recent 80k episode:
Rob Wiblin: Maybe a final question is it feels like we have to thread a needle between, on the one hand, AI takeover and domination of our trajectory against our consent — or indeed potentially against our existence — and this other reverse failure mode, where humans have all of the power and AI interests are simply ignored. Is there something interesting about the symmetry between these two plausible ways that we could fail to make the future go well? Or maybe are they just actually conceptually distinct?
Carl Shulman: I don’t know that that quite tracks. One reason being, say there’s an AI takeover, that AI will then be in the same position of being able to create AIs that are convenient to its purposes. So say that the way a rogue AI takeover happens is that you have AIs that develop a habit of keeping in mind reward or reinforcement or reproductive fitness, and then those habits allow them to perform very well in processes of training or selection. Those become the AIs that are developed, enhanced, deployed, then they take over, and now they’re interested in maintaining that favourable reward signal indefinitely.
Then the functional upshot is this is, say, selfishness attached to a particular computer register. And so all the rest of the history of civilisation is dedicated to the purpose of protecting the particular GPUs and server farms that are representing this reward or something of similar nature. And then in the course of that expanding civilisation, it will create whatever AI beings are convenient to that purpose.
So if it’s the case that, say, making AIs that suffer when they fail at their local tasks — so little mining bots in the asteroids that suffer when they miss a speck of dust — if that’s instrumentally convenient, then they may create that, just like humans created factory farming. And similarly, they may do terrible things to other civilisations that they eventually encounter deep in space and whatnot.
And you can talk about the narrowness of a ruling group and say, and how terrible would it be for a few humans, even 10 billion humans, to control the fates of a trillion trillion AIs? It’s a far greater ratio than any human dictator, Genghis Khan. But by the same token, if you have rogue AI, you’re going to have, again, that disproportion.
- ^
> It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority [...]
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable [sic] to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
A couple of reactions:
If digital minds takeoff goes well [...], would we expect a better far-future for digital minds? If so, then I’m inclined to think some considerations in the post are at least indirectly important to digital mind value stuff.
Here’s a position that some people hold:
If there is a long reflection or similar, then far-future AI welfare gets solved.
A long reflection or similar will most likely happen by default, assuming alignment goes well.
For what it’s worth, I buy (1)[1] but I’m not sold on (2), and so overall I’m somewhat sympathetic to your view, Brad. On the other hand, to someone who buys both (1) and (2)—as I think @Zach does—your argument does not go through.
If not, then I’m inclined to think digital mind value stuff we have a clue about how to positively affect is not in the far future.
There is potentially an argument here for AI welfare being a neartermist EA cause area. If you wanted to make a more robust neartermist argument, then one approach could be to estimate the number of digital minds in the takeoff, and the quantity of suffering per digital mind, and then compare the total against animal suffering in factory farms.
In general, I do wish that people like yourself arguing for AI welfare as a cause area were clearer about whether they are making a neartermist or longtermist case. Otherwise, it kind of feels like you are coming from a pet theory-ish position that AI welfare should be a cause, rather than arguing in a cause-neutral way. (This is something I’ve observed on the whole; I’m sorry to pick on your post+comment in particular.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
- ^
assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered)
- ^
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
*From Carl Shulman’s recent 80k interview:
you could imagine a state letting loose this robotic machinery that replicates at a very rapid rate. If it doubles 12 times in a year, you have 4,096 times as much. By the time other powers catch up to that robotic technology, if they were, say, a year or so behind, it could be that there are robots loyal to the first mover that are already on all the asteroids, on the Moon, and whatnot. And unless one tried to forcibly dislodge them, which wouldn’t really work because of the disparity of industrial equipment, then there could be an indefinite and permanent gap in industrial and military equipment.
- ^
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
- ^
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
- ^
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
- ^
This would be very weird: it requires that either the value-setters are very rushed or [...]
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
This would be very weird: it requires that either the value-settlers [...] or that they have lots of time to consult with superintelligent advisors but still make the wrong choice.
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
- ^
the likely U.S. presidential administration for the next four years
- ^
in this world, TAI has been nationalized
- ^
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
- ^
All recent U.S. presidents have been religious, for instance.
- ^
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
- ^
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
- ^
Relatedly, if illusionism is true, then welfare is a fully subjective problem.
- Jul 6, 2024, 3:48 AM; 11 points) 's comment on Digital Minds Takeoff Scenarios by (
- Jul 8, 2024, 11:26 AM; 9 points) 's comment on Making AI Welfare an EA priority requires justifications that have not been given by (
- ^
The closing sentence of this comment, “All in all, bad ideas, advocated by the intellectually weak, appealing mostly to the genetically subpar,” breaks our Forum norm against unnecessary rudeness or offensiveness.
The “genetically subpar” part is especially problematic. At best, it would appear that the commenter, John, is claiming that the post mainly appeals to the less intelligent—an unnecessarily rude and most likely false claim. A worse interpretation is that John is making a racist remark, which we view as strongly unacceptable.
Overall, we see this as an unpromising start to John’s Forum engagement—this is their first comment—and we have issued a one-month ban. If they return to the Forum then we’ll expect to see a higher standard of discourse.
As a reminder, bans affect the user, not the account.
If anyone has questions or concerns, feel free to reach out: if you think we made a mistake here, you can appeal the decision.
I’m not sure if your comment is an attempt to restate with examples some of what’s in the “What deep honesty is not” section, or if it’s you pointing out what you see as blind spots in the post. In case it’s the latter, here are some quotes from the post which cover similar ground:
Deep honesty is not a property of a person that you need to adopt wholesale. It’s something you can do more or less of, at different times, in different domains.
…
But blunt truths can be hurtful. It is often compatible with deep honesty to refrain from sharing things where it seems kinder to do so [...] And it’s of course important, if sharing something that might be difficult to hear, to think about how it can be delivered in a gentle way.
…
If the cashier at the grocery store asks how you’re doing, it’s not deeply honest to give the same answer you’d give to a therapist — it’s just inappropriate.
Thank you for your work there. I’m curious about what made you resign, and also about why you’ve chosen now to communicate that?
(I expect that you are under some form of NDA, and that if you were willing and able to talk about why you resigned then you would have done so in your initial post. Therefore, for readers interested in some possibly related news: last month, Daniel Kokotajlo quit OpenAI’s Futures/Governance team “due to losing confidence that it [OpenAI] would behave responsibly around the time of AGI,” and a Superalignment researcher was forced out of OpenAI in what may have been a political firing (source). OpenAI appears to be losing its most safety-conscious people.)
For what it’s worth, I endorse @Habryka’s old comment on this issue:
Man, this sure is a dicy topic, but I do think it’s pretty likely that Torres has a personality disorder, and that modeling these kinds of things is often important.
A while ago we had a conversation on the forum on whether Elon Musk might be (at least somewhat) autistic. A number of people pushed back on this as ungrounded speculation and as irrelevant in a way that seemed highly confused to me, since like, being autistic has huge effects on how you make decisions and how you relate to the world, and Musk has been a relevant player in many EA-adjacent cause areas for quite a while.
I do think there is some trickiness in talking about this kind of stuff, but talking about someone’s internal mental makeup can often be really important. Indeed, lots of people were saying to me in-person that they were modeling SBF as a sociopath, and implying that they would not feel comfortable giving that description in public, since that’s rude. I think in this case that diagnosis sure would have been really helpful and I think our norms against bringing up this kind of stuff harmed us quite a bit.
To be clear I am not advocating for a culture of psychologizing everyone. I think that’s terrible, and a lot of the worse interactions I’ve had with people external to the community have been people who have tried to dismiss various risks from artificial intelligence through various psychologizing lenses like “these people are power-obsessed, which is why they think an AI will want to dominate everyone”, which… are really not helpful and seem just straightforwardly very wrong to me, while also being very hard to respond to.
I don’t currently have a great proposal for norms for discussing this kind of stuff, especially as an attack (I feel less bad about the Elon autism discussion, since like, Elon identifies at least partially as autistic and I don’t think he would see it as an insult). Seems hard. My current guess is that it must be OK to at some point, after engaging extensively with someone’s object-level arguments, to bring up more psychologizing explanations and intuitions, but that it currently should come pretty late, after the object-level has been responded to and relatively thoroughly explored. I think this is the case with Torres, but not the case with many other people.
Oh, interesting, thanks for the link—I didn’t realize this was already an area of research. (I brought up my collusion idea with a couple of CLR researchers before and it seemed new to them, which I guess made me think that the idea wasn’t already being discussed.)
Perhaps this old comment from Rohin Shah could serve as the standard link?
(Note that it’s on the particular case of recommending people do/don’t work at a given org, rather than the general case of praise/criticism, but I don’t think this changes the structure of the argument other than maybe making point 1 less salient.)
Excerpting the relevant part:
On recommendations: Fwiw I also make unconditional recommendations in private. I don’t think this is unusual, e.g. I think many people make unconditional recommendations not to go into academia (though I don’t).
I don’t really buy that the burden of proof should be much higher in public. Reversing the position, do you think the burden of proof should be very high for anyone to publicly recommend working at lab X? If not, what’s the difference between a recommendation to work at org X vs an anti-recommendation (i.e. recommendation not to work at org X)? I think the three main considerations I’d point to are:
(Pro-recommendations) It’s rare for people to do things (relative to not doing things), so we differentially want recommendations vs anti-recommendations, so that it is easier for orgs to start up and do things.
(Anti-recommendations) There are strong incentives to recommend working at org X (obviously org X itself will do this), but no incentives to make the opposite recommendation (and in fact usually anti-incentives). Similarly I expect that inaccuracies in the case for the not-working recommendation will be pointed out (by org X), whereas inaccuracies in the case for working will not be pointed out. So we differentially want to encourage the opposite recommendations in order to get both sides of the story by lowering our “burden of proof”.
(Pro-recommendations) Recommendations have a nice effect of getting people excited and positive about the work done by the community, which can make people more motivated, whereas the same is not true of anti-recommendations.
Overall I think point 2 feels most important, and so I end up thinking that the burden of proof on critiques / anti-recommendations should be lower than the burden of proof on recommendations—and the burden of proof on recommendations is approximately zero. (E.g. if someone wrote a public post recommending Conjecture without any concrete details of why—just something along the lines of “it’s a great place doing great work”—I don’t think anyone would say that they were using their power irresponsibly.)
I would actually prefer a higher burden of proof on recommendations, but given the status quo if I’m only allowed to affect the burden of proof on anti-recommendations I’d probably want it to go down to ~zero. Certainly I’d want it to be well below the level that this post meets.
Thanks, I found this post helpful, especially the diagram.
What (if any) is the overlap of cooperative AI […] and AI safety?
One thing I’ve thought about a little is the possiblility of there being a tension wherein making AIs more cooperative in certain ways might raise the chance that advanced collusion between AIs breaks an alignment scheme that would otherwise work.[1]
- ^
I’ve not written anything up on this and likely never will; I figure here is as good a place as any to leave a quick comment pointing to the potential problem, appreciating that it’s but a small piece in the overall landscape and probably not the problem of highest priority.
- ^
Hard to tell from the information given. Two sources saying an unknown number of people are threatening to resign could just mean that two people are disgruntled and might themselves resign.
Hmm, okay, so it sounds like you’re arguing that even if we measure the curvature of our observable universe to be negative, it could still be the case that the overall universe is positively curved and therefore finite? But surely your argument should be symmetric, such that you should also believe that if we measure the curvature of our observable universe to be positive, it could still be the case that the overall universe is negatively curved and thus infinite?
Thanks for replying, I think I now understand your position a bit better. Okay, so if your concern is around measurements only being finitely precise, then my exactly-zero example is not a great one, because I agree that it’s impossible to measure the universe as being exactly flat.
Maybe a better example: if the universe’s large-scale curvature is either zero or negative, then it necessarily follows that it’s infinite.
—(I didn’t give this example originally because of the somewhat annoying caveats one needs to add. Firstly, in the flat case, that the universe has to be simply connected. And then in the negatively curved case, that our universe isn’t one of the unusual, finite types of hyperbolic 3-manifold given by Mostow’s rigidity theorem in pure math. (As far as I’m aware, all cosmologists believe that if the universe is negatively curved, then it’s infinite.))—
I think this new example might address your concern? Because even though measurements are only finitely precise, and contain uncertainty, you can still be ~100% confident that the universe is negatively curved based on measurement. (To be clear, the actual measurements we have at present don’t point to this conclusion. But in theory one could obtain measurements to justify this kind of confidence.)
(For what it’s worth, I personally have high credence in eternal inflation, which posits that there are infinitely many bubble/pocket universes, and that each pocket universe is negatively curved—very slightly—and infinitely large. (The latter on account of details in the equations.))
Thank you for doing this work!
I’ve not yet read the full report—only this post—and so I may well be missing something, but I have to say that I am surprised at Figure E.1:
If I understand correctly, the figure says that experts think extinction is more than twice as likely if there is a warning shot compared to if there is not.
I accept that a warning shot happening probably implies that we are in a world in which AI is more dangerous, which, by itself, implies higher x-risk.[1] On the other hand, a warning shot could galvanize AI leaders, policymakers, the general public, etc., into taking AI x-risk much more seriously, such that the overall effect of a warning shot is to actually reduce x-risk.
I personally think it’s very non-obvious how these two opposing effects weigh up against each other, and so I’m interested in why the experts in this study are so confident that a warning shot increases x-risk. (Perhaps they expect the galvanizing effect will be small? Perhaps they did not consider the galvanizing effect? Perhaps there are other effects they considered that I’m missing?)
Though I believe the effect here is muddied by ‘treacherous turn’ considerations / the argument that the most dangerous AIs will probably be good at avoiding giving off warning shots.