I think your argument could go through if the person being praised was Holden, or Will MacAskill, or some other big name in EA. However, Michael seems pretty under the radar given the size of his contributions, so I don’t think your concerns check out in this case (and in fact this case might even align with your point about recognizing unsung contributors).
Will Aldred
+1! I especially appreciate how Michael often writes very detailed responses as part of extended back-and-forth-type comment threads. These contributions aren’t rewarded so much karma-wise, but I think they’re extremely valuable: I personally owe a good deal of what I understand about welfare, moral uncertainty, infinite ethics and theories of consciousness to comments I’ve read by Michael.
Animal welfare getting so little[1] EA funding, at present, relative to global health, seems to be an artefact of Open Phil’s ‘worldview diversification,’ which imo is a lacklustre framework for decision-making, both in theory and (especially) in practice: see, e.g., Sempere (2022).
Cost-effectiveness analyses I’ve seen indicate that animal welfare interventions, like cage-free campaigns, are really excellent uses of money—orders of magnitude more effective than leading global health interventions.
Though not central to my argument, there’s also the meat-eater problem, which I think is under-discussed.
- Oct 10, 2024, 12:04 AM; 26 points) 's comment on PeterSlattery’s Quick takes by (
Responding to the bottom part of your second footnote:
To me, it seems pretty important for Forum Team members (especially the interim lead!) to be communicating with Forum users. I therefore think it’s a mistake for you to assign zero value to your posts and comments, relative to your other work.
How much value to assign to one of your posts or comments? I would crudely model this as:
(Size of your post’s or comment’s contribution)/(Size of all Forum contributions in a year) x (Forum’s total value per year)
You’ll have better-informed figures/estimates than I do, but I’d guess that the size of all Forum contributions, measured in karma,[1] in a year, is around 100,000, and that the value of the Forum per year is around $10M.[2] A thoughtful comment might get 10 karma, on average, and a thoughtful post might get 50.
I’d therefore roughly value a comment from you at (10 karma)/(100,000 karma) x $10M = $1000, and a post from you at $5000.
(My model may well be off in some way; I invite readers to improve on it.)
- ^
I don’t take karma to be a perfect measure of value by any means—see, e.g., ‘Karma overrates some topics’—but I think it’s a reasonable-enough measure for carrying out this BOTEC.
- ^
Why $10M? Mostly I’m working off the value of an ‘EA project’ as estimated in cell C4 of this spreadsheet by Nuño. (This was the accompanying post.)
- ^
+1. While I applaud the authors for doing this work at all, and share their hopes regarding automated forecasting, by my lights the opening paragraphs massively overstate their bot’s ability.
The moderators have reviewed the decision to ban @dstudioscode after users appealed the decision. Tl;dr: We are revoking the ban, and are instead rate-limiting dstudioscode and warning them to avoid posting content that could be perceived as advocating for major harm or illegal activities. The rate limit is due to dstudiocode’s pattern of engagement on the Forum, not simply because of their most recent post—for more on this, see the “third consideration” listed below.
More details:
Three moderators,[1] none of whom was involved in the original decision to ban dstudiocode, discussed this case.
The first consideration was “Does the cited norm make sense?” For reference, the norm cited in the original ban decision was “Materials advocating major harm or illegal activities, or materials that may be easily perceived as such” (under “What we discourage (and may delete or edit out)” in our “Guide to norms on the Forum”). The panel of three unanimously agreed that having some kind of Forum norm in this vein makes sense.
The second consideration was “Does the post that triggered the ban actually break the cited norm?” For reference, the post ended with the question “should murdering a meat eater be considered ‘ethical’?” (Since the post was rejected by moderators, users cannot see it.[2] We regret the confusion caused by us not making this point clearer in the original ban message.)
There was disagreement amongst the moderators involved in the appeal process about whether or not the given post breaks the norm cited above. I personally think that the post is acceptable since it does not constitute a call to action. The other two moderators see the post as breaking the norm; they see the fact that it is “just” a philosophical question as not changing the assessment.[3] (Note: The “meat-eater problem” has been discussed elsewhere on the Forum. Unlike the post in question, in the eyes of the given two moderators, these posts did not break the “advocating for major harm or illegal activities” norm because they framed the question as about whether to donate to save the life of a meat-eating person, rather than as about actively murdering people.)
Amongst the two appeals-panelist moderators who see the post as norm-breaking, there was disagreement about whether the correct response would be a temporary ban or just a warning.
The third consideration was around dstudiocode’s other actions and general standing on the Forum. dstudiocode currently sits at −38 karma following 8 posts and 30 comments. This indicates that their contributions to the discourse have generally not been helpful.[4] Accordingly, all three moderators agreed that we should be more willing to (temporarily) ban dstudiocode for a potential norm violation.
dstudiocode has also tried posting very similar, low-quality (by our lights) content multiple times. The post that triggered the ban was similar to, though more “intense” than, this other post of theirs from five months ago. Additionally, they tried posting similar content through an alt account just before their ban. When a Forum team member asked them about their alt, they appeared to lie.[5] All three moderators agreed that this repeated posting of very similar, low-quality content warrants at least a rate limit (i.e., a cap on how much the user in question can post or comment).[6] (For context, eight months ago, dstudiocode published five posts in an eight-day span, all of which were low quality, in our view. We would like to avoid a repeat of that situation: a rate limit or a ban are the tools we could employ to this end.) Lying about their alt also makes us worried that the user is trying to skirt the rules.
Overall, the appeals panel is revoking dstudiocode’s ban, and is replacing the ban with a warning (instructing them to avoid posting content that could be perceived as advocating for major harm of illegal activities) and a rate limit. dstudiocode will be limited to at most one comment every three days and one post per week for the next three weeks—i.e., until when their original ban would have ended. Moderators will be keeping an eye on their posting, and will remove their posting rights entirely if they continue to publish content that we consider sufficiently low quality or norm-bending.
We would like to thank @richard_ngo and @Neel Nanda for appealing the original decision, as well as @Jason and @dirk for contributing to the discussion. We apologize that the original ban notice was rushed, and failed to lay out all the factors that went into the decision.[7] (Reasoning along the lines of the “third consideration” given above went into the original decision, but we failed to communicate that.)
If anyone has questions or concerns about how we have handled the appeals process, feel free to comment below or reach out.
- ^
Technically, two moderators and one moderation advisor. (I write “three moderators” in the main text because that makes referring to them, as I do throughout the text, less cumbersome.)
- ^
The three of us discussed whether or not to quote the full version of the post that triggered the ban in this moderator comment, to allow users to see exactly what is being ruled on. By split decision (with me as the dissenting minority), we have decided not to do so: in general, we will probably avoid republishing content that is objectionable enough to get taken down in the first place.
- ^
I’m not certain, but my guess is that the disagreement here is related to the high vs. low decoupling spectrum (where high decouplers, like myself, are fine with entertaining philosophical questions like these, whereas low decouplers tend to see such questions as crossing a line).
- ^
We don’t see karma as a perfect measure of a user’s value by any means, but we do consider a user’s total karma being negative to be a strong signal that something is awry.
Looking through dstudiocode’s post and comment history, I do think that they are trying to engage in good faith (as opposed to being a troll, say). However, the EA Forum exists for a particular purpose, and has particular standards in place to serve that purpose, and this means that the Forum is not necessarily a good place for everyone who is trying to contribute. (For what it’s worth, I feel a missing mood in writing this.)
- ^
In response to our request that they stop publishing similar content from multiple accounts, they said: “Posted from multiple accounts? I feel it is possible that the same post may have been created because maybe the topic is popular?” However, we are >99% confident, based on our usual checks for multiple account use, that the other account that tried to publish this similar content is an alt controlled by them. (They did subsequently stop trying to publish from other accounts.)
- ^
We do not have an official policy on rate limits, at present, although we have used rate limits on occasion. We aim to improve our process here. In short, rate limits may be a more appropriate intervention than bans are for users who aren’t clearly breaking norms, but who are nonetheless posting low-quality content or repeatedly testing the edges of the norms.
- ^
Notwithstanding the notice we published, which was a mistake, I am not sure if the ban decision itself was a mistake. It turns out that different moderators have different views on the post in question, and I think the difference between the original decision to ban and the present decision to instead warn and rate limit can mostly be chalked up to reasonable disagreement between different moderators. (We are choosing to override the original decision since we spent significantly longer on the review, and we therefore have more confidence in the review decision being “correct”. We put substantial effort into the review because established users, in their appeal, made some points that we felt deserved to be taken seriously. However, this level of effort would not be tenable for most “regular” moderation calls—i.e., those involving unestablished or not-in-great-standing users, like dstudiocode—given the tradeoffs we face.)
- ^
For operations roles, and focusing on impact (rather than status), I notice that your view contrasts markedly with @abrahamrowe’s in his recent ‘Reflections on a decade of trying to have an impact’ post:
Impact Through Operations
I don’t really think my ops work is particularly impactful, because I think ops staff are relatively easy to hire for compared to other roles. However I have spent a lot of my time in EA doing ops work.
I was RP’s COO for 4 years, overseeing its non-research work (fiscal sponsorship, finance, HR, communications, fundraising, etc), and helping the organization grow from around 10 to over 100 staff within its legal umbrella.
Worked on several advising and consulting projects for animal welfare and AI organizations
I think the advising work is likely the most impactful ops work I’ve done, though I overall don’t know if I think ops is particularly impactful.
I see both Abraham and yourself as strong thinkers with expertise in this area, which makes me curious about the apparent disagreement. Meanwhile, the ‘correct’ answer to the question of an ops role’s impact relative to that of a research role should presumably inform many EAs’ career decisions, which makes the disagreement here pretty consequential. I wonder if getting to the ground truth of the matter is tractable? (I’m not sure how best to operationalize the disagreement / one’s starting point on the matter, but maybe something like “On the current margin, I believe that the ratio of early-career EAs aiming for operations vs. research roles should be [number]:1.”)
(I understand that you and Abraham overlapped for multiple years at the same org—Rethink Priorities—which makes me all the more curious about how you appear to have reached fairly opposite conclusions.)
Thank you for doing this work!
I’ve not yet read the full report—only this post—and so I may well be missing something, but I have to say that I am surprised at Figure E.1:
If I understand correctly, the figure says that experts think extinction is more than twice as likely if there is a warning shot compared to if there is not.
I accept that a warning shot happening probably implies that we are in a world in which AI is more dangerous, which, by itself, implies higher x-risk.[1] On the other hand, a warning shot could galvanize AI leaders, policymakers, the general public, etc., into taking AI x-risk much more seriously, such that the overall effect of a warning shot is to actually reduce x-risk.
I personally think it’s very non-obvious how these two opposing effects weigh up against each other, and so I’m interested in why the experts in this study are so confident that a warning shot increases x-risk. (Perhaps they expect the galvanizing effect will be small? Perhaps they did not consider the galvanizing effect? Perhaps there are other effects they considered that I’m missing?)
- ^
Though I believe the effect here is muddied by ‘treacherous turn’ considerations / the argument that the most dangerous AIs will probably be good at avoiding giving off warning shots.
- ^
For what it’s worth, I was reminded of Jessica Taylor’s account of collective debugging and psychoses as I read that part of the transcript. (Rather than trying to quote pieces of Jessica’s account, I think it’s probably best that I just link to the whole thing as well as Scott Alexander’s response.)
‘Five Years After AGI’ Focus Week happening over at Metaculus.
Inspired in part by the EA Forum’s recent debate week, Metaculus is running a “focus week” this week, aimed at trying to make intellectual progress on the issue of “What will the world look like five years after AGI (assuming that humans are not extinct)[1]?”
Leaders of AGI companies, while vocal about some things they anticipate in a post-AGI world (for example, bullishness in AGI making scientific advances), seem deliberately vague about other aspects. For example, power (will AGI companies have a lot of it? all of it?), whether some of the scientific advances might backfire (e.g., a vulnerable world scenario or a race-to-the-bottom digital minds takeoff), and how exactly AGI will be used for “the benefit of all.”
Forecasting questions for the week range from “Percentage living in poverty?” to “Nuclear deterrence undermined?” to “‘Long reflection’ underway?”
Those interested: head over here. You can participate by:
Forecasting
Commenting
Writing questions
There may well be some gaps in the admin-created question set.[4] We welcome question contributions from users.
The focus week will likely be followed by an essay contest, since a large part of the value in this initiative, we believe, lies in generating concrete stories for how the future might play out (and for what the inflection points might be). More details to come.[5]
- ^
This is not to say that we firmly believe extinction won’t happen. I personally put p(doom) at around 60%. At the same time, however, as I have previouslywritten, I believe that more important trajectory changes lie ahead if humanity does manage to avoid extinction, and that it is worth planning for these things now.
- ^
Moreover, I personally take Nuño Sempere’s “Hurdles of using forecasting as a tool for making sense of AI progress” piece seriously, especially the “Excellent forecasters and Superforecasters™ have an imperfect fit for long-term questions” part.
With short-term questions on things like geopolitics, I think one should just basically defer to the Community Prediction. Conversely, with certain long-term questions I believe it’s important to interrogate how forecasters are reasoning about the issue at hand before assigning their predictions too much weight. Forecasters can help themselves by writing comments that explain their reasoning.
- ^
In addition, stakeholders we work with, who look at our questions with a view to informing their grantmaking, policymaking, etc., frequently say that they would find more comments valuable in helping bring context to the Community Prediction.
- ^
All blame on me, if so.
- ^
Update: I ended up leaving Metaculus fairly soon after writing this post. I think that means the essay contest is less likely to happen, but I guess stay tuned in case it does.
as power struggles become larger-scale, more people who are extremely good at winning them will become involved. That makes AI safety strategies which require power-seeking more difficult to carry out successfully.
How can we mitigate this issue? Two things come to mind. Firstly, focusing more on legitimacy [...] Secondly, prioritizing competence.
A third way to potentially mitigate the issue is to simply become more skilled at winning power struggles. Such an approach would be uncooperative, and therefore undesirable in some respects, but on balance, to me, seems worth pursuing to at least some degree.
… I realize that you, OP, have debated a very similar point before (albeit in a non-AI safety thread)—I’m not sure if you have additional thoughts to add to what you said there? (Readers can find that previous debate/exchange here.)
Oh, sorry, I see now that the numberings I used in my second comment don’t map onto how I used them in my first one, which is confusing. My bad.
Your last two paragraphs are very informative to me.
I think digital minds takeoff going well (again, for digital minds and with respect to existential risk) makes it more likely that alignment goes well. [...] In taking alignment going well to be sensitive to how takeoff goes, I am denying that alignment going well is something we should treat as given independently of how takeoff goes.
This is interesting; by my lights this is the right type of argument for justifying AI welfare being a longtermist cause area (which is something that I felt was missing from the debate week). If you have time, I would be keen to hear how you see digital minds takeoff going well as aiding in alignment.[1]
[stuff about nudging AIs away from having certain preferences, etc., being within the AI welfare cause area’s purview, in your view]
Okay, interesting, makes sense.
Thanks a lot for your reply, your points have definitely improved my understanding of AI welfare work!
- ^
One thing I’ve previously been cautiously bullish about as an underdiscussed wildcard is the kinda sci-fi approach of getting to human mind uploading (or maybe just regular whole brain emulation) before prosaic AGI, and then letting the uploaded minds—which could be huge in number and running much faster than wall clock time—solve alignment. However, my Metaculus question on this topic indicates that such a path to alignment is very unlikely.
I’m not sure if the above is anything like what you have in mind? (I realize that human mind uploading is different to the thing of LLMs or other prosaic AI systems gaining consciousness (and/or moral status), and that it’s the latter that is more typically the focus of digital minds work (and the focus of your post, I think). So, on second thoughts, I imagine your model for the relationship between digital minds takeoff and alignment will be something different.)
- ^
I had material developed for other purposes [...] But the material wasn’t optimized for addressing whether AI welfare should be a cause area, and optimizing it for that didn’t strike me as the most productive way for me to engage given my time constraints.
Sounds very reasonable. (Perhaps it might help to add a one-sentence disclaimer at the top of the post, to signpost for readers what the post is vs. is not trying to do? This is a weak suggestion, though.)
I don’t see how buying (1) and (2) undermines the point I was making. If takeoff going well makes the far future go better in expectation for digital minds, it could do so via alignment or via non-default scenarios.
I feel unsure about what you are saying, exactly, especially the last part. I’ll try saying some things in response, and maybe that helps locate the point of disagreement…
(… also feel free to just bow out of this thread if you feel like this is not productive…)
In the case that alignment goes well and there is a long reflection—
i.e., (1) and (2) turn out true—my position is that doing AI welfare work now has no effect on the future, because all AI welfare stuff gets solved in the long reflection. In other words, I think that “takeoff going well makes the far future go better in expectation for digital minds” is an incorrect claim in this scenario. (I’m not sure if you are trying to make this claim.)In the case that alignment goes well but there is no long reflection—
i.e., (1) turns out true but (2) turns out false—my position is that doing AI welfare work now might make the far future go better for digital minds. (And thus in this scenario I think some amount of AI welfare should be done now.[1]) Having said this, in practice, in a world in which(2)whether or not a long reflection happens could go either way, I view trying to set up a long reflection as a higher priority intervention than any one of the things we’d hope to solve in the long reflection, such as AI welfare or acausal trade.In the case that alignment goes poorly, humans either go extinct or are disempowered. In this case, does doing AI welfare work now improve the future at all? I used to think the answer to this was “yes,” because I thought that better understanding sentience could help with designing AIs that avoid creating suffering digital minds.[2] However, I now believe that this basically wouldn’t work, and that something much hackier (and therefore lower cost) would work instead, like simply nudging AIs in their training to have altruistic/anti-sadistic preferences. (This thing of nudging AIs to be anti-sadistic is part of the suffering risk discourse—I believe it’s something that CLR works on or has worked on—and feels outside of what’s covered by the “AI welfare” field.)
- ^
Exactly how much should be done depends on things like how important and tractable digital minds stuff is relative to the other things on the table, like acausal trade, and to what extent the returns to working on each of these things are diminishing, etc.
- ^
Why would an AI create digital minds that suffer? One reason is that the AI could have sadistic preferences. A more plausible reason is that the AI is mostly indifferent about causing suffering, and so does not avoid taking actions that incidentally cause/create suffering. Carl Shulman explored this point in his recent 80k episode:
Rob Wiblin: Maybe a final question is it feels like we have to thread a needle between, on the one hand, AI takeover and domination of our trajectory against our consent — or indeed potentially against our existence — and this other reverse failure mode, where humans have all of the power and AI interests are simply ignored. Is there something interesting about the symmetry between these two plausible ways that we could fail to make the future go well? Or maybe are they just actually conceptually distinct?
Carl Shulman: I don’t know that that quite tracks. One reason being, say there’s an AI takeover, that AI will then be in the same position of being able to create AIs that are convenient to its purposes. So say that the way a rogue AI takeover happens is that you have AIs that develop a habit of keeping in mind reward or reinforcement or reproductive fitness, and then those habits allow them to perform very well in processes of training or selection. Those become the AIs that are developed, enhanced, deployed, then they take over, and now they’re interested in maintaining that favourable reward signal indefinitely.
Then the functional upshot is this is, say, selfishness attached to a particular computer register. And so all the rest of the history of civilisation is dedicated to the purpose of protecting the particular GPUs and server farms that are representing this reward or something of similar nature. And then in the course of that expanding civilisation, it will create whatever AI beings are convenient to that purpose.
So if it’s the case that, say, making AIs that suffer when they fail at their local tasks — so little mining bots in the asteroids that suffer when they miss a speck of dust — if that’s instrumentally convenient, then they may create that, just like humans created factory farming. And similarly, they may do terrible things to other civilisations that they eventually encounter deep in space and whatnot.
And you can talk about the narrowness of a ruling group and say, and how terrible would it be for a few humans, even 10 billion humans, to control the fates of a trillion trillion AIs? It’s a far greater ratio than any human dictator, Genghis Khan. But by the same token, if you have rogue AI, you’re going to have, again, that disproportion.
- ^
> It is not good enough to simply say that an issue might have a large scale impact and therefore think it should be an EA priority [...]
I think that this is wrong. The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously and provides prima facie evidence that it should be a priority. I think it is vastly preferrable [sic] to preempt problems before they occur rather than try to fix them once they have. For one, AI welfare is a very complicated topic that will take years or decades to sort out. AI persons (or things that look like AI persons) could easily be here in the next decade. If we don’t start thinking about it soon, then we may be years behind when it happens.
I feel like you are talking past the critique. For an intervention to be a longtermist priority, there needs to be some kind of story for how it improves the long-term future. Sure, AI welfare may be a large-scaled problem which takes decades to sort out (if tackled by unaided humans), but that alone does not mean it should be worked on presently. Your points here do not engage with the argument, made by @Zach Stein-Perlman early on in the week, that we can just punt solving AI welfare to the future (i.e., to the long reflection / to once we have aligned superintelligent advisors), and in the meantime continue focusing our resources on AI safety (i.e., on raising the probability that we make it to a long reflection).
(There is an argument going in the opposite direction that a long reflection might not happen following alignment success, and so doing AI welfare work now might indeed make a difference to what gets locked in for the long-term. I am somewhat sympathetic to this argument, as I wrote here, but I still don’t think it delivers a knockdown case for making AI welfare work a priority.)
Likewise, for an intervention to be a neartermist priority, there has to be some kind of quantitative estimate demonstrating that it is competitive—or will soon be competitive, if nothing is done—in terms of suffering prevented per dollar spent, or similar, with the current neartermist priorities. Factory farming seems like the obvious thing to compare AI welfare against. I’ve been surprised by how nobody has tried coming up with such an estimate this week, however rough. (Note: I’m not sure if you are trying to argue that AI welfare should be both a neartermist and longtermist priority, as some have.)
(Note also: I’m unsure how much of our disagreement is simply because of the “should be a priority” wording. I agree with JWS’s current “It is not good enough…” statement, but would think it wrong if the “should” were replaced with “could.” Similarly, I agree with you as far as: “The fact that something might have a huge scale and we might be able to do something about it is enough for it to be taken seriously.”)
[ETA: On a second read, this comment of mine seems a bit more combative than I intended—sorry about that.]
A couple of reactions:
If digital minds takeoff goes well [...], would we expect a better far-future for digital minds? If so, then I’m inclined to think some considerations in the post are at least indirectly important to digital mind value stuff.
Here’s a position that some people hold:
If there is a long reflection or similar, then far-future AI welfare gets solved.
A long reflection or similar will most likely happen by default, assuming alignment goes well.
For what it’s worth, I buy (1)[1] but I’m not sold on (2), and so overall I’m somewhat sympathetic to your view, Brad. On the other hand, to someone who buys both (1) and (2)—as I think @Zach does—your argument does not go through.
If not, then I’m inclined to think digital mind value stuff we have a clue about how to positively affect is not in the far future.
There is potentially an argument here for AI welfare being a neartermist EA cause area. If you wanted to make a more robust neartermist argument, then one approach could be to estimate the number of digital minds in the takeoff, and the quantity of suffering per digital mind, and then compare the total against animal suffering in factory farms.
In general, I do wish that people like yourself arguing for AI welfare as a cause area were clearer about whether they are making a neartermist or longtermist case. Otherwise, it kind of feels like you are coming from a pet theory-ish position that AI welfare should be a cause, rather than arguing in a cause-neutral way. (This is something I’ve observed on the whole; I’m sorry to pick on your post+comment in particular.)
Yeah, I agree that it’s unclear how things get locked in in this scenario. However, my best guess is that solving the technological problem of designing and building probes that travel as fast as allowed by physics—i.e., just shy of light speed[1]—takes less time than solving the philosophical problem of what to do with the cosmos.
If one is in a race, then one is forced into launching probes as soon as one has solved the technological problem of fast-as-physically-possible probes (because delaying means losing the race),[2] and so in my best guess the probes launched will be loaded with values that one likely wouldn’t endorse if one had more time to reflect.[3]
Additionally, if one is in a race to build fast-as-physically-possible probes, then one is presumably putting most of one’s compute toward winning that race, leaving one with little compute for solving the problem of what values to load the probes with.[4]
Overall, I feel pretty pessimistic about a multipolar scenario going well,[5] but I’m not confident.
- ^
assuming that new physics permitting faster-than-light travel is ruled out (or otherwise not discovered)
- ^
There’s some nuance here: maybe one has a lead and can afford some delay. Also, the prize is continuous rather than discrete—that is, one still gets some of the cosmos if one launches late (although on account of how the probes reproduce exponentially, one does lose out big time by being second)*.
*From Carl Shulman’s recent 80k interview:
you could imagine a state letting loose this robotic machinery that replicates at a very rapid rate. If it doubles 12 times in a year, you have 4,096 times as much. By the time other powers catch up to that robotic technology, if they were, say, a year or so behind, it could be that there are robots loyal to the first mover that are already on all the asteroids, on the Moon, and whatnot. And unless one tried to forcibly dislodge them, which wouldn’t really work because of the disparity of industrial equipment, then there could be an indefinite and permanent gap in industrial and military equipment.
- ^
It’s very unclear to me how large this discrepancy is likely to be. Are the loaded values totally wrong according to one’s idealized self? Or are they basically right, such that the future is almost ideal?
- ^
There’s again some nuance here, like maybe one believes that the set of world-states/matter-configurations that would score well according to one’s idealized values is very narrow. In this case, the EV calculation could indicate that it’s better to take one’s time even if this means losing almost all of the cosmos, since a single probe loaded with one’s idealized values is worth more to one than a trillion probes loaded with the values one would land on through a rushed reflective process.
There are also decision theory considerations/wildcards, like maybe the parties racing are mostly AI-led rather than human-led (in a way in which the humans are still empowered, somehow), and the AIs—being very advanced, at this point—coordinate in an FDT-ish fashion and don’t in fact race.
- ^
On top of race dynamics resulting in suboptimal values being locked in, as I’ve focused on above, I’m worried about very bad, s-risky stuff like threats and conflict, as discussed in this research agenda from CLR.
- ^
This would be very weird: it requires that either the value-setters are very rushed or [...]
As an intuition pump: if the Trump administration,[1] or a coalition of governments led by the U.S., is faced all of a sudden—on account of intelligence explosion[2] plus alignment going well—with deciding what to do with the cosmos, will they proceed thoughtfully or kind of in a rush? I very much hope the answer is “thoughtfully,” but I would not bet[3] that way.
What about if we end up in a multipolar scenario, as forecasters think is about 50% likely? In this case, I think rushing is the default?
Pausing for a long reflection may be the obvious path to you or me or EAs in general if suddenly in charge of an aligned ASI singleton, but the way we think is very strange compared to most people in the world.[4] I expect that without a good deal of nudging/convincing, the folks calling the shots will not opt for such reflection.[5]
(Note that I don’t consider this a knockdown argument for putting resources towards AI welfare in particular: I only voted slightly in the direction of “agree” for this debate week. I do, however, think that many more EA resources should be going towards ASI governance / setting up a long reflection, as I have written before.)
This would be very weird: it requires that either the value-settlers [...] or that they have lots of time to consult with superintelligent advisors but still make the wrong choice.
One thread here that feels relevant: I don’t think it’s at all obvious that superintelligent advisors will be philosophically competent.[6] Wei Dai has written a series of posts on this topic (which I collected here); this is an open area of inquiry that serious thinkers in our sphere are funding. In my model, this thread links up with AI welfare since welfare is in part an empirical problem, which superintelligent advisors will be great at helping with, but also in part a problem of values and philosophy.[7]
- ^
the likely U.S. presidential administration for the next four years
- ^
in this world, TAI has been nationalized
- ^
I apologize to Nuño, who will receive an alert, for not using “bet” in the strictly correct way.
- ^
All recent U.S. presidents have been religious, for instance.
- ^
My mainline prediction is that decision makers will put some thought towards things like AI welfare—in fact, by normal standards they’ll put quite a lot of thought towards these things—but they will fall short of the extreme thoughtfulness that a scope-sensitive assessment of the stakes calls for. (This prediction is partly informed by someone I know who’s close to national security, and who has been testing the waters there to gauge the level of openness towards something like a long reflection.)
- ^
One might argue that this is a contradictory statement, since the most common definition of superintelligence is an AI system (or set of systems) that’s better than the best human experts in all domains. So, really, what I’m saying is that I believe it’s very possible we end up in a situation in which we think we have superintelligence—and the AI we have sure is superhuman at many/most/almost-all things—but, importantly, philosophy is its Achilles heel.
(To be clear, I don’t believe there’s anything special about biological human brains that makes us uniquely suited to philosophy; I don’t believe that philosophically competent AIs are precluded from the space of all possible AIs. Nonetheless, I do think there’s a substantial chance that the “aligned” “superintelligence” we build in practice lacks philosophical competence, to catastrophic effect. (For more, see Wei Dai’s posts.))
- ^
Relatedly, if illusionism is true, then welfare is a fully subjective problem.
- Jul 6, 2024, 3:48 AM; 11 points) 's comment on Digital Minds Takeoff Scenarios by (
- Jul 8, 2024, 11:26 AM; 9 points) 's comment on Making AI Welfare an EA priority requires justifications that have not been given by (
- ^
The closing sentence of this comment, “All in all, bad ideas, advocated by the intellectually weak, appealing mostly to the genetically subpar,” breaks our Forum norm against unnecessary rudeness or offensiveness.
The “genetically subpar” part is especially problematic. At best, it would appear that the commenter, John, is claiming that the post mainly appeals to the less intelligent—an unnecessarily rude and most likely false claim. A worse interpretation is that John is making a racist remark, which we view as strongly unacceptable.
Overall, we see this as an unpromising start to John’s Forum engagement—this is their first comment—and we have issued a one-month ban. If they return to the Forum then we’ll expect to see a higher standard of discourse.
As a reminder, bans affect the user, not the account.
If anyone has questions or concerns, feel free to reach out: if you think we made a mistake here, you can appeal the decision.
I’m not sure if your comment is an attempt to restate with examples some of what’s in the “What deep honesty is not” section, or if it’s you pointing out what you see as blind spots in the post. In case it’s the latter, here are some quotes from the post which cover similar ground:
Deep honesty is not a property of a person that you need to adopt wholesale. It’s something you can do more or less of, at different times, in different domains.
…
But blunt truths can be hurtful. It is often compatible with deep honesty to refrain from sharing things where it seems kinder to do so [...] And it’s of course important, if sharing something that might be difficult to hear, to think about how it can be delivered in a gentle way.
…
If the cashier at the grocery store asks how you’re doing, it’s not deeply honest to give the same answer you’d give to a therapist — it’s just inappropriate.
(Just pointing out that previous discussion of this paper on this forum can be found here.)