Cool! I just submitted a project—minor bit of feedback is that it’s slightly irritating to have the ‘project subtitle’ field be mandatory.
Arepo
Great post—I’m embarrassed to have missed it til now! One key point I disagree with:
there might be interventions that reduce risk a lot for not very long or not very much but for a long time. But actions that drastically reduce risk and do so for a long time are rare.
I think there are two big possible exceptions to the latter claim: benign AI and becoming sustainably multiplanetary. EAs have discussed the former a lot, and I don’t have much to add (though I’m highly sceptical of it as an arbitrary-value lock-in mechanism on cosmic timelines). I think the latter is more interestingly unexplored. Christopher Lankhof made a case for it here, but didn’t get much engagement, and what criticism he did get seems quite short-term to me: basically that shelters are a cheaper option, and therefore we should prioritise them.
Such criticism might or might not be true in the next few decades. But beyond that, if AI neither kills us nor locks us in to a dystopic or utopic path, and if there are no lightcone-threatening technologies available (e.g. the potential ability to trigger a false vacuum decay), then it seems like by far our best defence against extinction will be simple numbers. The more intelligent life there is in the more places, the bigger and therefore more improbable an event would have to be to kill everyone.
A naive—but I think reasonable, given above caveats—calculation would be to treat the destruction of life around each planet as at least somewhat independent. That would give us some kind of exponential decay function of extinction risk, such that your credence in extinction might be a(1-b)^(p-1), where a is some constant or function representing the risk of a single-planet civilisation going extinct, b is some decay rate—of max(1/2) for total complete independence of extinction on each planet—and p is the number of planets in your civilisation. Absent universe-destroying mechanisms or unstoppable AI, this credence would quickly approach 0.
Obviously ‘creating an self-sustaining settlement on a new planet’ isn’t exactly an everyday occurrence, but with a century or two of continuous technological progress (less, given rapid economic acceleration via e.g. moderately benign AI) it seems likely to progress via ‘doable’ to ‘actually pretty straightforward’. The same technologies that establish the first such colony will go a very long way towards establishing the next few.
In the shorter term, ‘self-sustainingness’ needn’t be an all or nothing proposition. A colony that could e.g. effectively recycle its nutrients for a decade or two would still likely serve as a better defence against e.g. biopandemics than any refuge on Earth—and unlike those on Earth, would be constantly pressure tested even before the apocalypse, so might end up being easier to make reliably robust (vs on-Earth shelters) than simple cost-analyses would suggest.
Katja and I date, so yes, I am biased, but I really think that’s a pretty unimportant fact about her
Congrats to both of you on your great catches! Say hi to her for me—it’s been a while :)
More generally, what incentives exist? In a normal for-profit environment there are various reasons for individuals to start their own company, to seek promotion, to do a good job, to do a bad job, to commit institutional fraud etc—we typically think of these as mainly financial, and often use the adage ‘follow the money’ as a methodology to try and discover these phenomena, to encourage the good ones and discourage the bad.
I want to know what the equivalent methodology would be to find out equivalent phenomena at EA organisations.
EA organizations don’t really have a great need for nurses, for history professors, for plumbers, etc.
Fwiw, I was involved with an EA organisation that that struggled for years with the admin of finding trustworthy tradespeople (especially plumbers).
More generally, I think a lot of EA individuals would benefit a lot from access to specialist knowledge from all sorts of fields, if people with that knowledge were willing to offer it free or at a discount to others in the community.
I have a stronger version of the same concerns, fwiw. I can’t imagine a ‘Long Reflection’ that didn’t involve an extremely repressive government clamping down on private industry every time a company tried to do anything too ambitious, and that didn’t effectively promote some caste of philosopher kings above all others to the resentment of the populace. It’s hard to believe this could lead to anything other than substantially worse social values.
I also don’t see any a priori reason to think ‘reflecting’ gravitates people towards moral truth or better values. Philosophers have been reflecting for centuries, and there’s still very little consensus among them or any particular sign that they’re approaching one.
Are there reasonably engaging narrative tropes (or could we invent effective new ones) that could easily be recycled in genre fiction to promote effective altruist principles, in much the same way that e.g. the noble savage trope can easily be used to promote ecocentric philosophies, no-one gets left behind trope promotes localism, etc?
A steel manned version of the best longtermist argument(s) against AI safety as the top priority cause area.
How can we make effective altruism more appealing to political conservatives without alienating engaged liberals? If there is an inevitable trade-off between the two, what is the optimal equilibrium, how close to it are we, and can we get closer?
Write a concrete proposal for a scalable bunker system that would be robust and reliable enough to preserve technological civilisation in the event of human extinction due to e.g. nuclear winter, biopandemics on the surface. How much would it cost? Given that many people assert it would be much easier than settling other planets, why hasn’t anyone started building such systems en mass, and how could we remove whatever the blocker is?
Investigating incentives in EA organisations. Is money still the primary incentive? If not, how should we think about the intra-EA economy?
What are the most likely scenarios in which we don’t see transformative AI this century or perhaps for even longer? Do they require strong assumptions about (e.g.) theory of mind?
Is there an underunexplored option to fund early stage for-profits that seem to have high potential social value? Might it sometimes be worth funding them in exchange for basically 0 equity so that it’s comparatively easy for them to raise further funding the normal way?
If we take utilitarianism at face value, what are the most likely candidates for the physical substrate of ‘a utilon’? Is it plausible there are multiple such substrates? Can we usefully speculate on any interesting properties they might have?
Some empirical research into the fragile world hypothesis, in particular with reference to energy return on investment (EROI). Is there a less extreme version of ‘The great energy descent’ that implies that average societal EROI could stay at sustainable levels but only absent shocks, and that one or two big shocks could push it below that point and make it a) impossible to recover or b) possible to recover but only after such a major restructuring of our economy that it would resemble the collapse of civiliation?
An updated version of Luisa Rodriguez’s ‘What is the likelihood that civilizational collapse would cause technological stagnation? (outdated research)’ post that took into account her subsequent concerns, and looked beyond ‘reaching an industrial revolution’ to ‘rebuilding an economy large enough to eventually become spacefaring’.
Some kind of investigation into feedback mechanisms to reward good nonprofit work and potentially penalise bad nonprofit work that are more organic/finer instruments than ‘you get a grant or you don’t’.
I know impact certificates are a possibility, but I don’t understand what the secondary market for those could be. They also seem more relevant to individuals and possibly organisations than regular employees at organisations.
I downvoted this post for the lack of epistemic humility. I don’t mind people playing with thought experiments as a way to yield insight, but they’re almost never a way of generating ‘evidence’. Saying things like CDT ‘falls short’ because of them is far too strong. Under a certain set of assumptions, it arguably recommends an action that some people intuitively take issue with—that’s not exactly a self-evident deficiency.
Personally, I can’t see any reason to reject CDT based on the arguments here (or any I’ve seen elsewhere). They seem to rely on sleights such amphiboly, vagueness around what we call causality, asking me to accepting magic or self-effacing premises in some form, and an assumption of perfect selfishness/attachment to closed individualism, both of which I’d much sooner give up than accept a theory that basically defies known physics. For example:
Betting on the past. In my pocket (says Bob) I have a slip of paper on which is written a proposition P. You must choose between two bets. Bet 1 is a bet on P at 10:1 for a stake of one dollar. Bet 2 is a bet on P at 1:10 for a stake of ten dollars. So your pay-offs are as in [the table below]. Before you choose whether to take Bet 1 or Bet 2 I should tell you what P is. It is the proposition that the past state of the world was such as to cause you now to take Bet 2.
Ahmed argues that any causal decision theory worthy of the name would recommend taking Bet 1, simply because taking Bet 1 causally dominates taking Bet 2.[9]
In this scenario I don’t actually see the intuition that supposedly pushes CDTs to take bet one. It’s not clearly phrased, so maybe I’m not parsing it as intended, but it seems to me like the essence is supposed to be that the piece of paper says ‘you will take bet 2.’ If I take bet 2, it’s true, if not, it’s false. Since I’m financially invested in the proposition being true, I ‘cause’ it to be so. I don’t see a case for taking bet 1, or any commitment to evidential weirdness from eschewing it.
The psychopath button. Paul is debating whether to press the “kill all psychopaths” button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button?
This is doing some combination of amphiboly and asking us to accept self-effacing premises.
There are many lemmas to this scenario, depending on how we interpret it, and none given me any concern for CDT:
We might just reject the premises as being underspecified or false. It’s very unclear what the result of pressing the button would be, so if Paul were me I would need to ask a great deal of questions first to have any confidence that this were really a positive EV action. But if I were satisfied that it was, I don’t think I’m a psychopath, and wouldn’t see any real reason why I should assume that I am one just because I’m doing something that harms a few to help many.
Otherwise, if we’re imagining that pressing this button is extremely net good, then rather than switching to a belief system with spooky acausality, we might relax the highly unrealistic assumption that Paul is necessarily totally selfish (even if he does turn out to be a psychopath, they are perfectly capable of being altruistic), and believe that he would push the button and risk death. In such a scenario, a CDT Paul would altruistically push the button and this seems optimal for his goals.
If we insist on the claim that he is totally selfish, then we’re more or less defining him as a psychopath (see the dictionary definition, ‘no feeling for other people’), so pressing the button would provide no relevant evidence but would definitely kill him. In such a scenario CDT Paul would selfishly not press it and this seems optimal for his goals.
If we modify the definition of psychopathy to ‘both being willing to kill other people and also being totally selfish’, then if Paul were to press it expecting it to kill him but improve the world, he would (by evidencing himself to be a psychopath and therefore choosing self-sacrifice for the sake of others) be evidencing himself to be not a psychopath. So the premises in this interpretation are self-effacing, and CDT Paul could reasonably go either way; or he might reasonably believe that such premises are either inconsistent or insufficient to allow him to decide.
Maybe we define psychopathy being willing to kill other people at all even in extremely bizarre circumstances, and even altruistically at great cost to oneself (which is a strange definition, but we’re running out of options) and independent of that definition, we again insist insist on the claim that Paul is totally selfish. Then pressing the button will probably kill almost the entire world’s population including Paul, and be a very bad thing to do. CDT Paul would have both altruistic and selfish reason not to push the button, and this seems optimal for his goals.
Finally if we somehow still insist that pressing the button will necessarily make the world better and that CDT will require Paul to press it while other decision theories will not, this seems like a strike against all those other decision theories. Why would we want to promote algorithms that worsen the world?
If we’re imagining that we get to determine how an AGI thinks, I would rather than give it an easily comprehensible and somewhat altruistic motivation than a perfectly selfish motivation with greater complexity that’s supposed to undo that selfishness.
Newcomb’s problem is similar. I won’t go through all the lemmas in detail, because there are many and some are extremely convoluted, but an approximation of my view is that it’s incredibly underspecified how we know that Omega supposedly knows the outcome and that he’s being honest with us about his intentions, and what we should do as a consequence. For example:
If we imagine he simulated us countless times, he can’t know the outcome, because each simulation introduces a new variable (himself, with knowledge of N+1 simulations, where in the previous simulation he had had knowledge of N) that we could use to randomise generate a result that would seem random to him. So asserting him to be actually perfect requires magic and hence gives us no meaningful result under any decision theory.
If we don’t try to sabotage him via randomness and we believe that he’s doing endless simulations then a CDT without either the assumption of perfect selfishness or of closed individualism (without which, the ‘next guy’ is indistinguishable including to me from myself) will one-box because it will cause the next guy to get more money. Both of these assumptions seem reasonable to drop—or rather, I think they’re unreasonable to hold in the first place—and in this case everyone ends up richer because we one-box without any metaphysical spookiness.
If we believe we have full information about his simulation process and insist on being perfectly selfish, then in the first simulation we should obviously 2-box, since there’s nothing to lose (assuming he has no other way of knowing our intent; otherwise see below). Then, knowing that we’re the Sim 2, we should two-box, since we can infer that Omega will have predicted we do that based on Sim 1. And so on, ad infinitum—we always two-box.
If we assert some more mundane scenario, like Omega is just a human psychologist with a good track record, then we need to assume his hit rate is much lower for it to be non-magic. Then his best strategy (if he’s trying to optimise for correct predictions and not messing with us in some other way) is certainly going to be something trivial like ‘always assume they’ll one-box’, or perhaps something fractionally more sophisticated like ‘assume they’ll one-box iff they’ve ever posted on Less Wrong’. That’s going to dominate any personality-reading he does—and implies we should two-box.
If he’s a machine learning algorithm who looks at a huge amount of my personal historical data and generates a generally accurate prediction, then we’re bordering back on magic for him to have collected all that data without me knowing what was going to happen.
If I did know or had a strong suspicion, then if the expected payoff was sufficient then as a CDT I should have always publicly acted in ways that imply I would one day one-box (and then ultimately two-box and expect to get the maximum reward).
If I didn’t know, then choosing non-CDT examples throughout my life just in case I ever run into such an ML algorithm opens me to other easy exploitation. For eg, I have actually played this game with non-CDT friends, and ‘exploited’ them by just lying about which box I put how much money in (often after hinting in advance that I plan to do so). If they distrust me—even forewarned by a comment such as this—enough to call my bluff in such scenarios, they’re giving evidence to the ML algorithm that it should assume they’re two-boxers, and hence losing the purported benefit of rejecting CDT. Regardless, when we unexpectedly reach the event, we have no reason not to then two-box.
I realise there are many more scenarios, but these arguments feel liturgical to me. If the rejection of CDT can’t be explained in a single well defined and non-spooky case of how it evidently fails by its own standards, I don’t see any value in generating or ‘clearing up confusions about’ ever more scenarios, and strongly suspect those who try of motivated reasoning.
I’m really excited to see this work! I know they’re not your variables, but having identified some top N variables (and possibly others of interest), it would be really helpful if you could dig out a concrete explanation of how each one was generated. E.g. ‘social support’ seems like it could mean any number of things, possibly different things to different people (my girlfriend suggested it might commonly be interpreted as effectively ‘happiness’, and so not really be telling us anything). ‘Feelings of freedom’ could be similar - ‘shelter’ sounds a bit more robust, but it still seems really important to understand what the number being given is.
For social support I had a very shallow look as far as the Helliwell essay and gave up when I couldn’t find a definition of/survey question representing it in there. I don’t have the bandwidth to dig further, but would really like to understand this better.
I think much of this criticism is off. There are things I would disagree with Nuno on, but most of what you’re highlighting doesn’t seem to fairly represent his actual concerns.
He does. Also, I suspect his main concern is with people being banned rather than having their posts moderated.
I don’t know what Nuno actually believes, but he carefully couches both of these as hypotheticals, so I don’t think you should cite them as things he believes. (in the same section, he hypothetically imagines ‘What if EA goes (¿continues to go?) in the direction of being a belief that is like an attire, without further consequences. People sip their wine and participate in the petty drama, but they don’t act differently.’ - which I don’t think he advocates).
Also, you’re equivocating the claim that EA is too naive (which he certainly seems to believe), too consequentialist (which I suspect but don’t know he believes), ignores common sense (which I imagine he believes), what he’s actually said he believes—that he thinks it should optimise more vigorously—what the hypothetical you quote.
I’m not sure what you want here—his blog is full of criticisms of EA organisations, including those linked in the OP.
He literally links to why he thinks their priorities are bad in the same sentence.
I don’t think it’s reasonable to assert that he conflates them in a post that estimates the degree to which OP money dominates the EA sphere, that includes the header ‘That the structural organization of the movement is distinct from the philosophy’, and that states ‘I think it makes sense for the rank and file EAs to more often do something different from EA™’. I read his criticism as being precisely that EA, the non-OP part of the movement has a lot of potential value, which is being curtailed by relying too much on OP.
I think you’re mispresenting the exact sentence you quote, which contains the modifier ‘to constrain the rank and file in a hypocritical way’. I don’t know how in favour of maximisation Nuno is, but what he actually writes about in that section is the ways OP has pursued maximising strategies of their own that don’t seem to respect the concerns they profess.
You don’t have to agree with him on any of these points, but in general I don’t think he’s saying what you think he’s saying.