Thank you, this is helpful.
Scott Alexander
I’m having trouble understanding this. The part that comes closest to making sense to me is this summary:
The fact that life has survived so long is evidence that the rate of
potentially omnicidal events is low...[this and the anthropic shadow effect] cancel out, so that overall the historical record provides evidence for a true rate close to the observed rate.Are they just applying https://en.wikipedia.org/wiki/Self-indication_assumption_doomsday_argument_rebuttal to anthropic shadow without using any of the relevant terms, or is it something else I can’t quite get?
Also, how would they respond to the fine-tuning argument? That is, it seems like most planets (let’s say 99.9%) cannot support life (eg because they’re too close to their sun). It seems fantastically surprising that we find ourselves on a planet that does support life, but anthropics provides an easy way out of this apparent coincidence. That is, anthropics tells us that we overestimate the frequency of things that allow us to be alive. This seems like reverse anthropic shadow, where anthropic shadow is underestimating the frequency of things that cause us to be dead. So is the paper claiming that anthropics does change our estimates of the frequency of good things, but can’t change our estimate of the frequency of bad things? Why would this be?
I mostly agree with this. The counterargument I can come up with is that the best AI think tanks right now are asking for grants in the range of $2 - $5 million and seem to be pretty influential, so it’s possible that a grantmaker who got $8 million could improve policy by 5%, in which case it’s correct to equate those two.
I’m not sure how that fits with the relative technical/policy questions.
Yes, I added them partway through after thinking about the question set more.
Sorry, I’ve fixed the Google Form.
Results of an informal survey on AI grantmaking
The article was obviously terrible, and I hope the listed mistakes get corrected, but I haven’t seen a request for correction on the claim that CFAR/Lightcone has $5 million of FTX money and isn’t giving it back. Is there any more information on whether this is true and, if so, what their reasoning is?
I think this is more over-learning and institutional scar tissue from FTX. The world isn’t divided into Bad Actors and Non-Bad-Actors such that the Bad Actors are toxic and will destroy everything they touch.
There’s increasing evidence that Sam Altman is a cut-throat businessman who engages in shady practices. This also describes, for example, Bill Gates and Elon Musk, both of whom also have other good qualities. I wouldn’t trust either of them to single-handedly determine the fate of the world, but they both seem like people who can be worked with in the normal paradigm of different interests making deals with each other while appreciating a risk of backstabbing.
I think “Sam Altman does shady business practices, therefore all AI companies are bad actors and alignment is impossible” is a wild leap. We’re still in the early (maybe early middle) stages of whatever is going to happen. I don’t think this is the time to pick winners and put all eggs in a single strategy. Besides, what’s the alternative? Policy? Do you think politicians aren’t shady cut-throat bad actors? That the other activists we would have to work alongside aren’t? Every strategy involves shifting semi-coalitions with shady cut-throat bad actors of some sort of another, you just try to do a good job navigating them and keep your own integrity intact.
If your point is “don’t trust Sam Altman absolutely to pursue our interests above his own”, point taken. But there are vast gulfs between “don’t trust him absolutely” and “abandon all strategies that come into contact with him in any way”. I think the middle ground here is to treat him approximately how I think most people here treat Elon Musk. He’s a brilliant but cut-throat businessman who does lots of shady practices. He seems to genuinely have some kind of positive vision for the world, or want for PR reasons to seem like he has a positive vision for the world, or have a mental makeup incapable of distinguishing those two things. He’s willing to throw the AI safety community the occasional bone when it doesn’t interfere with business too much. We don’t turn ourselves into the We Hate Elon Musk movement or avoid ever working with tech companies because they contain people like Elon Musk. We distance ourselves from him enough that his PR problems aren’t our PR problems (already done in Sam’s case; thanks to the board the average person probably thinks of us as weird anti-Sam-Altman fanatics) describe his positive and negative qualities honestly if asked, try to vaguely get him to take whatever good advice we have that doesn’t conflict with his business too much, and continue having a diverse portfolio of strategies at any given time. Or, I mean, part of the shifting semi-coalitions is that if some great opportunity to get rid of him comes, we compare him to the alternatives and maybe take it. But we’re so far away from having that alternative that pining after it is a distraction from the real world.
Excerpts From The EA Talmud
Pause For Thought: The AI Pause Debate
I thought we already agreed the demon case showed that FDT wins in real life, since FDT agents will consistently end up with more utility than other agents.
Eliezer’s argument is that you can become the kind of entity that is programmed to do X, by choosing to do X. This is in some ways a claim about demons (they are good enough to predict even the choices you made with “your free will”). But it sounds like we’re in fact positing that demons are that good—I don’t know how to explain how they have 999,999/million success rate otherwise—so I think he is right.
I don’t think the demon being wrong one in a million times changes much. 999,999 of the people created by the demon will be some kind of FDT decision theorist with great precommitment skills. If you’re the one who isn’t, you can observe that you’re the demon’s rare mistake and avoid cutting off your legs, but this just means you won the lottery—it’s not a generally winning strategy.
Decision theories are intended as theories of what is rational for you to do. So it describes what choices are wise and which choices are foolish.
I don’t understand why you think that the choices that get you more utility with no drawbacks are foolish, and the choices that cost you utility for no reason are wise.
On the Newcomb’s Problem post, Eliezer explicitly said that he doesn’t care why other people are doing decision theory, he would like to figure out a way to get more utility. Then he did that. I think if you disagree with his goal, you should be arguing “decision theory should be about looking good, not about getting utility” (so we can all laugh at you) rather than saying “Eliezer is confidently and egregiously wrong” and hiding the fact that one of your main arguments is that he said we should try to get utility instead of failing all the time and then came up with a strategy that successfully does that.
I think rather than say that Eliezer is wrong about decision theory, you should say that Eliezer’s goal is to come up with a decision theory that helps him get utility, and your goal is something else, and you have both come up with very nice decision theories for achieving your goal.
(what is your goal?)
My opinion on your response to the demon question is “The demon would never create you in the first place, so who cares what you think?” That is, I think your formulation of the problem includes a paradox—we assume the demon is always right, but also, that you’re in a perfect position to betray it and it can’t stop you. What would actually happen is the demon would create a bunch of people with amputation fetishes, plus me and Eliezer who it knows wouldn’t betray it, and it would never put you in the position of getting to make the choice in real life (as opposed to in an FDT algorithmic way) in the first place. The reason you find the demon example more compelling than the Newcomb example is that it starts by making an assumption that undermines the whole problem—that is, that the demon has failed its omniscience check and created you who are destined to betray it. If your problem setup contains an implicit contradiction, you can prove anything.
I don’t think this is as degenerate a case as “a demon will torture everyone who believes FDT”. If that were true, and I expected to encounter that demon, I would simply try not to believe FDT (insofar as I can voluntarily change my beliefs). While you can always be screwed over by weird demons, I think decision theory is about what to choose in cases where you have all of the available knowledge and also a choice in the matter, and I think the leg demon fits that situation.
I guess any omniscient demon reading this to assess my ability to precommit will have learned I can’t even precommit effectively to not having long back-and-forth discussions, let alone cutting my legs off. But I’m still interested in where you’re coming from here since I don’t think I’ve heard your exact position before.
Have you read https://www.lesswrong.com/posts/6ddcsdA2c2XpNpE5x/newcomb-s-problem-and-regret-of-rationality ? Do you agree that this is our crux?
Would you endorse the statement “Eliezer, using his decision theory, will usually end out with more utility than me over a long life of encountering the sorts of weird demonic situations decision theorists analyze, I just think he is less formally-rational” ?
Or do you expect that you will, over the long run, get more utility than him?
Sorry if I misunderstood your point. I agree this is the strongest objection against FDT. I think there is some sense in which I can become the kind of agent who cuts off their legs (ie by choosing to cut off my legs), but I admit this is poorly specified.
I think there’s a stronger case for, right now, having heard about FDT for the first time, deciding I will follow FDT in the future. Various gods and demons can observe this and condition on my decision, so when the actual future comes around, they will treat me as an FDT-following agent rather than a non-FDT-following agent. Even though future-created-me isn’t exactly in a position to influence the (long-since gone) demon, current me is in a position to make this decision for future relevant situations, and should decide to follow FDT in general. Part of this decision I’ve made involves being the kind of person who would take the FDT option in hypothetical scenarios.
Then there’s the additional question of whether to defect against the demons/gods later, and say “Haha, back in August 2023 I resolved to become an FDT agent, and I fooled you into believing me, but now that I’ve been created I’m just going to not cut off my legs after all”. I think of this as—suppose every past being created by the demon has cut off its legs, ie the demon has a 100% predictive success rate over millions of cases. So the demon would surely predict if I would do this. That means I should (now) try really hard not to do this. Cf. Parfit’s Hitchhiker. Can I bind my future self like this? I think empirically yes—I think I have enough honor that if I tell hypothetical demon gods now that I’m going to do various things, I can actually do them when the time comes. This will be “irrational” in some sense, but I’ll still end up with more utility than everyone else.
Is there some sense in which, if I decide not to cut off my legs, I would wink out of existence? I admit feeling a superstitious temptation to believe this (a non-superstitious justification might be wondering if I’m the real me, or a version of me in the omniscient demon’s simulation to predict what I would do). I think the literal answer is no but that it’s practically useful to keep my superstitious belief in this to allow myself to do the irrational thing that gets me more utility. But this is a weird enough sidetrack that I’m really not sure I’m still in normal Eliezer-approved-decision-theory-land at all.
I think an easier question is whether you should program an AI to always keep its pre-emptive bargains with gods and demons; here the answer is just straightforwardly yes. You don’t have to assume that your actions alter your algorithm, you can just alter the algorithm directly. I think this is what Eliezer is most interested in, though I’m not sure.
Were there bright people who said they had checked his work, understood it, agreed with him, and were trying to build on it? Or just people who weren’t yet sure he was wrong?
I don’t want to get into a long back-and-forth here, but for the record I still think you’re misunderstanding what I flippantly described as “other Everett branches” and missing the entire motivation behind Counterfactual Mugging. It is definitely not supposed to directly make sense in the exact situation you’re in. I think this is part of why a variant of it is called “updateless”, because it makes a principled refusal to update on which world you find yourself in in order to (more flippant not-quite-right description) program the type of AIs that would weird games played against omniscient entities.
If the demon would only create me conditional on me cutting off my legs after I existed, and it was the specific class of omniscient entity that FDT is motivated by winning games with, then I would endorse cutting off my legs in that situation.
(as a not-exactly-right-but-maybe-helpful intuition pump, consider that if the demon isn’t omniscient—but simply reads the EA Forum—or more strictly can predict the text that will appear on the EA Forum years in the future—it would now plan to create me but not you, and I with my decision theory would be better off than you with yours. And surely omniscience is a stronger case than just reads-the-EA-Forum!)
If this sounds completely stupid to you, and you haven’t yet read the LW posts on Counterfactual Mugging. I would recommend starting there; otherwise, consider finding a competent and motivated FDT proponent (ie not me) and trying to do some kind of double-crux or debate with them, I’d be interested in seeing the results.
I won’t comment on the overall advisability of this piece, but I think you’re confused about the decision theory (I’m about ten years behind state of the art here, and only barely understood it ten years ago, so I might be wrong).
The blackmail situation seems analogous to the Counterfactual Mugging, which was created to highlight how Eliezer’s decision theories sometimes (my flippant summary) suggest you make locally bad decisions in order to benefit versions of you in different Everett branches. Schwartz objecting “But look how locally bad this decision is!” isn’t telling Eliezer anything he doesn’t already know, and isn’t engaging with the reasoning. I think I would pay Omega in Counterfactual Mugging; I agree Schwartz’s case is harder, but provisionally I think it unintentionally adds a layer of Pascal’s Wager + torture vs. dust specks by making the numbers so extreme, which are two totally unrelated reasoning vortices.
I think the “should you procreate to make your father procreate?” question only works if your father’s cognitive algorithms are perfectly correlated with yours, which no real father’s are. To make the example fair, it should be more like “You were created by Omega, a god who transcends time. It resolved to created you if and only if It predicted that you would procreate, and It is able to predict everything perfectly. Now should you procreate?” I would also accept “You were created by a clone of yourself in the exact same situation, down to the atom, that you find yourself in now, including worrying about being created by a clone of yourself and so on. Should you procreate?” In both of these, the question seems much more open than with a normal human father.
If Eliezer’s decision theories make no sense and are ignoring easy disproofs, then everyone else who finds them compelling (or at least not obviously wrong) after long study, including people like Wei Dai, Abram Demski, Scott Garrabrant, Benya Fallenstein, etc, is also bizarrely and inexplicably wrong. This is starting to sound less like “Eliezer is a uniquely bad reasoner” and more like “there’s something in the water supply here that makes extremely bright people with math PhDs make simple dumb mistakes that any rando can notice.”
Thanks for writing this.
I understand why you can’t go public with applicant-related information, but is there a reason grantmakers shouldn’t have a private Slack channel where they can ask things like “Please PM me if any of you have any thoughts on John Smith, I’m evaluating a grant request for him now”?
Okay, so GWWC, LW, and GiveWell, what are we going to do to reverse the trend?
Seriously, should we be thinking of this as “these sites are actually getting less effective at recruiting EAs” or as “there are so many more recruitment pipelines now that it makes sense that each one would drop in relative importance” or as “any site will naturally do better in its early years as it picks the low-hanging fruit in converting its target population, then do worse later”?
Habryka referred me to https://forum.effectivealtruism.org/posts/A47EWTS6oBKLqxBpw/against-anthropic-shadow , whose “Possible Solution 2” is what I was thinking of. It looks like anthropic shadow holds if you think there are many planets (which seems true) and you are willing to accept weird things about reference classes (which seems like the price of admissions to anthropics). I appreciate the paper you linked for helping me distinguish between the claim that anthropic shadow is transparently true without weird assumptions, vs. the weaker claim in Possible Solution 2 that it might be true with about as much weirdness as all the other anthropic paradoxes.