I agree that this was probably a factor that contributed to the accuracy gains of people who made more frequent forecasts. It may even have been doing most of the work; I’m not sure.
The exact training module they used is probably not public, but they do have a training module on their website. It costs money though.
For sure, forecasters who devoted more effort to it tended to make more accurate predictions. It would be surprising if that wasn’t true!
...because that AI research is useful for some other goal the AI has, such as maximizing paperclips. See the instrumental convergence thesis.
The argument for doom by default seems to rest on a default misunderstanding of human values as the programmer attempts to communicate them to the AI. If capability growth comes before a goal is granted, it seems less likely that misunderstanding will occur.
Eh, I could see arguments that it would be less likely and arguments that it would be more likely. Argument that it is less likely: We can use the capabilities to do something like “Do what we mean,” allowing us to state our goals imprecisely & survive. Argument that it is more likely: If we mess up, we immediately have an unaligned superintelligence on our hands. At least if the goals come before the capability growth, there is a period where we might be able to contain it and test it, since it isn’t capable of escaping or concealing its intentions.
I think the big disanalogy between AI and the Industrial and Agricultural revolutions is that there seems to be a serious chance that an AI accident will kill us all. (And moreover this isn’t guaranteed; it’s something we have leverage over, by doing safety research and influencing policy to discourage arms races and encourage more safety research.) I can’t think of anything comparable for the IR or AR. Indeed, there are only two other cases in history of risk on that scale: Nuclear war and pandemics.
Thanks for this talk/post—It’s a good example of the sort of self-skepticism that I think we should encourage.
FWIW, I think it’s a mistake to construe the classic model of AI accident catastrophe as capability gain first, then goal acquisition. I say this because (a) I never interpreted it that way when reading the classic texts, and (b) it doesn’t really make sense—the original texts are very clear that the massive jump in AI capability is supposed to come from recursive self-improvement, i.e. the AI helping to do AI research. So already we have some sort of goal-directed behavior (bracketing CAIS/ToolAI objections!) leading up to and including the point of arrival at superintelligence.
I would construe the little sci-fi stories about putting goals into goal slots as not being a prediction about the architecture of AI but rather illustrations of completely different points about e.g. orthogonality of value or the dangers of unaligned superintelligences.
At any rate, though, what does it matter whether the goal is put in after the capability growth, or before/during? Obviously, it matters, but it doesn’t matter for purposes of evaluating the priority of AI safety work, since in both cases the potential for accidental catastrophe exists.
This research is very helpful, thanks! Two questions: (1) Sometimes I wonder if brain size is relevant, not just to probability-of-feeling-pain but to “amount of pain felt” or something like that. So, for example, perhaps a 1kg brain feels 1000x more pain than a 1g brain, on average. Do you include this in your analysis? If not, would it change things much if you did—e.g. making cows much higher-priority? (2) Your analysis is focused on the question of which animals should be prioritized in EA interventions. Does it also apply to the question of which animals are highest-priority to avoid eating? E.g. would it be better to be a reducitarian who eats beef but no other meats than a pescetarian?
Hmmm, good point: If we carve up the space of possibilities finely enough, then every possibility will have a too-low probability. So to make a “ignore small probabilities” solution work, we’d need to include some sort of rule for how to carve up the possibilities. And yeah, this seems like an unpromising way to go…
I think the best way to do it would be to say “We lump all possibilities together that have the same utility.” The resulting profile of dots would be like a hollow bullet or funnel. If we combined that with an “ignore all possibilities below probability p” rule, it would work. It would still have problems, of course.
I believe this concern is addressed by the next post in the series. The current examples implicitly only consider two possible outcomes: “No effect” and “You do blah blah blah and this saves precisely X lives...” The next post expands the model to include arbitrarily many possible outcomes of each action under consideration, and after doing so ends up reasoning in much the way you describe to defuse the initial worry.
Good point. I put in some links at the beginning and end, and I’ll go through now and add the other links you suggest… I don’t think the forum software allows me to link to a part of a post, but I can at least link to the post.
On solution #6: Yeah, it only works if the profiles really do cancel out. But I classified it as involving the decision rule because if your rule is simply to sum up the utilityXprobability of all the possibilities, it doesn’t matter if they are perfectly symmetric around 0, your sum will still be undefined.
Yep, solution #1 involves biting the bullet and rejecting regularity. It has problems, but maybe they are acceptable problems.
Solution #2 would be great if it works, but I don’t think it will—I regret pushing that to the appendix, sorry!
Thanks again for all the comments, btw!
But the probability of those rare things will be super low. It’s not obvious that they’ll change the EV as much as nearer term impacts. … All this theorizing might be unnecessary if our actual expectations follow a different pattern.
Yes, if the profiles are not funnel-shaped then this whole thing is moot. I argue that they are funnel-shaped, at least for many utility functions currently in use (e.g. utility functions that are linear in QALYs) I’m afraid my argument isn’t up yet—it’s in the appendix, sorry—but it will be up in a few days!
Are we? Expected utility is still a thing. Some actions have greater expected utility than others even if the probability distribution has huge mass across both positive and negative possibilities. If infinite utility is a problem then it’s already a problem regardless of any funnel or oscillating type distribution of outcomes.
If the profiles are funnel-shaped, expected utility is not a thing. The shape of your action profiles depends on your probability function and your utility function. Yes, infinitely valuable outcomes are a problem—but I’m arguing that even if you ignore infinitely valuable outcomes, there’s still a big problem having to do with infinitely many possible finite outcomes, and moreover even if you only consider finitely many outcomes of finite value, if the profiles are funnel-shaped then what you end up doing will be highly arbitrary, determined mostly by whatever is happening at the place where you happened to draw the cutoff.
Another way of describing this phenomenon is that we are simply seizing the low hanging fruit, and hard intellectual progress isn’t even needed.
That’s what I’d like to think, and that’s what I do think. But this argument challenges that; this argument says that the low-hanging fruit metaphor is inappropriate here: there is no lowest-hanging fruit or anything close; there is an infinite series of fruit hanging lower and lower, such that for any fruit you pick, if only you had thought about it a little longer you would have found an even lower-hanging fruit that would have been so much easier to pick that it would easily justify the cost in extra thinking time needed to identify it… moreover, you never really “pick” these fruit, in that the fruit are gambles, not outcomes; they aren’t actually what you want, they are just tickets that have some chance of getting what you want. And the lower the fruit, the lower the chance...
Thanks! Yeah, sorry—I was thinking about putting it up all at once but decided against because that would make for a very long post. Maybe I should have anyway, so it’s all in one place.
Well, I don’t share your intuition, but I’d love to see it explored more. Maybe you can get an argument out of it. One way to try would be to try to find a class of at least 10^10^10^10 hypotheses that are at least as plausible as the Mugger’s story.
I’m not assuming it’s symmetric. It probably isn’t symmetric, in fact. Nevertheless, it’s still true that the expected utility of every action is undefined, and that if we consider increasingly large sets of possible outcomes, the partial sums will oscillate wildly the more we consider.
Yes, at any level of probability there should be a higher density of outcomes towards the center. That doesn’t change the result, as far as I can tell. Imagine you are adding new possible outcomes to consideration, one by one. Most of the outcomes you add won’t change the EV much. But occasionally you’ll hit one that makes everything that came before look like a rounding error, and it might flip the sign of the EV. And this occasional occurrence will never cease; it’ll always be true that if you keep considering more possibilities, the old possibilities will continue to be dwarfed and the sign will continue to flip. You can never rest easy and say “This is good enough;” there will always be more crucial considerations to uncover.
So this is a problem in theory—it means we are approximating an ideal which is both stupid and incoherent—but is it a problem in practice?
Well, I’m going to argue in later posts in this series that it isn’t. My argument is basically that there are a bunch of reasonably plausible ways to solve this theoretical problem without undermining long-termism.
That said, I don’t think we should dismiss this problem lightly. One thing that troubles me is how superficially similar the failure mode I describe here is to the actual history of the EA movement: People say “Hey, let’s actually do some expected value calculations” and they start off by finding better global poverty interventions, then they start doing this stuff with animals, then they start talking about the end of the world, then they start talking about evil robots… and some of them talk about simulations and alternate universes...
Arguably this behavior is the predictable result of considering more and more possibilities in your EV calculations, and it doesn’t represent progress in any meaningful sense—it just means that EAs have gone farther down the funnel-shaped rabbithole than everybody else. If we hang on long enough, we’ll end up doing crazier and crazier things until we are diverting all our funds from x-risk prevention and betting it on some wild scheme to hack into an alternate dimension and create uncountably infinite hedonium.
Yup. Also, even if the decision-theoretic move works, it doesn’t solve the more general problem. You’ll just “mug yourself” by thinking up more and more ridiculous hypotheses and chasing after them.
It’s good to know lots of people have this intuition—I think I do too, though it’s not super strong in me.
Arguably, when p is below the threshold you mention, we can make some sort of psuedo-law-of-large-numbers argument for expected utility maximization, like “If we all follow this policy, probably at least one of us will succeed.” But when p is above the threshold, we can’t make that argument.
So the idea is: Reject expected utility maximization in general (perhaps for reasons which will be discussed in subsequent posts!), but accept some sort of “If following a policy seems like it will probably work, then do it” principle, and use that to derive expected utility maximization in ordinary cases.
All of this needs to be made more precise and explored in more detail. I’d love to see someone do that.
(BTW, upcoming posts remove the binary-outcomes assumption. Perhaps it was a mistake to post them in sequence instead of all at once...)