“The combination of these vastly different expressions of scale together with anchoring makes that we should expect people to over-estimate the probability of unlikely risks and hence to over-estimate the expected utility of x-risk prevention measures. ”
I am not entirely sure whether i understand this point. Is the argument that the anchoring effect would cause an overestimation, because the “perceived distance” from an anchor grows faster per added zero than per increase of one to the exponent?
Directly relevant quotes from the articles for easier reference:
“This story seems consistent with the historical record. Things are usually preceded by worse versions, even in cases where there are weak reasons to expect a discontinuous jump. The best counterexample is probably nuclear weapons. But in that case there were several very strong reasons for discontinuity: physics has an inherent gap between chemical and nuclear energy density, nuclear chain reactions require a large minimum scale, and the dynamics of war are very sensitive to energy density.”
“I’m not aware of many historical examples of this phenomenon (and no really good examples)—to the extent that there have been “key insights” needed to make something important work, the first version of the insight has almost always either been discovered long before it was needed, or discovered in a preliminary and weak version which is then iteratively improved over a long time period. ”
“Over the course of training, ML systems typically go quite quickly from “really lame” to “really awesome”—over the timescale of days, not months or years.
But the training curve seems almost irrelevant to takeoff speeds. The question is: how much better is your AGI then the AGI that you were able to train 6 months ago?”
“Discontinuities larger than around ten years of past progress in one advance seem to be rare in technological progress on natural and desirable metrics. We have verified around five examples, and know of several other likely cases, though have not completed this investigation. ”
“Supposing that AlphaZero did represent discontinuity on playing multiple games using the same system, there remains a question of whether that is a metric of sufficient interest to anyone that effort has been put into it. We have not investigated this.
Whether or not this case represents a large discontinuity, if it is the only one among recent progress on a large number of fronts, it is not clear that this raises the expectation of discontinuities in AI very much, and in particular does not seem to suggest discontinuity should be expected in any other specific place.”
“We have not investigated the claims this argument is premised on, or examined other AI progress especially closely for discontinuities.”
Another point against the content overhang argument: While more data is definitely useful, it is not clear, whether raw data about a world without a particular agent in it will be similarly useful to this agent as data obtained from its own (or that of sufficiently similar agents) interaction with the world. Depending on the actual implementation of a possible superintelligence, this raw data might be marginally helpful but far from being the most relevant bottleneck.
“Bostrom is simply making an assumption that such rapid rates of progress could occur. His intelligence spectrum argument can only ever show that the relative distance in intelligence space is small; it is silent with respect to likely development timespans. ”
It is not completely silent. I would expect any meaningful measure for distance in intelligence space to at least somewhat correlate with timespans necessary to bridge that distance. So while the argument is not a decisive one regarding time spans, it also seems far from saying nothing.
“As such it seems patently absurd to argue that developments of this magnitude could be made on the timespan of days or weeks. We simply see no examples of anything like this from history, and Bostrom cannot argue that the existence of superintelligence would make historical parallels irrelevant, since we are precisely talking about the development of superintelligence in the context of it not already being in existence. ”
Note that the argument from historical parallels is extremely sensitive to reference class. While it seems like there has not been “anything like this” in science or engineering (although progress seems to have been quite discontinous (but not self-reinforcing) by some metrics at times) or related to general intelligence (here it would be interesting to explore, whether or not the evolution of human intelligence happened a lot faster than an outside observer would have expected from looking at the evolution of other animals, since hours and weeks seem like a somewhat Anthropocentric frame of reference), narrow AI has gone from sub- to superhuman level in quite small time spans a lot recently (this is once again very sensitive to framing, so take it more as a point for the complexity of aruments from historical parallels, than as a direct argument for fast take-offs being likely).
“not consistent either with the slow but steady rate of progress in artificial intelligence research over the past 60 years”
Could you elaborate? I’m not extremely familiar with the history of artificial intelligence, but my impression was, that progress was quite jumpy at times, instead of slow and steady.
Thanks for writing this!
I think you are pointing out some important imprecisions, but i think that some of your arguments aren’t as conclusive as you seem to present them to be:
“Bostrom therefore faces a dilemma. If intelligence is a mix of a wide range of distinct abilities as in Intelligence(1), there is no reason to think it can be ‘increased’ in the rapidly self-reinforcing way Bostrom speaks about (in mathematical terms, there is no single variable which we can differentiate and plug into the differential equation, as Bostrom does in his example on pages 75-76). ”
Those variables could be reinforcing each other, as one could argue they had done in the evolution of human intelligence. (in mathematical terms, there is a runaway dynamic similar to the one dimensional case for a linear vector-valued differential equation, as long as all eigenvalues are positive).
“This should become clear if one considers that ‘essentially all human cognitive abilities’ includes such activities as pondering moral dilemmas, reflecting on the meaning of life, analysing and producing sophisticated literature, formulating arguments about what constitutes a ‘good life’, interpreting and writing poetry, forming social connections with others, and critically introspecting upon one’s own goals and desires. To me it seems extraordinarily unlikely that any agent capable of performing all these tasks with a high degree of proficiency would simultaneously stand firm in its conviction that the only goal it had reasons to pursue was tilling the universe with paperclips. To me it seems extraordinarily unlikely that any agent capable of performing all these tasks with a high degree of proficiency would simultaneously stand firm in its conviction that the only goal it had reasons to pursue was tilling the universe with paperclips.”
Why does it seem unlikely? Also, do you mean unlikely as in “agents emerging in a world similar to ours is nowprobably won’t have this property” or as in “given that someone figured out how to construct a great variety of superintelligent agents, she would still have trouble constructing an agent with this property?”
Yes, exactly. When first reading your summary i interpreted it as the “for all” claim.
In the your literature review you summarize the Smith and Winkler (2006) paper as “Prove that nonrandom, non-Bayesian decision strategies systematically overestimate the value of the selected option.”
On first sight, this claim seems like it might be stronger than the claim i have taken away from the paper (which is similar to what you write later in the text): if your decision strategy is to just choose the option you (naively) expect to be best, you will systematically overestimate the value of the selected option.
If you think the first claim is implied by the second (or something in the paper i missed) in some sense, i’d love to learn about your arguments!
“In fact, I believe that choosing the winning option does maximize expected value if all measurements are unbiased and their reliability doesn’t vary too much.”
I think you are basically right, but the amount of available options also plays a role here. If you consider a lot of non-optimal options, for which your measurements are only slightly noisier than for the best option, you can still systematically underselect the best option. (For example, simulations suggest that with 99 N(0,1.1) and 1 N(0.1,1) variables, the last one will only be maximal among the 100 only 0.7% of the time, despite having the highest expected value).
In this case, randomly taking one option would in fact have a higher expected value. (But it still seems very unclear, how one would identify similar situations in reality, even if they existed).
Some combination of moderately varying noise and lots of options seems like the most plausible condition, under which not taking the winning option might be better for some real world decisions.
I think that the assumption of the existence of a Funnel shaped distribution with undefined expected value of things we care about is quite a bit stronger than assuming that there are infinitely many possible outcomes.
But even if we restrict ourselves to distributions with finite expected value, our estimates can still fluctuate wildly until we have gathered huge amounts of evidence.
So while i am sceptical of the assumption that there exists a sequence of world states with utilities tending to infinity and even more sceptical of extremely high/low utility world states being reachable with sufficient probability for there to be undefined expected value (the absolute value of the utility of our action would have to have infinite expected value, and i’m sceptical of believing this without something at least close to “infinite evidence”), i still think your post is quite valuable for starting a debate on how to deal with low probability events, crucial considerations and our decision making when expected values fluctuate a lot.
Also, even if my intuition about the impossibility of infinite utilities was true (I’m not exactly sure what that would actually mean, though), the problems you mentioned would still apply to anyone who does not share this intuition.
I think the argument is that additional information showing that a cause has high marginal impact might divert causes away towards it from causes with less marginal impact. And getting this kind of information does seem more likely for causes without a track record allowing for a somewhat robust estimation of their (marginal) impact.
For clarification: (PITi+ui) is the “real” tractability and importance?
The text seems to make more sense that way, but reading “ui is the unknown (to you) importance and tractability of the cause.”, I interpreted it as ui being the “real” tractability and importance instead of just a noise term at first.
Relatedly, the impromptu nature of some debating formats could also help with getting comfortable formulating answers to nontrivial questions under (time) pressure. Apart from being generally helpful, this might be especially valuable in some types of job interviews.
I’ve been considering to invest some time into competitive debating, mostly in order to improve that skill, so if someone has data (even anecdotal) on that, pleases share :)
I am quite interested in your other arguments for why EV calculations won’t work for pascal’s mugging and why they might extend to x-risks. I would probably have prefered a post already including all the arguments for your case.
About the argument from hypothetical updates: My intuition is, that if you assign a probability of a lot more than 0.1^10^10^10 to the mugger actually being able to follow through this might create other problems (like probabilities of distinct events adding to something higher than 1 or priors inconsistent with occams razor). If that intuition (and your argument) was true (my intuition might very well be wrong and seems at least slightly influenced by motivated reasoning), one would basically have to conclude that bayesian EV reasoning fails as soon as it involves combinations of extreme utilities and miniscule probabilities.
However, i don’t think the credenced for being able to influence x-risks are so low, that updating becomes impossible and therefore i’m not convinced not to use EV to evaluate them by your first argument. I’m quite eager to see the other arguments, though.
What exactly do you mean with utility here? The Quasi-negative utilitarian framework seems to correspond to a shift of everyone’s personal utility, such that the shifted utility for each person is 0, whenever this person’s live is neither worth living, nor not worth living.
It seems to me, like a reasonable notion of utility would have this property anyway (but i might just use the word differently than other people, please tell me, if there is some widely used definition contradicting this!). This reframes the discussion into one about where the zero point of utility functions should lie, which seems easier to grasp. In particular, from this point of view Quasi-negative utilitarianism still gives rise to some for of the sadistic-repugnant conclussion.
On a broader point, i suspect, that the repugnance of repgugnant conclussions usually stems from confusion/disagreement about what “a life worth living” means. However, as in your article, entertaining this conclussion still seems useful in order to sharpen our intuition about what lives are actually worth living.
Are any ways of making content easier to filter (like for example tags) planned?
I am rather new to the community and there have been multiple occassions, where i randomly stumbled upon old articles, i haven’t read, concerned with topics i was interested in and had previously made an effort to find articles about. This seems rather inefficient.
“to prove this argument I would have to present general information which may be regarded as having informational hazard”
Is there any way to assess the credibility of statements like this (or whether this is actually an argument worth considering in a given specific context)?
It seems like you could use this as a general purpose argument for almost everything.
I am not sure about whether your usage of economies of scale already covers this, but it seems to make sense to highlight, that what matters is the marginal difference of the money for you and your adversary. If doing evil is a lot more efficient at low scales (Think of distributing highly addictive drugs among vurnerable populations vs. Distributing Malaria nets), your adversary could be hitting diminishing returns already, while your marginal returns increase, and the lottery might still be not be worth it.
Are you talking about the individual level, or the mean? My estimate would be, that for the median individual, the effect will have faded out after at most 6 months. However, the mean might be influenced by the tails quite strongly.
Thinking about it for a bit longer, a mean effect of 12 years does seem quite implausible, though. In the limiting case, where only the tails matter, this would be equivalent to convincing around 25% of the initially influenced students to stop eating pork for the rest of their lives.
The upper bound for my 90% confidence interval for the mean seems to be around 3 years, while the lower bound is at 3 months. The probability mass within the interval is mostly centered to the left.
The claim does not seem to be exactly, that there is a 10% chance of an animal advocacy video affecting consumption decisions after 12 years for a given individual.
I’d interpret it as: there is a 5% chance of the mean duration of reduction, conditioned on the participant reporting to change their behaviour based on the video being higher than 12 years.
This could for example also be achieved by having a very long term impact on very few participants. This interpretation seems a lot more plausible, although i am not certain at all, wheter that claim correct. Long term follow up data would certainly be very helpful.
At this point, i think that to analyze the $1bn case correctly, you’d have to substract everyone’s opportunity cost in the calculation of the shapley value (if you want to use it here). This way, the example should yield what we expect.
I might do a more general writeup about shapley values, their advantages, disadvantages and when it makes sense to use them, if i find the time to read a bit more about the topic first.
I think, it might be best to just report confidence intervals for your final estimates (guesstimate should give you those). Then everyone can combine your estimates with their own priors on general intervention’s effectiveness and thereby potentially correct for the high levels of uncertainty (at least in a crude way by estimating the variance from the confidence intervals).
The variance of X can be defined as E[X^2]-E[X]^2, which should not be hard to implement in Guesstimate. However, i am not sure, whether or not having the variance yields to more accurate updating, than having a confidence interval. Optimally you’d have the full distribution, but i am not sure, whether anyone will actually do the maths to update from there. (But they could get it roughly from your guesstimate model).
I might comment more on some details and the moral assumptions, if i find the time for it soon.