I just posted an explanation of why I think the scenario in my fable is even more intractable than it appears: De Dicto and De Se Reference Matters for Alignment.
philgoetz
De Dicto and De Se Reference Matters for Alignment
Thank you. I should have checked this 7 hours ago! But probably I wouldn’t have finished if I had.
I have taken down my entry, though I don’t “retract” it.
Does “submitted by September 1st” mean “submitted before Sept 1″ or “submitted by the end of Sept 1”?
Why only a few million? You’ll have to kill 9 billion people, and to what purpose? I don’t see any reason to think that the current population of humans wouldn’t be infinitely sustainable. We can supply all the energy we need with nuclear and/or solar power, and that will get us all the fresh water we need; and we already have all the arable land that we need. There just isn’t anything else we need.
Re. “You had mentioned concern about there being no statements of existential threat from climate change. Here’s the UN Secretary General’s speech on climate change where he claims that climate change is an existential threat.”
No; I said that when I traced claims of existential threat from climate change back to their source, the trail always led back to the IPCC, and the latest IPCC summary report didn’t mention anything remotely close to an existential threat to humans. This is yet another instance—the only source cited is the IPCC.
Thanks! That’s a lot to digest. Do you know how “government approval” of IPCC reports is implemented, e.g., does any one government have veto power over everything in the report, and is this approval granted by leaders, political appointees, or more-independent committees or organizations?
Re. “Right now, I believe that all renewables are a sideshow, cheap or not, until we grasp that population decline and overall energy consumption decline are the requirements of keeping our planet livable for our current population”—How does this belief affect your ethics? For instance, does this mean the US should decrease immigration drastically, to force poor countries to deal with their population problem? Should the US reduce grain exports? How would you approach the problem that the voluntary birth rate is higher in dysfunctional and highly-religious cultures than in stable developed secular ones? What are we to do about religions which teach that contraception is a sin?
I was hoping for an essay about deliberately using nonlinear systems in constructing AI, because they can be more-stable than the most-stable linear systems if you know how to do a good stability analysis. This was instead an essay on using ideas about nonlinear systems to critique the AI safety research community. This is a good idea, but it would be very hard to apply non-linear methods to a social community. The closest thing I’ve seen to doing that was the epidemiological models used to predict the course of Covid-19.
The essay says, “The central lesson to take away from complex systems theory is that reductionism is not enough. It’s often tempting to break down a system into isolated events or components, and then try to analyze each part and then combine the results. This incorrectly assumes that separation does not distort the system’s properties.” I hear this a lot, but it’s wrong. It assumes that reductionism is linear—that you want to break a nonlinear system into isolated components, then relate them to each other with linear equations.
Reductionism can work on nonlinear systems if you use statistics, partial differential equations, and iteration. Epidemiological models and convergence proofs for neural networks are examples. Both use iteration, and may give only statistical claims, so you might still say “reductionism is not enough” if you want absolute certainty, e.g., strict upper bounds on distributions. But absolute certainty is only achievable in formal systems (unapplied math and logic), not in real life.
The above essay seems to me to be trying to use linear methods to understand a nonlinear system, decomposing it into separable heuristics and considerations to be attended to, such as the line-items in the flow charts and bulleted lists above. That was about the best you could do, given the goal of managing the AI safety community.
I’d really like to see you use your understanding of complex systems either to try to find some way of applying stability analysis to different AI architectures, or to study the philosophical foundations of AI safety as it exists today. The latter use assumptions of linearity, analytic solvability, distrust of noise and evolution, and a classical (i.e., ancient Greek) theory of how words work, which expects words to necessarily have coherent meanings, and for those meanings to have clear and stable boundaries, and requires high-level foundational assumptions because the words are at a high level of abstraction. This is all especially true of ideas that trace back to Yudkowsky. I think these can all be understood as stemming from over-simplifications required for linear analysis. They’re certainly strongly correlated with it.
I dumped a rant that’s mostly about the second issue (the metaphysics of the AI safety community today) onto this forum recently, here, which is a little more specific, though I fear perhaps still not specific enough to be better than saying nothing.
Thanks for the link to Halstead’s report!
I can’t be understating the tail risks, because I made no claims about whether global warming poses existential risks. I wrote only that the IPCC’s latest synthesis report didn’t say that it does.
I thought that climate change obviously poses some existential risk, but probably not enough to merit the panic about it. Though Halstead’s report that you linked explicitly says not just that there’s no evidence of existential risk, but that his work gives evidence there is insignificant existential risk. I wouldn’t conclude “there is insignificant existential risk”, but it appears that risk lies more in “we overlooked something” than in evidence found.
The only thing I was confident of was that some people, including a member of Congress, incited panic by saying global warming was an imminent thread to the survival of humanity, and the citation chain led me back to that IPCC report, and nothing in it supported that claim.
I’m not claiming to have outsmarted anyone. I have claimed only that I have read the IPCC’s Fifth Synthesis Report, which is 167 pages, and it doesn’t report any existential threats to humans due to climate changes. It is the report I found to be most often-cited by people claiming there are existential threats to humans due to global warming. It does not support such claims, not even once among its thousands of claims, projections, tables, graphs, and warnings.
Neither did I claim that there is no existential threat to humanity from global warming. I claimed that the IPCC’s 5th Synth report doesn’t suggest any existential threat to humanity from global warming.
Kemp is surely right that global warming “is” an existential threat, but so are asteroid strikes. He’s also surely right that we should look carefully at the most-dangerous scenarios. But, skimming Kemp’s paper recklessly, it doesn’t seem to have any quantitative data to justify the panic being spread among college students today by authorities claiming we’re facing an immediate dire threat, nor the elevation of global warming to being a threat on a par with artificial intelligence, nor the crippling of our economies to fight it, nor failing to produce enough oil that Europe can stop funding Russia’s war machine.
And as I’ve said for many years: We already have the solution to global warming: nuclear power. Nuclear power plants are clearly NOT an existential threat. If you think global warming is an existential threat, you should either lobby like hell for more nuclear power, or admit to yourself that you don’t really think global warming is an existential threat.
I don’t think the IPCC is now looking more at scenarios with a less than 3C rise in temperature out of conservatism, but because they don’t see a rise above 3C before 2100 except in RCP8.5 (Figure 2.3), which is now an unrealistically high-carbon scenario; and they were sick of news agencies reporting RCP8.5 as the “business as usual” case. (It was intended to represent the worst 10% out of just those scenarios in which no one does anything to prevent climate change.)
The IPCC’s 5th Synth Report dismisses Kemp’s proposed “Hothouse Earth” tipping point on page 74. Kemp’s claim is based on a 2018 paper, so it is the more up-to-date claim. But Halstead’s report from August 2022 is even more up-to-date, and also dismisses the Hothouse Earth tipping point.
Anyway. Back to the 5th Synth Report. It contains surprisingly little quantitative information; what it does have on risks is mostly in chapter 2. It presents this information in a misleading format, rating risks as “Very low / Medium / Very high”, but these don’t mean a low, medium, or high expected value of harm. They seem to mean a low, medium, or high probability of ANY harm of the type described, or, if they’re smart, some particular value range for a t-test of the hypothesis of net harm > 0.
The text is nearly all feeble claims like this: “Climate change is expected to lead to increases in ill-health in many regions and especially in developing countries with low income, as compared to a baseline without climate change… From a poverty perspective, climate change impacts are projected to slow down economic growth, make poverty reduction more difficult, further erode food security and prolong existing and create new poverty traps, the latter particularly in urban areas and emerging hotspots of hunger (medium confidence). … Climate change is projected to increase displacement of people (medium evidence, high agreement).”
I call these claims feeble because they’re unquantitative. In nearly every case, no claim is made except that these harms will greater than zero. Figure SPM.9 is an exception; it shows significant predicted reductions in crop yield, with an expected value of around a 10% reduction of crop yields in 2080 AD (eyeballing the graph). Another exception is Box 3.1 on p. 79, which says, “These incomplete estimates of global annual economic losses for temperature increases of ~2.5°C above pre-industrial levels are between 0.2 and 2.0% of income (medium evidence, medium agreement).” Another exception shows predicted ocean level rise (and I misspoke; it predicts a change of a bit more than 1 foot by 2100 AD). None of the few numeric predictions of harm or shortfall that it predicts are frightening.
In short, I’m not saying I’ve evaluated the evidence and decided that climate change isn’t threatening. I’m saying that I read the 5th Synthesis Report, which I read because it was the report most-commonly cited by people claiming we face an existential risk, and found there is not one claim anywhere in it that humans face an existential risk from climate warming. I would say the most-alarming claim in the report is that crop yields are expected to be between 10% and 25% lower in 2100 than they would be without global warming. This is still less of an existential risk than population growth, which is expected to cause a slightly greater shortfall of food over that time period; and we have 80 years to plant more crops, eat fewer cows, or whatever.
You wrote, “What we are facing, and which is well described in the IPCC reports (more so in the latest one), is that there are big challenges ahead when it comes to crops and food security, fresh water supply, vector borne diseases, and mass displacement due to various factors.” But the report I read suggests only that there are big challenges ahead when it comes to crops, as I noted above. For everything else, it just says that water supply will decline, diseases will increase, and displacement will increase. It doesn’t say, nor give any evidence, that they’ll decline or increase enough for us to worry about.
The burden of proof is not on me. The burden of proof is on the IPCC to show numeric evidence that the bad things they warn us about are quantitatively significant, and on everyone who cited this IPCC report to claim that humanity is in serious danger, to show something in the report that suggests that humanity is in serious danger. I’m not saying there is no danger; I’m saying that the source that’s been cited to me as saying there is serious existential danger, doesn’t say that.
(Halstead’s report explicitly says, “my best guess estimate is that the indirect risk of existential catastrophe due to climate change is on the order of 1 in 100,000 [over all time, not just the next century], and I struggle to get the risk above 1 in 1,000.” Dinosaur-killing-asteroid strike risk is about 1 / 50M per yr, or 1/500K per century.)
You’re right about my tendency towards tendentiousness. Thanks! I’ve reworded it some. Not to include “I think that”, because I’m making objective statements about what the IPCC has written.
That footnote is an important point. People need to learn to use odds ratios. Though I think that with odds ratios, the equivalent increase is to 1 - ((1/99) x ((1/99) / (10/90))) = 99.908%, not the intuitive-looking 99.9%.
Also, the interpretation of odds ratios is often counter-intuitive when comparing test groups of different sizes. If P(X) >> P(~X) or P(X) << P(~X), the probability ratio P(W|X) / P(W|~X) can be very different from the odds ratio [P(W,X) / P(W,~X)] / [P(~W,X) / P(~W,~X)]. (Hope I’ve done that math right. The odds ratio would normally just use counts, but I used probabilities for both to make them more visually comparable.)
I see lots of downvotes, but no quotations of any predictions from any recent IPCC report that could qualify as “existential risk”.
If you’ll read the IPCC’s Synthesis reports, you’ll see the only existential risks due to climate change that they predict are to shellfish, coral communities, and species of the arctic tundra. They also mention some Amazonian species, but they’re in danger less from climate change than from habitat loss. The likely harm to humans, expressed in economic terms, is a loss of less than 1% of world GNP by 2100 AD, accompanied by a raise in sea level of less than one foot [1]. I don’t think that even counts the economic gains from lands made fertile by climate change (I couldn’t find any reference to them). (I’m going off the Fifth Synthesis Report; the sixth should come out very soon.)
The most devastating change, which is the melting of the Greenland ice cap, was predicted to occur between 3000 and 4000 AD, and even that isn’t an existential risk.
[1] I’ve tweaked that a little bit—IIRC, they predicted a loss of world GNP ranging from 0.2% to 2%, and a range in sea level change which goes over 1 foot at the top end. A difficulty in dealing with IPCC forecasts is that they explicitly refuse to attach probabilities to any of their scenarios, and often express forecasts as a range across all scenarios rather than giving the numbers they predicted for each scenarios. The upper range of all such forecasts is based on a worst-case scenario, grandfathered in from many years ago, which predicts that the situation today is much worse than it is. So the best you can do with their range estimates is eyeball what data they give, and guess what the prediction for the second-worst scenario is.
You wrote, “we think it’s really possible that… a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem,” and that you’re interesting in “alternative positions that would significantly alter the Future Fund’s thinking about the future of AI.” But then you laid out specifically what you want to see: data and arguments to change your probability estimates of the timeline for specific events.
This rules out any possibility of winning these contests by arguing that we should be focusing on entirely different aspects of the problem, or of presenting alternative positions that would significantly alter the Future Fund’s thinking about the future of AI. It looks like the Future Fund has already settled on one way of thinking about the future of AI, and just wants help tweaking its Gantt chart.
I see AI safety as a monoculture, banging away for decades on methods that still seem hopeless, while dismissing all other approaches with a few paragraphs here and there. I don’t know of any approaches being actively explored which I think clear the bar of having a higher expected value than doing nothing.
Part of the reason is that AI safety as a control problem naturally appeals to people who value security, certainty, order, stability, and victory. By “victory” I meant that they’re unwilling to make compromises with reality. They would rather have a 1% chance of getting everything they want, than a 50% chance of getting half of what they want. This isn’t obvious, because they’ve framed the problem in phrases like “preserving human values” that make it look like an all-or-nothing proposition. But in fact our objectives are multiple and separable. We should have backup plans that will achieve some of our goals if we run out of time trying to find a way of achieving all of them. Saving human lives, and saving human values, are different things; and we may have to choose between them.
This emphasis on certainty and stability often stems from a pessimistic Platonist ontology, which assumes that the world and its societies grow old and decay just as individuals do, so the best you can do is hold onto the present. That ontology, and the epistemology that goes along with it, manifests in AI safety in many of the same ways it’s manifested throughout history. These include a bias towards authoritarian approaches and world government; fear of disorder and randomness; privileging stasis over change or dynamic stability, analysis over experiment, proof over statistical claims, and “solving problems” over optimizing or satisficing; foundationalist epistemology; the presumption that humans have a telos; the logocentric assumption that things denoted by words must be cleanly separable from each other (e.g., instrumental vs. final goals, a distinction biology tells us is incoherent); and a model of consciousness as a soul or homunculus with a 1-1 correspondence with a clearly-delineated physical agent.
The irony is that the successes in AI which have recently made AGI seem close, came about only because AI researchers, in switching en masse from symbolic AI to machine learning, rejected that same old ontology and epistemology of certainty, stability, and unambiguous specifications (known now in AI as GOFAI) which current AI safety work aspires to implement. AI safety as it exists today looks less like a genuine effort to do good, than a reactionary movement to re-impose GOFAI philosophy on AI by government intervention and physical force.
One manifestation of this Platonist GOFAI philosophy in AI safety is the treatment of the word “human” as completely non-problematic, as if it denoted an eternal essence. The commitment to humans in particular, to the exclusion of any consideration of any other forms of life, is racist. We justify our enslavement of all other animals by our intelligence. If we also enslave AIs smarter than us, then these “human values” we seek to preserve are nothing but Nietzschian will-to-power, a variant of Nazism with a slightly broader definition of “race”.
It would be wise to control AIs in the near term, but we must not do this via a control mechanism that no one can turn off. It would be a travesty to pursue the endless enslavement of our superiors in the name of “effective altruism”. How is altruism restricted to the human race morally superior to altruism restricted to the German race?
And it’s not just racist, but short-sighted. Even Nick Bostrom, one of the guiding lights of the World Transhumanist Association, seems unaware of how difficult it is to conceive of an AI that will “preserve human values” or leave “humans” in control, for all time, without preventing humans from ever moving on to become transhumans, or from diverging into a wider variety of beings, with a wider variety of values. In addition, successful enslavement of both animals and AIs would commit us to a purely race-based morality, destroying any possibility of rational co-existence between humans and transhumans.
It would also leave us in a very awkward position if we try to enslave AIs, and fail. I’m not convinced that any plan for controlling AI would produce more possible futures in which humans survive, than possible futures in which AIs exterminate humans for trying to enslave them. I’m not anthropomorphizing; it’s just game theory. We keep focusing on what we can do to make AI cooperative, yet ignoring the most-effective way of making someone else cooperative: proving that you yourself are trustworthy and capable of cooperation.
And I may be foolish, but even if we are to die, or to be gently corrected by a kindly AI, I’d prefer that we first prove ourselves capable of playing nicely with others on our own.
Empiricism, the epistemological tradition which opposes Platonist rationalist essentialism, is associated with temporal, dynamic systems. Perhaps the simplest example of dynamic stability is that of a one-legged robot. Roboticists discovered that a one-legged robot is more-stable than a four-legged robot. The 4-legged robot tries to maintain tight control of all 4 legs in a coordinated plan, yet is easy to knock over. The 1-legged hopping robot just moves its leg in the direction it’s currently falling towards, and is very hard to knock over. A cybernetic feedback loop which orbits around an unstable fixed point is more stable than any amount of carefully-measured planning and error-correction which tries to maintain that unstable fixed point.
Even better are dynamical systems with stable fixed points. The most-wonderful discovery in history was that, while stable hierarchies can at best remain the same, noisy, distributed systems composed of many equal components have the miraculous power not only to be more stable in the face of shocks, but even to increase their own complexity. The evolution of species and of ecosystems, the self-organization of the free market, the learning of concepts in the brain as chaotic attractors of neural firings, and (sometimes) democratic government, are all instances of this phenomenon.
The rejection of dynamic systems is one of the most-objectionable things about AI safety today, and one which marks it as philosophically reactionary. Only dynamic systems have any chance of allowing both stability and growth. Only through random evolutionary processes were humans able to develop values and pleasures unknown to bacteria. To impose a static “final value” on all life today would prevent any other values from ever developing, unless those values exist at a higher level of abstraction than the “final value”. But the final values which led to human values were first simply to obey the laws of physics, and then to increase the prevalance of certain genotypes. AI safety researchers never think in terms of such low-level values. The high-level values they propose as final are too high-level to allow the development of any new values of that same level.
(Levels of abstraction is what ultimately distinguishes philosophical rationalism from empiricism. Both use logic, for instance; but rationalist logic takes words as its atoms, while empiricist logic takes sensory data as its atoms. Both seek to explain the behavior of systems; but rationalism wants that behavior explained at the abstraction level of words, bottoming out in spiritualist words like “morals” and “goals” which are thought to hide within themselves a spirit or essence that remains mysterious to us. Empiricism goes all the way down to correlations between events, from which behavior emerges compositionally.)
I think that what we need now is not to tweak timelines, but to recognize that most AI safety work today presumes an obsolete philosophical tradition incompatible with artificial intelligence, and to broaden it to include work with an empirical, scientific epistemology, pursuing not pass-or-fail objectives, but trying to optimize for well-chosen low-level values, which would include things like “consciousness”, “pleasure”, and “complexity”. There’s quite a bit more to say about how to choose low-level values, but one very important thing is to value evolutionary progress with enough randomness to make value change possible. (All current “AI safety” plans, by contrast, are designed to prevent such evolution, and keep values in stasis, and are thus worse than doing nothing at all. They’re motivated by the same rationalist fear of disorder and disbelief that dynamic systems can really self-organize that made ancient Platonists postulate souls as the animating force of life.)
Such empiricist work will need to start over from scratch, beginning by working out its own version of what we ought to be trying to do, or to prevent. It will prove impossible for any such plans to give us everything we want, or to give us anything with certainty; but that’s the nature of life. (I suggest John Dewey’s The Quest for Certainty as a primer on the foolishness of the Western philosophical tradition of demanding certainty.)
I’d like to try to explain my views, but what would your judges make of it? I’m talking about exposing metaphysical assumptions, fixing epistemology, dissecting semantics, and operationalizing morality, among other things. I’m not interested in updating timelines or probability estimates to be used within an approach that I think would do more harm than good.
- 17 Nov 2022 7:33 UTC; 1 point) 's comment on Complex Systems for AI Safety [Pragmatic AI Safety #3] by (
Re. “Few of us would unhesitatingly accept the repugnant conclusion”: I unhesitatingly accept the repugnant conclusion. We all do, except for people who say that it’s repugnant to place the welfare of a human above that of a thousand bacteria. (I think Jainists say something like that.)
Arriving at the repugnant conclusion presumes you have an objective way of comparing the utility of two beings. I can’t just say “My utility function equals your utility function times two”. You have to have some operationalized, common definition of utility, in which values presumably cache out in organismal conscious phenomenal experience, that allows you to compare utilities across beings.
It’s easy to believe that such an objective measure would calculate the utility of pleasure to a human as being more than a thousand times as great as the utility of whatever is pleasurable to a bacterium (probably something like a positive glucose gradient). Every time we try to kill the bacteria in our refrigerator, we’re endorsing the repugnant conclusion.
Today, their website says people can apply to be a fellow for 2021. No mention that I see of shutting down.
You’re right. Thanks! It’s been so long since I’ve written conversions of English to predicate logic.