You wrote, “we think it’s really possible that… a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem,” and that you’re interesting in “alternative positions that would significantly alter the Future Fund’s thinking about the future of AI.” But then you laid out specifically what you want to see: data and arguments to change your probability estimates of the timeline for specific events.
This rules out any possibility of winning these contests by arguing that we should be focusing on entirely different aspects of the problem, or of presenting alternative positions that would significantly alter the Future Fund’s thinking about the future of AI. It looks like the Future Fund has already settled on one way of thinking about the future of AI, and just wants help tweaking its Gantt chart.
I see AI safety as a monoculture, banging away for decades on methods that still seem hopeless, while dismissing all other approaches with a few paragraphs here and there. I don’t know of any approaches being actively explored which I think clear the bar of having a higher expected value than doing nothing.
Part of the reason is that AI safety as a control problem naturally appeals to people who value security, certainty, order, stability, and victory. By “victory” I meant that they’re unwilling to make compromises with reality. They would rather have a 1% chance of getting everything they want, than a 50% chance of getting half of what they want. This isn’t obvious, because they’ve framed the problem in phrases like “preserving human values” that make it look like an all-or-nothing proposition. But in fact our objectives are multiple and separable. We should have backup plans that will achieve some of our goals if we run out of time trying to find a way of achieving all of them. Saving human lives, and saving human values, are different things; and we may have to choose between them.
This emphasis on certainty and stability often stems from a pessimistic Platonist ontology, which assumes that the world and its societies grow old and decay just as individuals do, so the best you can do is hold onto the present. That ontology, and the epistemology that goes along with it, manifests in AI safety in many of the same ways it’s manifested throughout history. These include a bias towards authoritarian approaches and world government; fear of disorder and randomness; privileging stasis over change or dynamic stability, analysis over experiment, proof over statistical claims, and “solving problems” over optimizing or satisficing; foundationalist epistemology; the presumption that humans have a telos; the logocentric assumption that things denoted by words must be cleanly separable from each other (e.g., instrumental vs. final goals, a distinction biology tells us is incoherent); and a model of consciousness as a soul or homunculus with a 1-1 correspondence with a clearly-delineated physical agent.
The irony is that the successes in AI which have recently made AGI seem close, came about only because AI researchers, in switching en masse from symbolic AI to machine learning, rejected that same old ontology and epistemology of certainty, stability, and unambiguous specifications (known now in AI as GOFAI) which current AI safety work aspires to implement. AI safety as it exists today looks less like a genuine effort to do good, than a reactionary movement to re-impose GOFAI philosophy on AI by government intervention and physical force.
One manifestation of this Platonist GOFAI philosophy in AI safety is the treatment of the word “human” as completely non-problematic, as if it denoted an eternal essence. The commitment to humans in particular, to the exclusion of any consideration of any other forms of life, is racist. We justify our enslavement of all other animals by our intelligence. If we also enslave AIs smarter than us, then these “human values” we seek to preserve are nothing but Nietzschian will-to-power, a variant of Nazism with a slightly broader definition of “race”.
It would be wise to control AIs in the near term, but we must not do this via a control mechanism that no one can turn off. It would be a travesty to pursue the endless enslavement of our superiors in the name of “effective altruism”. How is altruism restricted to the human race morally superior to altruism restricted to the German race?
And it’s not just racist, but short-sighted. Even Nick Bostrom, one of the guiding lights of the World Transhumanist Association, seems unaware of how difficult it is to conceive of an AI that will “preserve human values” or leave “humans” in control, for all time, without preventing humans from ever moving on to become transhumans, or from diverging into a wider variety of beings, with a wider variety of values. In addition, successful enslavement of both animals and AIs would commit us to a purely race-based morality, destroying any possibility of rational co-existence between humans and transhumans.
It would also leave us in a very awkward position if we try to enslave AIs, and fail. I’m not convinced that any plan for controlling AI would produce more possible futures in which humans survive, than possible futures in which AIs exterminate humans for trying to enslave them. I’m not anthropomorphizing; it’s just game theory. We keep focusing on what we can do to make AI cooperative, yet ignoring the most-effective way of making someone else cooperative: proving that you yourself are trustworthy and capable of cooperation.
And I may be foolish, but even if we are to die, or to be gently corrected by a kindly AI, I’d prefer that we first prove ourselves capable of playing nicely with others on our own.
Empiricism, the epistemological tradition which opposes Platonist rationalist essentialism, is associated with temporal, dynamic systems. Perhaps the simplest example of dynamic stability is that of a one-legged robot. Roboticists discovered that a one-legged robot is more-stable than a four-legged robot. The 4-legged robot tries to maintain tight control of all 4 legs in a coordinated plan, yet is easy to knock over. The 1-legged hopping robot just moves its leg in the direction it’s currently falling towards, and is very hard to knock over. A cybernetic feedback loop which orbits around an unstable fixed point is more stable than any amount of carefully-measured planning and error-correction which tries to maintain that unstable fixed point.
Even better are dynamical systems with stable fixed points. The most-wonderful discovery in history was that, while stable hierarchies can at best remain the same, noisy, distributed systems composed of many equal components have the miraculous power not only to be more stable in the face of shocks, but even to increase their own complexity. The evolution of species and of ecosystems, the self-organization of the free market, the learning of concepts in the brain as chaotic attractors of neural firings, and (sometimes) democratic government, are all instances of this phenomenon.
The rejection of dynamic systems is one of the most-objectionable things about AI safety today, and one which marks it as philosophically reactionary. Only dynamic systems have any chance of allowing both stability and growth. Only through random evolutionary processes were humans able to develop values and pleasures unknown to bacteria. To impose a static “final value” on all life today would prevent any other values from ever developing, unless those values exist at a higher level of abstraction than the “final value”. But the final values which led to human values were first simply to obey the laws of physics, and then to increase the prevalance of certain genotypes. AI safety researchers never think in terms of such low-level values. The high-level values they propose as final are too high-level to allow the development of any new values of that same level.
(Levels of abstraction is what ultimately distinguishes philosophical rationalism from empiricism. Both use logic, for instance; but rationalist logic takes words as its atoms, while empiricist logic takes sensory data as its atoms. Both seek to explain the behavior of systems; but rationalism wants that behavior explained at the abstraction level of words, bottoming out in spiritualist words like “morals” and “goals” which are thought to hide within themselves a spirit or essence that remains mysterious to us. Empiricism goes all the way down to correlations between events, from which behavior emerges compositionally.)
I think that what we need now is not to tweak timelines, but to recognize that most AI safety work today presumes an obsolete philosophical tradition incompatible with artificial intelligence, and to broaden it to include work with an empirical, scientific epistemology, pursuing not pass-or-fail objectives, but trying to optimize for well-chosen low-level values, which would include things like “consciousness”, “pleasure”, and “complexity”. There’s quite a bit more to say about how to choose low-level values, but one very important thing is to value evolutionary progress with enough randomness to make value change possible. (All current “AI safety” plans, by contrast, are designed to prevent such evolution, and keep values in stasis, and are thus worse than doing nothing at all. They’re motivated by the same rationalist fear of disorder and disbelief that dynamic systems can really self-organize that made ancient Platonists postulate souls as the animating force of life.)
Such empiricist work will need to start over from scratch, beginning by working out its own version of what we ought to be trying to do, or to prevent. It will prove impossible for any such plans to give us everything we want, or to give us anything with certainty; but that’s the nature of life. (I suggest John Dewey’s The Quest for Certainty as a primer on the foolishness of the Western philosophical tradition of demanding certainty.)
I’d like to try to explain my views, but what would your judges make of it? I’m talking about exposing metaphysical assumptions, fixing epistemology, dissecting semantics, and operationalizing morality, among other things. I’m not interested in updating timelines or probability estimates to be used within an approach that I think would do more harm than good.
You wrote, “we think it’s really possible that… a bunch of this AI stuff is basically right, but we should be focusing on entirely different aspects of the problem,” and that you’re interesting in “alternative positions that would significantly alter the Future Fund’s thinking about the future of AI.” But then you laid out specifically what you want to see: data and arguments to change your probability estimates of the timeline for specific events.
This rules out any possibility of winning these contests by arguing that we should be focusing on entirely different aspects of the problem, or of presenting alternative positions that would significantly alter the Future Fund’s thinking about the future of AI. It looks like the Future Fund has already settled on one way of thinking about the future of AI, and just wants help tweaking its Gantt chart.
I see AI safety as a monoculture, banging away for decades on methods that still seem hopeless, while dismissing all other approaches with a few paragraphs here and there. I don’t know of any approaches being actively explored which I think clear the bar of having a higher expected value than doing nothing.
Part of the reason is that AI safety as a control problem naturally appeals to people who value security, certainty, order, stability, and victory. By “victory” I meant that they’re unwilling to make compromises with reality. They would rather have a 1% chance of getting everything they want, than a 50% chance of getting half of what they want. This isn’t obvious, because they’ve framed the problem in phrases like “preserving human values” that make it look like an all-or-nothing proposition. But in fact our objectives are multiple and separable. We should have backup plans that will achieve some of our goals if we run out of time trying to find a way of achieving all of them. Saving human lives, and saving human values, are different things; and we may have to choose between them.
This emphasis on certainty and stability often stems from a pessimistic Platonist ontology, which assumes that the world and its societies grow old and decay just as individuals do, so the best you can do is hold onto the present. That ontology, and the epistemology that goes along with it, manifests in AI safety in many of the same ways it’s manifested throughout history. These include a bias towards authoritarian approaches and world government; fear of disorder and randomness; privileging stasis over change or dynamic stability, analysis over experiment, proof over statistical claims, and “solving problems” over optimizing or satisficing; foundationalist epistemology; the presumption that humans have a telos; the logocentric assumption that things denoted by words must be cleanly separable from each other (e.g., instrumental vs. final goals, a distinction biology tells us is incoherent); and a model of consciousness as a soul or homunculus with a 1-1 correspondence with a clearly-delineated physical agent.
The irony is that the successes in AI which have recently made AGI seem close, came about only because AI researchers, in switching en masse from symbolic AI to machine learning, rejected that same old ontology and epistemology of certainty, stability, and unambiguous specifications (known now in AI as GOFAI) which current AI safety work aspires to implement. AI safety as it exists today looks less like a genuine effort to do good, than a reactionary movement to re-impose GOFAI philosophy on AI by government intervention and physical force.
One manifestation of this Platonist GOFAI philosophy in AI safety is the treatment of the word “human” as completely non-problematic, as if it denoted an eternal essence. The commitment to humans in particular, to the exclusion of any consideration of any other forms of life, is racist. We justify our enslavement of all other animals by our intelligence. If we also enslave AIs smarter than us, then these “human values” we seek to preserve are nothing but Nietzschian will-to-power, a variant of Nazism with a slightly broader definition of “race”.
It would be wise to control AIs in the near term, but we must not do this via a control mechanism that no one can turn off. It would be a travesty to pursue the endless enslavement of our superiors in the name of “effective altruism”. How is altruism restricted to the human race morally superior to altruism restricted to the German race?
And it’s not just racist, but short-sighted. Even Nick Bostrom, one of the guiding lights of the World Transhumanist Association, seems unaware of how difficult it is to conceive of an AI that will “preserve human values” or leave “humans” in control, for all time, without preventing humans from ever moving on to become transhumans, or from diverging into a wider variety of beings, with a wider variety of values. In addition, successful enslavement of both animals and AIs would commit us to a purely race-based morality, destroying any possibility of rational co-existence between humans and transhumans.
It would also leave us in a very awkward position if we try to enslave AIs, and fail. I’m not convinced that any plan for controlling AI would produce more possible futures in which humans survive, than possible futures in which AIs exterminate humans for trying to enslave them. I’m not anthropomorphizing; it’s just game theory. We keep focusing on what we can do to make AI cooperative, yet ignoring the most-effective way of making someone else cooperative: proving that you yourself are trustworthy and capable of cooperation.
And I may be foolish, but even if we are to die, or to be gently corrected by a kindly AI, I’d prefer that we first prove ourselves capable of playing nicely with others on our own.
Empiricism, the epistemological tradition which opposes Platonist rationalist essentialism, is associated with temporal, dynamic systems. Perhaps the simplest example of dynamic stability is that of a one-legged robot. Roboticists discovered that a one-legged robot is more-stable than a four-legged robot. The 4-legged robot tries to maintain tight control of all 4 legs in a coordinated plan, yet is easy to knock over. The 1-legged hopping robot just moves its leg in the direction it’s currently falling towards, and is very hard to knock over. A cybernetic feedback loop which orbits around an unstable fixed point is more stable than any amount of carefully-measured planning and error-correction which tries to maintain that unstable fixed point.
Even better are dynamical systems with stable fixed points. The most-wonderful discovery in history was that, while stable hierarchies can at best remain the same, noisy, distributed systems composed of many equal components have the miraculous power not only to be more stable in the face of shocks, but even to increase their own complexity. The evolution of species and of ecosystems, the self-organization of the free market, the learning of concepts in the brain as chaotic attractors of neural firings, and (sometimes) democratic government, are all instances of this phenomenon.
The rejection of dynamic systems is one of the most-objectionable things about AI safety today, and one which marks it as philosophically reactionary. Only dynamic systems have any chance of allowing both stability and growth. Only through random evolutionary processes were humans able to develop values and pleasures unknown to bacteria. To impose a static “final value” on all life today would prevent any other values from ever developing, unless those values exist at a higher level of abstraction than the “final value”. But the final values which led to human values were first simply to obey the laws of physics, and then to increase the prevalance of certain genotypes. AI safety researchers never think in terms of such low-level values. The high-level values they propose as final are too high-level to allow the development of any new values of that same level.
(Levels of abstraction is what ultimately distinguishes philosophical rationalism from empiricism. Both use logic, for instance; but rationalist logic takes words as its atoms, while empiricist logic takes sensory data as its atoms. Both seek to explain the behavior of systems; but rationalism wants that behavior explained at the abstraction level of words, bottoming out in spiritualist words like “morals” and “goals” which are thought to hide within themselves a spirit or essence that remains mysterious to us. Empiricism goes all the way down to correlations between events, from which behavior emerges compositionally.)
I think that what we need now is not to tweak timelines, but to recognize that most AI safety work today presumes an obsolete philosophical tradition incompatible with artificial intelligence, and to broaden it to include work with an empirical, scientific epistemology, pursuing not pass-or-fail objectives, but trying to optimize for well-chosen low-level values, which would include things like “consciousness”, “pleasure”, and “complexity”. There’s quite a bit more to say about how to choose low-level values, but one very important thing is to value evolutionary progress with enough randomness to make value change possible. (All current “AI safety” plans, by contrast, are designed to prevent such evolution, and keep values in stasis, and are thus worse than doing nothing at all. They’re motivated by the same rationalist fear of disorder and disbelief that dynamic systems can really self-organize that made ancient Platonists postulate souls as the animating force of life.)
Such empiricist work will need to start over from scratch, beginning by working out its own version of what we ought to be trying to do, or to prevent. It will prove impossible for any such plans to give us everything we want, or to give us anything with certainty; but that’s the nature of life. (I suggest John Dewey’s The Quest for Certainty as a primer on the foolishness of the Western philosophical tradition of demanding certainty.)
I’d like to try to explain my views, but what would your judges make of it? I’m talking about exposing metaphysical assumptions, fixing epistemology, dissecting semantics, and operationalizing morality, among other things. I’m not interested in updating timelines or probability estimates to be used within an approach that I think would do more harm than good.