That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:
One of Peter’s first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.
Imagine it’s 1942. The Manhattan project is well under way, Leo Szilard has shown that it’s possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a “speculative cause”?
There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a “speculative cause” in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.
You might argue that it’s a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren’t charitable dollars supposed to go to starving children? Isn’t the NSF supposed to handle scientific funding? And I’d like to agree, but society has kinda been dropping the ball on this one.
If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I’d say that “strangelet safety” would be an extremely worthy cause.
How worthy? Hard to say. I agree with Peter that it’s hard to figure out how to trade off “safety of potentially-very-highly-impactful technology that is currently under furious development” against “children are dying of malaria”, but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.
Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we’re going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it’s hard, but I don’t think throwing out everything that doesn’t visibly pay off in the extremely short term is the answer.
Alternatively, you could argue that MIRI’s approach is unlikely to work. That’s one of Peter’s explicit arguments: it’s very hard to find interventions that reliably affect the future far in advance, especially when there aren’t hard objective metrics. I have three disagreements with Peter on this point.
First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn’t necessarily mean humans have a really hard time generating math—in fact, humans have a surprisingly good track record when it comes to generating math!
Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I’d be interested in any attempt to quantitatively evaluate this claim.)
Second, I agree in general that any one individual team isn’t all that likely to solve the AI alignment problem on their own. But the correct response to that isn’t “stop funding AI alignment teams”—it’s “fund more AI alignment teams”! If you’re trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn’t “don’t fund any containment groups at all,” the answer is “you’d better fund a few different containment groups, then!”
Third, I object to the whole “there’s no feedback” claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is “yes”—figuring out what was & wasn’t a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory. We’re trying to do something similar with various other confusing aspects of good reasoning (such as logical uncertainty), and you’re welcome to raise concerns about whether we need to understand good reasoning under logical uncertainty in order to build an aligned AI, but saying that there’s “no feedback loop” seems to just misunderstand the approach.
The smallpox vaccine was the first ever vaccine… a highly unproven cause. This site says it saved over half a billion lives. If there was an EA movement when Edward Jenner was alive hundreds of years ago, would it have sensibly advised Jenner to work on a different project because the idea of vaccines was an unproven one?
Note that most of the top lifesavers on ScienceHeros.com did research work, which is an inherently unprovable cause, but managed to save many more lives than a person donating to Givewell’s top charities can expect to save. Of course, scientific research can also backfire and cost lives. So one response to this might be to say: “scientific research is an unproven cause that’s hard to know the sign of, so we should ignore scientific research in favor of proven causes”. But to me this sounds like a head-in-the-sand approach. Scientific research is going to be by far the most significant bit affecting the future of life on Earth. I would rather see the EA movement try to develop tools to get better at predicting science impacts, or at least save money to nudge science when it’s more clear what impacts it might have.
The smallpox vaccine was the first ever vaccine… a highly unproven cause. [...] would it have sensibly advised Jenner to work on a different project because the idea of vaccines was an unproven one?
What do you think of the stability under self-modification example in this essay?
I haven’t taken the time to fully understand MIRI’s work. But my reading is that MIRI’s work is incremental without being empirical—like most people working in math & theoretical computer science, they are using proofs to advance their knowledge rather than randomized controlled trials. So this might meet the “tight feedback loops” criterion without meeting the “approached experimentally” criterion.
BTW, you might be interested in this comment of mine about important questions for which it’s hard to gather relevant experimental data.
Here are some related guesses of mine if anyone is interested:
The importance of the far future is so high that there’s nothing to do but bite the bullet and do the best we can to improve it.
MIRI represents a promising approach to improving the far future, but it shouldn’t be the only approach we investigate. For example, I would like to see an organization that attempted to forecast a broad variety of societal and technological trends, predict how they’ll interact, and try to identify the best spots to apply leverage.
The first thing to do is to improve our competency at predicting the future in general. The organization I describe could evolve out of a hedge fund that learned to generate superior returns through long-term trading, for instance. The approach to picking stocks that Charlie Munger, Warren Buffet’s partner, describes in Poor Charlie’s Almanack sounds like the sort of thing that might work for predicting other aspects of how the future will unfold. Munger reads a ton of books and uses a broad variety of mental frameworks to try to understand the assets he evaluates (more of a fox than a hedgehog).
(Interesting to note that the Givewell founders are ex-employees of Bridgewater, one of the world’s top hedge funds.)
A meta-level approach to predictions: push for the legalization of prediction markets that would let us aggregate the views of many people and financially incentivize them to forecast accurately. Although there are likely problems with this approach, e.g. markets for unwanted events creating financial incentives for speculators to cause those unwanted events.
When thinking about the far future, the best we may be able to do is identify specific key parameters that we think will have a positive impact on the future and then use experimental approaches with tight feedback loops to measure whether we are nudging those parameters in the right direction. For example, maybe we think a world with fewer belligerent people is one that’s more likely to survive existential threats. We write a bot that uses sentiment analysis to measure the level of belligerence in online discussion. We observe that the legalization of marijuana in a particular US state causes a noticeable drop in the level of belligerence of people talking online. We sponsor campaigns to legalize marijuana in other states and notice more drops. Etc. This isn’t a serious suggestion; legalizing marijuana in the US makes other countries like Iran and Russia even more belligerent by comparison; it’s just an illustration.
(Cast in these terms, MIRI is valuable if “a taller stack of quality AI safety papers leads to a world that’s more likely to survive AGI threats”.)
But maybe we think that even the truth value of a statement like “a world with fewer belligerent people is more likely to survive existential threats” is essentially a coin toss once you look far enough out. In that case, the best we can do might be to try to attain wealth and positions of power as a movement, while improving our prediction capabilities so they are at least a little better than everyone else’s. Maybe we’d be able to see Bad Stuff on the horizon before others were paying much attention and direct resources to avert it.
It might also be wise to develop a core competency in “averting disasters on the horizon”, whatever that might look like… e.g. practice actually nudging society to see which strategies work effectively. The broad ability to nudge society is one that can be developed through experiment and tight feedback loops, and could be effective for lots of different things.
financial incentives for speculators to cause those unwanted events.
I’ve never understood this argument. There has always been a latent incentive to off CEOs or destroy infrastructure and trade on the resulting stock price swings. In practice this is very difficult to pull off. Prediction markets would be under more scrutiny and thus harder to game in this manner.
To take a step back, this objection is yet another example of one that gets trotted out against prediction markets all the time but which has been addressed in the white papers on the topic.
What’s your response to Peter Hurford’s arguments in his article Why I’m Skeptical Of Unproven Causes...?
That post mixes a bunch of different assertions together, let me try to distill a few of them out and answer them in turn:
One of Peter’s first (implicit) points is that AI alignment is a speculative cause. I tend to disagree.
Imagine it’s 1942. The Manhattan project is well under way, Leo Szilard has shown that it’s possible to get a neutron chain reaction, and physicists are hard at work figuring out how to make an atom bomb. You suggest that this might be a fine time to start working on nuclear containment, so that, once humans are done bombing the everloving breath out of each other, they can harness nuclear energy for fun and profit. In this scenario, would nuclear containment be a “speculative cause”?
There are currently thousands of person-hours and billions of dollars going towards increasing AI capabilities every year. To call AI alignment a “speculative cause” in an environment such as this one seems fairly silly to me. In what sense is it speculative to work on improving the safety of the tools that other people are currently building as fast as they can? Now, I suppose you could argue that either (a) AI will never work or (b) it will be safe by default, but both those arguments seem pretty flimsy to me.
You might argue that it’s a bit weird for people to claim that the most effective place to put charitable dollars is towards some field of scientific study. Aren’t charitable dollars supposed to go to starving children? Isn’t the NSF supposed to handle scientific funding? And I’d like to agree, but society has kinda been dropping the ball on this one.
If we had strong reason to believe that humans could build strangelets, and society were pouring billions of dollars and thousands of human-years into making strangelets, and almost no money or effort was going towards strangelet containment, and it looked like humanity was likely to create a strangelet sometime in the next hundred years, then yeah, I’d say that “strangelet safety” would be an extremely worthy cause.
How worthy? Hard to say. I agree with Peter that it’s hard to figure out how to trade off “safety of potentially-very-highly-impactful technology that is currently under furious development” against “children are dying of malaria”, but the only way I know how to trade those things off is to do my best to run the numbers, and my back-of-the-envelope calculations currently say that AI alignment is further behind than the globe is poor.
Now that the EA movement is starting to look more seriously into high-impact interventions on the frontiers of science & mathematics, we’re going to need to come up with more sophisticated ways to assess the impacts and tradeoffs. I agree it’s hard, but I don’t think throwing out everything that doesn’t visibly pay off in the extremely short term is the answer.
Alternatively, you could argue that MIRI’s approach is unlikely to work. That’s one of Peter’s explicit arguments: it’s very hard to find interventions that reliably affect the future far in advance, especially when there aren’t hard objective metrics. I have three disagreements with Peter on this point.
First, I think he picks the wrong reference class: yes, humans have a really hard time generating big social shifts on purpose. But that doesn’t necessarily mean humans have a really hard time generating math—in fact, humans have a surprisingly good track record when it comes to generating math!
Humans actually seem to be pretty good at putting theoretical foundations underneath various fields when they try, and various people have demonstrably succeeded at this task (Church & Turing did this for computing, Shannon did this for information theory, Kolmogorov did a fair bit of this for probability theory, etc.). This suggests to me that humans are much better at producing technical progress in an unexplored field than they are at generating social outcomes in a complex economic environment. (I’d be interested in any attempt to quantitatively evaluate this claim.)
Second, I agree in general that any one individual team isn’t all that likely to solve the AI alignment problem on their own. But the correct response to that isn’t “stop funding AI alignment teams”—it’s “fund more AI alignment teams”! If you’re trying to ensure that nuclear power can be harnessed for the betterment of humankind, and you assign low odds to any particular research group solving the containment problem, then the answer isn’t “don’t fund any containment groups at all,” the answer is “you’d better fund a few different containment groups, then!”
Third, I object to the whole “there’s no feedback” claim. Did Kolmogorov have tight feedback when he was developing an early formalization of probability theory? It seems to me like the answer is “yes”—figuring out what was & wasn’t a mathematical model of the properties he was trying to capture served as a very tight feedback loop (mathematical theorems tend to be unambiguous), and indeed, it was sufficiently good feedback that Kolmogorov was successful in putting formal foundations underneath probability theory. We’re trying to do something similar with various other confusing aspects of good reasoning (such as logical uncertainty), and you’re welcome to raise concerns about whether we need to understand good reasoning under logical uncertainty in order to build an aligned AI, but saying that there’s “no feedback loop” seems to just misunderstand the approach.
Great article. My thoughts:
The smallpox vaccine was the first ever vaccine… a highly unproven cause. This site says it saved over half a billion lives. If there was an EA movement when Edward Jenner was alive hundreds of years ago, would it have sensibly advised Jenner to work on a different project because the idea of vaccines was an unproven one?
Note that most of the top lifesavers on ScienceHeros.com did research work, which is an inherently unprovable cause, but managed to save many more lives than a person donating to Givewell’s top charities can expect to save. Of course, scientific research can also backfire and cost lives. So one response to this might be to say: “scientific research is an unproven cause that’s hard to know the sign of, so we should ignore scientific research in favor of proven causes”. But to me this sounds like a head-in-the-sand approach. Scientific research is going to be by far the most significant bit affecting the future of life on Earth. I would rather see the EA movement try to develop tools to get better at predicting science impacts, or at least save money to nudge science when it’s more clear what impacts it might have.
I regret talking mainly about what is “unproven” when I really meant to talk about what (a) has tight feedback loops and (b) is approached experimentally. See the clarification in http://lesswrong.com/lw/ic0/where_ive_changed_my_mind_on_my_approach_to/
I think MIRI can fit this description in some ways (I’m particularly excited about the AI Impacts blog), but it doesn’t in other ways.
What do you think of the stability under self-modification example in this essay?
I haven’t taken the time to fully understand MIRI’s work. But my reading is that MIRI’s work is incremental without being empirical—like most people working in math & theoretical computer science, they are using proofs to advance their knowledge rather than randomized controlled trials. So this might meet the “tight feedback loops” criterion without meeting the “approached experimentally” criterion.
BTW, you might be interested in this comment of mine about important questions for which it’s hard to gather relevant experimental data.
Here are some related guesses of mine if anyone is interested:
The importance of the far future is so high that there’s nothing to do but bite the bullet and do the best we can to improve it.
MIRI represents a promising approach to improving the far future, but it shouldn’t be the only approach we investigate. For example, I would like to see an organization that attempted to forecast a broad variety of societal and technological trends, predict how they’ll interact, and try to identify the best spots to apply leverage.
The first thing to do is to improve our competency at predicting the future in general. The organization I describe could evolve out of a hedge fund that learned to generate superior returns through long-term trading, for instance. The approach to picking stocks that Charlie Munger, Warren Buffet’s partner, describes in Poor Charlie’s Almanack sounds like the sort of thing that might work for predicting other aspects of how the future will unfold. Munger reads a ton of books and uses a broad variety of mental frameworks to try to understand the assets he evaluates (more of a fox than a hedgehog).
(Interesting to note that the Givewell founders are ex-employees of Bridgewater, one of the world’s top hedge funds.)
A meta-level approach to predictions: push for the legalization of prediction markets that would let us aggregate the views of many people and financially incentivize them to forecast accurately. Although there are likely problems with this approach, e.g. markets for unwanted events creating financial incentives for speculators to cause those unwanted events.
When thinking about the far future, the best we may be able to do is identify specific key parameters that we think will have a positive impact on the future and then use experimental approaches with tight feedback loops to measure whether we are nudging those parameters in the right direction. For example, maybe we think a world with fewer belligerent people is one that’s more likely to survive existential threats. We write a bot that uses sentiment analysis to measure the level of belligerence in online discussion. We observe that the legalization of marijuana in a particular US state causes a noticeable drop in the level of belligerence of people talking online. We sponsor campaigns to legalize marijuana in other states and notice more drops. Etc. This isn’t a serious suggestion; legalizing marijuana in the US makes other countries like Iran and Russia even more belligerent by comparison; it’s just an illustration.
(Cast in these terms, MIRI is valuable if “a taller stack of quality AI safety papers leads to a world that’s more likely to survive AGI threats”.)
But maybe we think that even the truth value of a statement like “a world with fewer belligerent people is more likely to survive existential threats” is essentially a coin toss once you look far enough out. In that case, the best we can do might be to try to attain wealth and positions of power as a movement, while improving our prediction capabilities so they are at least a little better than everyone else’s. Maybe we’d be able to see Bad Stuff on the horizon before others were paying much attention and direct resources to avert it.
It might also be wise to develop a core competency in “averting disasters on the horizon”, whatever that might look like… e.g. practice actually nudging society to see which strategies work effectively. The broad ability to nudge society is one that can be developed through experiment and tight feedback loops, and could be effective for lots of different things.
Related: Robin Hanson and Charles Twardy AMA on the SciCast tech forecasting project. Some correlates of forecasting success.
I’ve never understood this argument. There has always been a latent incentive to off CEOs or destroy infrastructure and trade on the resulting stock price swings. In practice this is very difficult to pull off. Prediction markets would be under more scrutiny and thus harder to game in this manner.
To take a step back, this objection is yet another example of one that gets trotted out against prediction markets all the time but which has been addressed in the white papers on the topic.