Nitpick: doesn’t the argument you made also assume that there’ll be a big discontinuity right before AGI? That seems necessary for the premise about “extremely novel software” (rather than “incrementally novel software”) to hold.
I do think that AGI will be developed by methods that are relatively novel. Like, I’ll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I’ll be more surprised still if all of the core ideas are >12 years old.
(Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don’t expect us to build AGI by just throwing more compute and data at GPT-n.)
Separately and with more confidence, I’m expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of “click” moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.
“Superhuman AI isn’t just GPT-3 but thinking faster and remembering more things” (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3′s brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI’s brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.
I’m happy to add more points like these to the stew so they can be talked about. “Your list of reasons for thinking AGI risk is high didn’t explicitly mention X” is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn’t explicitly mention “intelligence is a property of matter rather than of souls imparted into particular animal species by God”, “AGI isn’t thousands of years in the future”, “most random goals would produce bad outcomes if optimized by a superintelligence”...)
Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say “ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn’t mention [X] by name”, as opposed to speaking as though there’s an objectively correct level of granularity.
Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:
For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect.
The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn’t obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn’t calling obviously stupid.(E.g., the OP didn’t say that it’s obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)
The OP didn’t issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there’s to be any objective answer), so I didn’t try to explicitly write out obvious popularly held beliefs like “AGI is more powerful than PowerPoint”. I’m happy to do that if someone wants to shift the conversation there, but hopefully it’s obvious why I didn’t do that originally.
Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn’t mean to fault you for not meeting an (unmade) challenge to list all your assumptions—I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption—enough that the argument alone (along with other, largely uncontroversial assumptions) doesn’t make it “quite easy to reach extremely dire forecasts about AGI.” (Though I was thinking more about 90%+ forecasts.)
(That assumption—i.e. the main claims in the 3rd paragraph of your response—seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)
Nitpick: doesn’t the argument you made also assume that there’ll be a big discontinuity right before AGI? That seems necessary for the premise about “extremely novel software” (rather than “incrementally novel software”) to hold.
I do think that AGI will be developed by methods that are relatively novel. Like, I’ll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I’ll be more surprised still if all of the core ideas are >12 years old.
(Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don’t expect us to build AGI by just throwing more compute and data at GPT-n.)
Separately and with more confidence, I’m expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of “click” moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.
“Superhuman AI isn’t just GPT-3 but thinking faster and remembering more things” (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3′s brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI’s brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.
I’m happy to add more points like these to the stew so they can be talked about. “Your list of reasons for thinking AGI risk is high didn’t explicitly mention X” is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn’t explicitly mention “intelligence is a property of matter rather than of souls imparted into particular animal species by God”, “AGI isn’t thousands of years in the future”, “most random goals would produce bad outcomes if optimized by a superintelligence”...)
Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say “ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn’t mention [X] by name”, as opposed to speaking as though there’s an objectively correct level of granularity.
Like, the original thing I said was:
Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:
The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn’t obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn’t calling obviously stupid. (E.g., the OP didn’t say that it’s obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)
The OP didn’t issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there’s to be any objective answer), so I didn’t try to explicitly write out obvious popularly held beliefs like “AGI is more powerful than PowerPoint”. I’m happy to do that if someone wants to shift the conversation there, but hopefully it’s obvious why I didn’t do that originally.
Fair! Sorry for the slow reply, I missed the comment notification earlier.
I could have been clearer in what I was trying to point at with my comment. I didn’t mean to fault you for not meeting an (unmade) challenge to list all your assumptions—I agree that would be unreasonable.
Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption—enough that the argument alone (along with other, largely uncontroversial assumptions) doesn’t make it “quite easy to reach extremely dire forecasts about AGI.” (Though I was thinking more about 90%+ forecasts.)
(That assumption—i.e. the main claims in the 3rd paragraph of your response—seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)