Thanks, this is helpful! I’m in the middle of writing some posts laying out my reasoning… but it looks like it’ll take a few more weeks at least, given how long it’s taken so far.
Funnily enough, all three of the sources of skepticism you mention are things that I happen to have written things about or else am in the process of writing something about. This is probably a coincidence. Here are my answers to 1, 2, and 3, or more like teasers of answers:
1. I agree, it could. But it also could not. I think a non-agent AGI would also be a big deal; in fact I think there are multiple potential AI-induced points of no return. (For example, a non-agent AGI could be retrained to be an agent, or could be a component of a larger agenty system, or could be used to research agenty systems faster, or could create a vulnerable world that ends quickly or goes insane.) I’m also working on a post arguing that the millions of years of evolution don’t mean shit and that while humans aren’t blank slates they might as well be for purposes of AI forecasting. :)
2. My model for predicting AI timelines (which I am working on a post for) is similar to Ajeya’s. I don’t think it’s fair to describe it as an extrapolation of current trends; rather, it constructs a reasonable prior over how much compute should be needed to get to AGI, and then we update on the fact that the amount of compute we have so far hasn’t been enough, and make our timelines by projecting how the price of compute will drop. (So yeah, we are extrapolating compute price trends, but those seem fairly solid to extrapolate, given the many decades across which they’ve held fairly steady, and given that we only need to extrapolate them for a few more years to get a non-trivial probability.)
3. Yes, this is something that’s been discussed at length. There are lots of ways things could go wrong. For example, the people who build AGI will be thinking that they can use it for something, otherwise they wouldn’t have built it. By default it will be out in the world doing things; if we want it to be locked in a box under study (for a long period of time that it can’t just wait patiently through), we need to do lots of AI risk awareness-raising. Alternatively, AI might be good enough at persuasion to convince some of the relevant people that it is trustworthy when it isn’t. This is probably easier than it sounds, given how much popular media is suffused with “But humans are actually the bad guys, keeping sentient robots as slaves!” memes. (Also because there probably will be more than one team of people and one AI; it could be dozens of AIs talking to thousands or millions of people each. With competitive pressure to give them looser and looser restrictions so they can go faster and make more money or whatever.) As for whether we’d shut it off after we catch it doing dangerous things—well, it wouldn’t do them if it thought we’d notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.
I guess a few quick responses to each, although I haven’t read through your links yet.
I think agenty systems in general can still be very limited in how competent they are, due to the same data/training bottlenecks, even if you integrate a non-agential AGI into the system.
I did see Ajeya’s post and read Rohin’s summary. I think there might not be any one most reasonable prior for compute necessary for AGI (or whether hitting some level of compute is enough, even given enough data or sufficiently complex training environments), since this will need to make strong and basically unjustified assumptions about whether current approaches (or the next approaches we will come up with) can scale to AGI. Still, this doesn’t mean AGI timelines aren’t short; it might just means you should do a sensitivity analysis on different priors when you’re thinking of supporting or doing certain work. And, of course, they did do such a sensitivity analysis for the timeline question.
In response to this specifically, “As for whether we’d shut it off after we catch it doing dangerous things—well, it wouldn’t do them if it thought we’d notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.”, what other kinds of ways do you expect it would go very badly? Is it mostly unknown unknowns?
Thanks, this is helpful! I’m in the middle of writing some posts laying out my reasoning… but it looks like it’ll take a few more weeks at least, given how long it’s taken so far.
Funnily enough, all three of the sources of skepticism you mention are things that I happen to have written things about or else am in the process of writing something about. This is probably a coincidence. Here are my answers to 1, 2, and 3, or more like teasers of answers:
1. I agree, it could. But it also could not. I think a non-agent AGI would also be a big deal; in fact I think there are multiple potential AI-induced points of no return. (For example, a non-agent AGI could be retrained to be an agent, or could be a component of a larger agenty system, or could be used to research agenty systems faster, or could create a vulnerable world that ends quickly or goes insane.) I’m also working on a post arguing that the millions of years of evolution don’t mean shit and that while humans aren’t blank slates they might as well be for purposes of AI forecasting. :)
2. My model for predicting AI timelines (which I am working on a post for) is similar to Ajeya’s. I don’t think it’s fair to describe it as an extrapolation of current trends; rather, it constructs a reasonable prior over how much compute should be needed to get to AGI, and then we update on the fact that the amount of compute we have so far hasn’t been enough, and make our timelines by projecting how the price of compute will drop. (So yeah, we are extrapolating compute price trends, but those seem fairly solid to extrapolate, given the many decades across which they’ve held fairly steady, and given that we only need to extrapolate them for a few more years to get a non-trivial probability.)
3. Yes, this is something that’s been discussed at length. There are lots of ways things could go wrong. For example, the people who build AGI will be thinking that they can use it for something, otherwise they wouldn’t have built it. By default it will be out in the world doing things; if we want it to be locked in a box under study (for a long period of time that it can’t just wait patiently through), we need to do lots of AI risk awareness-raising. Alternatively, AI might be good enough at persuasion to convince some of the relevant people that it is trustworthy when it isn’t. This is probably easier than it sounds, given how much popular media is suffused with “But humans are actually the bad guys, keeping sentient robots as slaves!” memes. (Also because there probably will be more than one team of people and one AI; it could be dozens of AIs talking to thousands or millions of people each. With competitive pressure to give them looser and looser restrictions so they can go faster and make more money or whatever.) As for whether we’d shut it off after we catch it doing dangerous things—well, it wouldn’t do them if it thought we’d notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.
I guess a few quick responses to each, although I haven’t read through your links yet.
I think agenty systems in general can still be very limited in how competent they are, due to the same data/training bottlenecks, even if you integrate a non-agential AGI into the system.
I did see Ajeya’s post and read Rohin’s summary. I think there might not be any one most reasonable prior for compute necessary for AGI (or whether hitting some level of compute is enough, even given enough data or sufficiently complex training environments), since this will need to make strong and basically unjustified assumptions about whether current approaches (or the next approaches we will come up with) can scale to AGI. Still, this doesn’t mean AGI timelines aren’t short; it might just means you should do a sensitivity analysis on different priors when you’re thinking of supporting or doing certain work. And, of course, they did do such a sensitivity analysis for the timeline question.
In response to this specifically, “As for whether we’d shut it off after we catch it doing dangerous things—well, it wouldn’t do them if it thought we’d notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.”, what other kinds of ways do you expect it would go very badly? Is it mostly unknown unknowns?
Well, I look forward to talking more sometime! No rush, let me know if and when you are interested.
On point no. 3 in particular, here are some relevant parables (a bit lengthy, but also fun to read!) https://www.lesswrong.com/posts/5wMcKNAwB6X4mp9og/that-alien-message
https://www.lesswrong.com/posts/bTW87r8BrN3ySrHda/starwink-by-alicorn
https://www.gregegan.net/MISC/CRYSTAL/Crystal.html (I especially recommend this last one, it’s less relevant to our discussion but a better story and raises some important ethical issues.)