I mostly agree with Rohinâs answer, and Iâm pretty skeptical overall of AI safety as a cause area, although I have deep uncertainty about this and might hedge by supporting s-risk-focused work.
Are you primarily interested in these trades with people who already prioritize AI safety?
On 3, do you mean youâd start putting x% after the first 5 years?
I think itâs plausible you could find people who are undecided between AI safety with short timelines and other cause areas or between short and long timelines, and pay them enough to work on AI safety for short timelines, since they could address their uncertainty with donations outside of (short timeline) AI safety. Iâve worked as a deep learning researcher/âengineer to earn-to-give for animal welfare, and I have considered working in AI safety, focusing on worst-case scenarios (CLR, CRS) or to earn-to-give for animal welfare. I think technical AI safety would be more interesting and motivating than my past work in deep learning, and perhaps more interesting day-to-day than my current plans but less motivating in the long run due to my skepticism. I was preparing to apply to CLRâs internship for this past summer, but got an internship offer from Charity Entrepreneurship first and decided to go with that. I know one person who did something similar but went with AI safety instead.
It might be too expensive to pay people interested in earning-to-give enough to earn-to-give in (short timeline) AI safety, if AI safety isnât already one of their top priorities. Also, they donât even have to be EAs; you could find people who would just find the work interesting (e.g. people with graduate degrees in related subjects) but are worried about it not paying enough. You could take out loans to do this, but this kind of defies common sense and sounds pretty crazy to me.
(FWIW, my own price to work on AI safety (short or long timelines) is probably too high now, and, of course, thereâs the question of whether Iâm a good fit, anyway.)
Sorry for the delayed reply. Iâm primarily interested in making these trades with people who have a similar worldview to me, because this increases the chance that as a result of the trade they will start working on the things I think are most valuable. Iâd be happy to talk with other people too, except that if thereâs so much inferential distance to cross it would be more for fun than for impact. That said, maybe Iâm modelling this wrong.
Yes, for no. 3 I meant after the first 5 years. Good catch.
It sounds like you might be a good fit for this sort of thing! Want to have a call to chat sometime? Iâm also interested in doing no. 2 with youâŚ
I think Iâd need to read more before we could have a very productive conversation. If you want to point me to some writing that you found most persuasive for short timelines (or you could write a post laying out your reasoning, if you havenât already; this could get more useful community discussion, too), that would be helpful. I donât want to commit to anything yet, though. Iâm also not that well-read on AI safety in general.
I guess a few sources of skepticism I have now are:
Training an agent to be generally competent in interactions with humans and our systems (even virtually, and not just in conversation) could be too slow or require more complex simulated data than is feasible. Maybe a new version of GPT will be an AGI but not an agent and that might come soon, and while that could still be very impactful, it might not pose an existential risk. Animals as RL agents have had millions of years of evolution to have strong priors fitted to real world environments built into each individual.
Iâm just skeptical about trying to extrapolate current trends to AGI.
On AI risk more generally, Iâm skeptical that an AI could acquire and keep enough resources without the backing of people with access to them to be very dangerous. It would have to deceive us at least until itâs too late for us to cut its access (and I havenât heard of such a scenario that wasnât far-fetched), e.g. by cutting the power or internet, which we can do physically, including by bombing. If we do catch it doing something dangerous, we will cut access. It would need access to powerful weapons to protect its access to resources or do much harm before we could cut its access to resources. This seems kind of obvious, though, so I imagine there are some responses from the AI safety community.
Thanks, this is helpful! Iâm in the middle of writing some posts laying out my reasoning⌠but it looks like itâll take a few more weeks at least, given how long itâs taken so far.
Funnily enough, all three of the sources of skepticism you mention are things that I happen to have written things about or else am in the process of writing something about. This is probably a coincidence. Here are my answers to 1, 2, and 3, or more like teasers of answers:
1. I agree, it could. But it also could not. I think a non-agent AGI would also be a big deal; in fact I think there are multiple potential AI-induced points of no return. (For example, a non-agent AGI could be retrained to be an agent, or could be a component of a larger agenty system, or could be used to research agenty systems faster, or could create a vulnerable world that ends quickly or goes insane.) Iâm also working on a post arguing that the millions of years of evolution donât mean shit and that while humans arenât blank slates they might as well be for purposes of AI forecasting. :)
2. My model for predicting AI timelines (which I am working on a post for) is similar to Ajeyaâs. I donât think itâs fair to describe it as an extrapolation of current trends; rather, it constructs a reasonable prior over how much compute should be needed to get to AGI, and then we update on the fact that the amount of compute we have so far hasnât been enough, and make our timelines by projecting how the price of compute will drop. (So yeah, we are extrapolating compute price trends, but those seem fairly solid to extrapolate, given the many decades across which theyâve held fairly steady, and given that we only need to extrapolate them for a few more years to get a non-trivial probability.)
3. Yes, this is something thatâs been discussed at length. There are lots of ways things could go wrong. For example, the people who build AGI will be thinking that they can use it for something, otherwise they wouldnât have built it. By default it will be out in the world doing things; if we want it to be locked in a box under study (for a long period of time that it canât just wait patiently through), we need to do lots of AI risk awareness-raising. Alternatively, AI might be good enough at persuasion to convince some of the relevant people that it is trustworthy when it isnât. This is probably easier than it sounds, given how much popular media is suffused with âBut humans are actually the bad guys, keeping sentient robots as slaves!â memes. (Also because there probably will be more than one team of people and one AI; it could be dozens of AIs talking to thousands or millions of people each. With competitive pressure to give them looser and looser restrictions so they can go faster and make more money or whatever.) As for whether weâd shut it off after we catch it doing dangerous thingsâwell, it wouldnât do them if it thought weâd notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.
I guess a few quick responses to each, although I havenât read through your links yet.
I think agenty systems in general can still be very limited in how competent they are, due to the same data/âtraining bottlenecks, even if you integrate a non-agential AGI into the system.
I did see Ajeyaâs post and read Rohinâs summary. I think there might not be any one most reasonable prior for compute necessary for AGI (or whether hitting some level of compute is enough, even given enough data or sufficiently complex training environments), since this will need to make strong and basically unjustified assumptions about whether current approaches (or the next approaches we will come up with) can scale to AGI. Still, this doesnât mean AGI timelines arenât short; it might just means you should do a sensitivity analysis on different priors when youâre thinking of supporting or doing certain work. And, of course, they did do such a sensitivity analysis for the timeline question.
In response to this specifically, âAs for whether weâd shut it off after we catch it doing dangerous thingsâwell, it wouldnât do them if it thought weâd notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.â, what other kinds of ways do you expect it would go very badly? Is it mostly unknown unknowns?
I mostly agree with Rohinâs answer, and Iâm pretty skeptical overall of AI safety as a cause area, although I have deep uncertainty about this and might hedge by supporting s-risk-focused work.
Are you primarily interested in these trades with people who already prioritize AI safety?
On 3, do you mean youâd start putting x% after the first 5 years?
I think itâs plausible you could find people who are undecided between AI safety with short timelines and other cause areas or between short and long timelines, and pay them enough to work on AI safety for short timelines, since they could address their uncertainty with donations outside of (short timeline) AI safety. Iâve worked as a deep learning researcher/âengineer to earn-to-give for animal welfare, and I have considered working in AI safety, focusing on worst-case scenarios (CLR, CRS) or to earn-to-give for animal welfare. I think technical AI safety would be more interesting and motivating than my past work in deep learning, and perhaps more interesting day-to-day than my current plans but less motivating in the long run due to my skepticism. I was preparing to apply to CLRâs internship for this past summer, but got an internship offer from Charity Entrepreneurship first and decided to go with that. I know one person who did something similar but went with AI safety instead.
It might be too expensive to pay people interested in earning-to-give enough to earn-to-give in (short timeline) AI safety, if AI safety isnât already one of their top priorities. Also, they donât even have to be EAs; you could find people who would just find the work interesting (e.g. people with graduate degrees in related subjects) but are worried about it not paying enough. You could take out loans to do this, but this kind of defies common sense and sounds pretty crazy to me.
(FWIW, my own price to work on AI safety (short or long timelines) is probably too high now, and, of course, thereâs the question of whether Iâm a good fit, anyway.)
Sorry for the delayed reply. Iâm primarily interested in making these trades with people who have a similar worldview to me, because this increases the chance that as a result of the trade they will start working on the things I think are most valuable. Iâd be happy to talk with other people too, except that if thereâs so much inferential distance to cross it would be more for fun than for impact. That said, maybe Iâm modelling this wrong.
Yes, for no. 3 I meant after the first 5 years. Good catch.
It sounds like you might be a good fit for this sort of thing! Want to have a call to chat sometime? Iâm also interested in doing no. 2 with youâŚ
I think Iâd need to read more before we could have a very productive conversation. If you want to point me to some writing that you found most persuasive for short timelines (or you could write a post laying out your reasoning, if you havenât already; this could get more useful community discussion, too), that would be helpful. I donât want to commit to anything yet, though. Iâm also not that well-read on AI safety in general.
I guess a few sources of skepticism I have now are:
Training an agent to be generally competent in interactions with humans and our systems (even virtually, and not just in conversation) could be too slow or require more complex simulated data than is feasible. Maybe a new version of GPT will be an AGI but not an agent and that might come soon, and while that could still be very impactful, it might not pose an existential risk. Animals as RL agents have had millions of years of evolution to have strong priors fitted to real world environments built into each individual.
Iâm just skeptical about trying to extrapolate current trends to AGI.
On AI risk more generally, Iâm skeptical that an AI could acquire and keep enough resources without the backing of people with access to them to be very dangerous. It would have to deceive us at least until itâs too late for us to cut its access (and I havenât heard of such a scenario that wasnât far-fetched), e.g. by cutting the power or internet, which we can do physically, including by bombing. If we do catch it doing something dangerous, we will cut access. It would need access to powerful weapons to protect its access to resources or do much harm before we could cut its access to resources. This seems kind of obvious, though, so I imagine there are some responses from the AI safety community.
Thanks, this is helpful! Iâm in the middle of writing some posts laying out my reasoning⌠but it looks like itâll take a few more weeks at least, given how long itâs taken so far.
Funnily enough, all three of the sources of skepticism you mention are things that I happen to have written things about or else am in the process of writing something about. This is probably a coincidence. Here are my answers to 1, 2, and 3, or more like teasers of answers:
1. I agree, it could. But it also could not. I think a non-agent AGI would also be a big deal; in fact I think there are multiple potential AI-induced points of no return. (For example, a non-agent AGI could be retrained to be an agent, or could be a component of a larger agenty system, or could be used to research agenty systems faster, or could create a vulnerable world that ends quickly or goes insane.) Iâm also working on a post arguing that the millions of years of evolution donât mean shit and that while humans arenât blank slates they might as well be for purposes of AI forecasting. :)
2. My model for predicting AI timelines (which I am working on a post for) is similar to Ajeyaâs. I donât think itâs fair to describe it as an extrapolation of current trends; rather, it constructs a reasonable prior over how much compute should be needed to get to AGI, and then we update on the fact that the amount of compute we have so far hasnât been enough, and make our timelines by projecting how the price of compute will drop. (So yeah, we are extrapolating compute price trends, but those seem fairly solid to extrapolate, given the many decades across which theyâve held fairly steady, and given that we only need to extrapolate them for a few more years to get a non-trivial probability.)
3. Yes, this is something thatâs been discussed at length. There are lots of ways things could go wrong. For example, the people who build AGI will be thinking that they can use it for something, otherwise they wouldnât have built it. By default it will be out in the world doing things; if we want it to be locked in a box under study (for a long period of time that it canât just wait patiently through), we need to do lots of AI risk awareness-raising. Alternatively, AI might be good enough at persuasion to convince some of the relevant people that it is trustworthy when it isnât. This is probably easier than it sounds, given how much popular media is suffused with âBut humans are actually the bad guys, keeping sentient robots as slaves!â memes. (Also because there probably will be more than one team of people and one AI; it could be dozens of AIs talking to thousands or millions of people each. With competitive pressure to give them looser and looser restrictions so they can go faster and make more money or whatever.) As for whether weâd shut it off after we catch it doing dangerous thingsâwell, it wouldnât do them if it thought weâd notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.
I guess a few quick responses to each, although I havenât read through your links yet.
I think agenty systems in general can still be very limited in how competent they are, due to the same data/âtraining bottlenecks, even if you integrate a non-agential AGI into the system.
I did see Ajeyaâs post and read Rohinâs summary. I think there might not be any one most reasonable prior for compute necessary for AGI (or whether hitting some level of compute is enough, even given enough data or sufficiently complex training environments), since this will need to make strong and basically unjustified assumptions about whether current approaches (or the next approaches we will come up with) can scale to AGI. Still, this doesnât mean AGI timelines arenât short; it might just means you should do a sensitivity analysis on different priors when youâre thinking of supporting or doing certain work. And, of course, they did do such a sensitivity analysis for the timeline question.
In response to this specifically, âAs for whether weâd shut it off after we catch it doing dangerous thingsâwell, it wouldnât do them if it thought weâd notice and shut it off. This effectively limits what it can do to further its goals, but not enough, I think.â, what other kinds of ways do you expect it would go very badly? Is it mostly unknown unknowns?
Well, I look forward to talking more sometime! No rush, let me know if and when you are interested.
On point no. 3 in particular, here are some relevant parables (a bit lengthy, but also fun to read!) https://ââwww.lesswrong.com/ââposts/ââ5wMcKNAwB6X4mp9og/ââthat-alien-message
https://ââwww.lesswrong.com/ââposts/ââbTW87r8BrN3ySrHda/ââstarwink-by-alicorn
https://ââwww.gregegan.net/ââMISC/ââCRYSTAL/ââCrystal.html (I especially recommend this last one, itâs less relevant to our discussion but a better story and raises some important ethical issues.)