Link-commenting my Twitter thread of immediate reaction and summary of paper. Some light editing for readability. Would be interested on feedback if this slightly odd for a forum comment content is helpful or interesting to people.
Overall take: this is a well done survey, but all surveys of this sort have big caveats. I think this survey is as good as it is reasonable to expect a survey of AI researchers to be. But, there is still likely bias due to who chooses to respond, and it’s unclear how much we should be deferring to this group. It would be good to see an attempt to correct for response bias (eg weighting). Appendix D implies it would likely only have small effects though, except widening the distributions because women were more uncertain and less likely to respond.
Timelines
Wording of questions matters a lot when asking about time for AI to be able to do all tasks/jobs. 60 year difference in median due to a small change, which researchers can’t explain. Will allow cherry picking to support different arguments. In particular, Timeline predictions are extremely non-robust. HLMI = High-Level Machine Intelligence. FAOL = Full Automation of Labor. 60 year apart in time to occurence.
Other bits of question wording matter, not anywhere near as much (see below). Annoyingly, there’s no CIs for the median times so it’s hard to assess how much is noise. I guess not much due to the sample size.
This might be just uncertainty though. Any single year is a bad summary when uncertainty is this high. The ranges below are where the distribution aggregated across researchers place 50% of the mass.
I find it very amusing that AI researchers think the hardest task to get AI to do (of all the ones they asked about) is… Being an AI researcher. Glad they’re confident in their own job security.
Note that time to being “feasible” is defined quite loosely. It would still cost millions of dollars (if not more) to implement and only be available to top labs. Annoyingly, it means that the predictions can only be falsified as too long, not too short.
The aggregation is making a strong assumption about the shape of respondents’ distribution. I’m suspicious of any extrapolation or interpolation based on it. A sensitivity analysis would be nice here. Also, why not a three-parmeter distribution so it can be fit exactly?
Time to ~AGI in 2016 and 2022 surveys very similar, but big change in median times for 2023. Remember previous caveat about no CIs and wide distributions though.
Some recommendations on quoting timelines from this survey.
Don’t use just the HLMI or FAOL questions.
Use intervals not medians.
Be clear it’s an expert survey, and might be biased. Being an AI researcher selects for thinking AI is promising!
Outcomes of AGI
I don’t have much to say on the probabilities of different oucomes. I note they’re aggregating with means/medians. These reduce the weight on very low end or very high probabilities a lot (relative to geometric mean of odds, which I think is better). So these are probably closer to 50% than they should be.
Headline result below! Probability of very bad outcomes, conditional on high-level machine intelligence existing. Median respondent unchanged at 5-10%. I’d guess heavily affected by rounding and putting 5% for “small chance, don’t know”. An upper bound on the truth for AI researchers’ median IMO.
There’s lots of demographic breakdowns, mostly uninteresting IMO. They didn’t ask or otherwise assess how much work respondents had done on AI safety. Would have been interesting to see the split and also to assess response bias.
Thanks for citing the survey here, and thank you Joshua for your analysis.
Your post doesn´t seem strange to me at this place; at the very least I can´t find any harm in posting it here. (If someone is more interested in other discussions, they may read the first two lines and then skip it.) The only question would be if this is worth YOUR time, and I am confident you are able to judge this (and you apparently did and found it worth your time).
Since you already delved that deep into the material and since I don´t see myself doing the same, here a question to you (or whoever else feeling inclined to answer):
Were there a significant part of experts who thought that HLMI and/or FAOL are downright impossible (at least with anything resembling our current approaches)? I do hear/read doubts like these sometimes. If so, how were these experts included in the mean, since you can´t just include infinity with non-zero probability without the whole number going up to infinity? (If they even used a mean. “Aggregate Forecast” is not very clear; if they used the median ore something similar the second question can be ignored.)
Link-commenting my Twitter thread of immediate reaction and summary of paper. Some light editing for readability. Would be interested on feedback if this slightly odd for a forum comment content is helpful or interesting to people.
Overall take: this is a well done survey, but all surveys of this sort have big caveats. I think this survey is as good as it is reasonable to expect a survey of AI researchers to be. But, there is still likely bias due to who chooses to respond, and it’s unclear how much we should be deferring to this group. It would be good to see an attempt to correct for response bias (eg weighting). Appendix D implies it would likely only have small effects though, except widening the distributions because women were more uncertain and less likely to respond.
Timelines
Wording of questions matters a lot when asking about time for AI to be able to do all tasks/jobs. 60 year difference in median due to a small change, which researchers can’t explain. Will allow cherry picking to support different arguments. In particular, Timeline predictions are extremely non-robust. HLMI = High-Level Machine Intelligence. FAOL = Full Automation of Labor. 60 year apart in time to occurence.
Other bits of question wording matter, not anywhere near as much (see below). Annoyingly, there’s no CIs for the median times so it’s hard to assess how much is noise. I guess not much due to the sample size.
This might be just uncertainty though. Any single year is a bad summary when uncertainty is this high. The ranges below are where the distribution aggregated across researchers place 50% of the mass.
I find it very amusing that AI researchers think the hardest task to get AI to do (of all the ones they asked about) is… Being an AI researcher. Glad they’re confident in their own job security.
Note that time to being “feasible” is defined quite loosely. It would still cost millions of dollars (if not more) to implement and only be available to top labs. Annoyingly, it means that the predictions can only be falsified as too long, not too short.
The aggregation is making a strong assumption about the shape of respondents’ distribution. I’m suspicious of any extrapolation or interpolation based on it. A sensitivity analysis would be nice here. Also, why not a three-parmeter distribution so it can be fit exactly?
Time to ~AGI in 2016 and 2022 surveys very similar, but big change in median times for 2023. Remember previous caveat about no CIs and wide distributions though.
Some recommendations on quoting timelines from this survey.
Don’t use just the HLMI or FAOL questions.
Use intervals not medians.
Be clear it’s an expert survey, and might be biased. Being an AI researcher selects for thinking AI is promising!
Outcomes of AGI
I don’t have much to say on the probabilities of different oucomes. I note they’re aggregating with means/medians. These reduce the weight on very low end or very high probabilities a lot (relative to geometric mean of odds, which I think is better). So these are probably closer to 50% than they should be.
Headline result below! Probability of very bad outcomes, conditional on high-level machine intelligence existing. Median respondent unchanged at 5-10%. I’d guess heavily affected by rounding and putting 5% for “small chance, don’t know”. An upper bound on the truth for AI researchers’ median IMO.
There’s lots of demographic breakdowns, mostly uninteresting IMO. They didn’t ask or otherwise assess how much work respondents had done on AI safety. Would have been interesting to see the split and also to assess response bias.
Thanks for citing the survey here, and thank you Joshua for your analysis.
Your post doesn´t seem strange to me at this place; at the very least I can´t find any harm in posting it here. (If someone is more interested in other discussions, they may read the first two lines and then skip it.) The only question would be if this is worth YOUR time, and I am confident you are able to judge this (and you apparently did and found it worth your time).
Since you already delved that deep into the material and since I don´t see myself doing the same, here a question to you (or whoever else feeling inclined to answer):
Were there a significant part of experts who thought that HLMI and/or FAOL are downright impossible (at least with anything resembling our current approaches)? I do hear/read doubts like these sometimes. If so, how were these experts included in the mean, since you can´t just include infinity with non-zero probability without the whole number going up to infinity? (If they even used a mean. “Aggregate Forecast” is not very clear; if they used the median ore something similar the second question can be ignored.)