In general, I’m skeptical of surveys like this—I participated in a similar one a few years ago that didn’t have super useful results, though I think it was kind of useful for clarifying my own thinking. But that’s pretty outside-viewy. Let me take a stab at making that general skepticism concrete—trying to elucidate why people might struggle to answer, slash why the questions you’re asking won’t yield super useful answers.
I expect that the ‘right’ answer depends on carefully enumerating and considering a bunch of different plausible scenarios, and what you’ll get instead is either uncertainty or vague intuitive guesses. If you mostly want vague intuitive guesses, great! I would guess you’d get more clarity from trying to elicit people’s particular models / expected trajectories.
My rough experience is that people working in AI governance mostly think about particular trajectories/dynamics of AI progress that they consider especially plausible/important/tractable, so they might only have insight into particular configurations of variables you consider. Or their insight might be at a more granular level, weighing e.g. the impact of AI development in particular corporate labs.
Skimming your survey, the answer that feels right to me is often that the effect depends a lot on circumstances. For example, fast takeoff worlds where fast takeoff is anticipated look extremely different from fast takeoff worlds where it comes as a surprise.
Yes, I’m having a tough time explaining the purpose of the model which has led to very long convoluted descriptions.I am not predicting or attempting to predict any of these conditions. I understand your skepticism and share it. I believe overall that it can be a waste of time to concentrate on forecasting very difficult or impossible to measure issues. That is certainly not the purpose of this.
The point of this is to construct broad categories of plausible (approximately) scenarios and impactful ones (broadly, a lot of these should be marked no effect) simply to create categories. However, the output does not say what is or is not going to happen, or is certainly best or worst, it will be a narrative showing all options which will have mixed values (the combining values process changes these all up regardless). The likelihood survey values are much more valuable, but the best assessment of impact is important too (but admittedly much less clear).
For example, all the values for individual conditions (e.g., paradigm) will be calculated with every other, but the output is not “fast takeoff scenario is 80% likely,” or “greatly decrease x” the output will be potential scenario elements that are mixed e.g., “fast takeoff” (unlikely, but high impact) and “new paradigm”(likely, but moderate impact) which will just be one of the many possible outputs. Thus, thousands of these pairs will be clustered and we’ll use the clusters to develop scenarios.
For the likelihood questions this is clearer I think, it multiplies (or adds) depending on the variable to highlight how one condition is affected by the other. Ideally, and this is the plan depending on how this goes, is to have a workshop or roundtable to go through each one of these pairs (e.g., fast takeoff, and distribution, is a value pair) and request expert judgment on how one may affect the other.
While this is somewhat imprecise by design but an AI researcher’s view on whether deep learning will lead to AGI, or if prosaic AI is potentially more or less destabilizing, I believe is much more trustworthy than a random guess.
I have realized though that in future iterations (if there are any) I most certainly will not ask likelihood questions. That tends to get folks thinking about probability which would require more precise questions. And the impact is just a tough one. But the combination is important. Other projects we’ve done with this have been for climate change and arctic politics which were also quite vague, yet valuable in the end.
It looks like I just submitted another long-convoluted description lol. I get carried away attempting to explain the issue.
In any event, what I’m requesting is the best estimates from knowledgeable people to form groups for the model. Which will be used to paint the range of hopefully quite unique combinations of scenarios and test the GMA method. Who knows, it may provide important insights or a new tool for the community to use.
If you have any suggestions on how to frame this better or explain (now or in the future) please let me know.
A Quick point I forgot to make (or understand fully on the point). RE the fast takeoff comment at the end. Agreed. I had both on the original, fast takeoff controlled, fast takeoff uncontrolled, as well as CAIS fast, mod, slow, totaling about 6 choices. It got butchered. Way too many choices to rank.
So, I dropped it down to 4; I was told to go to 3, but I thought the “anticipated or unanticipated” points you make are quite valid and key, especially for moderate (equivalent to Christiano-style relatively fast takeoff) which is why there are two options—Moderate uncontrolled, a complete surprise in capability jumps, and moderate controlled, which suggests a competitive anticipated race dynamic, perhaps due to conflict and competition. So, fast unfortunately was left to include both anticipated, and unanticipated. I hope to break that out further, but I’ll likely be confined to the literature for that.
In general, I’m skeptical of surveys like this—I participated in a similar one a few years ago that didn’t have super useful results, though I think it was kind of useful for clarifying my own thinking. But that’s pretty outside-viewy. Let me take a stab at making that general skepticism concrete—trying to elucidate why people might struggle to answer, slash why the questions you’re asking won’t yield super useful answers.
I expect that the ‘right’ answer depends on carefully enumerating and considering a bunch of different plausible scenarios, and what you’ll get instead is either uncertainty or vague intuitive guesses. If you mostly want vague intuitive guesses, great! I would guess you’d get more clarity from trying to elicit people’s particular models / expected trajectories.
My rough experience is that people working in AI governance mostly think about particular trajectories/dynamics of AI progress that they consider especially plausible/important/tractable, so they might only have insight into particular configurations of variables you consider. Or their insight might be at a more granular level, weighing e.g. the impact of AI development in particular corporate labs.
Skimming your survey, the answer that feels right to me is often that the effect depends a lot on circumstances. For example, fast takeoff worlds where fast takeoff is anticipated look extremely different from fast takeoff worlds where it comes as a surprise.
Yes, I’m having a tough time explaining the purpose of the model which has led to very long convoluted descriptions. I am not predicting or attempting to predict any of these conditions. I understand your skepticism and share it. I believe overall that it can be a waste of time to concentrate on forecasting very difficult or impossible to measure issues. That is certainly not the purpose of this.
The point of this is to construct broad categories of plausible (approximately) scenarios and impactful ones (broadly, a lot of these should be marked no effect) simply to create categories. However, the output does not say what is or is not going to happen, or is certainly best or worst, it will be a narrative showing all options which will have mixed values (the combining values process changes these all up regardless). The likelihood survey values are much more valuable, but the best assessment of impact is important too (but admittedly much less clear).
For example, all the values for individual conditions (e.g., paradigm) will be calculated with every other, but the output is not “fast takeoff scenario is 80% likely,” or “greatly decrease x” the output will be potential scenario elements that are mixed e.g., “fast takeoff” (unlikely, but high impact) and “new paradigm”(likely, but moderate impact) which will just be one of the many possible outputs. Thus, thousands of these pairs will be clustered and we’ll use the clusters to develop scenarios.
For the likelihood questions this is clearer I think, it multiplies (or adds) depending on the variable to highlight how one condition is affected by the other. Ideally, and this is the plan depending on how this goes, is to have a workshop or roundtable to go through each one of these pairs (e.g., fast takeoff, and distribution, is a value pair) and request expert judgment on how one may affect the other.
While this is somewhat imprecise by design but an AI researcher’s view on whether deep learning will lead to AGI, or if prosaic AI is potentially more or less destabilizing, I believe is much more trustworthy than a random guess.
I have realized though that in future iterations (if there are any) I most certainly will not ask likelihood questions. That tends to get folks thinking about probability which would require more precise questions. And the impact is just a tough one. But the combination is important. Other projects we’ve done with this have been for climate change and arctic politics which were also quite vague, yet valuable in the end.
It looks like I just submitted another long-convoluted description lol. I get carried away attempting to explain the issue.
In any event, what I’m requesting is the best estimates from knowledgeable people to form groups for the model. Which will be used to paint the range of hopefully quite unique combinations of scenarios and test the GMA method. Who knows, it may provide important insights or a new tool for the community to use.
If you have any suggestions on how to frame this better or explain (now or in the future) please let me know.
A Quick point I forgot to make (or understand fully on the point). RE the fast takeoff comment at the end. Agreed. I had both on the original, fast takeoff controlled, fast takeoff uncontrolled, as well as CAIS fast, mod, slow, totaling about 6 choices. It got butchered. Way too many choices to rank.
So, I dropped it down to 4; I was told to go to 3, but I thought the “anticipated or unanticipated” points you make are quite valid and key, especially for moderate (equivalent to Christiano-style relatively fast takeoff) which is why there are two options—Moderate uncontrolled, a complete surprise in capability jumps, and moderate controlled, which suggests a competitive anticipated race dynamic, perhaps due to conflict and competition. So, fast unfortunately was left to include both anticipated, and unanticipated. I hope to break that out further, but I’ll likely be confined to the literature for that.