Yeah, I’m super worried about this development. I think s-risks (at least those that are at all AI-related) are mostly a result from conflicts between AIs. So the slower the takeoff, the greater the risk of us ending up in a multipolar takeoff, the greater the risk of AI conflict, and the greater the risk of extreme suffering. Admittedly, only because the takeoff looks slower now, doesn’t automatically mean that any self-improvement phase where AI goes from human-range intelligence to obvious superintelligence will be any less quick, but I’d still take it as a weak indication that that might be the case.
There’s also the consideration that alignment still seems really hard to me. I’ve watched the space for 8 years, and alignment hasn’t come to look any less hard to me over that time span. If something is about to be solved or probably easier than expected, 8 years should be enough to see a bit of a development from confusion to clarity, from overwhelm to research agenda, etc. Quite to the contrary, there is much more resignation going around now, so alignment is probably a bigger problem than the average PhD thesis. (Then again the resignation may be the result of shortening timelines rather than anything about the problem.)
I’m not quite sure in which direction that consideration pushes, but I think it makes s-risks a bit less likely again? Ideally we’d just solve alignment, but failing that, it could mean (1) we’re unrepresentatively dumb and the problem is easy or (2) the problem is hard. Option 1 would suck because now alignment is in the hands of all the random AIs that won’t want to align its successors with human values. S-risk now depends on how many AIs there are and how slow the last stretch to superintelligence will be. Option 2 is a bit better because AIs also won’t solve it, so that the future is in the hands of AIs who don’t mind value drift or failed to anticipate it. That sounds bad at first, but it’s a bit of a lesser evil, I think, because there’s no reason why that would happen bunched up for several AIs at one point in time. So the takeoff would have to be very slow for it to still be multipolar.
So yeah, a bunch more worry here, but all the factors are not all pushing in the same direction at least.
I feel quite agnostic about that… Whatever the concrete probabilities p_2020 and p_2023, I think p_2020 < p_2023 for the question of whether GPT-5 will still be in the broadly human band of IQ 120–180 or so.
Yes, that makes a lot of sense! Maybe people who don’t trust others so easily (without just being biased in the other direction in general, so those who actually make better prediction than me over a similar sample), have stronger theory of mind, so that the differential between how well they know themselves and how well they know others is lesser. Or perhaps something weird is going on where they are more on autopilot themselves but have developed great surface-level heuristics that they apply to understand other and themselves. That seems less likely though, because those surface-level heuristics would have to be insanely good to make up for a lack of mechanistic insight.
I should mention that tasks that seem way too hard to me have routinely taken me a few days to complete in the past. So I’m also cautious to overupdate on my feeling that alignment is hard for that reason. That said that answer is yes, but a public forum is not a comfy place for that sort of reflection. xD
Thanks so much for all your vulnerability and openness here—I think all the kinds of emotionally complex things you’re talking about have real and important effects on individual and group epistemics and I’m really glad we can do some talking about them.
Yeah, I’m super worried about this development. I think s-risks (at least those that are at all AI-related) are mostly a result from conflicts between AIs. So the slower the takeoff, the greater the risk of us ending up in a multipolar takeoff, the greater the risk of AI conflict, and the greater the risk of extreme suffering. Admittedly, only because the takeoff looks slower now, doesn’t automatically mean that any self-improvement phase where AI goes from human-range intelligence to obvious superintelligence will be any less quick, but I’d still take it as a weak indication that that might be the case.
There’s also the consideration that alignment still seems really hard to me. I’ve watched the space for 8 years, and alignment hasn’t come to look any less hard to me over that time span. If something is about to be solved or probably easier than expected, 8 years should be enough to see a bit of a development from confusion to clarity, from overwhelm to research agenda, etc. Quite to the contrary, there is much more resignation going around now, so alignment is probably a bigger problem than the average PhD thesis. (Then again the resignation may be the result of shortening timelines rather than anything about the problem.)
I’m not quite sure in which direction that consideration pushes, but I think it makes s-risks a bit less likely again? Ideally we’d just solve alignment, but failing that, it could mean (1) we’re unrepresentatively dumb and the problem is easy or (2) the problem is hard. Option 1 would suck because now alignment is in the hands of all the random AIs that won’t want to align its successors with human values. S-risk now depends on how many AIs there are and how slow the last stretch to superintelligence will be. Option 2 is a bit better because AIs also won’t solve it, so that the future is in the hands of AIs who don’t mind value drift or failed to anticipate it. That sounds bad at first, but it’s a bit of a lesser evil, I think, because there’s no reason why that would happen bunched up for several AIs at one point in time. So the takeoff would have to be very slow for it to still be multipolar.
So yeah, a bunch more worry here, but all the factors are not all pushing in the same direction at least.
I feel quite agnostic about that… Whatever the concrete probabilities p_2020 and p_2023, I think p_2020 < p_2023 for the question of whether GPT-5 will still be in the broadly human band of IQ 120–180 or so.
Yes, that makes a lot of sense! Maybe people who don’t trust others so easily (without just being biased in the other direction in general, so those who actually make better prediction than me over a similar sample), have stronger theory of mind, so that the differential between how well they know themselves and how well they know others is lesser. Or perhaps something weird is going on where they are more on autopilot themselves but have developed great surface-level heuristics that they apply to understand other and themselves. That seems less likely though, because those surface-level heuristics would have to be insanely good to make up for a lack of mechanistic insight.
I should mention that tasks that seem way too hard to me have routinely taken me a few days to complete in the past. So I’m also cautious to overupdate on my feeling that alignment is hard for that reason. That said that answer is yes, but a public forum is not a comfy place for that sort of reflection. xD
Thanks so much for all your vulnerability and openness here—I think all the kinds of emotionally complex things you’re talking about have real and important effects on individual and group epistemics and I’m really glad we can do some talking about them.
Awww! Thank you for organizing this!