Some AI questions/takes I’ve been thinking about: 1. I hear people confidently predicting that we’re likely to get catastrophic alignment failures, even if things go well up to ~GPT7 or so. But if we get to GPT-7, I assume we could sort of ask it, “Would taking this next step, have a large chance of failing?“. Basically, I’m not sure if it’s possible for an incredibly smart organization to “sleepwalk into oblivion”. Likewise, I’d expect trade and arms races to get a lot nicer/safer, if we could make it a few levels deeper without catastrophe. (Note: This is one reason I like advanced forecasting tech)
2. I get the impression that lots of EAs are kind of assuming that, if alignment issues don’t kill us quickly, 1-2 AI companies/orgs will create decisive strategic advantages, in predictable ways, and basically control the world shortly afterwards. I think this is a possibility, but would flag that right now, probably 99.9% of the world’s power doesn’t want this to happen (basically, anyone who’s not at the top of OpenAI/Anthropic/the next main lab). It seems to me like these groups would have to be incredibly incompetent to just let one org predictably control the world, within 2-20 years. This both means that I find this scenario unlikely, but also, almost every single person in the world should be an ally in helping EAs make sure these scenarios don’t happen.
3. Related to #2, I still get the impression that it’s far easier to make a case of, “Let’s not let one organization, commercial or government, get a complete monopoly on global power, using AI”, then, “AI alignment issues are likely to kill us all.” And a lot of the solutions to the former also seem like they should help the latter.
Depends on what assurance you need. If GPT-7 reliably provides true results in most/all settings you can find, that’s good evidence.
If GPT-7 is really Machiavellian, and is conspiring against you to make GPT-8, then it’s already too late for you, but it’s also a weird situation. If GPT-7 were seriously conspiring against you, I assume it wouldn’t need to wait until GPT-8 to take action.
Some AI questions/takes I’ve been thinking about:
1. I hear people confidently predicting that we’re likely to get catastrophic alignment failures, even if things go well up to ~GPT7 or so. But if we get to GPT-7, I assume we could sort of ask it, “Would taking this next step, have a large chance of failing?“. Basically, I’m not sure if it’s possible for an incredibly smart organization to “sleepwalk into oblivion”. Likewise, I’d expect trade and arms races to get a lot nicer/safer, if we could make it a few levels deeper without catastrophe. (Note: This is one reason I like advanced forecasting tech)
2. I get the impression that lots of EAs are kind of assuming that, if alignment issues don’t kill us quickly, 1-2 AI companies/orgs will create decisive strategic advantages, in predictable ways, and basically control the world shortly afterwards. I think this is a possibility, but would flag that right now, probably 99.9% of the world’s power doesn’t want this to happen (basically, anyone who’s not at the top of OpenAI/Anthropic/the next main lab). It seems to me like these groups would have to be incredibly incompetent to just let one org predictably control the world, within 2-20 years. This both means that I find this scenario unlikely, but also, almost every single person in the world should be an ally in helping EAs make sure these scenarios don’t happen.
3. Related to #2, I still get the impression that it’s far easier to make a case of, “Let’s not let one organization, commercial or government, get a complete monopoly on global power, using AI”, then, “AI alignment issues are likely to kill us all.” And a lot of the solutions to the former also seem like they should help the latter.
How do you know it tells the truth or its best knowledge of the truth without solving the “eliciting latent knowledge” problem?
Depends on what assurance you need. If GPT-7 reliably provides true results in most/all settings you can find, that’s good evidence.
If GPT-7 is really Machiavellian, and is conspiring against you to make GPT-8, then it’s already too late for you, but it’s also a weird situation. If GPT-7 were seriously conspiring against you, I assume it wouldn’t need to wait until GPT-8 to take action.