The assumptions are that more powerful models won’t be like weaker models but more accurate. They will show emergent abilities. Many things that gpt-4 can solve gpt-3 cannot, and those models share a similar lineage.
Safety issues show up when you have a model powerful enough to even exhibit them, and they may not be anything you predicted will happen from theory. Waluigi effect, hallucinations—both were not predicted by any theory by AI safety research groups. They seem to be the majority of the issues with models at the current level of capabilities.
The assumptions are that more powerful models won’t be like weaker models but more accurate. They will show emergent abilities. Many things that gpt-4 can solve gpt-3 cannot, and those models share a similar lineage.
Safety issues show up when you have a model powerful enough to even exhibit them, and they may not be anything you predicted will happen from theory. Waluigi effect, hallucinations—both were not predicted by any theory by AI safety research groups. They seem to be the majority of the issues with models at the current level of capabilities.