We might be deliberately aiming, but we have to get it right on the firsttry(with transformative AGI)! And so far none of our techniques are leading to anything close to perfect alignment even for relatively weak systems (see ref. to “29%” in OP!)
Actually, I would agree that it’s misaligned at the start of training, but what’s missing initially are the capabilities that make that misalignment dangerous.
Right. And that’s where the whole problem lies! If we can’t meaningfullyalign today’s weak AI systems, what hope do we have for aligning much more powerful ones!? It’s not acceptable for early systems to be misaligned, precisely because of what that implies for the alignment of more powerful systems and our collective existential security. If OpenAI want to say “it’s ok GPT-4 is nowhere close to being perfectly aligned, because we definitely definitely will do better for GPT-5”, are you really going to trust them? They really tried to make GPT-4 as aligned as possible (for 6 months). And failed. And still released it anyway.
We might be deliberately aiming, but we have to get it right on the first try (with transformative AGI)! And so far none of our techniques are leading to anything close to perfect alignment even for relatively weak systems (see ref. to “29%” in OP!)
Right. And that’s where the whole problem lies! If we can’t meaningfully align today’s weak AI systems, what hope do we have for aligning much more powerful ones!? It’s not acceptable for early systems to be misaligned, precisely because of what that implies for the alignment of more powerful systems and our collective existential security. If OpenAI want to say “it’s ok GPT-4 is nowhere close to being perfectly aligned, because we definitely definitely will do better for GPT-5”, are you really going to trust them? They really tried to make GPT-4 as aligned as possible (for 6 months). And failed. And still released it anyway.