I mostly think of alignment as about avoiding deception or catastrophic misgeneralization outside of testing settings.
In general I also believe that AI companies have a massive incentive to align their systems with user intent. You can’t profit if you are dead.
I mostly think of alignment as about avoiding deception or catastrophic misgeneralization outside of testing settings.
In general I also believe that AI companies have a massive incentive to align their systems with user intent. You can’t profit if you are dead.