Sounds like a good plan! I think there are other challenges with alignment that this wouldn’t solve (e.g. inner misalignment), but this would help. If you haven’t seen it, you might be interested in https://arxiv.org/abs/2008.02275
Thank you, this is super helpful! I appreciate it.
Yes, good point, if inner misalignment would emerge from an ML system, then any data source that was used for training would be ignored by the system anyways.
Depends on if you think alignment is a problem for the humanities or for engineering.
Sounds like a good plan! I think there are other challenges with alignment that this wouldn’t solve (e.g. inner misalignment), but this would help. If you haven’t seen it, you might be interested in https://arxiv.org/abs/2008.02275
Thank you, this is super helpful! I appreciate it.
Yes, good point, if inner misalignment would emerge from an ML system, then any data source that was used for training would be ignored by the system anyways.
Depends on if you think alignment is a problem for the humanities or for engineering.