Some abstractions that feel like they do real work on AI Alignment (compared to CIRL stuff):
Inner optimization
Intent alignment vs. impact alignment
Natural abstraction hypothesis
Coherent Extrapolated Volition
Instrumental convergence
Acausal trade
None of these are paradigms, but all of them feel like they do substantially reduce the problem, in a way that doesn’t feel true for CIRL. It is possible I have a skewed perception of actual CIRL stuff, based on your last paragraph though, so it’s plausible we are just talking about different things.
Huh. I’d put assistance games above all of those things (except inner optimization but that’s again downstream of the paradigm difference; inner optimization is much less of a thing when you aren’t getting intelligence through a giant search over programs). Probably not worth getting into this disagreement though.
Some abstractions that feel like they do real work on AI Alignment (compared to CIRL stuff):
Inner optimization
Intent alignment vs. impact alignment
Natural abstraction hypothesis
Coherent Extrapolated Volition
Instrumental convergence
Acausal trade
None of these are paradigms, but all of them feel like they do substantially reduce the problem, in a way that doesn’t feel true for CIRL. It is possible I have a skewed perception of actual CIRL stuff, based on your last paragraph though, so it’s plausible we are just talking about different things.
Huh. I’d put assistance games above all of those things (except inner optimization but that’s again downstream of the paradigm difference; inner optimization is much less of a thing when you aren’t getting intelligence through a giant search over programs). Probably not worth getting into this disagreement though.