I’ve just completed a comprehensive presentation of my current thinking on AI alignment. In it I give a formal statement of the AI alignment problem that is rigorous and makes clear the philosophical assumptions being made when we try to say we want to create AI aligned with human values. In the process I show the limitations of a decision theory based approach to AI alignment, similarly show the limitations of an axiological approach, and end up with a noematological approach (an account given in terms of noemata or the objects of conscious phenomena) that is better able to encompass all of what is necessarily meant by “align AI with human values” given we are likely to need to align non-rational agents.
I have yet to flesh out my thoughts on the prioritization consequences of this as it relates to what work we want to pursue to achieve AI alignment, but I have some initial thoughts I’d like to share and solicit feedback on. Without further ado, those thoughts which I believe deserve deeper discussion:
Non-decision theory based research into AI alignment is under explored.
Noematological research into AI alignment is almost completely ignored until now.
There are a handful of examples where people consider how humans align themselves with their goals and use this to propose techniques for AI alignment, but they are preliminary.
How non-decision theory based approaches could contribute to solving AI alignment was previously poorly understood, but now has a framework in which to work.
This framework is poorly explored though so it’s not yet clear exactly how to put insights from human and animal behavior and thought alignment into forms that might present specific solutions to AI alignment that could be tried.
Existing decision theory based research, like MIRIs, is well funded and well attended to relative to non-decision theory based research.
Non-decision theory based research into AI alignment is a neglected area ripe to benefit from additional funding and attention.
Decision theory based research is in absolute terms still an area where more work can be done and so remains underfunded and underattended to relative to the amount it could carry.
AI alignment research with different simplifying assumptions from those of decision theory are underexplored (compare 2 above).
Noematological AI alignment research may be a “dangerous attractor”.
Draws ideas from disciplines that have less focus on rigor and so may attract distraction to the field from those who are unprepared for the level of care AI safety research demands.
May give more of an angle for crackpots to access the field where a heavy decision theory focus helps weed them out.
Comments on these statements welcome here. If you have feedback on the original work I recommend you leave them as comments there or on the LW post about it.
Prioritization Consequences of “Formally Stating the AI Alignment Problem”
I’ve just completed a comprehensive presentation of my current thinking on AI alignment. In it I give a formal statement of the AI alignment problem that is rigorous and makes clear the philosophical assumptions being made when we try to say we want to create AI aligned with human values. In the process I show the limitations of a decision theory based approach to AI alignment, similarly show the limitations of an axiological approach, and end up with a noematological approach (an account given in terms of noemata or the objects of conscious phenomena) that is better able to encompass all of what is necessarily meant by “align AI with human values” given we are likely to need to align non-rational agents.
I have yet to flesh out my thoughts on the prioritization consequences of this as it relates to what work we want to pursue to achieve AI alignment, but I have some initial thoughts I’d like to share and solicit feedback on. Without further ado, those thoughts which I believe deserve deeper discussion:
Non-decision theory based research into AI alignment is under explored.
Noematological research into AI alignment is almost completely ignored until now.
There are a handful of examples where people consider how humans align themselves with their goals and use this to propose techniques for AI alignment, but they are preliminary.
How non-decision theory based approaches could contribute to solving AI alignment was previously poorly understood, but now has a framework in which to work.
This framework is poorly explored though so it’s not yet clear exactly how to put insights from human and animal behavior and thought alignment into forms that might present specific solutions to AI alignment that could be tried.
Existing decision theory based research, like MIRIs, is well funded and well attended to relative to non-decision theory based research.
Non-decision theory based research into AI alignment is a neglected area ripe to benefit from additional funding and attention.
Decision theory based research is in absolute terms still an area where more work can be done and so remains underfunded and underattended to relative to the amount it could carry.
AI alignment research with different simplifying assumptions from those of decision theory are underexplored (compare 2 above).
Noematological AI alignment research may be a “dangerous attractor”.
Draws ideas from disciplines that have less focus on rigor and so may attract distraction to the field from those who are unprepared for the level of care AI safety research demands.
May give more of an angle for crackpots to access the field where a heavy decision theory focus helps weed them out.
Comments on these statements welcome here. If you have feedback on the original work I recommend you leave them as comments there or on the LW post about it.