On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.
That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.
It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.
One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.
I’ve been aware of those kind of issues; what I’m hoping is that we can get a framework to include these subtleties automatically (eg by having the AI learn them from observations or from human published papers) without having to put it all in by hand ourselves.
Great, thank you for the response.
On (3) — I feel AI safety as it’s pursued today is a bit disconnected from other fields such as neuroscience, embodiment, and phenomenology. I.e. the terms used in AI safety don’t try to connect to the semantic webs of affective neuroscience, embodied existence, or qualia. I tend to take this as a warning sign: all disciplines ultimately refer to different aspects of the same reality, and all conversations about reality should ultimately connect. If they aren’t connecting, we should look for a synthesis such that they do.
That’s a little abstract; a concrete example would be the paper “Dissecting components of reward: ‘liking’, ‘wanting’, and learning” (Berridge et al. 2009), which describes experimental methods and results showing that ‘liking’, ‘wanting’, and ‘learning’ can be partially isolated from each other and triggered separately. I.e. a set of fairly rigorous studies on mice demonstrating they can like without wanting, want without liking, etc. This and related results from affective neuroscience would seem to challenge some preference-based frames within AI alignment, but it feels there‘s no ‘place to put’ this knowledge within the field. Affective neuroscience can discover things, but there’s no mechanism by which these discoveries will update AI alignment ontologies.
It’s a little hard to find the words to describe why this is a problem; perhaps that not being richly connected to other fields runs the risk of ‘ghettoizing‘ results, as many social sciences have ‘ghettoized’ themselves.
One of the reasons I’ve been excited to see your trajectory is that I’ve gotten the feeling that your work would connect more easily to other fields than the median approach in AI safety.
Thanks, that makes sense.
I’ve been aware of those kind of issues; what I’m hoping is that we can get a framework to include these subtleties automatically (eg by having the AI learn them from observations or from human published papers) without having to put it all in by hand ourselves.