I really liked this post and the model you’ve introduced!
With regards to your pseudomaths, a minor suggestion could be that your product notation is equal to how agentive our actor is. This could allow us to take into account impact that is negative (i.e., harmful processes) by then multiplying the product notation by another factor that takes into account the sign of the action. Then the change in impact could be proportional to the product of these two terms.
Could you clarify what you mean with agentive?
The way I see it, at any of the levels from ‘Values’ to ‘Actions’, a person’s position on the corrigibility scale could be so low to be negative. But it’s not an elegant or satisfactory way of modelling it (i.e. different ways of adjusting poorly to evidence could still lead to divergent results from an extremely negative Unilateralist’s Curse scenario to just sheer mediocrity)
By agentive I sort of meant “how effectively an agent is able to execute actions in accordance with their goals and values”—which seems to be independent of their values/how aligned they are with doing the most good.
I think this is a different scenario to the agent causing harm due to negative corrigibility (though I agree with your point about how this could be taken into account with your model).
It seems possible however that you could incorporate their values/alignment into corrigibility depending on one’s meta-ethical stance.
Ah, in this model, I see ‘effectiveness in executing actions according to values’ a result of lots of directed iteration of improving understanding at lower construal levels over time (reminds of the OODA loop that Romeo mentions above, will also look into the ‘levels of analysis’ now ). In my view, that doesn’t require an extra factor.
Which meta-ethical stance do you think this wouldn’t fit into the model? I’m curious to hear your thoughts to see where it fails to work.
I really liked this post and the model you’ve introduced!
With regards to your pseudomaths, a minor suggestion could be that your product notation is equal to how agentive our actor is. This could allow us to take into account impact that is negative (i.e., harmful processes) by then multiplying the product notation by another factor that takes into account the sign of the action. Then the change in impact could be proportional to the product of these two terms.
I’m happy to hear that it’s useful for you. :-)
Could you clarify what you mean with agentive? The way I see it, at any of the levels from ‘Values’ to ‘Actions’, a person’s position on the corrigibility scale could be so low to be negative. But it’s not an elegant or satisfactory way of modelling it (i.e. different ways of adjusting poorly to evidence could still lead to divergent results from an extremely negative Unilateralist’s Curse scenario to just sheer mediocrity)
By agentive I sort of meant “how effectively an agent is able to execute actions in accordance with their goals and values”—which seems to be independent of their values/how aligned they are with doing the most good.
I think this is a different scenario to the agent causing harm due to negative corrigibility (though I agree with your point about how this could be taken into account with your model).
It seems possible however that you could incorporate their values/alignment into corrigibility depending on one’s meta-ethical stance.
Ah, in this model, I see ‘effectiveness in executing actions according to values’ a result of lots of directed iteration of improving understanding at lower construal levels over time (reminds of the OODA loop that Romeo mentions above, will also look into the ‘levels of analysis’ now ). In my view, that doesn’t require an extra factor.
Which meta-ethical stance do you think this wouldn’t fit into the model? I’m curious to hear your thoughts to see where it fails to work.
Ah okay—I think I understand you, but this is entering areas where I become more confused and have little knowledge.
I’m also a bit lost as to what I meant by my latter point, so will think about it some more if possible.