I think you can get closer to dissolving this problem by considering why you’re assigning credit. Often, we’re assigning some kind of finite financial rewards.
Imagine that a group of n people have all jointly created $1 of value in the world, and that if any one of them did not participate, there would only be $0 units of value. Clearly, we can’t give $1 to all of them, because then we would be paying $n to reward an event that only created $0 of value, which is inefficient. If, however, only the first guy (i=1) is an “agent” that responds to incentives, while the others (1<=i<=n) are “environment” whose behaviour is unresponsive to incentives, then it is fine to give the first guy a reward of $1.
This is how you can ground the idea that agents who cooperate should share their praise (something like a Shapley Value approach), whereas rival agents who don’t buy into your reward scheme should be left out of the shapley calculation.
I think you can get closer to dissolving this problem by considering why you’re assigning credit. Often, we’re assigning some kind of finite financial rewards.
Imagine that a group of n people have all jointly created $1 of value in the world, and that if any one of them did not participate, there would only be $0 units of value. Clearly, we can’t give $1 to all of them, because then we would be paying $n to reward an event that only created $0 of value, which is inefficient. If, however, only the first guy (i=1) is an “agent” that responds to incentives, while the others (1<=i<=n) are “environment” whose behaviour is unresponsive to incentives, then it is fine to give the first guy a reward of $1.
This is how you can ground the idea that agents who cooperate should share their praise (something like a Shapley Value approach), whereas rival agents who don’t buy into your reward scheme should be left out of the shapley calculation.