I’m curious what you think would count as a current ML model ‘intentionally’ doing something? It’s not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I’m confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn’t Russell’s claim that a good strategy for minimising the loss function is to change users’ preferences? Then, whether or not the model is ‘intentionally’ radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
Yeah, I agree this is unclear. But, staying away from the word ‘intention’ entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn’t feed people as much extreme political content? I’d guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)
I’m curious what you think would count as a current ML model ‘intentionally’ doing something? It’s not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I’m confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn’t Russell’s claim that a good strategy for minimising the loss function is to change users’ preferences? Then, whether or not the model is ‘intentionally’ radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
Yeah, I agree this is unclear. But, staying away from the word ‘intention’ entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn’t feed people as much extreme political content? I’d guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)