Recently, there’s been significant interest among the EA community in investigating short-term social and political risks of AI systems. I’d like to recommend this video (and Jordan Harrod’s channel as a whole) as a starting point for understanding the empirical evidence on these issues.
From reading the summary in this post, it doesn’t look like the YouTube video discussed bears on the question of whether the algorithm is radicalizing people ‘intentionally,’ which I take to be the interesting part of Russell’s claim.
I’m curious what you think would count as a current ML model ‘intentionally’ doing something? It’s not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I’m confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn’t Russell’s claim that a good strategy for minimising the loss function is to change users’ preferences? Then, whether or not the model is ‘intentionally’ radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
Yeah, I agree this is unclear. But, staying away from the word ‘intention’ entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn’t feed people as much extreme political content? I’d guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)
Is The YouTube Algorithm Radicalizing You? It’s Complicated.
From reading the summary in this post, it doesn’t look like the YouTube video discussed bears on the question of whether the algorithm is radicalizing people ‘intentionally,’ which I take to be the interesting part of Russell’s claim.
I’m curious what you think would count as a current ML model ‘intentionally’ doing something? It’s not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I’m confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn’t Russell’s claim that a good strategy for minimising the loss function is to change users’ preferences? Then, whether or not the model is ‘intentionally’ radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
Yeah, I agree this is unclear. But, staying away from the word ‘intention’ entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn’t feed people as much extreme political content? I’d guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)