In Human Compatible, Stuart Russell makes an argument that I have heard him make repeatedly (I believe on the 80K podcast and the FLI conversation with Steven Pinker). He suggests a pretty bold and surprising claim:
[C]onsider how content-selection algorithms function on social media… Typically, such algorithms are designed to maximize click-through, that is, the probability that the user clicks on presented items. The solution is simply to present items that the user likes to click on, right? Wrong. The solution is to change the user’s preferences so that they become more predictable. A more predictable user can be fed items that they are likely to click on, thereby generating more revenue. People with more extreme political views tend to be more predictable in which items they will click on… Like any rational entity, the algorithm learns how to modify the state of its environment—in this case, the user’s mind—in order to maximize its own reward. The consequences include the resurgence of fascism, the dissolution of the social contract that underpins democracies around the world, and potentially the end of the European Union and NATO. Not bad for a few lines of code, even if it had a helping hand from some humans. Now imagine what a really intelligent algorithm would be able to do.
I don’t doubt that in principle this can and must happen in a sufficiently sophisticated system. What I’m surprised by is the claim that it is happening now. In particular, I would think that modifying human behavior to make people more predictable is pretty hard to do, so that any gains in predictive accuracy for algorithms available today would be swamped by (a) noise and (b) the gains from presenting the content that someone is more likely to click on given their present preferences.
To be clear, I also don’t doubt that there might be pieces of information algorithms can show people to make their behavior more predictable. Introducing someone to a new YouTube channel they have not encountered might make them more likely to click its follow-up videos, so that an algorithm has an incentive to introduce people to channels that lead predictably to their wanting to watch a number of other videos. But this is not the same as changing preferences. He seems to be claiming, or at least very heavily implying, that the algorithms change what people want, holding the environment (including information) constant.
Is there evidence for this (especially empirical evidence)? If so, where could I find it?
Facebook has at least experimented with using deep reinforcement learning to adjust its notifications according to https://arxiv.org/pdf/1811.00260.pdf . Depending on which exact features they used for the state space (i.e. if they are causally connected to preferences), the trained agent would at least theoretically have an incentive to change user’s preferences.
The fact that they use DQN rather than a bandit algorithm seems to suggest that what they are doing involves at least some short term planning, but the paper does not seem to analyze the experiments in much detail, so it is unclear whether they could have used a myopic bandit algorithm instead. Either way, seeing this made me update quite a bit towards being more concerned about the effect of recommender systems on preferences.
Is The YouTube Algorithm Radicalizing You? It’s Complicated.
From reading the summary in this post, it doesn’t look like the YouTube video discussed bears on the question of whether the algorithm is radicalizing people ‘intentionally,’ which I take to be the interesting part of Russell’s claim.
I’m curious what you think would count as a current ML model ‘intentionally’ doing something? It’s not clear to me that any currently deployed ML models can be said to have goals.
To give a bit more context on what I’m confused about: the model that gets deployed is the one that does best at minimising the loss function during training. Isn’t Russell’s claim that a good strategy for minimising the loss function is to change users’ preferences? Then, whether or not the model is ‘intentionally’ radicalising people is beside the point
(I find talk about the goals of AI systems pretty confusing, so I could easily be misunderstanding, or wrong about something)
Yeah, I agree this is unclear. But, staying away from the word ‘intention’ entirely, I think we can & should still ask: what is the best explanation for why this model is the one that minimizes the loss function during training? Does that explanation involve this argument about changing user preferences, or not?
One concrete experiment that could feed into this: if it were the case that feeding users extreme political content did not cause their views to become more predictable, would training select a model that didn’t feed people as much extreme political content? I’d guess training would select the same model anyway, because extreme political content gets clicks in the short-term too. (But I might be wrong.)
There’s a lot of anecdotal evidence that news organizations essentially change user’s preferences. The fundamental story is quite similar. It’s not clear how intentional this is, but there seem to be many cases of people becoming extremized after watching/reading the news (not that I think about it, this seems like a major factor in most of these situations).
I vaguely recall Matt Taibbi complaining about this in the book Hate Inc.
https://www.amazon.com/Hate-Inc-Todays-Despise-Another/dp/B0854P6WHH/ref=sr_1_3?dchild=1&keywords=Matt+Taibbi&qid=1618282776&sr=8-3
Here are a few related links:
https://nymag.com/intelligencer/2019/04/i-gathered-stories-of-people-transformed-by-fox-news.html
https://www.salon.com/2018/11/23/can-we-save-loved-ones-from-fox-news-i-dont-know-if-its-too-late-or-not/
If it turns out that the news channels change preferences, it seems like a small leap to suggest that recommender algorithms that get people onto news programs leads to changing their preferences. Of course, one should have evidence to the magnitude and so on.
Thanks. I’m aware of this sort of argument, though I think most of what’s out there relies on anecdotes, and it’s unclear exactly what the effect is (since there is likely some level of confounding here).
I guess there are still two things holding me up here. (1) It’s not clear that the media is changing preferences or just offering [mis/dis]information. (2) I’m not sure it’s a small leap. News channels’ effects on preferences likely involve prolonged exposure, not a one-time sitting. For an algorithm to expose someone in a prolonged way, it has to either repeatedly recommend videos or recommend one video that leads to their watching many, many videos. The latter strikes me as unlikely; again, behavior is malleable but not that malleable. In the former case, I would think the direct effect on the reward function of all of those individual videos recommended and clicked on has to be way larger than the effect on the person’s behavior after seeing the videos. If my reasoning were wrong, I would find that quite scary, because it would be evidence of substantially greater vulnerability to current algorithms than I previously thought.
(1) The difference between preferences and information seems like a thin line to me. When groups are divided about abortion, for example, which cluster would that fall into?
It feels fairly clear to me that the media facilitates political differences, as I’m not sure how else these could be relayed to the extent they are (direct friends/family is another option, but wouldn’t explain quick and correlated changes in political parties).
(2) The specific issue of prolonged involvement doesn’t seem hard to be believe. People spend lots of time on Youtube. I’ve definitely gotten lots of recommendations to the same clusters of videos. There are only so many clusters out there.
All that said, my story above is fairly different from Stuart’s. I think his is more of “these algorithms are a fundamentally new force with novel mechanisms of preference changes”. My claim is that media sources naturally change the preferences of individuals, so of course if algorithms have control in directing people to media sources, this will be influential in preference modification. This is where “preference modification” basically means, “I didn’t used to be an intense anarcho-capitalist, but then I watched a bunch of the videos, and now tie in strongly to the movement”
However, the issue of “how much do news organizations actively optimize preference modification for the purposes of increasing engagement, either intentionally or non intentionally?” is more vague.