As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I’ve seen specific arguments arguing against the claim.) One seems to be a paper studying what users believe about the Facebook algorithm (I don’t see any connection to “harm to relationships”, if anything, the paper talks about how people use Facebook to maintain relationships). The last one is a paper whose abstract does in fact talk about phones reducing cognitive capacity, but (a) most papers are garbage, (b) beware the man of one study, and (c) why blame recommender systems for that, when it could just as easily be (say) email that’s the problem?
Overall I feel pretty unconvinced that there even is a major problem with recommender systems. (I’m not convinced that there isn’t a problem either.)
You could argue that since recommender systems have huge scale, any changes you make will be impactful, regardless of whether there is a problem or not. However, if there isn’t a clear problem that you are trying to fix, I think you are going to have huge sign uncertainty on the impact of any given change, so the EV seems pretty low.
----
The main argument of this post seems to be that this cause area would have spillover effects into AGI alignment, so maybe I’m being unfair by focusing on whether or not there’s a problem. But if that’s your primary motivation, I think you should just do whatever seems best to address AGI alignment, which I expect won’t be to work on recommender systems. (Note that the skills needed for recommender alignment are also needed for some flavors of AGI alignment research, so personal fit won’t usually change the calculus much.)
----
Before you point me to Tristan Harris, I’ve engaged with (some of) those arguments too, see my thoughts here.
Unfortunately I don’t really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.
The next Alignment Newsletter will include twoarticles on recommender systems that mostly disagree with the “recommender systems are driving polarization” position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn’t neglecting arguments pointing in the opposite direction.)
EDIT: To be clear, I’d be excited for someone else to develop this into a post. The majority of my relevant thoughts are in the comments I already wrote, which anyone should feel free to use :)
Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I’ve come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it’s not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.
If there isn’t a clear problem you’re going to have huge sign uncertainty on the impact of any given change”
This is closer to a crux. I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.
From your Alignment Forum comment,
The core feature of AI alignment is that the AI system deliberately and intentionally does things, and creates plans in new situations that you hadn’t seen before, which is not the case with recommender systems.
This seems like the real crux. I’m not sure how exactly you define “deliberately and intentionally” but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).
I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se.
Some illustrative hypotheticals of how these could go poorly:
To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn’t enough data and your reward model places high reward on very negative things it hasn’t seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good).
Same situation, except the problem is that the examples you collected weren’t representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from “mindless fun”, hurting the people who wanted mindless fun)
Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was “unvirtuous fun”, and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn’t hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.)
You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think “wow, X is so bad, they do Y, I bet I could find tons of examples of that” and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm.
You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since any algorithm will be biased according to some metric, see the impossibility theorems). Everyone suffers.
I’m not saying that it’s impossible to do positive things. I’m more saying:
If you aren’t trying to solve a specific problem, it’s really hard and doesn’t seem obviously high-EV, especially due to sign uncertainty
It’s not clear why you should do better than the people at the companies—why is altruism important? If there’s a problem in the form of a deviation between a company’s incentives and what is actually good that has actual consequences in the world, then I can see why altruism has an advantage, but in the absence of such a problem I don’t see why altruists should expect to do better.
recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective.
How do you know that? In most cases of RL I know of, it seems better to model them as repeating things that worked well in the past. Only the largest uses of RL (AlphaZero, OpenAI Five, AlphaStar) seem like they might be exceptions.
I’m curious if approaches like those I describe here (end of the article; building on this which uses mini-publics) for determining rec system policy help address the concerns of your first 3 bullets. I should probably do a write-up or modification specifically for the EA audience (this is for a policy audience), but it ideally gets some of the point across re. how to do “deliberative retrospective judgment” in a way that is more likely to avoid problematic outcomes (I will also be publishing an expanded version that has much more sourcing).
These approaches could help! I don’t have strong reason to believe that they will, nor do I have strong reason to believe that they won’t, and I also don’t have strong reason to believe that the existing system is particularly problematic. I am just generally very uncertain and am mostly saying that other people should also be uncertain (or should explain why they are more confident).
Re: deliberative retrospective judgments as a solution: I assume you are going to be predicting what the deliberative retrospective judgment is in most cases (otherwise it would be far too expensive); it is unclear how easy it will be to do these sorts of predictions. Bullet points 1 and 2 were possibilities where the prediction was hard; I didn’t see on a quick skim why you think they wouldn’t happen. I agree “bridging divides” probably avoids bullet point 3, but I could easily tell different just-so stories where “bridging divides” is a bad choice (e.g. current affairs / news / politics almost always leads to divides, and so is no longer recommended; the population becomes extremely ignorant as a result worsening political dynamics).
As far as I can tell, this is all the evidence given in this post that there is in fact a problem. Two of the four links are news articles, which I ignore on the principle that news articles are roughly uncorrelated with the truth. (On radicalization I’ve seen specific arguments arguing against the claim.) One seems to be a paper studying what users believe about the Facebook algorithm (I don’t see any connection to “harm to relationships”, if anything, the paper talks about how people use Facebook to maintain relationships). The last one is a paper whose abstract does in fact talk about phones reducing cognitive capacity, but (a) most papers are garbage, (b) beware the man of one study, and (c) why blame recommender systems for that, when it could just as easily be (say) email that’s the problem?
Overall I feel pretty unconvinced that there even is a major problem with recommender systems. (I’m not convinced that there isn’t a problem either.)
You could argue that since recommender systems have huge scale, any changes you make will be impactful, regardless of whether there is a problem or not. However, if there isn’t a clear problem that you are trying to fix, I think you are going to have huge sign uncertainty on the impact of any given change, so the EV seems pretty low.
----
The main argument of this post seems to be that this cause area would have spillover effects into AGI alignment, so maybe I’m being unfair by focusing on whether or not there’s a problem. But if that’s your primary motivation, I think you should just do whatever seems best to address AGI alignment, which I expect won’t be to work on recommender systems. (Note that the skills needed for recommender alignment are also needed for some flavors of AGI alignment research, so personal fit won’t usually change the calculus much.)
----
Before you point me to Tristan Harris, I’ve engaged with (some of) those arguments too, see my thoughts here.
Have you considered developing these comments into a proper EA Forum post?
Unfortunately I don’t really have the time to do this well, and I think it would be a pretty bad post if I wrote the version that would be ~2 hours of effort or less.
The next Alignment Newsletter will include two articles on recommender systems that mostly disagree with the “recommender systems are driving polarization” position; you might be interested in those. (In fact, I did this shallow dive because I wanted to make sure I wasn’t neglecting arguments pointing in the opposite direction.)
EDIT: To be clear, I’d be excited for someone else to develop this into a post. The majority of my relevant thoughts are in the comments I already wrote, which anyone should feel free to use :)
Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I’ve come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it’s not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.
This is closer to a crux. I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.
From your Alignment Forum comment,
This seems like the real crux. I’m not sure how exactly you define “deliberately and intentionally” but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).
Some illustrative hypotheticals of how these could go poorly:
To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn’t enough data and your reward model places high reward on very negative things it hasn’t seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good).
Same situation, except the problem is that the examples you collected weren’t representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from “mindless fun”, hurting the people who wanted mindless fun)
Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was “unvirtuous fun”, and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn’t hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.)
You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think “wow, X is so bad, they do Y, I bet I could find tons of examples of that” and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm.
You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since any algorithm will be biased according to some metric, see the impossibility theorems). Everyone suffers.
I’m not saying that it’s impossible to do positive things. I’m more saying:
If you aren’t trying to solve a specific problem, it’s really hard and doesn’t seem obviously high-EV, especially due to sign uncertainty
It’s not clear why you should do better than the people at the companies—why is altruism important? If there’s a problem in the form of a deviation between a company’s incentives and what is actually good that has actual consequences in the world, then I can see why altruism has an advantage, but in the absence of such a problem I don’t see why altruists should expect to do better.
How do you know that? In most cases of RL I know of, it seems better to model them as repeating things that worked well in the past. Only the largest uses of RL (AlphaZero, OpenAI Five, AlphaStar) seem like they might be exceptions.
I’m curious if approaches like those I describe here (end of the article; building on this which uses mini-publics) for determining rec system policy help address the concerns of your first 3 bullets. I should probably do a write-up or modification specifically for the EA audience (this is for a policy audience), but it ideally gets some of the point across re. how to do “deliberative retrospective judgment” in a way that is more likely to avoid problematic outcomes (I will also be publishing an expanded version that has much more sourcing).
These approaches could help! I don’t have strong reason to believe that they will, nor do I have strong reason to believe that they won’t, and I also don’t have strong reason to believe that the existing system is particularly problematic. I am just generally very uncertain and am mostly saying that other people should also be uncertain (or should explain why they are more confident).
Re: deliberative retrospective judgments as a solution: I assume you are going to be predicting what the deliberative retrospective judgment is in most cases (otherwise it would be far too expensive); it is unclear how easy it will be to do these sorts of predictions. Bullet points 1 and 2 were possibilities where the prediction was hard; I didn’t see on a quick skim why you think they wouldn’t happen. I agree “bridging divides” probably avoids bullet point 3, but I could easily tell different just-so stories where “bridging divides” is a bad choice (e.g. current affairs / news / politics almost always leads to divides, and so is no longer recommended; the population becomes extremely ignorant as a result worsening political dynamics).