Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I’ve come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it’s not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.
If there isn’t a clear problem you’re going to have huge sign uncertainty on the impact of any given change”
This is closer to a crux. I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.
From your Alignment Forum comment,
The core feature of AI alignment is that the AI system deliberately and intentionally does things, and creates plans in new situations that you hadn’t seen before, which is not the case with recommender systems.
This seems like the real crux. I’m not sure how exactly you define “deliberately and intentionally” but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).
I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se.
Some illustrative hypotheticals of how these could go poorly:
To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn’t enough data and your reward model places high reward on very negative things it hasn’t seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good).
Same situation, except the problem is that the examples you collected weren’t representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from “mindless fun”, hurting the people who wanted mindless fun)
Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was “unvirtuous fun”, and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn’t hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.)
You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think “wow, X is so bad, they do Y, I bet I could find tons of examples of that” and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm.
You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since any algorithm will be biased according to some metric, see the impossibility theorems). Everyone suffers.
I’m not saying that it’s impossible to do positive things. I’m more saying:
If you aren’t trying to solve a specific problem, it’s really hard and doesn’t seem obviously high-EV, especially due to sign uncertainty
It’s not clear why you should do better than the people at the companies—why is altruism important? If there’s a problem in the form of a deviation between a company’s incentives and what is actually good that has actual consequences in the world, then I can see why altruism has an advantage, but in the absence of such a problem I don’t see why altruists should expect to do better.
recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective.
How do you know that? In most cases of RL I know of, it seems better to model them as repeating things that worked well in the past. Only the largest uses of RL (AlphaZero, OpenAI Five, AlphaStar) seem like they might be exceptions.
I’m curious if approaches like those I describe here (end of the article; building on this which uses mini-publics) for determining rec system policy help address the concerns of your first 3 bullets. I should probably do a write-up or modification specifically for the EA audience (this is for a policy audience), but it ideally gets some of the point across re. how to do “deliberative retrospective judgment” in a way that is more likely to avoid problematic outcomes (I will also be publishing an expanded version that has much more sourcing).
These approaches could help! I don’t have strong reason to believe that they will, nor do I have strong reason to believe that they won’t, and I also don’t have strong reason to believe that the existing system is particularly problematic. I am just generally very uncertain and am mostly saying that other people should also be uncertain (or should explain why they are more confident).
Re: deliberative retrospective judgments as a solution: I assume you are going to be predicting what the deliberative retrospective judgment is in most cases (otherwise it would be far too expensive); it is unclear how easy it will be to do these sorts of predictions. Bullet points 1 and 2 were possibilities where the prediction was hard; I didn’t see on a quick skim why you think they wouldn’t happen. I agree “bridging divides” probably avoids bullet point 3, but I could easily tell different just-so stories where “bridging divides” is a bad choice (e.g. current affairs / news / politics almost always leads to divides, and so is no longer recommended; the population becomes extremely ignorant as a result worsening political dynamics).
Thanks for pointing out that the evidence for specific problems with recommender systems is quite weak and speculative; I’ve come around to this view in the last year, and in retrospect I should have labelled my uncertainty here better and featured it less prominently in the article since it’s not really a crux of the cause prioritization analysis, as you noticed. Will update the post with this in mind.
This is closer to a crux. I think there are a number of concrete changes like optimizing for the user’s deliberative retrospective judgment, developing natural language interfaces or exposing recommender systems internals for researchers to study, which are likely to be hugely positive across most worlds including ones where there’s no “problem” attributable to recommender systems per se. Positive both in direct effects and in flow-through effects in learning what kinds of human-AI interaction protocols lead to good outcomes.
From your Alignment Forum comment,
This seems like the real crux. I’m not sure how exactly you define “deliberately and intentionally” but recommenders trained with RL (a small, but increasing fraction) are definitely capable of generating and executing complex novel sequences of actions towards an objective. Moreover they are deployed in a dynamic world and so encounter new situations habitually (unlike the toy environments more commonly used for AI Alignment research).
Some illustrative hypotheticals of how these could go poorly:
To optimize for deliberative retrospective judgment, you collect thousands of examples of such judgments, the most that is financially feasible. You train a reward model based on these examples and use that as your RL reward signal. Unfortunately this wasn’t enough data and your reward model places high reward on very negative things it hasn’t seen training data on (e.g. perhaps it strongly recommends posts encouraging people to commit suicide if they want to because it thinks encouraging people to do things they want is good).
Same situation, except the problem is that the examples you collected weren’t representative of everyone who uses the recommender system, and so now the recommender system is nearly unusable for such people (e.g. the recommender system pushes away from “mindless fun”, hurting the people who wanted mindless fun)
Same situation, except people are really bad at deliberative retrospective judgments. E.g. they take out everything that was “unvirtuous fun”, and due to the lack of fun people stop using the thing altogether. (Whether this is good or bad depends on whether the technology is net positive or net negative, but I tend to think this would be bad. Anyone I know who isn’t hyper-focused on productivity, i.e. most of the people in the world, seems to either like or be neutral about these technologies.)
You create a natural language interface. People use it to search for evidence that the outgroup is terrible (not deliberately; they think “wow, X is so bad, they do Y, I bet I could find tons of examples of that” and then they do, never seeking evidence in the other direction). Polarization increases dramatically, much more so than with the previous recommendation algorithm.
You expose the internals of recommender systems. Lots of people find gender biases and so on and PR is terrible. Company is forced to ditch their recommender system and instead have nothing (since any algorithm will be biased according to some metric, see the impossibility theorems). Everyone suffers.
I’m not saying that it’s impossible to do positive things. I’m more saying:
If you aren’t trying to solve a specific problem, it’s really hard and doesn’t seem obviously high-EV, especially due to sign uncertainty
It’s not clear why you should do better than the people at the companies—why is altruism important? If there’s a problem in the form of a deviation between a company’s incentives and what is actually good that has actual consequences in the world, then I can see why altruism has an advantage, but in the absence of such a problem I don’t see why altruists should expect to do better.
How do you know that? In most cases of RL I know of, it seems better to model them as repeating things that worked well in the past. Only the largest uses of RL (AlphaZero, OpenAI Five, AlphaStar) seem like they might be exceptions.
I’m curious if approaches like those I describe here (end of the article; building on this which uses mini-publics) for determining rec system policy help address the concerns of your first 3 bullets. I should probably do a write-up or modification specifically for the EA audience (this is for a policy audience), but it ideally gets some of the point across re. how to do “deliberative retrospective judgment” in a way that is more likely to avoid problematic outcomes (I will also be publishing an expanded version that has much more sourcing).
These approaches could help! I don’t have strong reason to believe that they will, nor do I have strong reason to believe that they won’t, and I also don’t have strong reason to believe that the existing system is particularly problematic. I am just generally very uncertain and am mostly saying that other people should also be uncertain (or should explain why they are more confident).
Re: deliberative retrospective judgments as a solution: I assume you are going to be predicting what the deliberative retrospective judgment is in most cases (otherwise it would be far too expensive); it is unclear how easy it will be to do these sorts of predictions. Bullet points 1 and 2 were possibilities where the prediction was hard; I didn’t see on a quick skim why you think they wouldn’t happen. I agree “bridging divides” probably avoids bullet point 3, but I could easily tell different just-so stories where “bridging divides” is a bad choice (e.g. current affairs / news / politics almost always leads to divides, and so is no longer recommended; the population becomes extremely ignorant as a result worsening political dynamics).