Do you not worry about moral uncertainty? Unless you’re certain about consequentialism, surely you should put some weight on avoiding killing even if it maximises impartial welfare?
Isaac Dunn
You’re welcome! N=1 though, so might be worth seeing what other people think too.
For what it’s worth, although I do think we are clueless about the long-run (and so overall) consequences of our actions, the example you’ve given isn’t intuitively compelling to me. My intuition wants to say that it’s quite possible that the cat vs dog decision ends up being irrelevant for the far future / ends up being washed out.
Sorry, I know that’s probably not what you want to hear! Maybe different people have different intuitions.
I don’t think OpenAI’s near term ability to make money (e.g. because of the quality of its models) is particularly relevant now to its valuation. It’s possible it won’t be in the lead in the future, but I think OpenAI investors are betting on worlds where OpenAI does clearly “win”, and the stickiness of its customers in other worlds doesn’t really affect the valuation much.
So I don’t agree that working on this would be useful compared with things that contribute to safety more directly.
How much do you think customers having 0 friction to switching away from OpenAI would reduce its valuation? I think it wouldn’t change it much, less than 10%.
(Also note that OpenAI’s competitors are incentivised to make switching cheap, e.g. Anthropic’s API is very similar to OpenAI’s for this reason.)
I think investors want to invest in OpenAI so badly almost entirely because it’s a bet on OpenAI having better models in the future, not because of sticky customers. So it seems that the effect of this on OpenAI’s cost of capital would be very small?
Interesting exercise, thanks! The link to view the questions doesn’t work though. It says:
The form AI Grantmaking Priorities Survey is no longer accepting responses.
Try contacting the owner of the form if you think that this is a mistake.
Interesting!
I think my worry is people who don’t think they need advice about what the future should look like. When I imagine them making the bad decision despite having lots of time to consult superintelligent AIs, I imagine them just not being that interested in making the “right” decision? And therefore their advisors not being proactive in telling them things that are only relevant for making the “right” decision.
That is, assuming the AIs are intent aligned, they’ll only help you in the ways you want to be helped:
Thoughtful people might realise the importance of getting the decision right, and might ask “please help me to get this decision right” in a way that ends up with the advisors pointing out that AI welfare matters and the decision makers will want to take that into account.
But unthoughtful or hubristic people might not ask for help in that way. They might just ask for help in implementing their existing ideas, and not be interested in making the “right” decision or in what they would endorse on reflection.
I do hope that people won’t be so thoughtless as to impose their vision of the future without seeking advice, but I’m not confident.
I agree that the text an LLM outputs shouldn’t be thought of as communicating with the LLM “behind the mask” itself.
But I don’t agree that it’s impossible in principle to say anything about the welfare of a sentient AI. Could we not develop some guesses about AI welfare by getting a much better understanding of animal welfare? (For example, we might learn much more about when brains are suffering, and this could be suggestive of what to look for in artificial neural nets)
It’s also not completely clear to me what the relationship between the sentient being “behind the mask” is, and the “role-played character”, especially if we imagine conscious, situationally-aware future models. Right now, it’s for sure useful to see the text output by an LLM as simulating a character, which is nothing to do with the reality of the LLM itself, but could that be related to the LLM not being conscious of itself? I feel confused.
Also, even if it was impossible in principle to evaluate the welfare of a sentient AI, you might still want to act differently in some circumstances:
Some ethical views see creating suffering as worse than creating the same amount of pleasure.
Empirically, in animals, it seems to me that the total amount of suffering is probably more than the total amount of pleasure. So we might worry that this could also be the case for ML models.
Why does “lock-in” seem so unlikely to you?
One story:
Assume AI welfare matters
Aligned AI concentrates power in a small group of humans
AI technology allows them to dictate aspects of the future / cause some “lock in” if they want. That’s because:
These humans control the AI systems that have all the hard power in the world
Those AI systems will retain all the hard power indefinitely; their wishes cannot be subverted
Those AI systems will continue to obey whatever instructions they are given indefinitely
Those humans decide to dictate some or all of what the future looks like, and lots of AIs end up suffering in this future because their welfare isn’t considered by the decision makers.
(Also, the decision makers could pick a future which isn’t very good in other ways.)
You could imagine AI welfare work now improving things by putting AI welfare on the radar of those people, so they’re more likely to take AI welfare into account when making decisions.
I’d be interested in which step of this story seems implausible to you—is it about AI technology making “lock in” possible?
Good question! I share that intuition that preventing harm is a really good thing to do, and I find striking the right balance between self-sacrifice and pursuing my own interests difficult.
I think if you argue that that leads to anything close to a normal life you are being disingenuous
I think this is probably wrong for most people. If you make yourself unhappy by trying to force yourself to make sacrifices you don’t want to make, I think most people will be much less productive. And I think that most people actually need a fairly normal social life etc. to avoid that. I believe this because I’ve seen and heard stories of people burning out from trying to work too hard, and I’ve come close myself.
I think the best way to have a large impact probably looks like working as hard as you sustainably can (for most people, I think this is working hard in a normal 9-5 work week or less), and spending enough time thinking seriously about the best strategy for you to make the biggest difference. It might also involve donating money, but again I think it’s a good use of money to spend some money on what makes you happy, to prevent resentment and burn out.
I think misaligned AI values should be expected to be worse than human values, because it’s not clear that misaligned AI systems would care about eg their own welfare.
Inasmuch as we expect misaligned AI systems to be conscious (or whatever we need to care about them) and also to be good at looking after their own interests, I agree that it’s not clear from a total utilitarian perspective that the outcome would be bad.
But the “values” of a misaligned AI system could be pretty arbitrary, so I don’t think we should expect that.
This is a true, counterfactual match, and we will only receive the equivalent amount to what we can raise.
What will happen to the money counterfactually? Presumably it will be donated to other things the match funder thinks are roughly as good as GWWC?
Is this a problem? Seems fine to me, because the meaning is often clear, as in two of your examples, and I think it adds value in those contexts. And if it’s not clear, doesn’t seem like a big loss compared to a counterfactual of having none of these types of vote available.
I think that trying to get safe concrete demonstrations of risk by doing research seems well worth pursuing (I don’t think you were saying it’s not).
Do you have any thoughts on how should people decide between working on groups at CEA and running a group on the ground themselves?
I imagine a lot of people considering applying could be asking themselves that question, and it doesn’t seem obvious to me how to decide.
To be fair, I think I’m partly making wrong assumptions about what exactly you’re arguing for here.
On a slightly closer read, you don’t actually argue in this piece that it’s as high as 90% - I assumed that because I think you’ve argued for that previously, and I think that’s what “high” p(doom) normally means.
Relatedly, I also think that your arguments for “p(doom|AGI)” being high aren’t convincing to people that don’t share your intuitions, and it looks like you’re relying on those (imo weak) arguments, when actually you don’t need to
I think you come across as over-confident, not alarmist, and I think it hurts how you come across quite a lot. (We’ve talked a bit about the object level before.) I’d agree with John’s suggested approach.
Makes sense. To be clear, I think global health is very important, and I think it’s a great thing to devote one’s life to! I don’t think it should be underestimated how big a difference you can make improving the world now, and I admire people who focus on making that happen. It just happens that I’m concerned the future might be even higher priority thing that many people could be in a good position to address.
Thanks Vasco! :)
I agree that thinking about other moral theories is useful for working out what utilitarianism would actually recommend.
That’s an interesting point re increasing the total amount of killing, I hadn’t considered that! But I was actually picking up on your comment which seemed to say something more general—that you wouldn’t intrinsically take into account whether an option involved (you) killing people, you’d just look at the consequences (and killing can lead to worse consequences, including in indirect ways, of course). But it sounds like maybe your response to that is you’re not worried about moral uncertainty / you’re sure about utilitarianism / you don’t have any reason to avoid killing people, other than the (normally very significant) utilitarian reasons not to kill?