AI alignment research, with a focus on moral progress and CEV.
Babel
Thank you for this critique!
Just want to highlight one thing: comments to this post are sometimes a bit harsh, but please don’t take this to mean we’re unwelcoming or defensive (although there may be a real tendency to overly argue for ourselves). The style of discussion on the forum is sometimes just like this :)
Are people encouraged to share this opportunity with non-EA friends and in non-EA circles? If so, maybe consider making this clear in the post?
Glad to hear that you found this useful!
Do you know of any companies that are hiring HRI designers?
Sorry, I know nothing about the HRI space :(
Hi Martyna, maybe this post and its comments can interest you.
Also, something else that comes to mind: Andrew Critch thinks that working on Human-Robot Interaction may be very useful to AI Safety. Note that he isn’t solely talking about robots, but also human-machine interaction in general (that’s how I interpret it; I may well be wrong):
HRI research is concerned with designing and optimizing patterns of interaction between humans and machines—usually actual physical robots, but not always.
Not sure whether other AI Safety researchers would agree on the importance of HRI, and not sure if your current career path is very relevant to this. Anyway, just sharing something that might be useful :)
Thanks for the post, it’s really exciting!
One very minor point:
In China, tofu is a symbol of poverty—a relic from when ordinary people couldn’t afford meat. As such, ordering tofu for guests is often seen as cheap and disrespectful.
I agree that this is somewhat true, but stating it like this seems a bit unfair. Ordering tofu for guests seems fine to me; It only gets problematic when you order way too much of it—in the same way as ordering nothing but rice for guests is extremely disrespectful. (Conflict of Interest: I’m a tofu lover!)
Anyway, I really like your idea! Good luck :)
Thanks for the suggestion, but I’m currently in college, so it’s impossible for me to move :)
Great points! I agree that the longtermist community need to better internalize the anti-speciesist belief that we claim to hold, and explicitly include non-humans in our considerations.
On your specific argument that longtermist work doesn’t affect non-humans:
X-risks aren’t the sole focus of longtermism. IMO work in the S-risk space takes non-humans (including digital minds) much more seriously, to the extent that human welfare is mentioned much less often than non-human welfare.
I think X-risk work does affect non-humans. Linch’s comment mentions one possible way, though I think we need to weigh the upsides and downsides more carefully. Another thing I want to add is that misaligned AI can be a much powerful actor than other earth-originating intelligient species, and may have a large influence on non-humans even after human extinction.
I think we need to thoroughly investigate the influence of our longtermist interventions on non-humans. This topic is highly neglected relative to its importance.
From a consequentialist perspective, I think what matters more is how these options affect your psychology and epistemics (in particular, whether doing this will increase or decrease your speciesist bias, and whether doing this makes you uncomfortable), instead of the amount of suffering they directly produce or reduce. After all, your major impact on the world is from your words and actions, not what you eat.
That being said, I think non-consequentialist views deserve some considerations too, if only due to moral uncertainty. I’m less certain about what are their implications though, especially when taking into account things like WAS.
A few minor notes to your points:
In terms of monetary cost, I think the cost of buying vitamin supplements is approximately cancelled out by the cost of buying meat.
At least where I live, vitamin supplements can be super cheap if you go for the pharmaceutical products instead of those health products wrapped up in fancy packages. I’m taking 5 kinds of supplements simultaneously, and in total they cost me no more than (the RMB equivalent of) several dollars per month.
Also, I wouldn’t eat any meat out of the house, so you can assume that the impact of my eating on my friends is irrelevant.
It might be hard to hide that from your friends if you are eating meat when being alone. All the time people mindlessly say things they aren’t supposed to say. Also when your friends ask you about your eating habit you’ll have to lie, which might be a bad thing even for consequentialists.
Currently, EA resources are not gained gradually year by year; instead, they’re gained in big leaps (think of Openphil and FTX). Therefore it might not make sense to accumulate resources for several years and give them out all at once.
In fact, there is a call for megaprojects in EA, which echos your point 1 and 3 (though these megaprojects are not expected to funded by accumulating resources over the years, but by directly deploying existing resources). I’m not sure if I understand your second point though.
Thanks for the reply, your points make sense! There is certainly a problem of “degree” to each of the concerns I wrote about in the comment, so arguments both for and against it should be taken into account. (To be clear, I wasn’t raising my points to dismiss your approach; Instead, they’re things that I think need to be taken care of, if we’re to take such approach.)
I have to say I’m not sure why the most influential time being in the future wouldn’t imply investing for that time though—I’d be interested to hear your reasoning.
Caveat: I haven’t spend much time thinking about this problem of investing vs direct work, so please don’t take my views too seriously. I should have made this clear in my original comment, my bad.
My first consideration is that we need to distinguish between “this century is more important than any given century in the future” and “this century is more important than all centuries in the future combined”. The latter argues strongly against investing for the future; But the former doesn’t seem to, as by investing now (patient philanthropy, movement building, etc.) you can potentially benefit many centuries to come.
The second consideration is that there’re many more factors than “how important this century is”. The need of the EA movement is one (and is a particularly important consideration for movement building), personal fit is another, among others.
Interesting idea, thanks for doing this! I agree it’s good to have more approachable cause prioritization models, but there’re also associated risks to be careful about:
A widely used model that is not frequently updated could do a lot of damage by spreading outdated views. Unlike large collections of articles, a simple model in a graphic form can be spread really fast, and once it’s spread out on the Internet it can’t be taken back.
A model made by a few individuals or some central organisation may run the risk of deviating from the view of majority EAs; instead a more “democratic” way (not too sure what this means exactly) of making the model might be favored.
Views in EA are really diverse, so one single model likely cannot capture all of them.
Also, I think the decision-tree-style framework used here has some inherent drawbacks:
It’s unclear what “yes” and “no” means.
e.g. What does it mean to agree that “humans have special status”? This can be refering to many different positions (see below for examples) which probably lead to vastly different conclusions.
a. humans have two times higher moral weight than non-humans
b. all animals are morally weighted by their neuron count (or some non-linear function of neuron count)
c. human utility always trumps non-human utility
for another example, see alexrjl’s comment.
Yes-or-no answers usually don’t serve as necessary and sufficient conditions.
e.g. I think “most influential time in future” is neither necessary nor sufficient for prioritizing “investing for the future”.
e.g. I don’t think the combined condition “suffering-focused OR adding people is neutral OR future pessimism” serves as anything close to a necessary condition to prioritizing “improving quality of future”.
A more powerful framework than decision trees might be favored, though I’m not sure what a better alternative would be. One might want to look at ML models for candidates, but one thing to note is that there’s likely a tradeoff between expressiveness and interprettability.
And lastly:
In addition, some foundational assumptions common to EA are made, including a consequentialist view of ethics in which wellbeing is what has intrinsic value.
I think there have been some discussions going on about EA decoupling with consequantialism, which I consider worthy. Might be good to include non-consequentialist considerations too.
While, to my knowledge, an artificial neural network has not been used to distinguish between large numbers of species (the most I found was fourteen, by Ruff et al., 2021)
Here is one study distinguishing between 24 species using bioacoustic data. I stumbled upon this study totally by coincidence, and I don’t know if there’re other studies larger in scale.
The study was carried out by the bioacoustics lab at MSR. It seems like some of their other projects might also be relevant to what we’re discussing here (low confidence, just speculating).
Maybe it would be better to mention less about “do good with your money” and instead more about “do good with your time”? (to counter the misconception that EA is all about E2G)
Also, agreed that the message should be short and simple.
Closely related, and also important, is the question of “which world gets precluded”. Different possibilities include:
By reducing extinction risk from a (hypothetical) scenario in which Earth explodes and falls into pieces, we preclude a world in which there’s no life (and therefore no powerful agent) on what previously was Earth.
By reducing extinction risk from pandemics, we preclude a world in which there’s no human on Earth, but possibly other intelligent species that have evolved to fill the niche previously occupied by humans.
By reducing extinction risk from unaligned AGI, we preclude a world in which there’s no human on Earth, but powerful unaligned AGIs, who may then start its conquest to the broader universe.
By reducing non-extinction existential risks (e.g. disappointing futures), we preclude a world in which humans continue to exist, but fail to realize their potential in doing good.
What will the rankings be like, if we sort the four precluded worlds in decreasing order of badness? I’m highly unsure either, but I would guess something like 4>2>3>1 (larger means worse).
After writing this down, I’m seeing a possible response to the argument above:
If we observe that Alice and Bob had, in the past, made similar decisions under equivalent circumstances, then we can infer that:
There’s an above-baseline likelihood that Alice and Bob have similar source codes, and
There’s an above-baseline likelihood that Alice and Bob have correlated sources of randomness.
(where the “baseline” refers to our prior)
However:
It still rests on the non-trivial metaphysical claim that different “free wills” (i.e. different sources of randomness) could be correlated.
The extent to which we update our prior (on the likelihood of correlated inputs) might be small, especially if we consider it unlikely that inputs could be correlated. This may lead to a much smaller weight of superrational considerations in our decision-making.
One doubt on superrationality:
(I guess similar discussions must have happened elsewhere, but I can’t find them. I am new to decision theory and superrationality, so my thinking may very well be wrong.)
First I present an inaccruate summary of what I want to say, to give a rough idea:
The claim that “if I choose to do X, then my identical counterpart will also do X” seems to (don’t necessarily though; see the example for details) imply there is no free will. But if we in deed assume determinism, then no decision theory is practically meaningful.
Then I shall elaborate with an example:
Two AIs with identical source codes, Alice and Bob, are engaging in a prisoner’s dillema.
Let’s first assume they have no “free will”, i.e. their programs are completely deterministic.
Suppose that Alice defects, then Bob also defects, due to their identical source code.
Now, we can vaguely imagine a world in which Alice had cooperated, and then Bob would also cooperate, resulting in a better outcome.
But that vaguely imagined world is not coherent, as it’s just impossible that, given the way her source code was written, Alice had cooperated.
Therefore, it’s practically meaningless to say “It would be better for Alice to cooperate”.
What if we assume they have free will, i.e. they each have a source of randomness, feeding random numbers into their programs as input?
If the two sources of randomness are completely independent, then decisions of Alice and Bob are also independent. Therefore, to Alice, an input that leads her to defect is always better than an input that leads her to cooperate—under both CDT and EDT.
If, on the other hand, the two sources are somehow correlated, then it might in deed be better for Alice to receive an input that leads her to cooperate. This is the only case in which superrationality is practically meaningful, but here the assumption of correlation is quite a strong claim and IMO dubious:
Our initial assumption on Alice and Bob is only that they have identical source codes. Conditional on Alice and Bob having identical source codes, it seems rather unlikely that their inputs would also be somehow correlated.
In the human case: conditional on my counterpart and I having highly similar brain circuits (and therefore way of thinking), it seems unreasonable to assert that our “free will” (parts of our thinking that aren’t deterministically explainable by brain circuits) will also be highly correlated.
Thanks for the answers, they all make sense and upvoted all of them :)
So for a brief summary:
The action that I described in the question is far from optimal under EV framework (CarlShulman & Brian_Tomasik), and
Even it is optimal, a utilitarian may still have ethical reasons to reject it, if he or she:
endorses some kind of non-traditional utilitarianism, most notably SFE (TimothyChan); or
considers the uncertainty involved to be moral (instead of factual) uncertainty (Brian_Tomasik).
Building conscious AI (in the form of brain emulations or other architectures) could possibly help us create a large amount of valuable artificial beings. Wildely speculative indulgence: being able to simulate humans and their descendents could be a great way to make the human species more robust to most existing existential risks (if it is easy to create artificial humans that can live in simulations then humanity could becomes much more resilient)
That would pose a huge risk of creating astronomical suffering too. For example, if someone decides to do a conscious simulation of natural history on earth, that would be a nightmare for those who work on reducing s-risks.
Thanks for the detailed answer!
I agree that the second- and third-order effects of e.g. donating to super-effective animal advocacy charities are, more likely than not, larger than those of e.g. volunteering at local animal shelters. (though that may depend on the exact charity you’re donating to?)
However, it’s likely that some other action has even larger second- and third-order effects than donating to top charities—after all, most (though not all) of these charities are optimizing for first-order effects, rather than the second- and third-order ones.
Therefore, it’s not obviously justifiable to simply ignore second- and third-order effects in our analysis.