Wei Dai
AI doing philosophy = AI generating hands?
The universe can probably support a lot more sentient life if we convert everything that we can into computronium (optimized computing substrate) and use it to run digital/artificial/simulated lives, instead of just colonizing the universe with biological humans. To conclude that such a future doesn’t have much more potential value than your 2010 world, we would have to assign zero value to such non-biological lives, or value each of them much less than a biological human, or make other very questionable assumptions. The Newberry 2021 paper that Vasco Grilo linked to has a section about about this:
If a significant fraction of humanity’s morally-relevant successors were instantiated digitally, rather than biologically, this would have truly staggering implications for the expected size of the future. As noted earlier, Bostrom (2014) estimates that 10^35 human lives could be created over the entire future, given known physical limits, and that 10^58 human lives could be created if we allow for the possibility of digital persons. While these figures were not intended to indicate a simple scaling law 31, they do imply that digital persons can in principle be far, far more resource efficient than biological life. Bostrom’s estimate of the number of digital lives is also conservative, in that it assumes all such lives will be emulations of human minds; it is by no means clear that whole-brain emulation represents the upper limit of what could be achieved. For a simple example, one can readily imagine digital persons that are similar to whole-brain emulations, but engineered so as to minimise waste energy, thereby increasing resource efficiency.
I think, as a matter of verifiable fact, that if people solve the technical problems of AI alignment, they will use AIs to maximize their own economic consumption, rather than pursue broad utilitarian goals like “maximize the amount of pleasure in the universe”.
If you extrapolate this out to after technological maturity, say 1 million years from now, what does selfish “economic consumption” look like? I tend to think that people’s selfish desires will be fairly easily satiated once everyone is much much richer and the more “scalable” “moral” values would dominate resource consumption at that point, but it might just be my imagination failing me.
I think mundane economic forces are simply much more impactful.
Why does “mundane economic forces” cause resources to be consumed towards selfish ends? I think economic forces select for agents who want to and are good at accumulating resources, but will probably leave quite a bit of freedom in how those resources are ultimately used once the current cosmic/technological gold rush is over. It’s also possible that our future civilization uses up much of the cosmic endowment through wasteful competition, leaving little or nothing to consume in the end. Is that’s your main concern?
(By “wasteful competition” I mean things like military conflict, costly signaling, races of various kinds that accumulate a lot of unnecessary risks/costs. It seems possible that you categorize these under “selfishness” whereas I see them more as “strategic errors”.)
To be sure, ensuring AI development proceeds ethically is a valuable aim, but I claim this goal is *not *the same thing as “AI alignment”, in the sense of getting AIs to try to do what people want.
There was at least one early definition of “AI alignment” to mean something much broader:
The “alignment problem for advanced agents” or “AI alignment” is the overarching research topic of how to develop sufficiently advanced machine intelligences such that running them produces good outcomes in the real world.
I’ve argued that we should keep using this broader definition, in part for historical reasons, and in part so that AI labs (and others, such as EAs) can more easily keep in mind that their ethical obligations/opportunities go beyond making sure that AI does what people want. But it seems that I’ve lost that argument so it’s good to periodically remind people to think more broadly about their obligations/opportunities. (You don’t say this explicitly, but I’m guessing it’s part of your aim in writing this post?)
(Recently I’ve been using “AI safety” and “AI x-safety” interchangeably when I want to refer to the “overarching” project of making the AI transition go well, but I’m open to being convinced that we should come up with another term for this.)
That said, I think I’m less worried than you about “selfishness” in particular and more worried about moral/philosophical/strategic errors in general. The way most people form their morality is scary to me, and personally I would push humanity to be more philosophically competent/inclined before pushing it to be less selfish.
- Long Reflection Reading List by Mar 24, 2024, 4:27 PM; 99 points) (
- Mar 26, 2024, 7:00 PM; 12 points) 's comment on Wei Dai’s Shortform by (LessWrong;
- Jan 15, 2024, 3:58 PM; 10 points) 's comment on AI doing philosophy = AI generating hands? by (
Not exactly what you’re asking for, but you could use it as a reference for all of the significant risks that different people have brought up, to select which ones you want to further research and address in your response post.
Tell me more about these “luxurious AI safety retreats”? I haven’t been to an AI safety workshop in several years, and wonder if something has changed. From searching the web, I found this:
and this:
I was there for an AI workshop earlier this year in Spring and stayed for 2 or 3 days, so let me tell you about the ‘luxury’ of the ‘EA castle’: it’s a big, empty, cold, stone box, with an awkward layout. (People kept getting lost trying to find the bathroom or a specific room.) Most of the furnishings were gone. Much of the layout you can see in Google Maps was nonfunctional, and several wings were off-limits or defunct, so in practice it was maybe a quarter of the size you’d expect from the Google Maps overview. There were clearly extensive needs for repair and remodeling of a lot of ancient construction, and most of the gardens are abandoned as too expensive to maintain. It is, as a real estate agent might say, very ‘historical’ and a ‘good fixer-upper’.
And not much visible evidence of luxury.
I thought about this and wrote down some life events/decisions that probably contributed to becoming who I am today.
Immigrating to the US at age 10 knowing no English. Social skills deteriorated while learning language, which along with lack of cultural knowledge made it hard to make friends during teenage and college years, which gave me a lot of free time that I filled by reading fiction and non-fiction, programming, and developing intellectual interests.
Was heavily indoctrinated with Communist propaganda while in China, but leaving meant I then had no viable moral/philosophical/political foundations. Parents were too busy building careers as new immigrants and didn’t try to teach me values/traditions. So I had a lot of questions that I didn’t have ready answers to, which perhaps contributed to my intense interest in philosophy (ETA: and economics and game theory).
Had an initial career in cryptography, but found it a struggle to compete with other researchers on purely math/technical skills. Realized that my comparative advantage was in more conceptual work. Crypto also taught me to be skeptical of my own and other people’s ideas.
Had a bad initial experience with academic research (received nonsensical peer review when submitting a paper to a conference) so avoided going that route. Tried various ways to become financially independent, and managed to “retire” in my late 20s to do independent research as a hobby.
A lot of these can’t really be imitated by others (e.g., I can’t recommend people avoid making friends in order to have more free time for intellectual interests). But here are some practical advice I can think of:
I think humanity really needs to make faster philosophical progress, so try your hand at that even if you think of yourself as more of a technical person. Same may be true for solving social/coordination problems. (But see next item.)
Somehow develop a healthy dose of self-skepticism so that you don’t end up wasting people’s time and attention arguing for ideas that aren’t actually very good.
It may be worth keeping an eye out for opportunities to “get rich quick” so you can do self-supported independent research. (Which allows you to research topics that don’t have legible justifications or are otherwise hard to get funding for, and pivot quickly as the landscape and your comparative advantage both change over time.)
ETA: Oh, here’s a recent LW post where I talked about how I arrived at my current set of research interests, which may also be of interest to you.
The main thing is that the clean distinction between attackers and defenders in the theory of the offense-defense balance does not exist in practice. All attackers are also defenders and vice-versa.
I notice that this doesn’t seem to apply to the scenario/conversation you started this post with. If a crazy person wants to destroy the world with an AI-created bioweapon, he’s not also a defender.
Another scenario I worry about is AIs enabling value lock-in, and then value locked-in AIs/humans/groups would have an offensive advantage in manipulating other people’s values (i.e., those who are not willing to value lock-in yet) while not having to be defenders.
Why aren’t there more people like him, and what is he doing or planning to do about that?
It seems like you’re basically saying “evolution gave us reason, which some of us used to arrive at impartiality” which doesn’t seem very different from my thinking which I alluded to in my opening comment (except that I used “philosophy” instead of “reason). Does that seem fair, or am I rounding you off too much, or otherwise missing your point?
I agree that was too strong or over simplified. Do you think there are other evolutionary perspectives from which impartiality is less surprising?
Thanks, I didn’t know some of this history.
The Altman you need to distrust & assume bad faith of & need to be paranoid about stealing your power is also usually an Altman who never gave you any power in the first place! I’m still kinda baffled by it, personally.
Two explanations come to my mind:
Past Sam Altman didn’t trust his future self, and wanted to use the OpenAI governance structure to constrain himself.
His status game / reward gradient changed (at least subjectively from his perspective). At the time it was higher status to give EA more power / appear more safety-conscious, and now it’s higher status to take it back / race faster for AGI. (I note there was internal OpenAI discussion about wanting to disassociate with EA after the FTX debacle.)
Both of reasons these probably played some causal role in what happened, but may well have been subconscious considerations. (Also entirely possible that he changed his mind in part for what we’d consider fair reasons.)
So, what could the EA faction of the board have done? …Not much, really. They only ever had the power that Altman gave them in the first place.
Some ideas for what they could have done:
Reasoned about why Altman gave them power in the first place. Maybe come up with hypotheses 1 and 2 above (or others) earlier in the course of events. Try to test these hypotheses when possible and use them to inform decision making.
If they thought 1 was likely, they could have talked to Sam about it explicitly at an early date, asked for more power or failsafes, got more/better experts (at corporate politics) to advise them, monitored Sam more closely, developed preparations/plans for the possible future fight. Asked Sam to publicly talk about how he didn’t trust himself, so that the public would be more sympathetic to the board when the time comes.
If 2 seemed likely, tried to manage Altman’s status (or reward in general) gradient better. For example, gave prominent speeches / op-eds highlighting AI x-risk and OpenAI’s commitment to safety. Asked/forced Sam to frequently do the same thing. Managed risk better so that FTX didn’t happen.
Not back Sam in the first place so they could criticize/constrain him from the outside (e.g. by painting him/OpenAI as insufficiently safety-focused and pushing harder for government regulations). Or made it an explicit and public condition of backing him that EA (including the board members) were allowed to criticize and try to constrain OpenAI, and frequently remind the public of this condition, in part by actually doing this.
Made it OpenAI policy that past and present employees are allowed/encouraged to publicly criticize OpenAI, so that for example the public would be aware of why the previous employee exodus (to Anthropic) happened.
It seems that EA tried to “play politics” with Sam Altman and OpenAI, by doing things like backing him with EA money and credibility (in exchange for a board seat) without first having high justifiable trust in him, generally refraining from publicly (or even privately, from what I can gather) criticizing Sam and OpenAI, Helen Toner apologizing to Sam/OpenAI for expressing even mild criticism in an academic paper, springing a surprise attack or counterattack on Sam by firing him without giving any warning or chance to justify himself.
I wonder how much of this course of action was intended / carefully considered, and whether/what parts people still endorse in retrospect. Or more generally, what lessons are people drawing from this whole episode?
I’m personally unsure whether to update in the direction of “play politics harder/better” or “play politics less and be principled more” or maybe “generally be more principled but play politics better when you have to”? Or even “EA had a pretty weak hand throughout and played it as well as can be reasonably expected”? (It sucks that insiders who can best answer these questions are choosing or committed to not talking.)
a lot of LessWrong writing refers to ‘status’, but they never clearly define what it is or where the evidence and literature for it is
Two citations that come to mind are Geoffrey Miller’s Virtue Signaling and Will Storr’s The Status Game (maybe also Robin Hanson’s book although its contents are not as fresh in my mind), but I agree that it’s not very scientific or well studied (unless there’s a body of literature on it that I’m unfamiliar with), which is something I’d like to see change.
Maybe it’s instead a kind of non-reductionist sense of existing and having impact, which I do buy, but then things like ‘ideas’,‘values’, and ‘beliefs’ should also exist in this non-reductionist way and be as important for considering human action as ‘status’ is.
Well sure, I agree with this. I probably wouldn’t have made my suggestion if EAs talked about status roughly as much as ideas, values, or beliefs.
Which to me is dangerously close to saying “if something talks about status, it’s evidence it’s real. If they don’t talk about it, then they’re self-deceiving in a Hansion sense, and this is evidence for status” which sets off a lot of epistemological red-flags for me
It seems right that you’re wary about this, but on reflection I think the main reason I think status is real is not because people talk or don’t talk about it, but because I see human behavior that seems hard to explain without invoking such a concept. For example, why are humans moral but our moralities vary so much across different communities? Why do people sometimes abandon or fail to act according to their beliefs/values without epistemic or philosophical reasons to do so? Why do communities sometimes collectively become very extreme in their beliefs/values, again without apparent epistemic or philosophical justification?
One, be more skeptical when someone says they are committed to impartially do the most good, and keep in mind that even if they’re totally sincere, that commitment may well not hold when their local status game changes, or if their status gradient starts diverging from actual effective altruism. Two, form a more explicit and detailed model of how status considerations + philosophy + other relevant factors drive the course of EA and other social/ethical movements, test this model empirically, basically do science on this and use it to make predictions and inform decisions in the future. (Maybe one or both of these could have helped avoid some of the mistakes/backlashes EA has suffered.)
One tricky consideration here is that people don’t like to explicitly think about status, because it’s generally better for one’s status to appear to do everything for its own sake, and any explicit talk about status kind of ruins that appearance. Maybe this can be mitigated somehow, for example by keeping some distance between the people thinking explicitly about status and EA in general. Or maybe, for the long term epistemic health of the planet, we can somehow make it generally high status to reason explicitly about status?
- Dec 3, 2023, 4:54 PM; 22 points) 's comment on Sam Altman / Open AI Discussion Thread by (
- Dec 3, 2023, 7:32 PM; 5 points) 's comment on Sam Altman / Open AI Discussion Thread by (
From an evolution / selfish gene’s perspective, the reason I or any human has morality is so we can win (or at least not lose) our local virtue/status game. Given this, it actually seems pretty wild that anyone (or more than a handful of outliers) tries to be impartial. (I don’t have a good explanation of how this came about. I guess it has something to do with philosophy, which I also don’t understand the nature of.)
BTW, I wonder if EAs should take the status game view of morality more seriously, e.g., when thinking about how to expand the social movement, and predicting the future course of EA itself.
What is a plausible source of x-risk that is 10% per century for the rest of time? It seems pretty likely to me that not long after reaching technological maturity, future civilization would reduce x-risk per century to a much lower level, because you could build a surveillance/defense system against all known x-risks, and not have to worry about new technology coming along and surprising you.
It seems that to get a constant 10% per century risk, you’d need some kind of existential threat for which there is no defense (maybe vacuum collapse), or for which the defense is so costly that that the public goods problem prevents it from being built (e.g., no single star system can afford it on their own). But the likelihood of such a threat existing in our universe doesn’t seem that high to me (maybe 20%?) which I think upper bounds the long term x-risk.
Curious how your model differs from this.
I’m confused about how it’s possible to know whether someone is making substantive progress on metaphilosophy; I’d be curious if you have pointers.
I guess it’s the same as any other philosophical topic, either use your own philosophical reasoning/judgement to decide how good the person’s ideas/arguments are, and/or defer to other people’s judgements. The fact that there is currently no methodology for doing this that is less subjective and informal is a major reason for me to be interested in metaphilosophy, since if we solve metaphilosophy that will hopefully give us a better methodology for judging all philosophical ideas, assuming the correct solution to metaphilosophy isn’t philosophical anti-realism (i.e., philosophical questions don’t have right or wrong answers), or something like that.
- Jan 15, 2024, 3:58 PM; 10 points) 's comment on AI doing philosophy = AI generating hands? by (
How should we deal with the possibility/risk of AIs inherently disfavoring all the D’s that Vitalik wants to accelerate? See my Twitter thread replying to his essay for more details.