Wei Dai
The source code was available, but if someone wanted to claim compliance with the NIST standard (in order to sell their product to the federal government, for example), they had to use the pre-compiled executable version.
I guess there’s a possibility that someone could verify the executable by setting up an exact duplicate of the build environment and re-compiling from source. I don’t remember how much I looked into that possibility, and whether it was infeasible or just inconvenient. (Might have been the former; I seem to recall the linker randomizing some addresses in the binary.) I do know that I never documented a process to recreate the executable and nobody asked.
It’s not clear to me why human vs. AIs would make war more likely to occur than in the human vs. human case, if by assumption the main difference here is that one side is more rational.
We have more empirical evidence that we can look at when it comes to human-human wars, making it easier to have well-calibrated beliefs about chances of winning. When it comes to human-AI wars, we’re more likely to have wildly irrational beliefs.
This is just one reason war could occur though. Perhaps a more likely reason is that there won’t be a way to maintain the peace, that both sides can be convinced will work, and is sufficiently cheap that the cost doesn’t eat up all of the gains from avoiding war. For example, how would the human faction know that if it agrees to peace, the AI faction won’t fully dispossess the humans at some future date when it’s even more powerful? Even if AIs are able to come up with some workable mechanisms, how would the humans know that it’s not just a trick?
Without credible assurances (which seems hard to come by), I think if humans do agree to peace, the most likely outcome is that it does get dispossessed in the not too distant future, either gradually (for example getting scammed/persuaded/blackmailed/stolen from in various ways), or all at once. I think society as a whole won’t have a strong incentive to protect humans because they’ll be almost pure consumers (not producing much relative to what they consume), and such classes of people are often killed or dispossessed in human history (e.g., landlords after communist takeovers).
I don’t think this follows. Humans presumably also had empathy in e.g. 1500, back when war was more common, so how could it explain our current relative peace?
I mainly mean that without empathy/altruism, we’d probably have even more wars, both now and then.
To the extent that changing human nature explains our current relatively peaceful era, this position seems to require that you believe human nature is fundamentally quite plastic and can be warped over time pretty easily due to cultural changes.
Well, yes, I’m also pretty scared of this. See this post where I talked about something similar. I guess overall I’m still inclined to push for a future where “AI alignment” and “human safety” are both solved, instead of settling for one in which neither is (which I’m tempted to summarize your position as, but I’m not sure if I’m being fair).
What are some failure modes of such an agency for Paul and others to look out for? (I shared one anecdote with him, about how a NIST standard for “crypto modules” made my open source cryptography library less secure, by having a requirement that had the side effect that the library could only be certified as standard-compliant if it was distributed in executable form, forcing people to trust me not to have inserted a backdoor into the executable binary, and then not budging when we tried to get an exception for this requirement.)
I’ve looked into the game theory of war literature a bit, and my impression is that economists are still pretty confused about war. As you mention, the simplest model predicts that rational agents should prefer negotiated settlements to war, and it seems unsettled what actually causes wars among humans. (People have proposed more complex models incorporating more elements of reality, but AFAIK there isn’t a consensus as to which model gives the best explanation of why wars occur.) I think it makes sense to be aware of this literature and its ideas, but there’s not a strong argument for deferring to it over one’s own ideas or intuitions.
My own thinking is that war between AIs and humans could happen in many ways. One simple (easy to understand) way is that agents will generally refuse a settlement worse than what they think they could obtain on their own (by going to war), so human irrationality could cause a war when e.g. the AI faction thinks it will win with 99% probability, and humans think they could win with 50% probability, so each side demand more of the lightcone (or resources in general) than the other side is willing to grant.
To take this one step further, I would say that given that many deviations from the simplest game theoretic model do predict war, war among consequentialist agents may well be the default in some sense. Also, given that humans often do (or did) go to war with each other, our shared values (i.e. the extent to which we do have empathy/altruism for others) must contribute to the current relative peace in some way.
I followed the instructions here.
I was curious why given Will’s own moral uncertainty (in this interview he mentioned having only 3% credence in utilitarianism) he wasn’t concerned about SBF’s high confidence in utilitarianism, but didn’t hear the topic addressed. Maybe @William_MacAskill could comment on it here?
One guess is that apparently many young people in EA are “gung ho” on utilitarianism (mentioned by Spencer in this episode), so perhaps Will just thought that SBF isn’t unusual in that regard? One lesson could be that such youthful over-enthusiasm is more dangerous than it seems, and EA should do more to warn people about the dangers of too much moral certainty and overconfidence in general.
Made a transcript with Microsoft Word.
- Apr 18, 2024, 11:57 AM; 25 points) 's comment on Personal reflections on FTX by (
Some suggestions for you to consider:
Target a different (non-EA) audience.
Do not say anything or cite any data that could be interpreted or misinterpreted as racist (keeping in mind that some people will be highly motivated to interpret them in this way).
Tailor your message to what you can say/cite. For example, perhaps frame the cause as one of pure justice/fairness (as opposed to consequentialist altruism), e.g., it’s simply unfair that some people can not afford genetic enhancement while others can. (Added: But please think this through carefully to prevent undesirable side effects, e.g., making some people want to ban genetic enhancement altogether.)
You may need to start a new identity in order to successfully do the above.
Then I think for practical decision-making purposes we should apply a heavy discount to world A) — in that world, what everyone else would eventually want isn’t all that close to what I would eventually want. Moreover what me-of-tomorrow would eventually want probably isn’t all that close to what me-of-today would eventually want. So it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights. Moreover, even though these worlds possibly differ significantly, I don’t feel like from my present position I have that much reason to be opinionated between them; it’s unclear that I’d greatly imperfect worlds according to the extrapolated volition of some future-me, relative to the imperfect worlds according to the extrapolated volition of someone else I think is pretty reasonable.
You seem to be assuming that people’s extrapolated views in world A will be completely uncorrelated with their current views/culture/background, which seems a strange assumption to make.
People’s extrapolated views could be (in part) selfish or partial, which is an additional reason that extrapolated views of you at different times may be closer than that of strangers.
People’s extrapolated views not converging doesn’t directly imply “it’s much much less likely that the world we end up with even if we save it is close to the ideal one by my lights” because everyone could still get close to what they want through trade/compromise, or you (and/or others with extrapolated views similar to yours) could end up controlling most of the future by winning the relevant competitions.
It’s not clear that applying a heavy discount to world A makes sense, regardless of the above, because we’re dealing with “logical risk” which seems tricky in terms of decision theory.
Thanks, lots of interesting articles in this list that I missed despite my interest in this area.
One suggestion I have is to add some studies of failed attempts at building/reforming institutions, otherwise one might get a skewed view of the topic. (Unfortunately I don’t have specific readings to suggest.)
A related topic you don’t mention here (maybe due to lack of writings on it?) is maybe humanity should pause AI development and have a long (or even short!) reflection about what it wants to do next, e.g. resume AI development or do something else like subsidize intelligence enhancement (e.g. embryo selection) for everyone who wants it so more people can meaningfully participate in deciding the fate of our world. (I note that many topics on this reading list are impossible for most humans to fully understand, perhaps even with AI assistance.)
I claim that this area outscores regular AI safety on importance while being significantly more neglected
This neglect is itself perhaps one of the most important puzzles of our time. With AGI very plausibly just a few years away, why aren’t more people throwing money or time/effort at this cluster of problems just out of self interest? Why isn’t there more intellectual/academic interest in these topics, many of which seem so intrinsically interesting to me?
We have to make judgment calls about how to structure our reflection strategy. Making those judgment calls already gets us in the business of forming convictions. So, if we are qualified to do that (in “pre-reflection mode,” setting up our reflection procedure), why can’t we also form other convictions similarly early?
I’m very confused/uncertain about many philosophical topics that seem highly relevant to morality/axiology, such as the nature of consciousness and whether there is such a thing as “measure” or “reality fluid” (and if so what is it based on). How can it be right or safe to form moral convictions under such confusion/uncertainty?
It seems quite plausible that in the future I’ll have access to intelligence-enhancing technologies that will enable me to think of many new moral/philosophical arguments and counterarguments, and/or to better understand existing ones. I’m reluctant to form any convictions until that happens (or the hope of it ever happening becomes very low).
Also I’m not sure how I would form object-level moral convictions even if I wanted to. No matter what I decide today, why wouldn’t I change my mind if I later hear a persuasive argument against it? The only thing I can think of is to hard-code something to prevent my mind being changed about a specific idea, or to prevent me from hearing or thinking arguments against a specific idea, but that seems like a dangerous hack that could mess up my entire belief system.
Therefore, it seems reasonable/defensible to think of oneself as better positioned to form convictions about object-level morality (in places where we deem it safe enough).
Do you have any candidates for where you deem it safe enough to form object-level moral convictions?
I put the full report here so you don’t have to wait for them to email it to you.
Anyone with thoughts on what went wrong with EA’s involvement in OpenAI? It’s probably too late to apply any lessons to OpenAI itself, but maybe not too late elsewhere (e.g., Anthropic)?
While drafting this post, I wrote down and then deleted an example of “avoiding/deflecting questions about risk” because the person I asked such a question is probably already trying to push their organization to take risks more seriously, and probably had their own political considerations for not answering my question, so I don’t want to single them out for criticism, and also don’t want to damage my relationship with this person or make them want to engage less with me or people like me in the future.
Trying to enforce good risk management via social rewards/punishments might be pretty difficult for reasons like these.
My main altruistic endeavor involves thinking and writing about ideas that seem important and neglected. Here is a list of the specific risks that I’m trying to manage/mitigate in the course of doing this. What other risks am I overlooking or not paying enough attention to, and what additional mitigations I should be doing?
Being wrong or overconfident, distracting people or harming the world with bad ideas.
Think twice about my ideas/arguments. Look for counterarguments/risks/downsides. Try to maintain appropriate uncertainties and convey them in my writings.
The idea isn’t bad, but some people take it too seriously or too far.
Convey my uncertainties. Monitor subsequent discussions and try to argue against people taking my ideas too seriously or too far.
Causing differential intellectual progress in an undesirable direction, e.g., speeding up AI capabilities relative to AI safety, spreading ideas that are more useful for doing harm than doing good.
Check ideas/topics for this risk. Self-censor ideas or switch research topics if the risk seems high.
Being first to talk about some idea, but not developing/pursuing it as vigorously as someone else might if they were first, thereby causing a net delay in intellectual or social progress.
Not sure what to do about this one. So far not doing anything except to think about it.
PR/political risks, e.g., talking about something that damages my reputation or relationships, and in the worst case harms people/causes/ideas associated with me.
Keep this in mind and talk more diplomatically or self-censor when appropriate.
Managing risks while trying to do good
@Will Aldred I forgot to mention that I do have the same concern about “safety by eating marginal probability” on AI philosophical competence as on AI alignment, namely that progress on solving problems lower in the difficulty scale might fool people into having a false sense of security. Concretely, today AIs are so philosophically incompetent that nobody trusts them to do philosophy (or almost nobody), but if they seemingly got better, but didn’t really (or not enough relative to appearances), a lot more people might and it could be hard to convince them not to.
Thanks for the comment. I agree that what you describe is a hard part of the overall problem. I have a partial plan, which is to solve (probably using analytic methods) metaphilosophy for both analytic and non-analytic philosophy, and then use that knowledge to determine what to do next. I mean today the debate between the two philosophical traditions is pretty hopeless, since nobody even understands what people are really doing when they do analytic or non-analytic philosophy. Maybe the situation will improve automatically when metaphilosophy has been solved, or at least we’ll have a better knowledge base for deciding what to do next.
If we can’t solve metaphilosophy in time though (before AI takeoff), I’m not sure what the solution is. I guess AI developers use their taste in philosophy to determine how to filter the dataset, and everyone else hopes for the best?
Just talking more about this problem would be a start. It would attract more attention and potentially resources to the topic, and make people who are trying to solve it feel more appreciated and less lonely. I’m just constantly confused why I’m the only person who frequently talks about it in public, given how obvious and serious the problem seems to me. It was more understandable before ChatGPT put AI on everyone’s radar, but now it’s just totally baffling. And I appreciate you writing this comment. My posts on the topic usually get voted up, but with few supporting comments, making me unsure who actually agrees with me that this is an important problem to work on.
If you’re a grant maker, you can decide to fund research in this area, and make some public statements to that effect.
If might be useful to think in terms of a “AI philosophical competence difficulty scale” similar to Sammy Martin’s AI alignment difficulty scale and “safety by eating marginal probability”. I tend to focus on the higher end of that scale, where we need to achieve a good explicit understanding of metaphilosophy, because I think solving that problem is the only way to reduce risk to a minimum, and it also fits my inclination for philosophical problems, but someone more oriented towards ML research could look for problems elsewhere on the difficulty scale, for example fine-tuning a LLM to do better philosophical reasoning, to see how far that can go. Another idea is to fine-tune a LLM for pure persuasion, and see if that can be used to create an AI that deemphasizes persuasion techniques that don’t constitute valid reasoning (by subtracting the differences in model weights somehow).
Some professional philosopher(s) may actually be starting a new org to do research in this area, so watch out for that news and check how you can contribute. Again providing funding will probably be an option.
Think about social aspects of the problem. What would it take for most people or politicians to take the AI philosophical competence problem seriously? Or AI lab leaders? What can be done if they never do?
Think about how to evaluate (purported) progress in the area. Are there clever ways to make benchmarks that can motivate people to work on the problem (and not be easily Goodharted against)?
Just to reemphasize, talk more about the problem, or prod your favorite philosopher or AI safety person to talk more about it. Again it’s totally baffling the degree to which nobody talks about this. I don’t think I’ve even once heard a professional philosopher publicly express a concern that AI might be relatively incompetent in philosophy, even as some opine freely on other aspects of AI. There are certainly obstacles for people to work on the problem like your reasons 1-3, but for now the bottleneck could just as well be in the lack of social proof that the problem is worth working on.
- Jan 21, 2024, 1:14 AM; 2 points) 's comment on AI doing philosophy = AI generating hands? by (LessWrong;
Why only “slightly” more control? It’s surprising to see you say this without giving any reasons or linking to some arguments, as this degree of alignment difficulty seems like a very unusual position that I’ve never seen anyone argue for before.