I think my impression is that the strategic upshots of this are directionally correct, but maybe not a huge deal? I’m not sure if you agree with that.
Owen Cotton-Barratt
Sorry, I didn’t mean mislabelled in terms of having the labels the wrong way around. I meant that the points you describe aren’t necessarily the ends of the spectrum—for instance, worse than just losing all alignment knowledge is losing all the alignment knowledge while keeping all of the knowledge about how to build highly effective AI.
At least that’s what I had in mind at the time of writing my comment. I’m now wondering if it would actually be better to keep the capabilities knowledge, because it makes it easier to do meaningful alignment work as you do the rerun. It’s plausible that this is actually more important than the more explicitly “alignment” knowledge. (Assuming that compute will be the bottleneck.)
You’re discussing catastrophes that are big enough to set the world back by at least 100 years. But I’m wondering if a smaller threshold might be appropriate. Setting the world back by even 10 years could be enough to mean re-running a lot of the time of perils; and we might think that catastrophes of that magnitude are more likely. (This is my current view.)
With the smaller setbacks you probably have to get more granular in terms of asking “in precisely which ways is this setting us back?”, rather than just analysing it in the abstract. But that can just be faced.
Why do you think alignment gets solved before reasonably good global governance? It feels to me pretty up in the air which target we should be aiming to hit first. (Hitting either would help us with the other. I do think that we likely want to get important use out of AI systems before we establish good global governance; but that we might want to then do the governance thing to establish enough slack to take the potentially harder parts of alignment challenge slowly.)
On section 4, where you ask about retaining alignment knowledge:
It feels kind of like you’re mislabelling the ends of the spectrum?
My guess is that rather than think about “how much alignment knowledge is lost?”, you should be asking about the differential between how much AI knowledge is lost and how much alignment knowledge is lost
I’m not sure that’s quite right either, but it feels a little bit closer?
For much of the article, you talk about post-AGI catastrophe. But when you first introduce the idea in section 2.1, you say:
the period from now until we reach robust existential security (say, stable aligned superintelligence plus reasonably good global governance)
It seems to me like this is a much higher bar than reaching AGI—and one for which the arguments that we could still be exposed to subsequent catastrophes seem much weaker. Did you mean to just say AGI here?
Human Dignity: a review
Yeah roughly the thought is “assuming concentrated power, it matters what the key powerful actors will do” (the liberal democracy comment was an aside saying that I think we should be conditioning on concentrated power).
And then for making educated guesses about what the key powerful actors will do, it seems especially important to me what their attitudes will be at a meta-level: how they prefer to work out what to do, etc.
I might have thought that some of the most important factors would be things like:
How likely is leadership to pursue intelligence enhancement, given technological opportunity?
How likely is leadership to pursue wisdom enhancement, given technological opportunity?
(Roughly because: either power is broadly distributed, in which case your comments about liberal democracy don’t seem to have so much bite; or it’s not, in which case it’s really the values of leadership that matter.) But I’m not sure you really touch on these. Interested if you have thoughts.
Thanks AJ!
My impression is that although your essay frames this as a deep disagreement, in fact you’re reacting to something that we’re not saying. I basically agree with the heart of the content here—that there are serious failure modes to be scared of if attempting to orient to the long term, and that something like loop-preservation is (along with the various more prosaic welfare goods we discussed) essential for the health of even a strict longtermist society.
However, I think that what we wrote may have been compatible with the view that you have such a negative reaction to, and at minimum I wish that we’d spent some more words exploring this kind of dynamic. So I appreciate your response.
That makes sense!
(I’m curious how much you’ve invested in giving them detailed prompts about what information to assess in applying particular tags, or even more structured workflows, vs just taking smart models and seeing if they can one-shot it; but I don’t really need to know any of this.)
If you want independent criteria-based judgements, it might realistically be a good option to have the judgements made by an LLM—with the benefit of having the classification instantly (as a bonus you could publish the prompt used, so the judgements would be easier for people to audit).
Ok thanks I think it’s fair to call me on this (I realise the question of what Thiel actually thinks is not super interesting to me, compared to “does this critique contain inspiration for things to be aware of that I wasn’t previously really tracking”; but get that most people probably aren’t orienting similarly, and I was kind of assuming that they were when I suggested this was why it was getting sympathy).
I do think though that there’s a more nuanced point here than “trying too hard to do good can result in harm”. It’s more like “over-claiming about how to do good can result in harm”. For a caricature to make the point cleanly: suppose EA really just promoted bednets, and basically told everyone that what it meant to be good was to give more money to bednets. I think it’s easy to see how this gaining a lot of memetic influence (bednet cults; big bednet, etc.) could end up being destructive (even if bednets are great).
I think that EA is at least conceivably vulnerable to more subtle versions of the same mistake. And that that is worth being vigilant against. (Note this is only really a mistake that comes up for ideas that are so self-recommending that they lead to something like strategic movement-building around the ideas.)
I think that the theology is largely a distraction from the reason this is attracting sympathy, which I’d guess to be more like:
If you have some ideas which are pretty good, or even very good, but they present as though they’re the answer needed for everything, and they’re not, that could be quite destructive (and potentially very-net-bad, even while the ideas were originally obviously-good)
This is at least a plausible failure mode for EA, and correspondingly worth some attention/wariness
This kind of concern hasn’t gotten much airtime before (and is perhaps easier to express and understand as a serious possibility with some of the language-that-I-interpret-metaphorically);
Embedded Altruism [slides]
is that you feel that moral statements are not as evidently subjective as say, ‘Vanilla ice-cream is the best flavor’ but not as objective as, say ‘An electron has a negative charge’, as living in some space of in-betweeness with respect to those two extremes
I think that’s roughly right. I think that they are unlikely to be more objective than “blue is a more natural concept than grue”, but that there’s a good chance that they’re about the same as that (and my gut take is that that’s pretty far towards the electron end of the spectrum; but perhaps I’m confused).
I’d say again, an electron doesn’t care for what a human or any other creature thinks about its electric charge.
Yeah, but I think that e.g. facts about economics are in some sense contingent on the thinking of people, but are not contingent on what particular people think, and I think that something similar could be true of morality.
I, on the contrary, don’t feel like there could be ‘moral experts’
The cleanest example I might give is that if I had a message from my near-future self saying “hey I’ve thought really hard about this issue and I really think X is right, sorry I don’t have time to unpack all of that”, I’d be pretty inclined to defer. I wonder if you feel differently?
I don’t think that moral philosophers in our society are necessarily hitting the bar I would like for “moral expert”. I also don’t think that people who are genuinely experts in morality would necessarily act according to moral values. (I’m not sure that these points are very important.)
See my response to Manuel—I don’t think this is “proving moral realism”, but I do think it would be pointing at something deeper and closer-to-objective than “happen to have the same opinions”.
I’m not sure what exactly “true” means here.
Here are some senses in which it would make morality feel “more objective” rather than “more subjective”:
I can have the experience of having a view, and then hearing an argument, and updating. My stance towards my previous view then feels more like “oh, I was mistaken” (like if I’d made a mathematical error) rather than “oh, my view changed” (like getting myself to like the taste of avocado when I didn’t used to).
There can exist “moral experts”, whom we would want to consult on matters of morality. Broadly, we should expect our future views to update towards those of smart careful thinkers who’ve engaged with the questions a lot.
It’s possible that the norms various civilizations converge on represent something like “the optimal(/efficient?/robust?) way for society to self-organize”
I don’t think this is exactly “independent of human or alien minds”, but it also very much doesn’t feel “purely subjective”
I don’t really believe there’s anything more deeply metaphysical than that going on with morality[1], but I do think that there’s a lot that’s important in the above bullets, and that moral realist positions often feel vibewise “more correct” than antirealist positions (in terms of what they imply for real-world actions), even though the antirealist positions feel technically “more correct”.
- ^
I guess: there’s also some possibility of getting more convergence for acausal reasons rather than just evolution towards efficiency. I do think this is real, but it mostly feels like a distraction here so I’ll ignore it.
Locally, I think that often there will be some cluster of less controversial common values like “caring about the flourishing of society” which can be used to derive something like locally-objective conclusions about moral questions (like whether X is wrong).
Globally, an operationalization of morality being objective might be something like “among civilizations of evolved beings in the multiverse, there’s a decently big attractor state of moral norms that a lot of the civilizations eventually converge on”.
There is a surprising amount of normative judgment in here for a fact check. Are you looking just for disagreements that people held roughly the beliefs you later outline (I think you overstate things but are directionally correct in describing how beliefs differed from yours), or also disagreements about whether they were bad beliefs?
For flavour: as I ask that question, I’m particularly (but not only) thinking of the reports you cite, where you seem to be casting them as “OP really throwing its weight behind these beliefs”, and I perceived them more as “earnest attempts by people at OP to figure out what was legit, and put their reasoning in public to let others engage”. I certainly didn’t just agree with them at the time, but I thought it was a good step forwards for collective epistemics to be able to have conversations at that level of granularity. Was it confounding that they were working at a big funder? Yeah, kinda—but that seemed second order compared to it just being great that anyone at all was pushing the conversation forwards in this way, even if there were a bunch of aspects of them I wasn’t on board with. I’m not sure if this is the kind of disagreement you’re looking for. (Maybe it’s just that I was on board with more of them than you were, and so I saw them as flawed-but-helpful rather than unhelpful? Then we get to the general question of what standards bad should be judged by given our lack of access to ground truth.)