LessWrong dev & admin as of July 5th, 2022.
RobertM
I think that works for many groups, and many subfields/related causes, but not for “effective altruism”.
To unpack this a bit, I think that “AI safety” or “animal welfare” movements could quite possibly get much bigger much more quickly than an “effective altruism” movement that is “commitment to using reason and evidence to do the most good we can”.
I agree! That’s why I’m surprised by the initial claim in the article, which seems to be saying that we’re more likely to be a smaller group if we become ideologically committed to certain object-level conclusions, and a larger group if we instead stay focused on having good epistemics and seeing where that takes us. It seems like the two should be flipped?
Importantly, switching to Signal has no comparable costs. The only cost I can think of is that the UX (User Experience) might be slightly better for Facebook Messenger than Signal.
Have you conducted a user survey to this effect? I personally find Signal’s UX to be substantially worse than Messenger’s (for the relevant use-cases), and strongly expect that most people who’ve used both would have similar feelings.
I also think this significantly overstates the potential risk reduction, since incautious users are those that are least likely to switch to Signal, so the gains are mostly limited to users who are already more careful by nature.
Yet 80k and EAG (through the job fairs) actively recruit to non-safety roles at OpenAI and DeepMind and there’s lots of EA’s who have worked at them, and if anything they’re looked upon more favorably for doing so.
I think this is no longer true (at least for the 80k jobs board), as of a couple months(?) ago. The OpenAI roles are all security/abuse focused, and the DeepMind roles are all alignment/security focused.
I can’t speak to what teams were recruiting at the most recent EAG but would be curious to hear from someone who has that info.
I don’t know if there’s any explicit documentation, but if you can read code you can check out the relevant logic here. (From digging into how
Posts.getParameters
works is that you can use any parameter that’s used as a term in the default Posts view, but I haven’t actually tried it and can’t guarantee that it works.) It is a bit gnarly though.Karma minimums are already supported, but karma maximums don’t seem to be—the default Posts view constructs a selector that assumes you’re looking for posts with karma greater than or equal to whatever’s passed in, when you’d want the reverse operation.
Your expected life hours lost become
Remaining life hours * P(nuke in your location | nuke in Ukraine)
, if Ukraine is hit and you choose to stay in your location afterwards. While the multiplier does depend onP(nuke in Ukraine)
,P(nuke in your location | nuke in Ukraine)
is still more important since your location is what determines whether it swings you over the decision threshold or not.
Yes, it does rely on that simplified assumption. I think I’m unlikely to get more than 1 additional bit of information via further warnings after a nuke in Ukraine (if that), so staying doesn’t seem worth the risk, but if you think you get legible warning signs >84% of the time (or whatever
1 - p(nuke in Ukraine)
is) then it seems worth waiting.
ETA: to clarify, my general position is that while I’m open to the possibility that there’ll be further signals which convey more bits of information about which world you’re in than the initial “nuke in Ukraine” signal, I expect those extra bits won’t do me much good because in most of those worlds events will move fast enough that I won’t be able to usefully respond. If you have a lot of weight on “escalation, if any, will be slow”, then your calculation will look different.
If an alignment-minded person is currently doing capabilities work under the assumption that they’d be replaced by an equally (or more) capable researcher less concerned about alignment, I think that’s badly mistaken. The number of people actually pushing the frontier forward is not all that large. Researchers at that level are not fungible; the differences between the first-best and second-best available candidates for roles like that are often quite large. The framing of an arms race is mistaken; the prize for “winning” is that you die sooner. Dying later is better. If you’re in a position like that I’d be happy to talk to you, or arrange for you to talk to another member of the Lightcone team.
I do not significantly credit the possibility that Google (or equivalent) will try to make life difficult for people who manage to successfully convince the marginal capabilities researcher to switch tracks, absent evidence. I agree that historical examples of vaguely similar things exist, but the ones I’m familiar with don’t seem analogous, and we do in fact have fairly strong evidence about the kinds of antics that various megacorps get up to, which seem to be strongly predicted by their internal culture.
I think it requires either a disagreement in definitions, or very pessimistic views about how tractable certain scientific problems will prove to be, to think that the “transformative” bit will take long enough to impact the discount rate by more than a few percent (total). But yes, it will be non-zero.
Let’s taboo the word “care”. I expect the average longtermist thinks that deaths from famines and floods are about as bad as the average non-longtermist EA. Problems do not become “less bad” simply because other problems exist.
Having different priorities, stemming from different beliefs about e.g. what things matter and how effectively we can address them, is orthogonal to relative evaluations of how bad any individual problem is.
I agree that the definitional point would be uninteresting, except that I think the commensense interpretation bundles a bunch of connotations which are wrong (and negative). In context, people receiving this message will have systematically incorrect beliefs about longtermism and those who use it as a framework for prioritization. This is plainly obvious if you e.g. go read pretty much any Twitter thread where people who are hearing about it for the first time (or were otherwise introduced to it in an adversarial context) are debating the subject.
It’s not clear to me that the variance of “being a technical researcher” is actually lower than “being a social coordinator”. Historically, quite a lot of capabilities advancements have come out of efforts that were initially intended to be alignment-focused.
Edited to add: I do think it’s probably harder to have a justified inside-view model of whether one’s efforts are directionally positive or negative when attempting to “buy time”, as opposed to “doing technical research”, if one actually makes a real effort in both cases.
There’s obviously substantial disagreement here, but the most recent salient example (and arguably the entire surrounding context of OpenAI as an institution).
“I don’t currently have much sympathy for someone who’s highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover … are under 10% or over 90%).”
I find this difficult to square with the fact that:
Absent highly specific victory conditions, the default (P = 1 - ε) outcome is takeover.
Of the three possibilities you list, interpretability seems like the only one that’s actually seen any traction, but:
there hasn’t actually been very much progress beyond toy problems
it’s not clear why we should expect it to generalize to future paradigms
we have no idea how to use any “interpretation” to actually get to a better endpoint
interpretability, by itself, is insufficient to avoid takeover, since you lose as soon as any other player in the game messes up even once
The other potential hopes you enumerate require people in the world to attempt to make a specific thing happen. For most of them, not only is practically nobody working on making any of those specific things happen, but many people are actively working in the opposite direction. In particular, with respect to the “Limited AI” hope, the leading AI labs are pushing quite hard on generality, rather than on narrow functionality. This has obviously paid off in terms of capability gains over “narrow” approaches. Being able to imagine a world where something else is happening does not tell us how to get to that world.
I can imagine having an “all things considered” estimation (integrating model uncertainty, other people’s predictions, etc) of under 90% for failure. But I don’t understand writing off the epistemic position of someone who has an “inside view” estimation of >90% failure, especially given the enormous variation of probability distributions that people have over timelines (which I agree are an important, though not overwhelming, factor when it comes to estimating chances of failure). Indeed, an “extreme” inside view estimation conditional on short timelines seems much less strange to me than a “moderate” one. (The only way a “moderate” estimation makes sense to me is if it’s downstream of predicting the odds of success for a specific research agenda, such as in John Wentworth’s The Plan − 2022 Update, and I’m not even sure one could justifiably give a specific research agenda 50% odds of success nearly a decade out as the person who came up with it, let alone anyone looking in from the outside.)
My guess is that he meant the sequences convey the kind of more foundational epistemology which helps people people derive better models on subjects like AI Alignment by themselves, though all of the sequences in The Machine in the Ghost and Mere Goodness have direct object-level relevance.
Excepting Ngo’s AGI safety from first principles, I don’t especially like most of those resources as introductions exactly because they offer readers very little opportunity to test or build on their beliefs. Also, I think most of them are substantially wrong. (Concrete Problems in AI Safety seems fine, but is also skipping a lot of steps. I haven’t read Unsolved Problems in ML Safety.)
If your headline claim is that someone has a “fairly poor track record of correctness”, then I think “using a representative set of examples” to make your case is the bare-minimum necessary for that to be taken seriously, not an isolated demand for rigor.
It’s possible I’ve flipped the sign on what you’re saying, but if I haven’t, I’m pretty sure most EAs are not moral realists, so I don’t know where you got the impression that it’s an underlying assumption of any serious EA efforts.
If I did flip the sign, then I don’t think it’s true that moral realism is “too unquestioned”. At this point it might be more fair to say that too much time & ink has been spilled on what’s frankly a pretty trivial question that only sees as much engagement as it does because people get caught up in arguing about definitions of words (and, of course, because some other people are deeply confused).
There’s definitely no censorship of the topic on LessWrong. Obviously I don’t know for sure why discussion is sparse, but my guess is that people mostly (and, in my opinion, correctly) don’t think it’s a particularly interesting or fruitful topic to discuss on LessWrong, or that the degree to which it’s an interesting subject is significantly outweighed by mindkilling effects.
Edit: with respect to the rest of the comment, I disagree that rationalists are especially interested in object-level discussion of the subjects, but probably are much more likely to disapprove of the idea that discussion of the subject should be verboten.I think the framing where Bostrom’s apology is a subject which has to be deliberately ignored is mistaken. Your prior for whether something sees active discussion on LessWrong is that it doesn’t, because most things don’t, unless there’s a specific reason you’d expect it to be of interest to the users there. I admit I haven’t seen a compelling argument for there being a teachable moment here, except the obvious “don’t do something like that in the first place”, and perhaps “have a few people read over your apology with a critical eye before posting it” (assuming that didn’t in fact happen). I’m sure you could find a way to tie those in to the practice of rationality, but it’s a bit of a stretch.
Yes, I agree that there’s a non-trivial divide in attitude. I don’t think the difference in discussion is surprising, at least based on a similar pattern observed with the response to FTX. From a quick search and look at the tag, there were on the order of 10 top-level posts on the subject on LW. There are 151 posts under the FTX collapse tag on the EA forum, and possibly more untagged.
I don’t think that the top-level comment is particularly responsive to the post, except insofar as it might have taken the title as a call to action (and then ignored the rest of the post). It’s also quite vague. But I agree that a ban seems like an unusually harsh response, absent additional context which supports the “provocation” interpretation.
I’m aware that this is not exactly the central thrust of the piece, but I’d be interested if you could expand on why we might expect the former to be a smaller group than the latter.
I agree that a “commitment to using reason and evidence to do the most good we can” is a much better target to aim for than “dedicated to a particular set of conclusions about the world”. However, my sense is that historically there have been many large and rapidly growing groups of people that fit the second description, and not very many of the first. I think this was true for mechanistic reasons related to how humans work rather than being accidents of history, and think that recent technological advances may even have exaggerated the effects.