Now I want to see how much I like honey-drenched fat
David Johnston
Longtermism and shorttermism can disagree on nuclear war to stop advanced AI
I have children, and I would precommit to enduring the pain without hesitation, but I don’t know what I would do in the middle of experiencing the pain. If pain is sufficiently intense, “I” am not in chatter any more, and whatever part of me is in charge, I don’t know very well how it would act
I have the complete opposite intuition: equal levels of pain are harder to endure for equal time if you have the option to make them stop. Obviously I don’t disagree that pain for a long time is worse than pain for a short time.
This intuition is driven by experiences like: the same level of exercise fatigue is a lot easier to endure if giving up would cause me to lose face. In general, exercise fatigue is more distracting than pain from injuries (my reference points being a broken finger and a cup of boiling water in my crotch—the latter being about as distractingly painful as a whole bunch of not especially notable bike races etc).
Thinking a bit more: the boiling water actually was more intense for a few seconds, but after that it was comparable to bike racing. But also, all I wanted to do was run around shouting obscenities and given that I was doing exactly that I don’t recall the sense of being in conflict with myself, which is one of the things I find hard to deal with about pain.
I don’t know that this scales to very intense pain. The only pain experience I’ve had notable enough to recall years later was e when I ran 70km without having done very much running to train for it—it hurt a lot I don’t have any involuntary pain experiences that compare to it (running + lack of preparation was important here—I’ve done 400km bike rides with no especially notable pain). This was voluntary in the sense that I could have stopped and called someone to pick me up, but that would have disqualified my team.
One prediction I’d make is that holding my hand in an ice bucket with only myself for company would be much harder than doing it with other people where I’d be ashamed to be the first to pull it out. I don’t just mean I’d act differently—I mean I think I would actually experience substantially less psychological tension.
Conditional on AGI being developed by 2070, what is the probability that humanity will suffer an existential catastrophe due to loss of control over an AGI system?
Requesting a few clarifications:
I think of existential catastrophes as things like near-term extinction rather than things like “the future is substantially worse than it could have been”. Alternatively, I tend to think that existential catastrophe means a future that’s much worse than technological stagnation, rather than one that’s much worse than it would have been with more aligned AI. What do you think?
Are we considering “loss of control over an AGI system” as a loss of control over a somewhat monolithic thing with a well-defined control interface, or is losing control over an ecosystem of AGIs also of interest here?
I think journalists are often imprecise and I wouldn’t read too much into the particular synonym of “said” that was chosen.
How much can we learn from other people’s guesses?
Does it make more sense to think about all probability distributions that offers a probability of 50% for rain tomorrow? If we say this represents our epistemic state, then we’re saying something like “the probability of rain tomorrow is 50%, and we withhold judgement about rain on any other day”.
I think this question—whether it’s better to take 1/n probabilities (or maximum entropy distributions or whatever) or to adopt some “deep uncertainty” strategy—does not have an obvious answer
Perhaps I’m just unclear what it would even mean to be in a situation where you “can’t” put a probability estimate on things that does as good as or better than pure 1/n ignorance.
Suppose you think you might come up with new hypotheses in the future which will cause you to reevaluate how the existing evidence supports your current hypotheses. In this case probabilistically modelling the phenomenon doesn’t necessarily get you the right “value of further investigation” (because you’re not modelling hypothesis X), but you might still be well advised to hold off acting and investigate further—collecting more data might even be what leads to you thinking of the new hypothesis, leading to a “non Bayesian update”. That said, I think you could separately estimate the probability of a revision of this type.
Similarly, you might discover a new outcome that’s important that you’d previously neglected to include in your models.
One more thing: because probability is difficult to work with, even if it is in principle compatible with adaptive plans, it might in practice tend to steer away from them.
Fair enough, she mentioned Yudkowsky before making this claim and I had him in mind when evaluating it (incidentally, I wouldn’t mind picking a better name for the group of people who do a lot of advocacy about AI X-risk if you have any suggestions)
I skimmed from 37:00 to the end. It wasn’t anything groundbreaking. There was
one incorrect claim (“AI safteyists encourage work at AGI companies”), I think her apparent moral framework that puts disproportionate weight on negative impacts on marginalised groups is not good, and overall she comes across as someone who has just begun thinking about AGI x-risk and so seems a bit naive on some issues. However, “bad on purpose to make you click” is very unfair.
But also: she says that hyping AGI encourages races to build AGI. I think this is true! Large language models at today’s level of capability—or even somewhat higher than this—are clearly not a “winner takes all” game; it’s easy to switch to a different model that suits your needs better and I expect the most widely used systems to be the ones that work the best for what people want them to do. While it makes sense that companies will compete to bring better products to market faster, it would be unusual to call this activity an “arms race”. Talking about arms races makes more sense if you expect that AI systems of the future will offer advantages much more decisive than typical “first mover” advantages, and this expectation is driven by somewhat speculative AGI discourse.
She also questions whether AI safetyists should be trusted to improve the circumstances of everyone vs their own (perhaps idiosyncratic) priorities. I think this is also a legitimate concern! MIRI were at some point apparently aiming to 1) build an AGI and 2) use this AGI to stop anyone else building an AGI (Section A, point 6). If they were successful, that would put them in a position of extraordinary power. Are they well qualified to do that? I’m doubtful (though I don’t worry about it too much because I don’t think they’ll succeed)
I think it’s quite sensible that people hoping to have a positive impact in biosecurity should become well-informed first. However, I don’t think this necessarily means that radical positions that would ban a lot of research are necessarily wrong, even if they are more often supported by people with less detailed knowledge of the field. I’m not accusing you of saying this, I just want to separate the two issues.
Many professionals in this space are scared and stressed. Adding to that isn’t necessarily building trust and needed allies. The professionals in this space are good people – no reputable virologist is trying to do research that intentionally releases or contributes to a pandemic. Biosafety professionals spend their life working to prevent lab leaks. If I’m being honest, many professionals in and around the biosecurity field don’t think incredibly highly of recent (the past few years) journalistic efforts and calls for total research bans.
Many people calling for complete bans think that scientists are unreliable on this—because they want to continue to do their work, and may not be experts in risk—and the fact that said scientists do not like people doing this doesn’t establish that anyone calling for a complete ban is wrong to do so.
As a case in point regarding the unreliability of involved scientists: your reference number 6 repeatedly states that there is “no evidence for a laboratory origin of SARS-CoV-2”, while citing arguments around the location of initial cases and phylogeny of SARS-CoV-2 as evidence for a zoonotic emergence. However, a survey of BSL-3 facilities in China found that 53% of associated coronavirus-related Nature publications were produced by Wuhan-based labs between 2017 and 2019, and it is extremely implausible that Wuhan bears 50% of the risk for novel zoonotic virus emergence in all of China! (it’s possible that the authors of that survey erred—the do seem ideologically committed to the lab leak theory). Furthermore, I have to the best of my ability evaluated arguments about the presence of the furin cleavage site in the SARS-CoV-2 genome and my conclusion is that it is around 5 times as likely to be present in the lab origin scenario (accounting the fact that the WIV is an author on a proposal to insert such sites into SARS-like coronaviruses; also, I consider anywhere from 1.1 to 20 times as likely to be plausible). One can debate the relative strength of different pieces of evidence—and many have—but the claim that there is evidence on one side and none on the other is not plausible in my view, and I at least don’t trust anyone making such a claim is able to capably adjudicate questions about risks of certain kinds of pathogen research.
(not that it’s especially relevant, but I currently think the case for zoonosis is slightly stronger than the case for a lab leak, I just don’t think you can credibly claim that there’s no evidence that supports the lab leak theory)
A little bit of proof readingstating confidentiality
confidently
You don’t likely don’t know more than professionals
You likely don’t know
I do worry about it. Some additional worries I have are 1) if AI is transformative and confers strong first mover advantages, then a private company leading the AGI race could quickly become similarly powerful to a totalitarian government and 2) if the owners of AI depend far less on support from people for their power than today’s powerful organisations, they might be generally less benevolent than today’s powerful organisations
I think they do? Nate at least says he’s optimistic about finding a solution given more time
I’m not sold on how well calibrated their predictions of catastrophe are, but I think they have contributed a large number of novel & important ideas to the field.
The main point I took from video was that Abigail is kinda asking the question: “How can a movement that wants to change the world be so apolitical?” This is also a criticism I have of many EA structures and people.
I think it’s surprising that EA is so apolitical, but I’m not convinced it’s wrong to make some effort to avoid issues that are politically hot. Three reasons to avoid such things: 1) they’re often not the areas where the most impact can be had, even ignoring constraints imposed by them being hot political topics 2) being hot political topics makes it even harder to make significant progress on these issues and 3) if EAs routinely took strong stands on such things, I’m confident it would lead to significant fragmentation of the community.
EA does take some political stances, although they’re often not on standard hot topics: they’re strongly in favour of animal rights and animal welfare, and were involved in lobbying for a very substantial piece of legislation recently introduced in Europe. Also, a reasonable number of EAs are becoming substantially more “political” on the question of how quickly the frontier of AI capabilities should be advanced.
Is the reason you don’t go back and forth about whether ELK will work in the narrow sense Paul is aiming for a) you’re seeking areas of disagreement, and you both agree it is difficult or b) you both agree it is likely to work in that sense?
My intuition for why “actions that have effects in the real world” might promote deception is that maybe the “no causation without manipulation” idea is roughly correct. In this case, a self-supervised learner won’t develop the right kind of model of its training process, but the fine-tuned learner might.
I think “no causation without manipulation” must be substantially wrong. If it was entirely correct, I think one would have to say that pretraining ought not to help achieve high performance on a standard RLHF objective, which is obviously false. It still seems plausible to me that a) the self-supervised learner learns a lot about the world it’s predicting, including a lot of “causal” stuff and b) there are still some gaps in its model regarding its own role in this world, which can be filled in with the right kind of fine-tuning.
Maybe this falls apart if I try to make it all more precise—these are initial thoughts, not the outcomes of trying to build a clear theory of the situation.
When I read your scripts and Rob is interviewing, I like to read Rob’s questions at twice the speed of the interviewees’ responses. Can you accommodate that with your audio version?