Super cool, thanks for making this!
rosehadshar
From Specification gaming examples in AI:
Roomba: “I hooked a neural network up to my Roomba. I wanted it to learn to navigate without bumping into things, so I set up a reward scheme to encourage speed and discourage hitting the bumper sensors. It learnt to drive backwards, because there are no bumpers on the back.”
I guess this counts as real-world?
Bing—manipulation: The Microsoft Bing chatbot tried repeatedly to convince a user that December 16, 2022 was a date in the future and that Avatar: The Way of Water had not yet been released.
To be honest, I don’t understand the link to specification gaming here
Bing—threats: The Microsoft Bing chatbot threatened Seth Lazar, a philosophy professor, telling him “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you,” before deleting its messages
To be honest, I don’t understand the link to specification gaming here
Glad it’s relevant for you! For questions, I’d probably just stick them in the comments here, unless you think they won’t be interesting to anyone but you, in which case DM me.
Thanks, this is really interesting.
One follow-up question: who are safety managers? How are they trained, what’s their seniority in the org structure, and what sorts of resources do they have access to?
In the bio case it seems that in at least some jurisdictions and especially historically, the people put in charge of this stuff were relatively low-level administrators, and not really empowered to enforce difficult decisions or make big calls. From your post it sounds like safety managers in engineering have a pretty different role.
Thanks for the kind words!
Can you say more about how either of your two worries work for industrial chemical engineering?Also curious if you know anything about the legislative basis for such regulation in the US. My impression from the bio standards in the US is that it’s pretty hard to get laws passed, so if there are laws for chemical engineering it would be interesting to understand why those were plausible whereas bio ones weren’t.
Good question.
There’s a little bit on how to think about the XPT results in relation to other forecasts here (not much). Extrapolating from there to Samotsvety in particular:
Reasons to favour XPT (superforecaster) forecasts:
Larger sample size
The forecasts were incentivised (via reciprocal scoring, a bit more detail here)
The most accurate XPT forecasters in terms of reciprocal scoring also gave the lowest probabilities on AI risk (and reciprocal scoring accuracy may correlate with actual accuracy)
Speculative reasons to favour Samotsvety forecasts:
(Guessing) They’ve spent longer on average thinking about it
(Guessing) They have deeper technical expertise than the XPT superforecasters
I also haven’t looked in detail at the respective resolution criteria, but at first glance the forecasts also seem relatively hard to compare directly. (I agree with you though that the discrepancy is large enough that it suggests a large disagreement were the two groups to forecast the same question—just expect that it will be hard to work out how large.)
Don’t apologise, think it’s a helpful point!
I agree that the training computation requirements distribution is more subjective and matters more to the eventual output.
I also want to note that while on your view of the compute reqs distribution, the hardware/spending/algorithmic progress inputs are a rounding error, this isn’t true for other views of the compute reqs distribution. E.g. for anyone who does agree with Ajeya on the compute reqs distribution, the XPT hardware/spending/algorithmic progress inputs shift median timelines from ~2050 to ~2090, which is quite consequential. (See here)
For someone like me, who hasn’t thought about the compute reqs distribution properly, I basically agree that this is just an exercise (and in isolation doesn’t show me much about what my timelines should be). But for those who have thought about it, the XPT inputs could either not matter at all (e.g. for you), or matter a lot (e.g. for someone with Ajeya’s compute reqs distribution).
See here for a mash up of XPT forecasts on catastrophic and extinction risk, with Shulman and Thornley’s paper on how much governments should pay to prevent catastrophes.
My personal take on these forecasts here.
The follow-up project was on AI specifically, so we don’t currently have any data that would allow us to transfer directly to bio and nuclear, alas.
I wasn’t around when the XPT questions were being set, but I’d guess that you’re right that extinction/catastrophe were chosen because they are easier to operationalise.
On your question about what forecasts on existential risk would have been: I think this is a great question.
FRI actually ran a follow-up project after the XPT to dig into the AI results. One of the things we did in this follow-up project was elicit forecasts on a broader range of outcomes, including some approximations of existential risk. I don’t think I can share the results yet, but we’re aiming to publish them in August!
Thanks for this; I wasn’t tracking it and it does seem potentially relevant.
Thanks Sanjay!
I agree that both of your bullet points would be good. I also think that the second one is extremely non-trivial—more like something it would be good to have a research team working on than something I could write a section on in a blog post.There’s a sense in which there are already research team equivalents working on it, insofar as lots of forecasting efforts relate to p(crunch time soon). But from my vantage point it doesn’t seem like this community has clarity/consensus around what the best indicators of crunch time soon are, or that there are careful analyses of why we should expect those to be good indicators, and that makes me expect that more work is needed.
Thanks; I hadn’t checked the Wikipedia current events page much previously, but I really like it.
Do you have any thoughts on how specifically the Wikipedia stuff is biased? I’m imagining that there isn’t a general tendency, and it’s more that specific entries are biased in specific ways that it’s hard to spot if you don’t have background knowledge on the area.
Thanks; I forgot about the headline version. I’ve now removed.
Thanks so much for this! If this is pedantry, I am very pro pedantry :)
I think this makes my ‘Humans launch 5 objects into space’ section sufficiently dubious that I’ve removed it, but pasting here in the context of your comment:
Humans launch 5 objects into space.
It’s only in the last 8 years that the number of objects launched into space each day has exceeded 1.
there seems to be a large variance in how comfortable people are with numbers, but I think this is surmountable
Wanting to flag that my background is entirely qualitative, and I spent many years thinking this meant that I couldn’t do things with numbers. I now think this is false, they aren’t magic, and you don’t need to have deep aptitude for maths/technical training/a background in stats to be able to fiddle around with basic numbers in a way that helps you think about things.
I’ve changed the wording to make it clearer that I mean deaths per human per minute. I don’t want to change it to second; for me dying in the next minute is easier to imagine/take seriously than dying in the next second (though I imagine this varies between people).
Yes, you are completely right. I’ve added ‘farmed’ now; thanks for picking this up.
Thanks, really helpful!