I’m an AI Program Officer at Longview Philanthropy, though all views I express here are my own.
Before Longview I was a researcher at Open Philanthropy and a Charity Entrepreneurship incubatee.
I’m an AI Program Officer at Longview Philanthropy, though all views I express here are my own.
Before Longview I was a researcher at Open Philanthropy and a Charity Entrepreneurship incubatee.
I think there’s a good case for AI safety having a pretty good counterfactual effect on a bunch of productive areas, but obviously that’s depends on a lot of details and there’s plenty of room for debate.
I think a stronger line of critique could be that early-mid AI safety efforts/thinking made the frontier race start earlier, go faster, and be more intense (e.g. roles in getting key frontier leaders obsessed, introducing Deepmind cofounders, boosting OpenAI’s founding, etc). I haven’t interrogated that history to know where to come down, but it’s a plausible way that the whole of AI safety has been net-negative. (This claim doesn’t really detract from future impact of AI safety though, if the cat’s out of the bag)
I was excited by ForecastBench and FutureEval both projecting that LLMs would reach superforecaster parity by June 2027. But I didn’t realise access to human crowd forecasts might be driving a lot of performance. If it is, that is massively disappointing.
The top LLM performers in ForecastBench have access to the crowd forecast (and it’s not clear to me if FutureEval hides crowd forecasts—Metaculus did for the Quarterly Cup in 2025 but I couldn’t find info about FutureEval). Skimming the literature with Claude, it seems like most studies either deliberately provide crowd forecasts or don’t prevent searching for it, and those that hide it tend to have significantly worse results (still interesting, but less exciting).
To me, the potential wonders of LLM superforecasting is being able to get excellent guesses at any questions I might come up with. If I need to already have a human crowd or market forecast for the guess to be any good, then the kind of LLM superforecasting being projected is about 10% as useful to me. I still expect ‘true’ parity eventually, but it becomes a story of general timelines rather than empirical projection.
I don’t know the field well, and I’m probably misunderstanding something. I’m posting this to find out I’m wrong. If I’m right, then it’s worth dampening the expectations of anyone else who was imagining having an instant team of supers at their beck-and-call in ~14 months time.
“very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.”
“Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.”
fwiw both of these claims strike me as close to nonsense, so I don’t think this is a helpful reaction.
Edit: I didn’t see how old this post was! It came up on my feed somehow and I’d assumed it was recent.
Thanks for this. I’ve only skimmed the report, and don’t have expertise in the area. So the below is lightly held.
Section 4.2.3 talks about negative wellbeing effects. I think these are a serious downside risk, but other than noting that severe harms are indeed faced by guest workers, the report’s response is based on a single paper (Clemens 2018) and the idea that the intervention could improve things via surveys and ratings. Most of the benefit/harms considered in the rest of the report are financial (it appears on a skim).
I think the risk of facilitating severe harms against individuals (participating in the ‘repugnant transaction’) is very unsettling, and would be my main reason not to donate to such a charity. If I were a prospective donor I would want to see deeper exploration and red teaming of this worry.
I’d also note that this issue has characteristics that EA/AIM is likely to systematically underrate:
harms that are hard to quantify and compare against financial benefits, and
harms that may be wrong in a deontological sense to inflict, and not properly appreciated or respected by an all things considered cost-effectiveness analysis.
I thought this was great! Seems like a very accessible way to spread this info.
Two minor notes—when you perch you get a little stuck where you can’t move or unperch, I only got free by clicking socialize. And in the caged scenario, I shared the space with 2 other chickens, not 5-10 like the info says. Making the surroundings of the cage depicting other chickens would be more intense too, rather than the existing pattern.
If you were you or others were going to extend it, I’d imagine gamifying it might be interesting. Eg you gain points by performing the natural behaviors, and points allow you to unlock the more elaborate natural behaviors. And then having some mechanic where you have to choose the deleterious behaviors (attacking other chickens, pulling own feathers). Maybe some stress bar that increases as a function of the space you have. Performing natural behaviors brings the stress down, and this is manageable in the kinder scenarios. But in cages, the stress is increasing rapidly and the natural behaviors aren’t available, so you’re forced to do the deleterious ones.)
The main benefit of doing this gamification would be to increase the chance people get interested, or that some streamer gives it a go.
Nice work!
Thanks for this. Post-hoc theorizing:
‘doing good better’ calls to mind comparisons to the reader’s typical ideas about doing good. It implicitly criticizes those examples, which is a negative experience for the reader and could cause defensiveness.
‘Do the most good’ makes the reader attempt to imagine what that could be, which is a positive and interesting question, and doesn’t immediately challenge the reader’s typical ideas about doing good.
It wouldn’t have been obvious to me before the fact whether the above stuff wouldn’t be outweighed by worries about reactions to ‘the most good’ or what have you, so I appreciate you gathering empirical evidence here.
“Given leadership literature is rife with stories of rejected individuals going on to become great leaders”
The selection effect can be very misleading here — in that literature you usually don’t hear from all the individuals who were selected and failed, nor those who were rejected correctly and would have failed, and so on. Lots of advice from the start-up/business sector is super sus for this exact reason.
I would wait for METR’s actual evaluation — ’30 hours’ is just based on claims of continued effort, not actual successful performance on carefully measured tasks.
I think it’s possible to gain the efficiency of using LLM assistance without sacrificing style/tone — it just requires taste and more careful prompting/context, which seems worth it for a job ad. Maybe it works for their intended audience, but puts me off.
What can I read to understand the current and near-term state of drone warfare, especially (semi-)autonomous systems?
I’m looking for an overview of the developments in recent years, and what near-term systems are looking like. I’ve been enjoying Paul Scharre’s ‘Army of None’, but given it was published in 2018 it’s well behind the curve. Thanks!
I don’t know. My guess is that they give very slim odds to the Trump admin caring about carbon neutrality, and think that the benefit of including a mention in their submission to be close to zero (other than demonstrating resolve in their principles to others).
On the minus side, such a mention risks a reaction with significant cost to their AI safety/security asks. So overall, I can see them thinking that including a mention does not make sense for their strategy. I’m not endorsing that calculus, just conjecturing.
Object-level aside, I suspect they’re aware their audience is the hypersensitive-to-appearances Trump admin, and framing things accordingly. Even basic, common sense points regarding climate change could have a significant cost to the doc’s reception.
One might distinguish de jure openness (“We let everyone in!”) with de facto (“We attract X subgroups, and repel Y!”). The homogeneity and narrowness of the recent conference might suggest the former approach has not been successful at intellectual openness.
The point wasn’t to motivate intuitions on the broader issue, but demonstrate that exclusionary beliefs could be a coherent concept. I agree your version is better for motivating broader intuitions
Thanks. Given Alice has committed no crime, and everything else about her is ‘normal’, I think organizers would need to point to her belief to justify uninviting or banning her. That would suggest that an individual’s beliefs can (in at least one case) justify restricting their participation, on the basis of how that belief concerns other (prospective) attendees.
Yes I’m not saying anyone was—this is a thought experiment to see if exclusionary beliefs can be a coherent concept. We can stipulate that Alice has this sincere belief, but no history of such attacks (she’s never met a Bob), and hasn’t made any specific threats against Bob. It’s just a belief - a subjective attitude about the world. If Bob does not attend due to knowing about Alice’s belief, is that reasonable in your view?
As a light thought experiment, what if Alice’s belief X was “People called Bob are secret evil aliens who I should always try to physically attack and maim if I get the opportunity” ?
Bob would understandably be put off by this belief, and have a pretty valid reason to not attend an event if he knew someone who believed it were present. Does it seem reasonable that Bob would ask that Alice (or people who hold the attack-secret-alien-Bobs belief) not be invited as speakers? Is that a heckler’s veto, and contrary to free expression and intellectual enquiry? Is Bob’s decision not to attend just a matter of his own feelings?
If answers to the above questions are ‘no’, it suggests it’s possible for a belief to be an ‘exclusionary belief’, on your terms.
That’s reasonable. My point is that it’s much less clear and open to contestation that Hanania’s article says the opposite of what the headline is, but given the example is ~retracted anyway my point is not important
I think Henry’s skeptical that the AI safety community made a counterfactual difference in getting interpretability started earlier or growing faster. Not questioning interpretability’s prospects for reducing x-risk.