I’m an AI Program Officer at Longview Philanthropy, though all views I express here are my own.
Before Longview I was a researcher at Open Philanthropy and a Charity Entrepreneurship incubatee.
I’m an AI Program Officer at Longview Philanthropy, though all views I express here are my own.
Before Longview I was a researcher at Open Philanthropy and a Charity Entrepreneurship incubatee.
Somewhat meta point on epistemic modesty, calling it out here because it is a pattern that has deeply frustrated me about EA/rationalism for as long as I have known them:
(making a quick take rather than commenting due to an app.operation_not_allowed error—I’m responding to @Linch’s quick take on war crimes)
I guess these are just EA/rationalist norms, but an approach that glosses major positions as being so quickly dismissible strikes me as insufficiently epistemically modest. I would expect such a treatment will fail to properly consider alternative answers or intuitions to the author’s own, especially the strongest versions of those answers (e.g. modern just war positions), won’t consider the most sophisticated counterpoints (e.g. your ‘oldest and clearest form’ gambit may just be bracketing out the counterexamples that don’t fit your definition, like genocide or sexual violence), and reinvent the wheel, e.g. the view seems to be exactly this from 2013:
“A final rationale for the perfidy prohibition is to preserve the possibility of a return to peace. To prevent the degradation of trust and the bad faith between warring parties that would impede negotiation of peace terms. An effective perfidy prohibition preserves the good faith upon which ceasefires, armistices and conclusions of hostilities rely.”
I think deep engagement with the range of serious views on the topic is required to make your post “the best modern articulation of these ancient ideas”. I don’t think the quick take seems on a good track for that.
I think Henry’s skeptical that the AI safety community made a counterfactual difference in getting interpretability started earlier or growing faster. Not questioning interpretability’s prospects for reducing x-risk.
I think there’s a good case for AI safety having a pretty good counterfactual effect on a bunch of productive areas, but obviously that’s depends on a lot of details and there’s plenty of room for debate.
I think a stronger line of critique could be that early-mid AI safety efforts/thinking made the frontier race start earlier, go faster, and be more intense (e.g. roles in getting key frontier leaders obsessed, introducing Deepmind cofounders, boosting OpenAI’s founding, etc). I haven’t interrogated that history to know where to come down, but it’s a plausible way that the whole of AI safety has been net-negative. (This claim doesn’t really detract from future impact of AI safety though, if the cat’s out of the bag)
I was excited by ForecastBench and FutureEval both projecting that LLMs would reach superforecaster parity by June 2027. But I didn’t realise access to human crowd forecasts might be driving a lot of performance. If it is, that is massively disappointing.
The top LLM performers in ForecastBench have access to the crowd forecast (and it’s not clear to me if FutureEval hides crowd forecasts—Metaculus did for the Quarterly Cup in 2025 but I couldn’t find info about FutureEval). Skimming the literature with Claude, it seems like most studies either deliberately provide crowd forecasts or don’t prevent searching for it, and those that hide it tend to have significantly worse results (still interesting, but less exciting).
To me, the potential wonders of LLM superforecasting is being able to get excellent guesses at any questions I might come up with. If I need to already have a human crowd or market forecast for the guess to be any good, then the kind of LLM superforecasting being projected is about 10% as useful to me. I still expect ‘true’ parity eventually, but it becomes a story of general timelines rather than empirical projection.
I don’t know the field well, and I’m probably misunderstanding something. I’m posting this to find out I’m wrong. If I’m right, then it’s worth dampening the expectations of anyone else who was imagining having an instant team of supers at their beck-and-call in ~14 months time.
“very obviously their direct experience with thinking and working with existing AIs would be worth > $1M pa if evaluated anonymously based on understanding SOTA AIs, and likely >$10s M pa if they worked on capabilities.”
“Y&S endorsing some effort as good would likely have something between billions $ to tens of billions $ value.”
fwiw both of these claims strike me as close to nonsense, so I don’t think this is a helpful reaction.
Edit: I didn’t see how old this post was! It came up on my feed somehow and I’d assumed it was recent.
Thanks for this. I’ve only skimmed the report, and don’t have expertise in the area. So the below is lightly held.
Section 4.2.3 talks about negative wellbeing effects. I think these are a serious downside risk, but other than noting that severe harms are indeed faced by guest workers, the report’s response is based on a single paper (Clemens 2018) and the idea that the intervention could improve things via surveys and ratings. Most of the benefit/harms considered in the rest of the report are financial (it appears on a skim).
I think the risk of facilitating severe harms against individuals (participating in the ‘repugnant transaction’) is very unsettling, and would be my main reason not to donate to such a charity. If I were a prospective donor I would want to see deeper exploration and red teaming of this worry.
I’d also note that this issue has characteristics that EA/AIM is likely to systematically underrate:
harms that are hard to quantify and compare against financial benefits, and
harms that may be wrong in a deontological sense to inflict, and not properly appreciated or respected by an all things considered cost-effectiveness analysis.
I thought this was great! Seems like a very accessible way to spread this info.
Two minor notes—when you perch you get a little stuck where you can’t move or unperch, I only got free by clicking socialize. And in the caged scenario, I shared the space with 2 other chickens, not 5-10 like the info says. Making the surroundings of the cage depicting other chickens would be more intense too, rather than the existing pattern.
If you were you or others were going to extend it, I’d imagine gamifying it might be interesting. Eg you gain points by performing the natural behaviors, and points allow you to unlock the more elaborate natural behaviors. And then having some mechanic where you have to choose the deleterious behaviors (attacking other chickens, pulling own feathers). Maybe some stress bar that increases as a function of the space you have. Performing natural behaviors brings the stress down, and this is manageable in the kinder scenarios. But in cages, the stress is increasing rapidly and the natural behaviors aren’t available, so you’re forced to do the deleterious ones.)
The main benefit of doing this gamification would be to increase the chance people get interested, or that some streamer gives it a go.
Nice work!
Thanks for this. Post-hoc theorizing:
‘doing good better’ calls to mind comparisons to the reader’s typical ideas about doing good. It implicitly criticizes those examples, which is a negative experience for the reader and could cause defensiveness.
‘Do the most good’ makes the reader attempt to imagine what that could be, which is a positive and interesting question, and doesn’t immediately challenge the reader’s typical ideas about doing good.
It wouldn’t have been obvious to me before the fact whether the above stuff wouldn’t be outweighed by worries about reactions to ‘the most good’ or what have you, so I appreciate you gathering empirical evidence here.
“Given leadership literature is rife with stories of rejected individuals going on to become great leaders”
The selection effect can be very misleading here — in that literature you usually don’t hear from all the individuals who were selected and failed, nor those who were rejected correctly and would have failed, and so on. Lots of advice from the start-up/business sector is super sus for this exact reason.
I would wait for METR’s actual evaluation — ’30 hours’ is just based on claims of continued effort, not actual successful performance on carefully measured tasks.
I think it’s possible to gain the efficiency of using LLM assistance without sacrificing style/tone — it just requires taste and more careful prompting/context, which seems worth it for a job ad. Maybe it works for their intended audience, but puts me off.
What can I read to understand the current and near-term state of drone warfare, especially (semi-)autonomous systems?
I’m looking for an overview of the developments in recent years, and what near-term systems are looking like. I’ve been enjoying Paul Scharre’s ‘Army of None’, but given it was published in 2018 it’s well behind the curve. Thanks!
I don’t know. My guess is that they give very slim odds to the Trump admin caring about carbon neutrality, and think that the benefit of including a mention in their submission to be close to zero (other than demonstrating resolve in their principles to others).
On the minus side, such a mention risks a reaction with significant cost to their AI safety/security asks. So overall, I can see them thinking that including a mention does not make sense for their strategy. I’m not endorsing that calculus, just conjecturing.
Object-level aside, I suspect they’re aware their audience is the hypersensitive-to-appearances Trump admin, and framing things accordingly. Even basic, common sense points regarding climate change could have a significant cost to the doc’s reception.
One might distinguish de jure openness (“We let everyone in!”) with de facto (“We attract X subgroups, and repel Y!”). The homogeneity and narrowness of the recent conference might suggest the former approach has not been successful at intellectual openness.
The point wasn’t to motivate intuitions on the broader issue, but demonstrate that exclusionary beliefs could be a coherent concept. I agree your version is better for motivating broader intuitions
Thanks. Given Alice has committed no crime, and everything else about her is ‘normal’, I think organizers would need to point to her belief to justify uninviting or banning her. That would suggest that an individual’s beliefs can (in at least one case) justify restricting their participation, on the basis of how that belief concerns other (prospective) attendees.
Yes I’m not saying anyone was—this is a thought experiment to see if exclusionary beliefs can be a coherent concept. We can stipulate that Alice has this sincere belief, but no history of such attacks (she’s never met a Bob), and hasn’t made any specific threats against Bob. It’s just a belief - a subjective attitude about the world. If Bob does not attend due to knowing about Alice’s belief, is that reasonable in your view?
I’m not trying to be unkind, and I apologise if I was. I’ll take this down if you ask here or via DM. I overreacted to what is a quick take because I think it was emblematic of a bad pattern—but that is unfair and disproportionate of me.
My main thing here is to push for better intermediate thinking. Like the standard EA/rat approach is so often based on dismissing mainstream or non-EA views, and then acting like their individual opinion is clearly superior, often reinventing current or past views that have had lots of non-EA examination. I want EA thinking to be better, and a lot of the time it would be improved by people reading more before opining, and not thinking the views of EA are so special.
We just have very different experiences then.
Do you mean critique someone on epistemic immodesty grounds? This is probably true but can you point me to the examples you have in mind? (I may indeed be doing this too much and seeing the examples would help)