I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
Ozzie Gooen
I had my Claude system do some brainstorming work on this.
https://www.longtermwiki.com/knowledge-base/models/intervention-models/anthropic-pledge-enforcement/
It generated some more specific interventions here.
I’ve been experimenting recently with a longtermist wiki, written fully with LLMs.
Some key decisions/properties:
1. Fully LLM-generated, heavily relying on Claude Code.
2. Somewhat opinionated. Tries to represent something of a median longtermist/EA longview, with a focus on the implications of AI. All pages are rated for “importance”.
3. Claude will estimates a lot of percentages and letter grades for things. If you see a percentage or grade, and there’s no citation, it might well be a guess by Claude.
4. An emphasis on numeric estimates, models, and diagrams. I had it generate many related models to different topics, some are better than others. Might later take the best ones and convert to Squiggle models or similar.
5. Still early & experimental. This is a bit in-between an official wiki and a personal project of interest now. I expect that things will become more stable over time. For now, expect pages to change locations, and terminology to be sometimes inconsistent, etc.
I overall think this space is pretty exciting right now, but it definitely brings challenges and requires cleverness.
https://www.longtermwiki.com/
https://www.longtermwiki.com/knowledge-base/responses/epistemic-tools/tools/longterm-wiki/
Recently I’ve been working on some pages about Anthropic and the OpenAI Foundation’s potentials for impact.
For example, see:
https://www.longtermwiki.com/knowledge-base/organizations/funders/anthropic-investors/
https://www.longtermwiki.com/knowledge-base/organizations/funders/openai-foundation/
There’s also a bunch of information on specific aspects of AI Safety, different EA organizations, and a lot more stuff.
It costs about $3-6 to add a basic page, maybe $10-$30 to do a nicer page. I could easily picture wanting even better later on. Happy to accept requests to add pages for certain organizations/projects/topics/etc that people here might be interested!
Also looking for other kinds of feedback!
I should also flag that one way to use it is through another LLM. Like, ask your local language model to help go through the wiki content for you and summarize the parts of interest.
I plan to write a larger announcement of this on the Forum later.
Quickly:
1. I think there’s probably good work to be done here!
2. I think the link you meant to include was https://www.longtermwiki.com/knowledge-base/organizations/funders/giving-pledge/
3. To be clear, I’m not directly writing this wiki. I’m using Claude Code with a bunch of scripts and stuff to put it together. So I definitely recommend being a bit paranoid when it comes to specifics!
That said, I think normally it does a decent job (and I’m looking to improve it!). On the 36%, that seems to have come from this article, which has a bit more, which basically reaffirms the point.
https://ips-dc.org/report-giving-pledge-at-15/
Also, to give Anthropic credit, I want to flag that a bunch of the employee donations are legally binding. Anthropic had a matching program which led to a good amount of money in Donor Advised Funds. https://www.longtermwiki.com/knowledge-base/organizations/funders/anthropic-investors/ (Note that this is also LLM-generated, so meant as a rough guess)
Deceased Pledger pledge fulfillmentWe calculate pledge fulfillment for deceased Pledgers as the amount of a Pledger’s charitable giving (either during their lifetime or through bequests from their estate) divided by the sum of their final net worth plus their charitable giving.
22 U.S. Pledgers have died, including 14 of the original 2010 signers. These 22 people were worth a combined $43.4 billion when they died.
Only one of the 22 deceased Pledgers — Chuck Feeney — gave his entire $8 billion fortune away before he died.
8 of the 22 deceased Pledgers fulfilled their pledges, giving away 50 percent or more of their wealth at death, either while they were living or in their estates.
The remaining 13 deceased Pledgers gave away less than 50 percent of their wealth, either while they were living or in their estates — although some of their estates are still being resolved.
A bit sad to find out that Open Philanthropy’s (now Coefficient Giving) GCR Cause Prioritization team is no more.
I heard it was removed/restructured mid-2025. Seems like most of the people were distributed to other parts of the org. I don’t think there were public announcements of this, though it is quite possible I missed something.
I imagine there must have been a bunch of other major changes around Coefficient that aren’t yet well understood externally. This caught me a bit off guard.
There don’t seem to be many active online artifacts about this team, but I found this hiring post from early 2024, and this previous AMA.
I’ve known and respected people on both sides of this, and have been frustrated by some of the back-and-forth on this.
On the side of the authors, I find these pieces interesting but very angsty. There’s clearly some bad blood here. It reminds me a lot of meat eaters who seem to attack vegans out of irritation more than deliberate logic. [1]
On the other, I’ve seen some attacks of this group on LessWrong that seemed over-the-top to me.
Sometimes grudges motivate authors to be incredibly productive, so maybe some of this can be useful.
It seems like others find these discussions useful form the votes, but as of now, I find it difficult to take much from them.
[1] I think there are many reasonable meat eaters out there, but there are also many who are angry/irrational about it.
Interesting analysis!
One hypothesis: animal advocacy is a frequent “second favorite” cause area. Many longtermists prefer animal work to global health, but when it comes to their own donations and career choices, they choose longtermism. This resembles voting dynamics where some candidates do well in ranked-choice but poorly in first-past-the-post.
Larks makes a good point—AI risk is also underfunded relative to survey preferences. The bigger anomaly is global health’s overallocation.
My very quick guess is that’s largely founder effects. I.E. GiveWell’s decade-long head start in building donor pipelines and mainstream legibility, while focusing on global health.
I find this pretty exciting. Would love to see FAR-UVC become more popular, and I think this seems like a smart move to help do that. Thanks for organizing and financing!
I’m not a marketing expert, but naively these headlines don’t look great to me.
“Veganuary champion quits to run meat-eating campaign”
″Former Veganuary champion quits to run meat-eating campaign—saying vegan dogma is ‘damaging’ to goal of reducing animal suffering”
I’d naively expect most readers to just read the headlines, and basically assume, “I guess there’s more reasons why meat is fine to eat.”
I tried asking Claude (note that it does have my own custom system prompt, which might bias it) if this campaign seemed like a good idea in the first place, and it was pretty skeptical. I’m curious if the FarmKind team did, and what their/your prompt was for this.
I appreciate this write-up, but overall feel pretty uncomfortable about this work. To me the issue was less about the team not properly discussing things with other stakeholders, than it was just the team doing a risky and seemingly poor intervention.
Quick things:
1. There are some neat actions happening, but often they are behind-the-scenes. Politics tends to be secretive.
2. The work I know about mostly falls into work focused on AI safety and bio safety. There’s some related work trying to limit authoritarianism in the US.
3. The funding landscape seems more challenging/complex than with other things.
I think I’d like to see more work on a wider scope of interventions to do good via politics. But I also appreciate that there are important limitations/challenges here now.
Good points!
>Would love to see something like this for charity ranking (if it isn’t already somewhere on the site).
I could definitely see this being done in the future.
>Don’t you need a philosophy axioms layer between outputs and outcomes?
I’m nervous that this can get overwhelming quickly. I like the idea of starting with things that are clearly decision-relevant to the certain audience the website has, then expanding from there. Am open to ideas on better / more scalable approaches!
>”governance” being a subcomponent when it’s arguably more important/ can control literally everything else at the top level seems wrong.
Thanks! I’ll keep in mind. I’d flag that this is an extremely high-level diagram, meant more to be broad and elegant than to flag which nodes are most important. Many critical things are “just subcomponents”. I’d like to make further diagrams on many of the different smaller nodes.
I made this simple high-level diagram of critical longtermist “root factors”, “ultimate scenarios”, and “ultimate outcomes”, focusing on the impact of AI during the TAI transition.
This involved some adjustments to standard longtermist language.
“Accident Risk” → “AI Takeover
”Misuse Risk” → “Human-Caused Catastrophe”
“Systemic Risk” → This is spit up into a few modules, focusing on “Long-term Lock-in”, which I assume is the main threat.
You can read interact with it here, where there are (AI-generated) descriptions and pages for things.
Curious to get any feedback!
I’d love it if there could eventually be one or a few well-accepted and high-quality assortments like this. Right now some of the common longtermist concepts seem fairly unorganized and messy to me.
---Reservations:
This is an early draft. There’s definitely parts I find inelegant. I’ve played with the final nodes instead being things like, “Pre-transition Catastrophe Risk” and “Post-Transition Expected Value”, for instance. I didn’t include a node for “Pre-transition value”; I think this can be added on, but would involve some complexity that didn’t seem worth it at this stage. The lines between nodes were mostly generated by Claude and could use more work.
This also heavily caters to the preferences and biases of the longtermist community, specifically some of the AI safety crowd.
Sure thing!
1. I plan to update it with new model releases. Some of this should be pretty easy—I plan to keep Sonnet up to date, and will keep an eye on other new models.
2. I plan to at least maintain it. This year I can expect to spend maybe 1/3rd the year on it or so. I’m looking forward to seeing what use and the response is like, and will gauge things accordingly. I think it can be pretty useful as a tool, even without a full-time-equivalent improving it. (That said, if anyone wants to help fund us, that would make this much easier!)
3. I’ve definitely thought about this, can prioritize. There’s a very high ceiling for how good background research can be for either a post, or for all claims/ideas in a post (much harder!). Simple version can be straightforward, though wouldn’t be much better than just asking Claude to do a straightforward search.
Opinion Fuzzing: A Proposal for Reducing & Exploring Variance in LLM Judgments Via Sampling
I’m looking now at the Fact Check. It did verify most of the claims it investigated on your post as correct, but not all (almost no posts get all, especially as the error rate is significant).
It seems like with chickens/shrimp it got a bit confused by numbers killed vs. numbers alive at any one time or something.
In the case of ICAWs, it looked like it did a short search via Perplexity, and didn’t find anything interesting. The official sources claim they don’t use aggressive tactics, but a smart agent would have realized it needed to search more. I think to get this one right would have involved a few more searches—meaning increased costs. There’s definitely some tinkering/improvements to do here.
Thanks! I wouldn’t take its takes too seriously, as it has limited context and seems to make a bunch of mistakes. It’s more a thing to use to help flag potential issues (at this stage), knowing there’s a false positive rate.
Thanks for the feedback!
I did a quick look at this. I largely agree there were some incorrect checks.
It seems like these specific issues were mostly from the Fallacy Check? That one is definitely too aggressive (in addition to having limited context), I’ll work on tuning it down. Note that you can choose which evaluators to run on each post, so going forward you might want to just skip that one at this point.
Sounds good, thanks! When you get a chance to try it, let me know if you have any feedback!
Announcing RoastMyPost: LLMs Eval Blog Posts and More
I can also vouch for this. I think that the bay area climate is pretty great. Personally I dislike weather above 80 degrees F, so the Bay is roughly in my ideal range.
I’ve lived in SF and Berkeley and haven’t found either to be particularly cloudy. I think that it really matters where you are in SF.
I don’t mean to sound too negative on this—I did just say “a bit sad” on that one specific point.
Do I think that CE is doing worse or better overall? It seems like Coefficient has been making a bunch of changes, and I don’t feel like I have a good handle on the details. They’ve also been expanding a fair bit. I’d naively assume that a huge amount of work is going on behind the scenes to hire and grow, and that this is putting CE in a better place on average.
I would expect this (the GCR prio team change) to be some evidence that specific ambitious approaches to GCR prioritization are more limited now. I think there are a bunch of large projects that could be done in this area that would probably take a team to do well, and right now it’s not clear who else could do such projects.
Bigger-picture, I personally think GCR prioritization/strategy is under-investigated, but I respect that others have different priorities.