I’m currently researching forecasting and epistemics as part of the Quantified Uncertainty Research Institute.
Ozzie Gooen
I definitely sympathize, though I’d phrase things differently.
As I’ve noted before, I think much of the cause is just that the community incentives very much come from the funding. And right now, we only have a few funders, and those funders are much more focused on AI Safety specifics then they are things like rationality/epistemics/morality. I think these people are generally convinced on specific AI Safety topics and unconvinced by a lot of more exploratory / foundational work.
For example, this is fairly clear at OP. Their team focused on “EA” is formally called “GCR Capacity Building.” The obvious goal is to “get people into GCR jobs.”
You mention a frustration about 80k. But 80k is getting a huge amount of their funding from OP, so it makes sense to me that they’re doing the sorts of things that OP would like.
Personally, I’d like to see more donations come from community members, to be aimed at community things. I feel that the EA scene has really failed here, but I’m hopeful there could be changes.
I don’t mean to bash OP / SFF / others. I think they’re doing reasonable things given their worldviews, and overall I think they’re both very positive. I’m just pointing out that they represent about all the main funding we have, and that they just aren’t focused on the EA things some community members care about.
Right now, I think that EA is in a very weak position. There just aren’t that many people willing to put in time or money to push forward the key EA programs and mission, other than using it as a way to get somewhat narrow GCR goals.
Or, in your terms, I think that almost no one is actually funding the “Soul” of EA, including the proverbial EA community.- May 9, 2025, 10:03 AM; 13 points) 's comment on The Soul of EA is in Trouble by (
I’d be interested to hear you disagreements with the Marxist-leaning influences. Could you give a few examples?
This is a very long + complex topic.
To me, much of it is a deeper issue. I see EA as coming from academic movements such as the Enlightenment, Empiricism, Analytic Philosophy, Humanism. While I see Marxist-leaning clusters as having influences more like Romanticism, Continental Philosophy, Postmodernism, etc. These are two clusters that have had a few-hundred-year argument/disagreement with each other. I’m sure you can find more with more searches and LLM prompts.
This is the first post in a series on uniting these two movements. We are stronger together, and I hope to demonstrate that each movement contains immense power to help the other. I see myself as a radical feminist and an Effective Altruist and I view those identities as symbiotic rather than contradictory.
Quickly—I think that all smart and truth-seeking people have a lot to learn from each other. My quick impression is that the radical feminist academic community has a mix of good and bad work, as is true with similar movements. I personally admire some of the combination of radicalism and scholarship, but I have disagreements with a lot of the Marxist-leaning influences that seem prevalent.
At the same time, it’s not clear to me what it would even mean to “unite” exactly. I imagine both communities would feel anxious about some of this.
I was reading a blog about EA by the Guerilla Foundation, which contained the quote:
[EA] provides wealth owners with a saviour narrative and a ‘veil of impartiality’ that might hinder deeper scrutiny into the origins of philanthropic money, and stifle personal transformation and solidarity.
And how do EAs respond to this?
I can’t respond for “EAs in total” but I can respond for myself.
For this specific point, I find it a very vague and early hypothesis. A much more concrete and precise claim might be,
”Donors that give to EA causes do so at the expense of greater altruism. We should generally expect that in empirical settings, donors that think they have some sort of ‘veil of impartiality’ fail to do much investigation, and thus wind up donating to worse causes.”
This sounds interesting to me, but it seems like an empirical question, and I’d really want some data or something before making big decisions with it. I could easily see the opposite being true, like,
”Donors who give to causes they think are highly effective will think of themselves as people who care about effectiveness, and then would be more likely to do research and prioritization in the future.”
Basically, this seems to me a lot like a just-so story at this stage.
“starting new public ambitious projects is much less fun if there are a bunch of people on a forum who are out to get you”
To be clear, I assume that the phrase “are out to get you” is just you referring to people giving regular EA Forum critique?
The phrase sounds to me like this is an intentional, long-term effort from some actors to take one down, and they just so happen to use critique as a way of doing that.
As I’m sure many would imagine, I think I disagree.
There are almost no examples of criticism clearly mattering (e.g. getting someone to significantly improve their project)
There’s a lot here I take issue with:
1. I’m not sure where the line is between “criticism” and “critique” or “feedback.” Would any judgements about a project that aren’t positive be considered “criticism”? We don’t have specific examples, so I don’t know what you refer to.
2. This jumps from “criticism matters” to “criticism clearly matters” (which is more easily defensible, but less important), to “criticism clearly mattering (e.g. getting someone to significantly improve their project)”, which is one of several ways that criticism could matter, clearly or otherwise. The latter seems like an incredibly specific claim that misses much of the discussion/benefits of criticism/critique/feedback.I’d rate this post decently high on the “provocative to clarity” measure, as in it’s fairly provocative while also being short. This isn’t something I take issue with, but I just wouldn’t spend too much attention/effort on it, given this. But I would be a bit curious what a much longer and detailed version of this post would be like.
AGI by 2028 is more likely than not
Sorry—my post is coming with the worldview/expectations that at some point, AI+software will be a major thing. I was flagging that in that view, software should become much better.
The question of “will AI+software” be important soon is a background assumption, but a distinct topic. If you are very skeptical, then my post wouldn’t be relevant to you.Some quick points on that topic, however:
1. I think there’s a decent coalition of researchers and programmers who do believe that AI+software will be a major deal very soon (if not already). Companies are investing substantially into it (i.e. Anthropic, OpenAI, Microsoft, etc).
2. I’ve found AI programming tools to be a major help, and so have many other programmers I’ve spoken to.
3. I see the current tools as very experimental and new, still. Very much as a proof of concept. I expect it to take a while to ramp up their abilities / scale. So the fact that the economic impact so far is limited doesn’t surprise me.
4. I’m not very set on extremely short timelines. But I think that 10-30 years would still be fairly soon, and it’s much more likely that big changes will happen on this time frame.
There’s a famous quote, “It’s easier to imagine the end of the world than the end of capitalism,” attributed to both Fredric Jameson and Slavoj Žižek.
I continue to be impressed by how little the public is able to imagine the creation of great software.
LLMs seem to be bringing down the costs of software. The immediate conclusion that some people jump to is “software engineers will be fired.”
I think the impacts on the labor market are very uncertain. But I expect that software getting overall better should be certain.
This means, “Imagine everything useful about software/web applications—then multiply that by 100x+.”
The economics of software companies today are heavily connected to the price of software. Primarily, software engineering is just incredibly expensive right now. Even the simplest of web applications with over 100k users could easily cost $1M-$10M/yr in development. And much of the market cap of companies like Meta and Microsoft is made up of their moat of expensive software.
There’s a long history of enthusiastic and optimistic programmers in Silicon Valley. I think that the last 5 years or so have seemed unusually cynical and hopeless for true believers in software (outside of AI).
But if software genuinely became 100x cheaper (and we didn’t quickly get to a TAI), I’d expect a Renaissance. A time for incredible change and experimentation. A wave of new VC funding and entrepreneurial enthusiasm.
The result would probably feature some pretty bad things (as is always true with software and capitalism), but I’d expect some great things as well.
LLMs seem more like low-level tools to me than direct human interfaces.
Current models suffer from hallucinations, sycophancy, and numerous errors, but can be extremely useful when integrated into systems with redundancy and verification.
We’re in a strange stage now where LLMs are powerful enough to be useful, but too expensive/slow to have rich scaffolding and redundancy. So we bring this error-prone low-level tool straight to the user, for the moment, while waiting for the technology to improve.
Using today’s LLM interfaces feels like writing SQL commands directly instead of using a polished web application. It’s functional if that’s all you have, but it’s probably temporary.
Imagine what might happen if/when LLMs are 1000x faster and cheaper.
Then, answering a question might involve:
Running ~100 parallel LLM calls with various models and prompts
Using aggregation layers to compare responses and resolve contradictions
Identifying subtasks and handling them with specialized LLM batches and other software
Big picture, I think researchers might focus less on making sure any one LLM call is great, and more that these broader setups can work effectively.
(I realize this has some similarities to Mixture of Experts)
I’ve spent some time in the last few months outlining a few epistemics/AI/EA projects I think could be useful.
Link here.I’m not sure how to best write about these on the EA Forum / LessWrong. They feel too technical and speculative to gain much visibility.
But I’m happy for people interested in the area to see them. Like with all things, I’m eager for feedback.
Here’s a brief summary of them, written by Claude.
---1. AI-Assisted Auditing
A system where AI agents audit humans or AI systems, particularly for organizations involved in AI development. This could provide transparency about data usage, ensure legal compliance, flag dangerous procedures, and detect corruption while maintaining necessary privacy.
2. Consistency Evaluations for Estimation AI Agents
A testing framework that evaluates AI forecasting systems by measuring several types of consistency rather than just accuracy, enabling better comparison and improvement of prediction models. It’s suggested to start with simple test sets and progress to adversarial testing methods that can identify subtle inconsistencies across domains.
3. AI for Epistemic Impact Estimation
An AI tool that quantifies the value of information based on how it improves beliefs for specific AIs. It’s suggested to begin with narrow domains and metrics, then expand to comprehensive tools that can guide research prioritization, value information contributions, and optimize information-seeking strategies.
4. Multi-AI-Critic Document Comments & Analysis
A system similar to “Google Docs comments” but with specialized AI agents that analyze documents for logical errors, provide enrichment, and offer suggestions. This could feature a repository of different optional open-source agents for specific tasks like spot-checking arguments, flagging logical errors, and providing information enrichment.
5. Rapid Prediction Games for RL
Specialized environments where AI agents trade or compete on predictions through market mechanisms, distinguishing between Information Producers and Consumers. The system aims to both evaluate AI capabilities and provide a framework for training better forecasting agents through rapid feedback cycles.
6. Analytics on Private AI Data
A project where government or researcher AI agents get access to private logs/data from AI companies to analyze questions like: How often did LLMs lie or misrepresent information? Did LLMs show bias toward encouraging user trust? Did LLMs employ questionable tactics for user retention? This addresses the limitation that researchers currently lack access to actual use logs.
7. Prediction Market Key Analytics Database
A comprehensive analytics system for prediction markets that tracks question value, difficulty, correlation with other questions, and forecaster performance metrics. This would help identify which questions are most valuable to specific stakeholders and how questions relate to real-world variables.
8. LLM Resolver Agents
A system for resolving forecasting questions using AI agents with built-in desiderata including: triggering experiments at specific future points, deterministic randomness methods, specified LLM usage, verifiability, auditability, proper caching/storing, and sensitivity analysis.
9. AI-Organized Information Hubs
A platform optimized for AI readers and writers where systems, experts, and organizations can contribute information that is scored and filtered for usefulness. Features would include privacy levels, payment proportional to information value, and integration of multiple file types.
“AIs doing Forecasting”[1] has become a major part of the EA/AI/Epistemics discussion recently.
I think a logical extension of this is to expand the focus from forecasting to evaluation.
Forecasting typically asks questions like, “What will the GDP of the US be in 2026?”
Evaluation tackles partially-speculative assessments, such as:
“How much economic benefit did project X create?”
“How useful is blog post X?”
I’d hope that “evaluation” could function as “forecasting with extra steps.” The forecasting discipline excels at finding the best epistemic procedures for uncovering truth[2]. We want to maintain these procedures while applying them to more speculative questions.
Evaluation brings several additional considerations:
We need to identify which evaluations to run from a vast space of useful and practical options.
Evaluations often disrupt the social order, requiring skillful management.
Determining the best ways to “resolve” evaluations presents greater challenges than resolving forecast questions.
I’ve been interested in this area for 5+ years but struggled to draw attention to it—partly because it seems abstract, and partly because much of the necessary technology wasn’t quite ready.
We’re now at an exciting point where creating LLM apps for both forecasting and evaluation is becoming incredibly affordable. This might be a good time to spotlight this area.
There’s a curious gap now where we can, in theory, envision a world with sophisticated AI evaluation infrastructure, yet discussion of this remains limited. Fortunately, researchers and enthusiasts can fill this gap, one sentence at a time.
[1] As opposed to [Forecasting About AI], which is also common here.
[2] Or at least, do as good a job as we can.
In ~2014, one major topic among effective altruists was “how to live for cheap.”
There wasn’t much funding, so it was understood that a major task for doing good work was finding a way to live with little money.
Money gradually increased, peaking with FTX in 2022.
Now I think it might be time to bring back some of the discussions about living cheaply.
Arguably, around FTX, it was better. EA and FTX both had strong brands for a while. And there were worlds in which the risk of failure was low.
I think it’s generally quite tough to get this aspect right though. I believe that traditionally, charities are reluctant to get their brands associated with large companies, due to the risks/downsides. We don’t often see partnerships between companies and charities (or say, highly-ideological groups) - I think that one reason why is that it’s rarely in the interests of both parties.Typically companies want to tie their brands to very top charities, if anyone. But now EA has a reputational challenge, so I’d expect that few companies/orgs want to touch “EA” as a thing.
Arguably influencers are a often a safer option—note that EA groups like GiveWell and 80k are already doing partnerships with influencers. As in, there’s a decent variety of smart YouTube channels and podcasts that hold advertisements for 80k/GiveWell. I feel pretty good about much of this.Arguably influencers are crafted in large part to be safe bets. As in, they’re very incentivized to not go crazy, and they have limited risks to worry about (given they represent very small operations).
I just had Claude do three attempts at what a version of the “Voice in the Room” chart would look like as an app, targeting AI Policy. The app is clearly broken, but I think it can act as an interesting experiment.
Here the influencing parties are laid out in consecutive rings. There are lines connecting connected organizations. There’s also a lot of other information here.
I agree.
I didn’t mean to suggest your post suggested otherwise—I was just focusing on another part of this topic.
I mainly agree.
I previously was addressing Michael’s more limited point, “I don’t think government competence is what’s holding us back from having good AI regulations, it’s government willingness.”
All that said, separately, I think that “increasing government competence” is often a good bet, as it just comes with a long list of benefits.
But if one believes that AI will happen soon, and that a major bottleneck is “getting the broad public to trust the US government more, with the purpose of then encouraging AI reform”, that seems like a dubious strategy.
(Potential research project, curious to get feedback)
I’ve been thinking a lot about how to do quantitative LLM evaluations of the value of various (mostly-EA) projects.We’d have LLMs give their best guesses at the value of various projects/outputs. These would be mediocre at first, but help us figure out how promising this area is, and where we might want to go with it.
The first idea that comes to mind is “Estimate the value in terms of [dollars, from a certain EA funder] as a [probability distribution]”. But this quickly becomes a mess. I think this couples a few key uncertainties into one value. This is probably too hard for early experiments.
A more elegant example would be “relative value functions”. This is theoretically nicer, but the infrastructure would be more expensive. It helps split up some of the key uncertainties, but would require a lot of technical investment.
One option that might be interesting is asking for a simple rank order. “Just order these projects in terms of the expected value.” We can definitely score rank orders, even though doing so is a bit inelegant.
So one experiment I’m imagining is:
We come up with a list of interesting EA outputs. Say, a combination of blog posts, research articles, interventions, etc. From this, we form a list of maybe 20 to 100 elements. These become public.
We then ask people to compete to rank these. A submission would be [an ordering of all the elements] and an optional [document defending their ordering].
We feed all of the entries in (2) into an LLM evaluation system. This would come with a lengthy predefined prompt. It would take in all of the provided orderings and all the provided defenses. It then outputs its own ordering.
We then score all of the entries in (2), based on how well they match the result of (3).
The winner gets a cash prize. Ideally, all submissions would become public.
This is similar to this previous competition we did.
Questions:
1. “How would you choose which projects/items to analyze?”
One option could be to begin with a mix of well-regarded posts on the EA Forum. Maybe we keep things to a limited domain for now (just X-risk), but have cover a spectrum of different amounts of karma.2. “Wouldn’t the LLM do a poor job? Why not humans?”
Having human judges at the end of this would add a lot of cost. It could easily make the project 2x as expensive. Also, I think it’s good for us to learn how to use LLMs for evaluating these competitions, as it has more long-term potential.3. “The resulting lists would be poor quality”
I think the results would be interesting, for a few reasons. I’d expect the results to be better than what many individuals would come up with. I also think it’s really important we start somewhere. It’s very easy to delay things until we have something perfect- then for that to never happen.
Thanks for the responses!
SB-1047 was adequately competently written (AFAICT). If we get more regulations at a similar level of competence, that would be reasonable.
Agreed
Getting regulators on board with what people want seems to me to be the best path to getting regulations in place.
I don’t see it as either/or. I agree that pushing for regulations is a bigger priority than AI in government. Right now the former is getting dramatically more EA resources and I’d expect that to continue. But I think the latter are getting almost none, and that doesn’t seem right to me.
Suppose it turned out Microsoft Office was dangerous. Surely the fact that Office is so embedded in government procedures would make it less likely to get banned?
I worry we’re getting into a distant hypothetical. I’d equate this to, “Given the Government is using Microsoft Office, are they likely to try to make sure that future versions of Microsoft Office are better? Especially, in a reckless way?”
Naively I’d expect a government that uses Microsoft Office to be one with a better understanding of the upsides and downsides of Microsoft Office.
I’d expect that most AI systems the Government would use would be fairly harmless (in terms of the main risks we care about). Like, things a few years old (and thus tested a lot in industry), with less computing power than would be ideal, etc.
Related, I think that the US military has done good work to make high-reliability software, due to their need for it. (Though this is a complex discussion, as they obviously do a mix of things.)
Happy to see development and funding in this field.
I would flag the obvious issue that a very small proportion of wild animals live in cities, given that cities take up a small proportion of the world. But I do know that there have been investigation into rats, which do exist in great numbers in cities.
The website for this project shows a fox—but I presume that this was chosen because it’s a sympathetic animal—not because foxes in cities represent a great deal of suffering.
I understand that tradeoffs need to be made to work with different funding sources and circumstances. But I’m of course curious what the broader story is here.