While not a study per se, I found the Huberman Lab podcast episode ‘How Smartphones & Social Media Impact Mental Health’ very informative. (It’s two and a half hours long, mostly about children and teenagers, and references the study(ies) it draws from, IIRC.)
Will Aldred
For previous work, I point you to @NunoSempere’s ‘Shallow evaluations of longtermist organizations,’ if you haven’t seen it already. (While Nuño didn’t focus on AI safety orgs specifically, I thought the post was excellent, and I imagine that the evaluation methods/approaches used can be learned from and applied to AI safety orgs.)
I hope in the future there will be multiple GV-scale funders for AI GCR work, with different strengths, strategies, and comparative advantages
(Fwiw, the Metaculus crowd prediction on the question ‘Will there be another donor on the scale of 2020 Good Ventures in the Effective Altruist space in 2026?’ currently sits at 43%.)
Epistemic status: strong opinions, lightly held
I remember a time when an org was criticized, and a board member commented defending the org. But the board member was factually wrong about at least one claim, and the org then needed to walk back wrong information. It would have been clearer and less embarrassing for everyone if they’d all waited a day or two to get on the same page and write a response with the correct facts.
I guess it depends on the specifics of the situation, but, to me, the case described, of a board member making one or two incorrect claims (in a comment that presumably also had a bunch of accurate and helpful content) that they needed to walk back sounds… not that bad? Like, it seems only marginally worse than their comment being fully accurate the first time round, and far better than them never writing a comment at all. (I guess the exception to this is if the incorrect claims had legal ramifications that couldn’t be undone. But I don’t think that’s true of the case you refer to?)
A downside is that if an organization isn’t prioritizing back-and-forth with the community, of course there will be more mystery and more speculations that are inaccurate but go uncorrected. That’s frustrating, but it’s a standard way that many organizations operate, both in EA and in other spaces.
I don’t think the fact that this is a standard way for orgs to act in the wider world says much about whether this should be the way EA orgs act. In the wider world, an org’s purpose is to make money for its shareholders: the org has no ‘teammates’ outside of itself; no-one really expects the org to try hard to communicate what it is doing (outside of communicating well being tied to profit); no-one really expects the org to care about negative externalities. Moreover, withholding information can often give an org a competitive advantage over rivals.
Within the EA community, however, there is a shared sense that we are all on the same team (I hope): there is a reasonable expectation for cooperation; there is a reasonable expectation that orgs will take into account externalities on the community when deciding how to act. For example, if communicating some aspect of EA org X’s strategy would take half a day of staff time, I would hope that the relevant decision-maker at org X takes into account not only the cost and benefit to org X of whether or not to communicate, but also the cost/benefit to the wider community. If half a day of staff time helps others in the community better understand org X’s thinking,[1] such that, in expectation, more than half a day of (quality-adjusted) productive time is saved (through, e.g., community members making better decisions about what to work on), then I would hope that org X chooses to communicate.
When I see public comments about the inner workings of an organization by people who don’t work there, I often also hear other people who know more about the org privately say “That’s not true.” But they have other things to do with their workday than write a correction to a comment on the Forum or LessWrong, get it checked by their org’s communications staff, and then follow whatever discussion comes from it.
I would personally feel a lot better about a community where employees aren’t policed by their org on what they can and cannot say. (This point has been debated before—see saulius and Habryka vs. the Rethink Priorities leadership.) I think such policing leads to chilling effects that make everyone in the community less sane and less able to form accurate models of the world. Going back to your example, if there was no requirement on someone to get their EAF/LW comment checked by their org’s communications staff, then that would significantly lower the time/effort barrier to publishing such comments, and then the whole argument around such comments being too time-consuming to publish becomes much weaker.
All this to say: I think you’re directionally correct with your closing bullet points. I think it’s good to remind people of alternative hypotheses. However, I push back on the notion that we must just accept the current situation (in which at least one major EA org has very little back-and-forth with the community)[2]. I believe that with better norms, we wouldn’t have to put as much weight on bullets 2 and 3, and we’d all be stronger for it.
Open Phil has seemingly moved away from funding ‘frontier of weirdness’-type projects and cause areas; I therefore think a hole has opened up that EAIF is well-placed to fill. In particular, I think an FHI 2.0 of some sort (perhaps starting small and scaling up if it’s going well) could be hugely valuable, and that finding a leader for this new org could fit in with your ‘running specific application rounds to fund people to work on [particularly valuable projects].’
My sense is that an FHI 2.0 grant would align well with EAIF’s scope. Quoting from your announcement post for your new scope:
Examples of projects that I (Caleb) would be excited for this fund [EAIF] to support
A program that puts particularly thoughtful researchers who want to investigate speculative but potentially important considerations (like acausal trade and ethics of digital minds) in the same physical space and gives them stipends—ideally with mentorship and potentially an emphasis on collaboration.
…
Foundational research into big, if true, areas that aren’t currently receiving much attention (e.g. post-AGI governance, ECL, wild animal suffering, suffering of current AI systems).
Having said this, I imagine that you saw Habryka’s ‘FHI of the West’ proposal from six months ago. The fact that that has not already been funded, and that talk around it has died down, makes me wonder if you have already ruled out funding such a project. (If so, I’d be curious as to why, though of course no obligation on you to explain yourself.)
Thanks for clarifying!
Be useful for research on how to produce intent-aligned systems
Just checking: Do you believe this because you see the intent alignment problem as being in the class of “complex questions which ultimately have empirical answers, where it’s out of reach to test them empirically, but one may get better predictions from finding clear frameworks for thinking about them,” alongside, say, high energy physics?
For a variety of reasons it seems pretty unlikely to me that we manage to robustly solve alignment of superintelligent AIs while pointed in “wrong” directions; that sort of philosophical unsophistication is why I’m pessimistic on our odds of success.
This is an aside, but I’d be very interested to hear you expand on your reasons, if you have time. (I’m currently on a journey of trying to better understand how alignment relates to philosophical competence; see thread here.)
(Possibly worth clarifying up front: by “alignment,” do you mean “intent alignment,” as defined by Christiano, or do you mean something broader?)
Hmm, interesting.
I’m realizing now that I might be more confused about this topic than I thought I was, so to backtrack for just a minute: it sounds like you see weak philosophical competence as being part of intent alignment, is that correct? If so, are you using “intent alignment” in the same way as in the Christiano definition? My understanding was that intent alignment means “the AI is trying to do what present-me wants it to do.” To me, therefore, this business of the AI being able to recognize whether its actions would be approved by idealized-me (or just better-informed-me) falls outside the definition of intent alignment.
(Looking through that Christiano post again, I see a couple of statements that seem to support what I’ve just said,[1] but also one that arguably goes the other way.[2])
Now, addressing your most recent comment:
Okay, just to make sure that I’ve understood you, you are defining weak philosophical competence as “competence at reasoning about complex questions [in any domain] which ultimately have empirical answers, where it’s out of reach to test them empirically, but where one may get better predictions from finding clear frameworks for thinking about them,” right? Would you agree that the “important” part of weak philosophical competence is whether the system would do things an informed version of you, or humans at large, would ultimately regard as terrible (as opposed to how competent the system is at high energy physics, consciousness science, etc.)?
If a system is competent at reasoning about complex questions across a bunch of domains, then I think I’m on board with seeing that as evidence that the system is competent at the important part of weak philosophical competence, assuming that it’s already intent aligned.[3] However, I’m struggling to see why this would help with intent alignment itself, according to the Christiano definition. (If one includes weak philosophical competence within one’s definition of intent alignment—as I think you are doing(?)—then I can see why it helps. However, I think this would be a non-standard usage of “intent alignment.” I also don’t think that most folks working on AI alignment see weak philosophical competence as part of alignment. (My last point is based mostly on my experience talking to AI alignment researchers, but also on seeing leaders of the field write things like this.))
A couple of closing thoughts:
I already thought that strong philosophical competence was extremely neglected, but I now also think that weak philosophical competence is very neglected. It seems to me that if weak philosophical competence is not solved at the same time as intent alignment (in the Christiano sense),[4] then things could go badly, fast. (Perhaps this is why you want to include weak philosophical competence within the intent alignment problem?)
The important part of weak philosophical competence seems closely related to Wei Dai’s “human safety problems”.
(Of course, no obligation on you to spend your time replying to me, but I’d greatly appreciate it if you do!)
- ^
They could [...] be wrong [about; sic] what H wants at a particular moment in time.
They may not know everything about the world, and so fail to recognize that an action has a particular bad side effect.
They may not know everything about H’s preferences, and so fail to recognize that a particular side effect is bad.
…
I don’t have a strong view about whether “alignment” should refer to this problem or to something different. I do think that some term needs to refer to this problem, to separate it from other problems like “understanding what humans want,” “solving philosophy,” etc.
(“Understanding what humans want” sounds quite a lot like weak philosophical competence, as defined earlier in this thread, while “solving philosophy” sounds a lot like strong philosophical competence.)
- ^
An aligned AI would also be trying to do what H wants with respect to clarifying H’s preferences.
(It’s unclear whether this just refers to clarifying present-H’s preferences, or if it extends to making present-H’s preferences be closer to idealized-H’s.)
- ^
If the system is not intent aligned, then I think this would still be evidence that the system understands what an informed version of me would ultimately regard as terrible vs. not terrible. But, in this case, I don’t think the system will use what it understands to try to do the non-terrible things.
- ^
Insofar as a solved vs. not solved framing even makes sense. Karnofsky (2022; fn. 4) argues against this framing.
Thanks for expanding! This is the first time I’ve seen this strong vs. weak distinction used—seems like a useful ontology.[1]
Minor: When I read your definition of weak philosophical competence,[2] high energy physics and consciousness science came to mind as fields that fit the definition (given present technology levels). However, this seems outside the spirit of “weak philosophical competence”: an AI that’s superhuman in the aforementioned fields could still fail big time with respect to “would this AI do something an informed version of me / those humans would ultimately regard as terrible?” Nonetheless, I’ve not been able to think up a better ontology myself (in my 5 mins of trying), and I don’t expect this definitional matter will cause problems in practice.
- ^
For the benefit of any readers: Strong philosophical competence is importantly different to weak philosophical competence, as defined.
The latter feeds into intent alignment, while the former is an additional problem beyond intent alignment.[Edit: I now think this is not so clear-cut. See the ensuing thread for more.] - ^
“Let weak philosophical competence mean competence at reasoning about complex questions which ultimately have empirical answers, where it’s out of reach to test them empirically, but one may get better predictions from finding clear frameworks for thinking about them.”
- ^
By the time systems approach strong superintelligence, they are likely to have philosophical competence in some sense.
It’s interesting to me that you think this; I’d be very keen to hear your reasoning (or for you to point me to any existing writings that fit your view).
For what it’s worth, I’m at maybe 30 or 40% that superintelligence will be philosophically competent by default (i.e., without its developers trying hard to differentially imbue it with this competence), conditional on successful intent alignment, where I’m roughly defining “philosophically competent” as “wouldn’t cause existential catastrophe through philosophical incompetence.” I believe this mostly because I find @Wei Dai’s writings compelling, and partly because of some thinking I’ve done myself on the matter. OpenAI’s o1 announcement post, for example, indicates that o1—the current #1 LLM, by most measures—performs far better in domains that have clear right/wrong answers (e.g., calculus and chemistry) than in domains where this is not the case (e.g., free-response writing[1]).[2] Philosophy, being interminable debate, is perhaps the ultimate “no clear right/wrong answers” domain (to non-realists, at least): for this reason, plus a few others (which are largely covered in Dai’s writings), I’m struggling to see why AIs wouldn’t be differentially bad at philosophy in the lead-up to superintelligence.
Also, for what it’s worth, the current community prediction on the Metaculus question “Five years after AGI, will AI philosophical competence be solved?” is down at 27%.[3] (Although, given how out of distribution this question is with respect to most Metaculus questions, the community prediction here should be taken with a lump of salt.)
(It’s possible that your “in some sense” qualifier is what’s driving our apparent disagreement, and that we don’t actually disagree by much.)
- ^
- ^
On this, AI Explained (8:01–8:34) says:
And there is another hurdle that would follow, if you agree with this analysis [of why o1’s capabilities are what they are, across the board]: It’s not just a lack of training data. What about domains that have plenty of training data, but no clearly correct or incorrect answers? Then you would have no way of sifting through all of those chains of thought, and fine-tuning on the correct ones. Compared to the original GPT-4o in domains with correct and incorrect answers, you can see the performance boost. With harder-to-distinguish correct or incorrect answers: much less of a boost [in performance]. In fact, a regress in personal writing.
- ^
Note: Metaculus forecasters—for the most part—think that superintelligence will come within five years of AGI. (See here my previous commentary on this, which goes into more detail.)
Imagine if we could say, “When Metaculus predicts something with 80% certainty, it happens between X and Y% of the time,” or “On average, Metaculus forecasts are off by X%”.
Fyi, the Metaculus track record—the “Community Prediction calibration” part, specifically—lets us do this already. When Metaculus predicts something with 80% certainty, for example, it happens around 82% of the time:
Building on the above: the folks behind Intelligence Rising actually published a paper earlier this month, titled ‘Strategic Insights from Simulation Gaming of AI Race Dynamics’. I’ve not read it myself, but it might address some of your wonderings, @yanni. Here’s the abstract:
We present insights from ‘Intelligence Rising’, a scenario exploration exercise about possible AI futures. Drawing on the experiences of facilitators who have overseen 43 games over a four-year period, we illuminate recurring patterns, strategies, and decision-making processes observed during gameplay. Our analysis reveals key strategic considerations about AI development trajectories in this simulated environment, including: the destabilising effects of AI races, the crucial role of international cooperation in mitigating catastrophic risks, the challenges of aligning corporate and national interests, and the potential for rapid, transformative change in AI capabilities. We highlight places where we believe the game has been effective in exposing participants to the complexities and uncertainties inherent in AI governance. Key recurring gameplay themes include the emergence of international agreements, challenges to the robustness of such agreements, the critical role of cybersecurity in AI development, and the potential for unexpected crises to dramatically alter AI trajectories. By documenting these insights, we aim to provide valuable foresight for policymakers, industry leaders, and researchers navigating the complex landscape of AI development and governance.
[emphasis added]
(Just pointing out that previous discussion of this paper on this forum can be found here.)
I think your argument could go through if the person being praised was Holden, or Will MacAskill, or some other big name in EA. However, Michael seems pretty under the radar given the size of his contributions, so I don’t think your concerns check out in this case (and in fact this case might even align with your point about recognizing unsung contributors).
+1! I especially appreciate how Michael often writes very detailed responses as part of extended back-and-forth-type comment threads. These contributions aren’t rewarded so much karma-wise, but I think they’re extremely valuable: I personally owe a good deal of what I understand about welfare, moral uncertainty, infinite ethics and theories of consciousness to comments I’ve read by Michael.
Animal welfare getting so little[1] EA funding, at present, relative to global health, seems to be an artefact of Open Phil’s ‘worldview diversification,’ which imo is a lacklustre framework for decision-making, both in theory and (especially) in practice: see, e.g., Sempere (2022).
Cost-effectiveness analyses I’ve seen indicate that animal welfare interventions, like cage-free campaigns, are really excellent uses of money—orders of magnitude more effective than leading global health interventions.
Though not central to my argument, there’s also the meat-eater problem, which I think is under-discussed.
Responding to the bottom part of your second footnote:
To me, it seems pretty important for Forum Team members (especially the interim lead!) to be communicating with Forum users. I therefore think it’s a mistake for you to assign zero value to your posts and comments, relative to your other work.
How much value to assign to one of your posts or comments? I would crudely model this as:
(Size of your post’s or comment’s contribution)/(Size of all Forum contributions in a year) x (Forum’s total value per year)
You’ll have better-informed figures/estimates than I do, but I’d guess that the size of all Forum contributions, measured in karma,[1] in a year, is around 100,000, and that the value of the Forum per year is around $10M.[2] A thoughtful comment might get 10 karma, on average, and a thoughtful post might get 50.
I’d therefore roughly value a comment from you at (10 karma)/(100,000 karma) x $10M = $1000, and a post from you at $5000.
(My model may well be off in some way; I invite readers to improve on it.)
- ^
I don’t take karma to be a perfect measure of value by any means—see, e.g., ‘Karma overrates some topics’—but I think it’s a reasonable-enough measure for carrying out this BOTEC.
- ^
Why $10M? Mostly I’m working off the value of an ‘EA project’ as estimated in cell C4 of this spreadsheet by Nuño. (This was the accompanying post.)
- ^
+1. While I applaud the authors for doing this work at all, and share their hopes regarding automated forecasting, by my lights the opening paragraphs massively overstate their bot’s ability.
The moderators have reviewed the decision to ban @dstudioscode after users appealed the decision. Tl;dr: We are revoking the ban, and are instead rate-limiting dstudioscode and warning them to avoid posting content that could be perceived as advocating for major harm or illegal activities. The rate limit is due to dstudiocode’s pattern of engagement on the Forum, not simply because of their most recent post—for more on this, see the “third consideration” listed below.
More details:
Three moderators,[1] none of whom was involved in the original decision to ban dstudiocode, discussed this case.
The first consideration was “Does the cited norm make sense?” For reference, the norm cited in the original ban decision was “Materials advocating major harm or illegal activities, or materials that may be easily perceived as such” (under “What we discourage (and may delete or edit out)” in our “Guide to norms on the Forum”). The panel of three unanimously agreed that having some kind of Forum norm in this vein makes sense.
The second consideration was “Does the post that triggered the ban actually break the cited norm?” For reference, the post ended with the question “should murdering a meat eater be considered ‘ethical’?” (Since the post was rejected by moderators, users cannot see it.[2] We regret the confusion caused by us not making this point clearer in the original ban message.)
There was disagreement amongst the moderators involved in the appeal process about whether or not the given post breaks the norm cited above. I personally think that the post is acceptable since it does not constitute a call to action. The other two moderators see the post as breaking the norm; they see the fact that it is “just” a philosophical question as not changing the assessment.[3] (Note: The “meat-eater problem” has been discussed elsewhere on the Forum. Unlike the post in question, in the eyes of the given two moderators, these posts did not break the “advocating for major harm or illegal activities” norm because they framed the question as about whether to donate to save the life of a meat-eating person, rather than as about actively murdering people.)
Amongst the two appeals-panelist moderators who see the post as norm-breaking, there was disagreement about whether the correct response would be a temporary ban or just a warning.
The third consideration was around dstudiocode’s other actions and general standing on the Forum. dstudiocode currently sits at −38 karma following 8 posts and 30 comments. This indicates that their contributions to the discourse have generally not been helpful.[4] Accordingly, all three moderators agreed that we should be more willing to (temporarily) ban dstudiocode for a potential norm violation.
dstudiocode has also tried posting very similar, low-quality (by our lights) content multiple times. The post that triggered the ban was similar to, though more “intense” than, this other post of theirs from five months ago. Additionally, they tried posting similar content through an alt account just before their ban. When a Forum team member asked them about their alt, they appeared to lie.[5] All three moderators agreed that this repeated posting of very similar, low-quality content warrants at least a rate limit (i.e., a cap on how much the user in question can post or comment).[6] (For context, eight months ago, dstudiocode published five posts in an eight-day span, all of which were low quality, in our view. We would like to avoid a repeat of that situation: a rate limit or a ban are the tools we could employ to this end.) Lying about their alt also makes us worried that the user is trying to skirt the rules.
Overall, the appeals panel is revoking dstudiocode’s ban, and is replacing the ban with a warning (instructing them to avoid posting content that could be perceived as advocating for major harm of illegal activities) and a rate limit. dstudiocode will be limited to at most one comment every three days and one post per week for the next three weeks—i.e., until when their original ban would have ended. Moderators will be keeping an eye on their posting, and will remove their posting rights entirely if they continue to publish content that we consider sufficiently low quality or norm-bending.
We would like to thank @richard_ngo and @Neel Nanda for appealing the original decision, as well as @Jason and @dirk for contributing to the discussion. We apologize that the original ban notice was rushed, and failed to lay out all the factors that went into the decision.[7] (Reasoning along the lines of the “third consideration” given above went into the original decision, but we failed to communicate that.)
If anyone has questions or concerns about how we have handled the appeals process, feel free to comment below or reach out.
- ^
Technically, two moderators and one moderation advisor. (I write “three moderators” in the main text because that makes referring to them, as I do throughout the text, less cumbersome.)
- ^
The three of us discussed whether or not to quote the full version of the post that triggered the ban in this moderator comment, to allow users to see exactly what is being ruled on. By split decision (with me as the dissenting minority), we have decided not to do so: in general, we will probably avoid republishing content that is objectionable enough to get taken down in the first place.
- ^
I’m not certain, but my guess is that the disagreement here is related to the high vs. low decoupling spectrum (where high decouplers, like myself, are fine with entertaining philosophical questions like these, whereas low decouplers tend to see such questions as crossing a line).
- ^
We don’t see karma as a perfect measure of a user’s value by any means, but we do consider a user’s total karma being negative to be a strong signal that something is awry.
Looking through dstudiocode’s post and comment history, I do think that they are trying to engage in good faith (as opposed to being a troll, say). However, the EA Forum exists for a particular purpose, and has particular standards in place to serve that purpose, and this means that the Forum is not necessarily a good place for everyone who is trying to contribute. (For what it’s worth, I feel a missing mood in writing this.)
- ^
In response to our request that they stop publishing similar content from multiple accounts, they said: “Posted from multiple accounts? I feel it is possible that the same post may have been created because maybe the topic is popular?” However, we are >99% confident, based on our usual checks for multiple account use, that the other account that tried to publish this similar content is an alt controlled by them. (They did subsequently stop trying to publish from other accounts.)
- ^
We do not have an official policy on rate limits, at present, although we have used rate limits on occasion. We aim to improve our process here. In short, rate limits may be a more appropriate intervention than bans are for users who aren’t clearly breaking norms, but who are nonetheless posting low-quality content or repeatedly testing the edges of the norms.
- ^
Notwithstanding the notice we published, which was a mistake, I am not sure if the ban decision itself was a mistake. It turns out that different moderators have different views on the post in question, and I think the difference between the original decision to ban and the present decision to instead warn and rate limit can mostly be chalked up to reasonable disagreement between different moderators. (We are choosing to override the original decision since we spent significantly longer on the review, and we therefore have more confidence in the review decision being “correct”. We put substantial effort into the review because established users, in their appeal, made some points that we felt deserved to be taken seriously. However, this level of effort would not be tenable for most “regular” moderation calls—i.e., those involving unestablished or not-in-great-standing users, like dstudiocode—given the tradeoffs we face.)
- ^
Yeah, I find myself very confused by this state of affairs. Hundreds of people are being funneled through the AI safety community-building pipeline, but there’s little funding for them to work on things once they come out the other side.[1]
As well as being suboptimal from the viewpoint of preventing existential catastrophe, this also just seems kind of common-sense unethical. Like, all these people (most of whom are bright-eyed youngsters) are being told that they can contribute, if only they skill up, and then they later find out that that’s not the case.
These community-building graduates can, of course, try going the non-philanthropic route—i.e., apply to AGI companies or government institutes. But there are major gaps in what those organizations are working on, in my view, and they also can’t absorb so many people.