Interested in AI safety talent search and development.
Peter
This seems interesting. Are there ways you think these ideas could be incorporated into LLM training pipelines or experiments we could run to test the advantages and potential limits vs RLHF/conventional alignment strategies? Also do you think using developmental constraints and then techniques like RLHF could be potentially more effective than either alone?
I’d like to see more rigorous engagement with big questions like where value comes from, what makes a good future, how we know, and how this affects cause prioritization. I think it’s generally assumed “consciousness is where value comes from, so maximize it in some way.” Yet some of the people who have studied consciousness most closely from a phenomenological perspective seem to not think that (e.g. zen masters, Tibetan lamas, other contemplatives, etc), let alone scale it to cosmic levels. Why? Is third person philosophical analysis alone missing something?
The experiences of these people add up to millions of years of contemplation across thousands of years. If we accept this as a sort of “long reflection” what does that mean? If we don’t, what do we envision differently and why? And are we really going to be able to do serious sustained reflection if/once we have everything we think we want within our grasp due to strong AI?
These are the kinds of things I’m currently thinking through most in my spare time and writing my thoughts up on.
For 2, what’s “easiest to build and maintain” is determined by human efforts to build new technologies, cultural norms, and forms of governance.
For 11 there isn’t necessarily a clear consensus on what “exceptional” means or how to measure it, and ideas about what it is are often not reliably predictive. Furthermore, organizations are extremely risk averse in hiring and there are understandable reasons for this—they’re thinking about how to best fill a specific role with someone who they will take a costly bet on. But this is rather different than thinking about how to make the most impactful use of each applicant’s talent. So I wouldn’t be surprised if even many talented people cannot find roles indefinitely for a variety of reasons: 1) the right orgs don’t exist yet 2) funder market lag 3) difficulty finding opportunities to prove their competence in the first place (doing well on work tests is a positive sign but it’s often not enough for hiring managers to hire on that alone), etc.
On top of that, there’s a bit of a hype cycle for different things within causes like AI safety (there was an interp phase, followed by a model evals phase, etc). Someone who didn’t fit ideas of what’s needed in the interpretability phase may have ended up a much better fit for model evals work when it started catching on, or for finding some new area to develop.For 12, I think it’s a mistake to bound everyone’s potential here. There are certainly some people who live far more selflessly and people who become much closer to that through their own efforts. Foreclosing that possibility is pretty different than accepting where one currently is and doing the best one can each day.
Would be curious to hear more. I’m interested in doing more independent projects in the near future but am not sure how they’d be feasible.
What do you think is causing the ball to be dropped?
Yes, what you are scaling matters just as much as the fact that you are scaling. So now developers are scaling RL post training and pretraining using higher quality synthetic data pipelines. If the point is just that training on average internet text provides diminishing real world returns in many real-world use cases, then that seems defensible; that certainly doesn’t seem to be the main recipe any company is using for pushing the frontier right now. But it seems like people often mistake this for something stronger like “all training is now facing insurmountable barriers to continued real world gains” or “scaling laws are slowing down across the board” or “it didn’t produce significant gains on meaningful tasks so scaling is done.” I mentioned SWE-Bench because that seems to suggest significant real world utility improvements rather than trivial prediction loss decrease. I also don’t think it’s clear that there is such an absolute separation here—to model the data you have to model the world in some sense. If you continue feeding multimodal LLM agents the right data in the right way, they continue improving on real world tasks.
Shouldn’t we be able to point to some objective benchmark if GPT-4.5 was really off trend? It got 10x the SWE-Bench score of GPT-4. That seems like solid evidence that additional pretraining continued to produce the same magnitude of improvements as previous scaleups. If there were now even more efficient ways than that to improve capabilities, like RL post-training on smaller o-series models, why would you expect OpenAI not to focus their efforts there instead? RL was producing gains and hadn’t been scaled as much as self-supervised pretraining, so it was obvious where to invest marginal dollars. GPT-5 is better and faster than 4.5. This doesn’t mean pretraining suddenly stopped working or went off trend from scaling laws though.
Maybe or maybe not—people also thought we would run out of training data years ago. But that has been pushed back and maybe won’t really matter given improvements in synthetic data, multimodal learning, and algorithmic efficiency.
It seems more likely that RL does actually allow LLMs to learn new skills.
RL + LLMs is still pretty new but we already have clear signs it exhibits scaling laws with the right setup just like self-supervised pretraining. This time they appear to be sigmoidal, probably based on something like each policy or goal or environment they’re trained with. It has been about 1 year since o1-preview and maybe this was being worked on to some degree about a year before that.
The Grok chart contains no numbers, which is so strange I don’t think you can conclude much from it except “we used more RL than last time.” It also seems likely that they might not yet be as efficient as OpenAI and DeepMind, who have been in the RL game for much longer with projects like AlphaZero and AlphaStar.
Humanity AI Commits $500 million to AI and Democracy Protection, AI x Security, and more
I think this is an interesting vision to reinvigorate things and do kind of feel sometimes “principles first” has been conflated with just “classic EA causes.”
To me, “PR speak” =/= clear effective communication. I think the lack of a clear, coherent message is most of what bothers people, especially during and after a crisis. Without that, it’s hard to talk to different people and meet them where they’re at. It’s not clear to me what the takeaways were or if anyone learned anything.
I feel like “figuring out how to choose leaders and build institutions effectively” is really neglected and it’s kind of shocking there doesn’t seem to be much focus here. A lingering question for me has been “Why can’t we be more effective in who we trust?” and the usual objections sort of just seem like “it’s hard.” But so is AI safety, biorisk, post-AGI prep, etc… so that doesn’t seem super satisfying.
Ok thanks for the details. Off the top of my head I can think of multiple people interested in AI safety who probably fit these (though the descriptions I think still could be more concretely operationalized) and fit into categories such as: founders/cofounders, several years experience in operations analytics and management, several years experience in consulting, multiple years experience in events and community building/management. Some want to stay in Europe, some have families, but overall I don’t recall them being super constrained.
What does strong non-technical founder-level operator talent actually mean concretely? I feel like I see lots of strong people struggle to find any role in the space.
Would be curious to hear your thoughts on this as one strategy for eliciting better plans
Do you have ideas about how we could get better plans?
Contest for Better AGI Safety Plans
I’ve actually been working on how to get better AI safety plans so I would be keen to chat with anyone who is interested in this. I think the best plan so far (covering the Alignment side) is probably Google’s [2504.01849] An Approach to Technical AGI Safety and Security. On the more governance side, one of the most detailed one’s is probably AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
I’ve been thinking about coup risks more lately so would actually be pretty keen to collaborate or give feedback on any early stuff. There isn’t much work on this (for example, none at RAND as far as I can tell).
I think EAs have frequently suffered from a lack of expertise, which causes pain in areas like politics. Almost every EA and AI safety person was way off on the magnitude of change a Trump win would create—gutting USAID easily dwarfs all of EA global health by orders of magnitude. Basically no one took this seriously as a possibility, or at least I do not know of anyone. And it’s not like you’d normally be incentivized to plan for abrupt major changes to a longstanding status quo in the first placce.
Oversimplification of neglectedness has definitely been an unfortunate meme for a while. Sometimes things are too neglected to make progress or don’t make sense for your skillset, or are neglected for a reason, or just less impactful. To a lesser extent, I think there has been some misuse/misunderstanding of counterfactual thinking as well instead of Shapley additives. Or being overly optimistic “our few week fellowship can very likely change someone’s entrenched career path” if they haven’t strongly shown that as their purpose for participating.Definitely agree we have a problem with deference/not figuring things out. It’s hard and there’s lots of imposter syndrome where people think they aren’t good enough to do this or try to do it. I think sometimes people get early negative feedback and over-update, dropping projects before they’ve tested things to see results. I would definitely like to see more rigorous impact evaluation in the space. At one point I wanted to start an independent org that did this. It seems surprisingly underprioritized. There’s a meme that EAs like to think and research and need to just do more things, but I think it’s a bit of a false dichotomy and on net more research + iteration is valuable and amplifies your effectiveness, making sure you’re prioritizing the right things in the right ways.
Another way deference expresses negative effects is that established orgs act as whirlpools that suck up all the talent and offer more “legitimacy” including frontier AI companies, but I think they’re often not the highest impact thing you could do. Often there is something that would be impactful but won’t happen if you don’t do it. Or would happen worse. Or happen way later. People also underestimate how much the org they work at will change how they think and what they think about and what they want to do or are willing to give up. But finding alternatives can be tough—how many people really want to continue working as independent contractors with no benefits and no coworkers indefinitely? it’s very adverse selection against impact. Sure, this level of competition might weed out some worse ideas but also good ones.
This makes me wonder if there could be good setups for evaluating AI systems as groups. You could have separate agent swarms in different sandboxes competing on metrics of safety and performance. The one that does better gets amplified. The agents may then have some incentive to enforce positive social norms for their group against things like sandbagging or deception. When deployed they might have not only individual IDs but group or clan IDs that tie them to each other and continue this dynamic.
Maybe there is some mechanism where membership gets shuffled around sometimes the way alleles do between genes. Or traits of the systems, though that seems less clearly desirable. There are already algorithms to imitate genetic recombination but that would be somewhat different. You could also combine social group membership systems and trait recombination systems potentially. Given the level of influence over AIs, it might be somewhat closer to selective breeding in certain respects but not entirely.
Seems important to check whether the people hired actually fit into those experience requirements or have more experience. If the roles are very competitive then it could be much higher.