Interested in AI safety talent search and development.
Peter
This makes me wonder if there could be good setups for evaluating AI systems as groups. You could have separate agent swarms in different sandboxes competing on metrics of safety and performance. The one that does better gets amplified. The agents may then have some incentive to enforce positive social norms for their group against things like sandbagging or deception. When deployed they might have not only individual IDs but group or clan IDs that tie them to each other and continue this dynamic.
Maybe there is some mechanism where membership gets shuffled around sometimes the way alleles do between genes. Or traits of the systems, though that seems less clearly desirable. There are already algorithms to imitate genetic recombination but that would be somewhat different. You could also combine social group membership systems and trait recombination systems potentially. Given the level of influence over AIs, it might be somewhat closer to selective breeding in certain respects but not entirely.
I’m not sure the policies have been mostly worked out but not implemented. Figuring out technical AI governance solutions seems like a big part of what is needed.
That’s a really broad question though. If you asked something like, which system unlocked the most real-world value in coding, people would probably say the jump to a more recent model like o3-mini or Gemini 2.5
You could similarly argue the jump from infant to toddler is much more profound in terms of general capabilities than college student to phd but the latter is more relevant in terms of unlocking new research tasks that can be done.
States and corporations rely on humans. They have no incentive to get rid of them. AGI would mean you don’t have to rely on humans. So AGIs or people using AGIs might not care about humans anymore or even see them as an obstacle.
States and corporations aren’t that monolithic; they are full of competing factions and people who often fail to coordinate or behave rationally. AGI will probably be much better at strategizing and coordinating.
States and corporation are constrained by balance of power with other states/corporations/actors. Superhuman AIs might not have this problem if they exceed all of humanity combined or might think they have more in common with each other than with humans.
Hmm maybe it could still be good to try things in case timelines are a bit longer or an unexpected opportunity arises? For example, what if you thought it was 2 years but actually 3-5?
So it seems like you’re saying there are at least two conditions: 1) someone with enough resources would have to want to release a frontier model with open weights, maybe Meta or a very large coalition of the opensource community if distributed training continues to scale, 2) it would need at least enough dangerous capability mitigations like unlearning and tamper resistant weights or cloud inference monitoring, or be behind the frontier enough so governments don’t try to stop it. Does that seem right? What do you think is the likely price range for AGI?
I’m not sure the government is moving fast enough or interested in trying to lock down the labs too much given it might slow them down more than it increases their lead or they don’t fully buy into risk arguments for now. I’m not sure what the key factors to watch here are. I expected reasoning systems next year, but it seems like even open weight ones were released this year that seem around o1 preview level just a few weeks after, indicating that multiple parties are pursuing similar lines of AI research somewhat independently.
This is a thoughtful post so it’s unfortunate it hasn’t gotten much engagement here. Do you have cruxes around the extent to which centralization is favorable or feasible? It seems like small models that could be run on a phone or laptop (~50GB) are becoming quite capable and decentralized training runs work for 10 billion parameter models which are close to that size range. I don’t know its exact size, but Gemini Flash 2.0 seems much better than I would have expected a model of that size to be in 2024.
Do you think there’s a way to tell the former group apart from people who are closer to your experience (hearing earlier would be beneficial)?
Interesting. People probably aren’t at peak productivity or even working at all for some part of those hours, so you could probably cut the hours by 1⁄4. This narrows the gap between what GPT2030 can achieve in a day and what all humans can together.
Assuming 9 billion people work 8 hours that’s ~8.22 million years of work in a day. But given slowdowns in productivity throughout the day we might want to round that down to ~6 million years.
Additionally, GPT2030 might be more effective than even the best human workers at their peak hours. If it’s 3x as good as a PhD student at learning, which it might be because of better retention and connections, it would be learning more than all PhD students in the world every day. The quality of its work might be 100x or 1000x better, which is difficult to compare abstractly. In some tasks like clearing rubble, more work time might easily translate into catching up on outcomes.
With things like scientific breakthroughs, more time might not result in equivalent breakthroughs. From that perspective, GPT2030 might end up doing more work than all of humanity since huge breakthroughs are uncommon.
This is a pretty interesting idea. I wonder if what we perceive as clumps of ‘dark matter’ might be or contain silent civilizations shrouded from interference.
Maybe there is some kind of defense dominant technology or strategy that we don’t yet comprehend.
Interesting post—I particularly appreciated the part about the impact of Szilard’s silence not really affecting Germany’s technological development. This was recently mentioned in Leopold Aschenbrenner’s manifesto as an analogy for why secrecy is important, but I guess it wasn’t that simple. I wonder how many other analogies are in there and elsewhere that don’t quite hold. Could be a useful analysis if anyone has the background or is interested.
Huh had no idea this existed
I think it’s good to critically interrogate this kind of analysis. I don’t want to discourage that. But as someone who publicly expressed skepticism about Flynn’s chances, I think there are several differences that mean it warrants closer consideration. The polls are much closer for this race, Biden is well known and experienced at winning campaigns, and the differences between the candidates in this race seem much larger. Based on that it at least seems a lot more reasonable to think Biden could win and that it will be a close race worth spending some effort on.
Interesting. Are there any examples of what we might consider a relatively small policy changes that received huge amounts of coverage? Like for something people normally wouldn’t care about. Maybe these would be informative to look at compared to more hot button issues like abortion that tend to get a lot of coverage. I’m also curious if any big issues somehow got less attention than expected and how this looks for pass/fail margins compared to other states where they got more attention. There are probably some ways to estimate this that are better than others.
I see.
I was interpreting it as “a referendum increases the likelihood of the policy existing later.” My question is about the assumptions that lead to this view and the idea that it might be more effective to run a campaign for a policy ballot initiative once and never again. Is this estimate of the referendum effect only for the exact same policy (maybe an education tax but the percent is slightly higher or lower) or similar policies (a fee or a subsidy or voucher or something even more different)? How similar do they have to be? What is the most different policy that existed later that you think would still count?
“Something relevant to EAs that I don’t focus on in the paper is how to think about the effect of campaigning for a policy given that I focus on the effect of passing one conditional on its being proposed. It turns out there’s a method (Cellini et al. 2010) for backing this out if we assume that the effect of passing a referendum on whether the policy is in place later is the same on your first try is the same as on your Nth try. Using this method yields an estimate of the effect of running a successful campaign on later policy of around 60% (Appendix Figure D20).
I’d be curious to hear about potential plans to address any of these, especially talent development and developing the pipeline of AI safety and governance.
Very interesting.
1. Did you notice an effect of how large/ambitious the ballot initiative was? I remember previous research suggesting consecutive piecemeal initiatives were more successful at creating larger change than singular large ballot initiatives.
2. Do you know how much the results vary by state?
3. How different do ballot initiatives need to be for the huge first advocacy effect to take place? Does this work as long as the policies are not identical or is it more of a cause specific function or something in between? Does it have a smooth gradient or is it discontinuous after some tipping point?
This is an inspiring amount of research. I really appreciate it and am enjoying reading it.
That’s a good point. Although 1) if people leave a company to go to one that prioritizes AI safety, then this means there are fewer workers at all the other companies who feel as strongly. So a union is less likely to improve safety there. 2) It’s common for workers to take action to improve safety conditions for them, and much less common for them to take action on issues that don’t directly affect their work, such as air pollution or carbon pollution, and 3) if safety inclined people become tagged as wanting to just generally slow down the company, then hiring teams will likely start filtering out many of the most safety minded people.
I’ve been thinking about coup risks more lately so would actually be pretty keen to collaborate or give feedback on any early stuff. There isn’t much work on this (for example, none at RAND as far as I can tell).
I think EAs have frequently suffered from a lack of expertise, which causes pain in areas like politics. Almost every EA and AI safety person was way off on the magnitude of change a Trump win would create—gutting USAID easily dwarfs all of EA global health by orders of magnitude. Basically no one took this seriously as a possibility, or at least I do not know of anyone. And it’s not like you’d normally be incentivized to plan for abrupt major changes to a longstanding status quo in the first placce.
Oversimplification of neglectedness has definitely been an unfortunate meme for a while. Sometimes things are too neglected to make progress or don’t make sense for your skillset, or are neglected for a reason, or just less impactful. To a lesser extent, I think there has been some misuse/misunderstanding of counterfactual thinking as well instead of Shapley additives. Or being overly optimistic “our few week fellowship can very likely change someone’s entrenched career path” if they haven’t strongly shown that as their purpose for participating.
Definitely agree we have a problem with deference/not figuring things out. It’s hard and there’s lots of imposter syndrome where people think they aren’t good enough to do this or try to do it. I think sometimes people get early negative feedback and over-update, dropping projects before they’ve tested things to see results. I would definitely like to see more rigorous impact evaluation in the space. At one point I wanted to start an independent org that did this. It seems surprisingly underprioritized. There’s a meme that EAs like to think and research and need to just do more things, but I think it’s a bit of a false dichotomy and on net more research + iteration is valuable and amplifies your effectiveness, making sure you’re prioritizing the right things in the right ways.
Another way deference expresses negative effects is that established orgs act as whirlpools that suck up all the talent and offer more “legitimacy” including frontier AI companies, but I think they’re often not the highest impact thing you could do. Often there is something that would be impactful but won’t happen if you don’t do it. Or would happen worse. Or happen way later. People also underestimate how much the org they work at will change how they think and what they think about and what they want to do or are willing to give up. But finding alternatives can be tough—how many people really want to continue working as independent contractors with no benefits and no coworkers indefinitely? it’s very adverse selection against impact. Sure, this level of competition might weed out some worse ideas but also good ones.