Joe is a former Reliability Engineer who pivoted careers in an effort to make AI go well. He’s taught courses in AI governance at BlueDot Impact, and consulted with the Center for AI Risk Management and Alignment. He now works on the communications team at the Machine Intelligence Research Institute.
Joe Rogero
Red vs blue: The parable of the feud within a feud
You appear to be arguing against a position I do not actually hold. I support many forms of alignment work. I expect most avenues to fail, yet still be worth exploring. I expect many such avenues to advance capabilities very little or not at all. Much of interpretability seems to be like this. And even given that some forms of interpretability may be used to train smarter models, it still (probably) seems worth the price to understand those models better.
A subset of work labeled “alignment” or “safety” seems fundamentally flawed and at best a waste of resources. It might be aimed at the wrong things, like Elon Musk’s plan to make AI that “seeks truth”; or it might be falsely labeled “alignment” while actually doing something else, like attempts to get AI to refuse to talk about [insert politicized issue here]. Here I am deliberately picking obvious examples as an existence proof of this category; one may reasonably disagree as to what other forms of research qualify.
I do think there is a further kind of research which is just advancing capabilities and doing little for alignment, e.g. the attempts to automate AI research itself. In this case the tradeoff seems overwhelmingly lopsided against proceeding. The faster you build it, the sooner you die.
Alignment and capabilities are no more equal than poison and medicine. It is still a bad idea to drink a cup containing mostly poison.
I of course cannot speak for the reasoning of PauseAI leadership, and can only make an informed guess as to their stance on the tradeoffs involved.
The broader context, as I understand it, was that many forms of technical safety work either (a) are capabilities masquerading as safety or (b) are too dependent on the labs for compute and model access and thus vulnerable to industry capture. Without wholly endorsing this frame, I do think it is a reasonable concern to have and it accurately describes at least some work labeled as safety.
Reflections on PauseCon 2026
A lesson in courage from science camp
A conversation on concentration of power
Announcing my retirement to a life of entirely failing to desperately seek renewed meaning
I agree that doing things takes time. If someone does not have the slack in their lives to do anything other than scrape by, I don’t recommend they force themselves. (I do recommend they call a representative about stopping the AI race. That takes mere minutes.) It’s not healthy to try to shoulder the world’s burdens when one’s knees are already buckling. This post is for everyone else.
If any org I’ve worked for meaningfully “controlled the narrative”, the world would look very different than it does. The narrative, such as it is, does not look very controlled to me.
I have seen many good people make changes happen simply by doing good work on their own time. Does this require slack, runway, and no small amount of luck? Sure. Do good and competent people have less reach than a sane and functional society would afford them? Probably. But one does have to actually take shots on goal in order to score, even when most of them miss, and that’s no less true for sounding vaguely like something out of a self-help book.
If you truly believe that impressing some gatekeeping organization is necessary to doing good work, then by all means set out to impress them. Sometimes it’s indeed necessary; for instance, I don’t see an international halt to AI development arriving without someone getting the U.S. government on board.
But I’ve taken direction from a high school dropout. The credentials bar is lower than you think.
We do not live by course alone
I felt like most of the counterarguments that I see in the wild (e.g. from people on Twitter, who are mostly much more informed about AI than the audience of this book) were left unaddressed. I have no idea whether the authors’ prioritization of counterarguments was right for that audience, and I do think it would be handy to have a version of this book somewhat more appropriate for AI twitter people.
PSA: The online resources do indeed contain quite a few counter-counterarguments that didn’t fit into the book. (Buck probably knows this already, some readers might not.)
So You Want to Work at a Frontier AI Lab
What We Can Do to Prevent Extinction by AI
The primary benefit I’m imagining is a single well-placed whistleblower positioned to publicly sound the alarm on a particularly obvious and immediate threat, perhaps related to CBRN capabilities. A better answer requires a longer post, which is in the works but may take a while.
Honestly this writeup did update me somewhat in favor of at least a few competent safety-conscious people working at major labs, if only so the safety movement has some access to what’s going on inside the labs if/when secrecy grows. The marginal extra researcher going to Anthropic, though? Probably not.
E.g. Ajeya’s median estimate is 99% automation of fully-remote jobs in roughly 6-8 years, 5+ years earlier than her 2023 estimate.
This seems more extreme than the linked comment suggests? I can’t find anything in the comment justifying “99% automation of fully-remote jobs”.
Frankly I think we get ASI and everyone dies before we get anything like 99% automation of current remote jobs, due to bureaucratic inertia and slow adoption. Automation of AI research comes first on the jagged frontier. I don’t think Ajeya disagrees?
Cost, Not Sacrifice
Registrations Open for 2024 NYC Secular Solstice & Megameetup
It’s often in the nature of thought experiments to try to reduce complicated things to simple choices. In reality, humans rarely know enough to do an explicit EV calculation about a decision correctly. It can still be an ideal that can help guide our decisions, such that “this seems like a poor trade of EV” is a red flag the same way “oh, I notice I could be Dutch booked by this set of preferences” is a good way to notice there may be a flaw in our thinking somewhere.
As Ande T. points out, choosing red in a world where you know that most people choose red is not really selfish. Even altruists are not called upon by their values to die for no gain. (I don’t claim that’s the world we live in, of course.)