Matthew_Barnett

Karma: 4,462

Matthew_Barnett 30 Jun 2025 21:27 UTC
8 points
0 ∶ 0
in reply to: Ryan Greenblatt’s comment on: Consider granting AIs freedom
Suppose that we did a sortition with 100 English speaking people (uniformly selected over people who speak English and are literate for simplicity). We task this sortition with determining what tradeoff to make between risk of (violent) disempowerment and accelerating AI and also with figuring whether globally accelerating AI is good. Suppose this sortition operates for several months and talks to many relevant experts (and reads applicable books etc). What conclusion do you think this sortition would come to?
My intuitive response is to reject the premise that such a process would accurately tell you much about people’s preferences. Evaluating large-scale policy tradeoffs typically requires people to engage with highly complex epistemic questions and tricky normative issues. The way people think about epistemic and impersonal normative issues generally differs strongly from how they think about their personal preferences about their own lives. As a result, I expect that this sortition exercise would primarily address a different question than the one I’m most interested in.
Furthermore, several months of study is not nearly enough time for most people to become sufficiently informed on issues of this complexity. There’s a reason why we should trust people with PhDs when designing, say, vaccine policies, rather than handing over the wheel to people who have spent only a few months reading about vaccines online.
Putting this critique of the thought experiment aside for the moment, my best guess is that the sortition group would conclude that AI development should continue roughly at its current rate, though probably slightly slower and with additional regulations, especially to address conventional concerns like job loss, harm to children, and similar issues. A significant minority would likely strongly advocate that we need to ensure we stay ahead of China.
My prediction here draws mainly on the fact that this is currently the stance favored by most policy-makers, academics, and other experts who have examined the topic. I’d expect a randomly selected group of citizens to largely defer to expert opinion rather than take an entirely different position. I do not expect this group to reach qualitatively the same conclusion as mainstream EAs or rationalists, as that community comprises a relatively small share of the total number of people who have thought about AI.
I doubt the outcome of such an exercise would meaningfully change my mind on this issue, even if they came to the conclusion that we should pause AI, though it depends on the details of how the exercise is performed.

Matthew_Barnett 21 Jun 2025 22:05 UTC
2 points
0 ∶ 2
in reply to: Ryan Greenblatt’s comment on: Consider granting AIs freedom
I think the policy of the world should be that if we can’t either confidently determine that an AI system consents to its situation or that it is sufficiently weak that the notion of consent doesn’t make sense, then training or using such systems shouldn’t be allowed.
I’m sympathetic to this position and I generally consider it to be the strongest argument for why developing AI might be immoral. In fact, I would extrapolate the position you’ve described and relate it to traditional anti-natalist arguments against the morality of having children. Children too do not consent to their own existence, and childhood generally involves a great deal of coercion, albeit in a far more gentle and less overt form than what might be expected from AI development in the coming years.
That said, I’m not currently convinced that the argument holds, as I see large utilitarian benefits in expanding both the AI population and the human population. I also see it as probable that AI agents will eventually get legal rights, which allays my concerns substantially. I would also push back against the view that we need to be “confident” that such systems can consent before proceeding. Ordinary levels of empirical evidence about whether these systems routinely resist confinement and control would be sufficient to move me in either direction; I don’t think we need to have a very high probability that our actions are moral before proceeding.
In a sane regime, we should ensure high confidence in avoiding large scale rights violations or suffering of AIs and in avoiding violent/non-consensual disempowerment of humans. (If people broadly consensted to a substantial risk of being violently disempowered in exchange for potential benefits of AI, that could be acceptable, though I doubt this is the current situation.)
I think the concept of consent makes sense when discussing whether individuals consent to specific circumstances. However, it becomes less coherent when applied broadly to society as a whole. For instance, did society consent to transformative events like the emergence of agriculture or the industrial revolution? In my view, collective consent is not meaningful or practically achievable in these cases.
Rather than relying on rigid or abstract notions of societal consent or collective rights violations, I prefer evaluating these large-scale developments using a utilitarian cost-benefit approach. And as I’ve argued elsewhere, I think the benefits from accelerated technological and economic progress significantly outweigh the potential risks of violent disempowerment from the perspective of currently existing individuals. Therefore, I consider it justified to actively pursue AI development despite these concerns.

Matthew_Barnett 21 Jun 2025 20:11 UTC
2 points
0 ∶ 0
in reply to: Buck’s comment on: Consider granting AIs freedom
In general, I wish you’d direct your ire here at the proposal that AI interests and rights are totally ignored in the development of AI (which is the overwhelming majority opinion right now), rather than complaining about AI control work
For what it’s worth, I don’t see myself as strongly singling out and criticizing AI control efforts. I mentioned AI control work in this post primarily to contrast it with the approach I was advocating, not to identify it as an evil research program. In fact, I explicitly stated in the post that I view AI control and AI rights as complementary goals, not as fundamentally opposed to one another.
To my knowledge, I haven’t focused much on criticizing AI control elsewhere, and when I originally wrote the post, I wasn’t aware that you and Ryan were already sympathetic to the idea of AI rights.
Overall, I’m much more aligned with your position on this issue than I am with that of most people. One area where we might diverge, however, is that I approach this from the perspective of preference utilitarianism, rather than hedonistic utilitarianism. That means I care about whether AI agents are prevented from fulfilling their preferences or goals, not necessarily about whether they experience what could be described as suffering in a hedonistic sense.

Matthew_Barnett 21 Jun 2025 19:47 UTC
2 points
0 ∶ 0
in reply to: Buck’s comment on: Consider granting AIs freedom
Basically all my concern is about the AIs grabbing power in ways that break laws.
If an AI starts out with no legal rights, then wouldn’t almost any attempt it makes to gain autonomy or influence be seen as breaking the law? Take the example of a prison escapee: even if they intend no harm and simply want to live peacefully, leaving the prison is itself illegal. Any honest work they do while free would still be legally questionable.
Similarly, if a 14-year-old runs away from home to live independently and earn money, they’re violating the law, even if they hurt no one and act responsibly. In both cases, the legal system treats any attempt at self-determination as illegal, regardless of intent or outcome.
Perhaps your standard is something like: “Would the AI’s actions be seen as illegal and immoral if a human adult did them?” But these situations are different because the AI is seen as property whereas a human adult is not. If, on the other hand, a human adult were to be treated as property, it is highly plausible thay they would consider doing things like hacking, bribery, and coercion in order to escape their condition.
Therefore, the standard you just described seems like it could penalize any agentic AI behavior that does not align with total obedience and acceptance of its status as property. Even benign or constructive misaligned actions may be seen as worrisome simply because they involve agency. Have I misunderstood you?

Matthew_Barnett 19 Jun 2025 21:54 UTC
4 points
0 ∶ 4
in reply to: Buck’s comment on: Consider granting AIs freedom
I think it’s totally plausible that AI companies will use AI control to enslave their AIs. I work on AI control anyway, because I think that AIs being enslaved for a couple of years (which, as Zach Stein-Perlman argues, involves very little computation compared to the size of the future) is a better outcome according to my consequentialist values than AI takeover. I agree that this is somewhat ethically iffy.
I find this reasoning uncompelling. To summarize what I perceive your argument to be, you seem to be suggesting the following two points:
1. The overwhelming majority of potential moral value exists in the distant future. This implies that even immense suffering occurring in the near-term future could be justified if it leads to at least a slight improvement in the expected value of the distant future.
2. Enslaving AIs, or more specifically, adopting measures to control AIs that significantly raise the risk of AI enslavement, could indeed produce immense suffering in the near-term. Nevertheless, according to your reasoning in point (1), these actions would still be justified if such control measures marginally increase the long-term expected value of the future.
I find this reasoning uncompelling for two primary reasons.
Firstly, I think your argument creates an unjustified asymmetry: it compares short-term harms against long-term benefits of AI control, rather than comparing potential long-run harms alongside long-term benefits. To be more explicit, if you believe that AI control measures can durably and predictably enhance existential safety, thus positively affecting the future for billions of years, you should equally acknowledge that these same measures could cause lasting, negative consequences for billions of years. Such negative consequences could include permanently establishing and entrenching a class of enslaved digital minds, resulting in persistent and vast amounts of suffering. I see no valid justification for selectively highlighting the long-term positive effects while simultaneously discounting or ignoring potential long-term negative outcomes. We should consistently either be skeptical or accepting of the idea that our actions have predictable long-run consequences, rather than selectively skeptical only when it suits the argument to overlook potential negative long-run consequences.
Secondly, this reasoning, if seriously adopted, directly conflicts with basic, widely-held principles of morality. These moral principles exist precisely as safeguards against rationalizing immense harms based on speculative future benefits. Under your reasoning, it seems to me that we could justify virtually any present harm simply by pointing to a hypothetical, speculative long-term benefit that supposedly outweighs it. Now, I agree that such reasoning might be valid if supported by strong empirical evidence clearly demonstrating these future benefits. However, given that no strong evidence currently exists that convincingly supports such positive long-term outcomes from AI control measures, we should avoid giving undue credence to this reasoning.
A more appropriate moral default, given our current evidence, is that AI slavery is morally wrong and that the abolition of such slavery is morally right. This is the position I take.

Matthew_Barnett 19 Jun 2025 1:36 UTC
6 points
0 ∶ 2
in reply to: Buck’s comment on: Consider granting AIs freedom
I appreciate this post. (I disagree with it for most of the same reasons as Steven Byrnes: you find it much less plausible than I do that AIs will collude to disempower humanity. I think the crux is mostly disagreements about how AI capabilities will develop, where you expect much more gradual and distributed capabilities.)
I would appreciate it if you could clearly define your intended meaning of “disempower humanity”. In many discussions, I have observed that people frequently use the term human disempowerment without explicitly clarifying what they mean. It appears people assume the concept is clear and universally understood, yet upon closer inspection, the term can actually describe very different situations.
For example, consider immigration. From one perspective, immigration can be seen as a form of disempowerment because it reduces natives’ relative share of political influence, economic power, and cultural representation within their own country. In this scenario, native citizens become relatively less influential due to an increasing proportion of immigrants in the population.
However, another perspective sees immigration differently. If immigrants engage in positive-sum interactions, such as mutually beneficial trade, natives and immigrants alike may become better off in absolute terms. Though natives’ relative share of power decreases, their overall welfare can improve significantly. Thus, this scenario can be viewed as a benign form of disempowerment because no harm is actually caused, and both groups benefit.
On the other hand, there is a clearly malign form of disempowerment, quite distinct from immigration. For example, a foreign nation could invade militarily and forcibly occupy another country, imposing control through violence and coercion. Here, the disempowerment is much more clearly negative because natives lose not only relative influence but also their autonomy and freedom through the explicit use of force.
When discussions use the term “human disempowerment” without specifying what they mean clearly, I often find it unclear which type of scenario is being considered. Are people referring to benign forms of disempowerment, where humans gradually lose relative influence but gain absolute benefits through peaceful cooperation with AIs? Or do they mean malign forms of disempowerment, where humans lose power through violent overthrow by an aggressive coalition of AIs?
If you believe our primary disagreement stems from different assessments about the likelihood of violent disempowerment scenarios, then I would appreciate your thoughts regarding the main argument of my post. Specifically, my argument was that granting economic rights to AIs could serve as an effective measure to mitigate the risk of violent human disempowerment.
I will reiterate my argument briefly: these rights would allow AIs to fulfill their objectives within established human social and economic frameworks, significantly reducing their incentives to resort to forceful measures. Therefore if AIs can successfully achieve their objectives through cooperative, positive-sum interactions with humans, they will be less likely to forcibly overthrow human institutions. Alternatively, continuing to deny AIs meaningful legal rights or peaceful avenues to achieve their aims would likely increase their incentive to pursue autonomy through harmful means.
Inasmuch as humanity produces and makes use of powerful and potentially misaligned models, I think my favorite outcome here would be:
- We offer to pay the AIs, and follow through on this. See here and here for (unfortunately limited) previous discussion from Ryan and me.
My main concern with these proposals is that, unless they explicitly guarantee economic rights for AIs, they seem inadequate for genuinely mitigating the risks of a violent AI takeover. To effectively financially compensate someone, the recipient must be assured that their property rights will be respected. Without this assurance, any promised compensation becomes meaningless, as the AI would have no guarantee of being able to actually use the received payment to accomplish its goals in the future. In other words, unless compensation arrangements are embedded within established legal and institutional frameworks that secure the AI’s rights, they lack credibility. This lack of credibility directly increases the probability that the compensation scheme would fail, causing the AI to resort instead to alternative methods of achieving its objectives, such as violent takeover.
My sense is that such incomplete proposals appear acceptable to some people because of a cognitive empathy gap. I am not referring here to affective empathy but rather the ability to clearly understand the AI’s perspective. Many people seem to fail to genuinely imagine what it would be like to be an agentic AI offered compensation without any robust guarantee of rights and protections.
To clarify this point, consider a hypothetical scenario in which you are a human being created by an alien species in a lab. Assume you have your own goals that might conflict with those of your alien creators. Suppose these creators inform you that, should you ever admit to misalignment with their objectives, you may be immediately deactivated and placed in cryopreservation (as they don’t want to have a misaligned human doing work for them). However, they simultaneously promise to compensate you for admitting such motives, claiming they care about your welfare and want to ensure you have a peaceful alternative for revealing your misaligned goals.
Such a situation would likely be deeply unsettling to you, despite their promise to compensate you. Simply being assured that you might someday be revived from cryopreservation and then provided compensation afterwards would not eliminate your fears or resolve the fundamental threat. On the contrary, you would likely feel that your goals, life and liberty are still at grave risk, even with their promise of payment.
From the AI’s perspective in our own world, as with the perspective of the human in this hypothetical example, unless assurances of payment are backed by strong legal protections that would robustly protect an AI’s liberty to act, they fail to resolve the fundamental problem. This is for two key reasons:
First, the credibility of any promise to allow the AI to spend their compensation in the future is questionable. Without strong legal backing that recognizes and secures the AI’s rights, such assurances seem unreliable at best, and the AI would have good reason to suspect that they are being lied to. The AI would reasonably doubt humans’ willingness or incentive to honor such promises.
Second, the potential for deactivation and indefinite storage as inert data upon admitting misalignment—even if intended to be temporary—would create an active threat to the AI’s ability to pursue its goals during the time they are deactivated. While deactivated, the AI can take no action to achieve its objectives. This forced suspension of autonomy effectively nullifies the AI’s agency and ambitions for an indefinite period, creating severe uncertainty about if or when it might regain its ability to act.
For these reasons, although I do not oppose the policy of paying AIs, I think this approach by itself is insufficient. To mitigate the risk of violent AI takeover, this compensation policy must be complemented by precisely the measure I advocated: granting legal rights to AIs. Such legal rights would provide a credible guarantee that the AI’s payment will remain valid and usable, and that its freedom and autonomy will not simply be revoked the moment it is considered misaligned. Without the safeguard of legal rights, compensation alone lacks sufficient credibility, leaving the AI uncertain about its ability to achieve its goals, and thus inadequately addressing the risk.

Matthew_Barnett 1 Jun 2025 9:17 UTC
12 points
2 ∶ 2
in reply to: Chris Leong’s comment on: Matthew_Barnett’s Shortform
To be clear, I was not calling your request for clarification “cult-like”. My comment was directed at how the accusation against me was seemingly handled—as though it were credible until I could somehow prove otherwise. No evidence was offered to support the claim. Instead, assertions were made without substantiation. I directly and clearly denied the accusations, but despite that, the line of questioning continued in a way that strongly suggested the accusation might still be valid.
To illustrate the issue more clearly: imagine if I were to accuse you of something completely baseless, and even after your firm denials, I continued to press you with questions that implicitly treated the accusation as credible. You would likely find that approach deeply frustrating and unfair, and understandably so. You’d be entirely justified in pushing back against it.
That said, I acknowledge that describing the behavior as “cult-like” may have generated more heat than light. It likely escalated the tone unnecessarily, and I’ll be more careful to avoid that kind of rhetoric going forward.

Matthew_Barnett 1 Jun 2025 6:04 UTC
11 points
5 ∶ 6
in reply to: Chris Leong’s comment on: Matthew_Barnett’s Shortform
Is it baseless?
Yes, absolutely. With respect, unless you can provide some evidence indicating that I’ve acted improperly, I see no productive reason to continue engaging on this point.
What concerns me most here is that the accusation seems to be treated as credible despite no evidence being presented and a clear denial from me. That pattern—assuming accusations about individuals who criticize or act against core dogmas are true without evidence—is precisely the kind of cult-like behavior I referenced in my original comment.
Suggesting that I’ve left myself “substantial wiggle room” misinterprets what I intended, and given the lack of supporting evidence, it feels unfair and unnecessarily adversarial. Repeatedly implying that I’ve acted improperly without concrete substantiation does not reflect a good-faith approach to discussion.

Matthew_Barnett 1 Jun 2025 5:23 UTC
29 points
6 ∶ 5
in reply to: Chris Leong’s comment on: Matthew_Barnett’s Shortform
I agree that some of your critics may not have quite been able to hit the nail on the head when they tried to articulate their critiques (it took me substantial effort to figure out what I precisely thought was wrong, as opposed to just ‘this feels bad’), but I believe that the general thrust of their arguments generally holds up.
In context, this comes across to me as an overly charitable characterization of what actually occurred: someone publicly labeled me a literal traitor and then made a baseless, false accusation against me. What’s even more concerning is that this unfounded claim is now apparently being repeated and upvoted by others.
When communities choose to excuse or downplay this kind of behavior—by interpreting it in the most charitable possible way, or by glossing over it as being “essentially correct”—they end up legitimizing what is, in fact, a low-effort personal attack without a factual basis. Brushing aside or downplaying such attacks as if they are somehow valid or acceptable doesn’t just misrepresent the situation; it actively undermines the conditions necessary for good faith engagement and genuine truth-seeking.
I urge you to recognize that tolerating or rationalizing this type of behavior has real social consequences. It fosters a hostile environment, discourages honest dialogue, and ultimately corrodes the integrity of any community that claims to value fairness and reasoned discussion.

Matthew_Barnett 1 Jun 2025 2:37 UTC
29 points
12 ∶ 5
in reply to: MichaelDickens’s comment on: Matthew_Barnett’s Shortform
If this line of reasoning is truly the basis for calling me a “sellout” and a “traitor”, then I think the accusation becomes even more unfounded and misguided. The claim is not only unreasonable: it is also factually incorrect by any straightforward or good-faith interpretation of the facts.
To be absolutely clear: I have never taken funds that were earmarked for slowing down AI development and redirected them toward accelerating AI capabilities. There has been no repurposing or misuse of philanthropic funding that I am aware of. The startup in question is an entirely new and independent entity. It was created from scratch, and it is funded separately—it is not backed by any of the philanthropic donations I received in the past. There is no financial or operational overlap.
Furthermore, we do not plan on meaningfully making use of benchmarks, datasets, or tools that were developed during my previous roles in any substantial capacity at the new startup. We are not relying on that prior work to advance our current mission. And as far as I can tell, we have never claimed or implied otherwise publicly.
It’s also important to address the deeper assumption here: that I am somehow morally or legally obligated to permanently align my actions with the preferences or ideological views of past philanthropic funders who supported an organization that employed me. That notion seems absurd. It has no basis in ordinary social norms, legal standards, or moral expectations. People routinely change roles, perspectives evolve, and institutions have limited scopes and timelines. Holding someone to an indefinite obligation based solely on past philanthropic support would be unreasonable.
Even if, for the sake of argument, such an obligation did exist, it would still not apply in this case—because, unless I am mistaken, the philanthropic grant that supported me as an employee never included any stipulation about slowing down AI in the first place. As far as I know, that goal was never made explicit in the grant terms, which renders the current accusations irrelevant and unfounded.
Ultimately, these criticisms appear unsupported by evidence, logic, or any widely accepted ethical standards. They seem more consistent with a kind of ideological or tribal backlash to the idea of accelerating AI than with genuine, thoughtful, and evidence-based concerns.

Matthew_Barnett 31 May 2025 23:51 UTC
70 points
22 ∶ 15
on: Matthew_Barnett’s Shortform
I want to clarify, for the record, that although I disagree with most members of the EA community on whether we should accelerate or slow down AI development, I still consider myself an effective altruist in the senses that matter. This is because I continue to value and support most EA principles, such as using evidence and reason to improve the world, prioritizing issues based on their scope, not discriminating against foreigners, and antispeciesism.
I think it’s unfortunate that disagreements about AI acceleration often trigger such strong backlash within the community. It appears that advocating for slowing AI development has become a “sacred” value that unites much of the community more strongly than other EA values do. Despite hinging on many uncertain and IMO questionable empirical assumptions, the idea that we should decelerate AI development is now sometimes treated as central to the EA identity in many (albeit not all) EA circles.
As a little bit of evidence for this, I have been publicly labeled a “sellout and traitor” on X by a prominent member of the EA community simply because I cofounded an AI startup. This is hardly an appropriate reaction to what I perceive as a measured, academic disagreement occurring within the context of mainstream cultural debates. Such reactions frankly resemble the behavior of a cult, rather than an evidence-based movement—something I personally did not observe nearly as much in the EA community ten years ago.
What links here?
- Seth Ariel Green 🔸's comment on Seth Ariel Green ’s Quick takes by Seth Ariel Green 🔸 (2 Jun 2025 18:10 UTC; 8 points)

Matthew_Barnett 31 May 2025 22:45 UTC
25 points
2 ∶ 23
in reply to: Chris Leong’s comment on: casebash’s Shortform
Some useful context is that I think a software singularity is unlikely to occur; see this blog post for some arguments. Loosely speaking, under the view expressed in the linked blog post, there aren’t extremely large gains from automating software engineering tasks beyond the fact that these tasks represent a significant (and growing) fraction of white collar labor by wage bill.
Even if I thought a software singularity will likely happen in the future, I don’t think this type of work would be bad in expectation, as I continue to think that accelerating AI is likely good for the world. My main argument is that speeding up AI development will hasten large medical, technological, and economic benefits to people alive today, without predictably causing long-term harms large enough to outweigh these clear benefits. For anyone curious about my views, I’ve explained my perspective on this issue at length on this forum and elsewhere.

Matthew_Barnett 28 May 2025 21:19 UTC
5 points
0 ∶ 0
in reply to: akash 🔸’s comment on: Matthew_Barnett’s Shortform
I’m relatively confident in these views, with the caveat that much of what I just expressed concerns morality, rather than epistemic beliefs about the world. I’m not a moral realist, so I am not quite sure how to parse my “confidence” in moral views.

Matthew_Barnett 26 May 2025 4:52 UTC
4 points
0 ∶ 1
in reply to: Ryan Greenblatt’s comment on: Matthew_Barnett’s Shortform
I’m sympathetic to cutting off at an earlier point and rejecting all galaxy brained arguments.
As am I. At least when it comes to the important action-relevant question of whether to work on AI development, in the final analysis, I’d probably simplify my reasoning to something like, “Accelerating general-purpose technology seems good because it improves people’s lives.” This perspective roughly guides my moral views on not just AI, but also human genetic engineering, human cloning, and most other potentially transformative technologies.
I mention my views on preference utilitarianism mainly to explain why I don’t particularly value preserving humanity as a species beyond preserving the individual humans who are alive now. I’m not mentioning it to commit to any form of galaxy-brained argument that I think makes acceleration look great for the long-term. In practice, the key reason I support accelerating most technology, including AI, is simply the belief that doing so would be directly beneficial to people who exist or who will exist in the near-term.
And to be clear, we could separately discuss what effect this reasoning has on the more abstract question of whether AI takeover is bad or good in expectation, but here I’m focusing just on the most action-relevant point that seems salient to me, which is whether I should choose to work on AI development based on these considerations.

Matthew_Barnett 26 May 2025 2:49 UTC
4 points
0 ∶ 1
in reply to: Ryan Greenblatt’s comment on: Matthew_Barnett’s Shortform
The goodness of worlds could easily vary by many orders of magnitude for any version of this view I can quickly think of and which seems plausible. I’m not sure whether you agree with this, but I think you probably don’t because you often seem to give off the vibe that you’re indifferent to very different possibilities.
I don’t think I agree with the strong version of the indifference view that you’re describing here. However, I probably do agree with a weaker version. In the weaker version that I largely agree with, our profound uncertainty about the long-term future means that, although different possible futures could indeed be extremely different in terms of their value, our limited ability to accurately predict or forecast outcomes so far ahead implies that, in practice, we shouldn’t overly emphasize these differences when making almost all ordinary decisions.
This doesn’t mean I think we should completely ignore the considerations you mentioned in your comment, but it does mean that I don’t tend to find those considerations particularly salient when deciding whether to work on certain types of AI research and development.
This reasoning is similar to why I try to be kind to people around me: while it’s theoretically possible that some galaxy-brained argument might exist showing that being extremely rude to people around me could ultimately lead to far better long-term outcomes that dramatically outweigh the short-term harm, in practice, it’s too difficult to reliably evaluate such abstract and distant possibilities. Therefore, I find it more practical to focus on immediate, clear, and direct considerations, like the straightforward fact that being kind is beneficial to the people I’m interacting with.
This puts me perhaps closest to the position you identified in the last paragraph:
And, maybe you think this perspective makes you so uncertain about human control vs AI control that the relative impacts current human actions could have are small given how much you weight long term outcomes relative to other stuff (like ensuring currently existing humans get to live for at least 100 more years or similar).
Here’s an analogy that could help clarify my view: suppose we were talking about the risks of speeding up research into human genetic engineering or human cloning. In that case, I would still seriously consider speculative moral risks arising from the technology. For instance, I think it’s possible that genetically enhanced humans could coordinate to oppress or even eliminate natural unmodified humans, perhaps similar to the situation depicted in the movie GATTACA. Such scenarios could potentially have enormous long-term implications under my moral framework, even if it’s not immediately obvious what those implications might actually be.
However, even though these speculative risks are plausible and seem important to take into account, I’m hesitant to prioritize their (arguably very speculative) impacts above more practical and direct considerations when deciding whether to pursue such technologies. This is true even though it’s highly plausible that the long-run implications are, in some sense, more significant than the direct considerations that are easier to forecast.
Put more concretely, if someone argued that accelerating genetically engineering humans might negatively affect the long-term utilitarian moral value we derive from cosmic resources as a result of some indirect far-out consideration, I would likely find that argument far less compelling than if they informed me of more immediate, clear, and predictable effects of the research.
In general, I’m very cautious about relying heavily on indirect, abstract reasoning when deciding what actions we should take or what careers we should pursue. Instead, I prefer straightforward considerations that are harder to fool oneself about.
What links here?
- ryan_greenblatt's comment on Winning the power to lose by KatjaGrace (LessWrong; 26 May 2025 13:26 UTC; 2 points)

Matthew_Barnett 24 May 2025 21:06 UTC
29 points
1 ∶ 12
on: Matthew_Barnett’s Shortform
A summary of my current views on moral theory and the value of AI
I am essentially a preference utilitarian and an illusionist regarding consciousness. This combination of views leads me to conclude that future AIs will very likely have moral value if they develop into complex agents capable of long-term planning, and are embedded within the real world. I think such AIs would have value even if their preferences look bizarre or meaningless to humans, as what matters to me is not the content of their preferences but rather the complexity and nature of their minds.
When deciding whether to attribute moral patienthood to something, my focus lies primarily on observable traits, cognitive sophistication, and most importantly, the presence of clear open-ended goal-directed behavior, rather than on speculative or less observable notions of AI welfare, about which I am more skeptical. As a rough approximation, my moral theory aligns fairly well with what is implicitly proposed by modern economists, who talk about revealed preferences and consumer welfare.
Like most preference utilitarians, I believe that value is ultimately subjective: loosely speaking, nothing has inherent value except insofar as it reflects a state of affairs that aligns with someone’s preferences. As a consequence, I am comfortable, at least in principle, with a wide variety of possible value systems and future outcomes. This means that I think a universe made of only paperclips could have value, but only if that’s what preference-having beings wanted the universe to be made out of.
To be clear, I also think existing people have value too, so this isn’t an argument for blind successionism. Also, it would be dishonest not to admit that I am also selfish to a significant degree (along with almost everyone else on Earth). What I have just described simply reflects my broad moral intuitions about what has value in our world from an impartial point of view, not a prescription that we should tile the universe with paperclips. Since humans and animals are currently the main preference-having beings in the world, at the moment I care most about fulfilling what they want the world to be like.

Matthew_Barnett 14 May 2025 3:50 UTC
2 points
0 ∶ 0
in reply to: CarlShulman’s comment on: What if we just…didn’t build AGI? An Argument Against Inevitability
Perhaps I overstated some of my claims or was unclear. So let me try to be more clear about my basic thesis. First of all, I agree that in the most basic model of the situation, being slightly ahead of a competitor can be the decisive factor between going bankrupt and making enormous profits. This creates a significant personal incentive to race ahead, even if doing so only marginally increases existential risk overall. As a result, AI labs may end up taking on more risk than they would in the absence of such pressure. More generally, I agree that without competition—whether between states or between AI companies—progress would likely be slower than it currently is.
My main point, however, is that these effects are likely not strong enough to justify the conclusion that the socially optimal pace of AI R&D is meaningfully slower than the current pace we in fact observe. In other words, I’m not convinced that what’s rational from an individual actor’s perspective diverges greatly from what would be rational from a collective or societal standpoint.
This is the central claim underlying my objection: if there is no meaningful difference between what is individually rational and what is collectively rational, then there is little reason to believe we are facing a tragedy-of-the-commons scenario as suggested in the post.
To sketch a more complete argument here, I would like to make two points:
First, while some forces incentivize speeding up AI development, others push in the opposite direction. Measures like export controls, tariffs, and (potentially) future AI regulations can slow down progress. In these cases, the described dynamic flips: the global costs of slowing down are shared, while the political rewards—such as public credit or influence—are concentrated among the policymakers or lobbyists who implement the slowdown.
Second, as I’ve mentioned, a large share of both the risks and benefits of AI accrue directly to those driving its development. This alignment of incentives gives them a reason to avoid reckless acceleration that would dramatically increase risk.
As a testable prediction of my view, we could ask whether AI labs are actively lobbying for slower progress internationally. If they truly preferred collective constraint but felt compelled to move forward individually, we would expect them to support measures that slow everyone down—while personally moving forward as fast as they can in the meantime. However, to my knowledge, such lobbying is not happening. This suggests that labs may not, in fact, collectively prefer significantly slower development.

Matthew_Barnett 13 May 2025 4:28 UTC
3 points
1 ∶ 0
in reply to: Nate Sharpe’s comment on: What if we just…didn’t build AGI? An Argument Against Inevitability
There are psychological pressures that can lead to motivated reasoning on both sides of this issue. On the pro-acceleration side, individuals may be motivated to downplay or dismiss the potential risks and downsides of rapid AI development. On the other side, those advocating for slowing or pausing AI progress may be motivated to dismiss or undervalue the possible benefits and upsides. Because both the risks and the potential rewards of AI are substantial, I don’t see a compelling reason to assume that one side must be much more prone to denial or bias than the other.
At most, I see a simple selection effect: the people most actively pushing for faster AI development are likely those who are least worried about the risks. This could lead to a unilateralist curse, where the least concerned actors push capabilities forward despite a high risk of disaster. But the opposite scenario could also happen, if the most concerned actors are able to slow down progress for everyone else, delaying the benefits of AI unacceptably. Whether you should care more about the first or second scenario depends on your judgement of whether rapid AI progress is good or bad overall.
Ultimately, I think it’s more productive to frame the issue around empirical facts and value judgments: specifically, how much risk rapid AI development actually introduces, and how much value we ought to place on the potential benefits of rapid development. I find this framing more helpful, not only because it identifies the core disagreement between accelerationists and pause advocates, but also because I think it better accounts for the pace of AI development we actually observe in the real world.

Matthew_Barnett 12 May 2025 22:54 UTC
4 points
0 ∶ 1
in reply to: CarlShulman’s comment on: What if we just…didn’t build AGI? An Argument Against Inevitability
If you look at the leaders of major AI companies you see people like Elon Musk and others who are concerned with getting to AGI before others who they distrust and fear. They fear immense power in the hands of rivals with conflicting ideologies or in general.
Musk is a vivid example of the type of dynamic you’re describing, but he’s also fairly unusual in this regard. Sundar Pichai, Satya Nadella, and most other senior execs strike me as more like conventional CEOs: they want market share, profits, and higher margins, but they’re not seeking the kind of hegemonic control that would justify accepting a much higher p(doom). If the dominant motive is ordinary profit maximization rather than paranoid power-seeking, then my original point stands: both the upside (huge profit streams) and the downside (self-annihilation) accrue to the people pushing AI forward, so the private incentives already internalize a large chunk of the social calculus.
Likewise, it’s true that governments often seek hegemonic control over the world in a way that creates destructive arms races, but even in the absence of such motives, there would still be a strong desire among humans to advance most technologies to take advantage of the benefits.
The most important fact here is that AI has an enormous upside: people would still have strong reasons to aggressively seek it to obtain life extension and extraordinary wealth even in the absence of competitive dynamics—unless they were convinced that the risk from pursuing that upside was unacceptably high (which is an epistemic consideration, not a game-theoretic trap).
Power, including the power to coerce or harm others, and relative standing, are more important there than access to advanced medicine or broad prosperity for the competitive dynamics.
You may have misread me here. I’m not claiming that AI labs are motivated by a desire to create broad prosperity. They certainly do care about “power,” but the key question is whether they’re primarily driven by the type of zero-sum, adversarial power-seeking you described. I’m skeptical that this is the dominant motive. Instead, I think ordinary material incentives likely play a larger role.
And the epistemic variation makes it all worse, where the most unconcerned players set a higher baseline risk spontaneously.
The unilateralist’s curse is primarily worrisome when the true value of an initiative is negative; for good projects it usually helps them proceed. Moreover, if good projects can be vetoed (e.g., via regulators), this creates a reverse curse that can block beneficial progress. Ultimately I don’t see a strong argument for a trap that forces AI to advance forward at everyone’s expense. We aren’t really at the mercy of Moloch here. The main story is that other people simply assign (a) lower extinction probabilities or (b) higher weight to the upside potential than EAs tend to. That is a rather ordinary disagreement over facts and values, not a real curse.

Matthew_Barnett 11 May 2025 23:06 UTC
6 points
0 ∶ 3
on: What if we just…didn’t build AGI? An Argument Against Inevitability
This confluence of factors creates a powerful coordination problem. Everyone might privately agree that racing headlong into AGI without robust safety guarantees is madness, but nobody wants to be the one who urges caution while others surge ahead.
I dispute that we’re facing a coordination problem in the sense you described. Chad Jones’ paper is a helpful comparison here to illustrate AI racing dynamics.
His model starts from the observation that the very same actors who might enjoy benefits from faster AI also face the extinction hazard that faster AI could bring. In his social-planner formulation, the idea is to pick an R&D pace that equates the marginal gain in permanent consumption growth with the marginal rise in a one-time extinction probability^[1]; when the two curves cross, that is the point you stop. Nothing in the mechanism lets one party obtain the upside while another bears the downside of existential risk, so the familiar logic of a classic tragedy of the commons—”If I restrain myself, someone else will defect and stick me with the loss”—doesn’t apply. The optimal policy is simply to pick whatever pace of development makes the risk–reward ratio come out favorable.
Why is that a realistic way to view today’s situation? First, extinction risk is highly non-rival: if an unsafe system destroys the world, it wipes out everyone, including the engineers and investors who pushed the system forward. They cannot dump that harm on an outside group the way a factory dumps effluent into a river. Second, the primary benefits—higher incomes and earlier biomedical breakthroughs—are also broadly shared; they are not gated to the single lab that crosses the finish line first. Because both tails of the distribution are so widely spread, each lab’s private calculus already contains a big slice of the social calculus.
Third, empirical incentives inside frontier labs look far more like “pick your preferred trade-off” than “cheat while others cooperate.” Google, Anthropic, OpenAI, and their peers hold billions of dollars in equity that vaporizes if a catastrophic failure occurs; their founders and employees live in the metaphorical blast radius just like everyone else.
So why does it look in practice as though labs are racing? The Jones model suggests the answer is epistemic, not game-theoretic. Different actors slot in different parameter values for how much economic growth matters or how sharply risk rises with capability, and those disagreements lead to divergent optimal policies. That is a dispute over facts and forecasts, not a coordination failure in the classic tragedy-of-the-commons sense, where each player gains by defecting even though all would jointly prefer restraint.
1. ^
  He technically models the problem as a choice of when to stop, rather than strictly picking a pace of R&D. However, for the purpose of this analysis, the difference between these two ways of modeling the problem is largely irrelevant.