evhub

Karma: 1,761

Evan Hubinger (he/him/his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic”

Selected work:

evhub Dec 21, 2022, 5:40 PM
11 points
3 ∶ 0
on: “How Vacuum Decay Would Destroy the Universe” (PBS Space Time, 2021)
At a sufficiently sophisticated technological level, vacuum decay actually becomes worthwhile, as it increases the total amount of available free energy. The problem is ensuring any sort of civilizational continuity before and after the vacuum decay—though, like any other physical process, vacuum decay shouldn’t destroy information, so theoretically if you understood the mechanics well enough you should be able to engineer whatever outcome you wanted on the other side.

evhub Dec 20, 2022, 11:10 PM
59 points
19 ∶ 0
on: A Case for Voluntary Abortion Reduction
In my opinion, I think the best solution here is incentivizing people to voluntarily have more children—e.g. child tax credits, maternity/paternity leave, etc. If you don’t think fetuses are moral patients, then the pro-natalist, longtermist, total utilitarian view doesn’t distinguish between having an abortion and just choosing not to have a child, so I don’t really see the reason to focus on abortion specifically in that case.
What links here?
- bruce's comment on A Case for Voluntary Abortion Reduction by Ariel Simnegar 🔸 (Dec 21, 2022, 4:32 PM; 6 points)
- Ariel Simnegar 🔸's comment on A Case for Voluntary Abortion Reduction by Ariel Simnegar 🔸 (Dec 24, 2022, 8:58 PM; 1 point)

Discovering Language Model Behaviors with Model-Written Evaluations

evhubDec 20, 2022, 8:09 PM

25 points

0 comments EA link

evhub Nov 29, 2022, 2:23 AM
2 points
1 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

I don’t really know what other purpose the “we must be very clear” here serves besides trying to indicate that you think it’s very important that EA projects a unified front here.

I am absolutely intending to communicate that I think it would be good for people to say that they think fraud is bad. But that doesn’t mean that I think we should condemn people who disagree regarding whether saying that is good or not. Rather, I think discussion about whether it’s a good idea for people to condemn fraud seems great to me, and my post was an attempt to provide my (short, abbreviated) take on that question.

evhub Nov 29, 2022, 1:57 AM
2 points
1 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

in a form that implies strong moral censure to anyone who argues the opposite

I don’t think this and didn’t say it. If you have any quotes from the post that you think say this, I’d be happy to edit it to be more clear, but from my perspective it feels like you’re inventing a straw man to be mad at rather than actually engaging with what I said.

You also said that we should do so independently of the facts of the FTX case, which feels weird to me, because I sure think the details of the case are very relevant to what ethical lines I want to draw in the future.

I think that, for the most part, you should be drawing your ethical boundaries in a way that is logically prior to learning about these sorts of facts. Otherwise it’s very hard to cooperate with you, for example.

The section you quote here reads to me as a rhetorical question.

It isn’t intended as a rhetorical question. I am being quite sincere there, though rereading it, I see how you could be confused. I just edited that section to the following:

In that spirit, I think it’s worth us carefully confronting the moral question here: is fraud in the service of raising money for effective causes wrong? This is a thorny moral question that is worth nuanced discussion, and I don’t claim to have all the answers.

Nevertheless, I think fraud in the service of effective altruism is basically unacceptable—and that’s as someone who is about as hardcore of a total utilitarian as it is possible to be.

evhub Nov 29, 2022, 1:30 AM
2 points
1 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

think what we owe the world is both reflection about where our actual lines are (and how the ones that we did indeed have might have contributed to this situation), as well as honest and precise statements about what kinds of things we might actually consider doing in the future.

I actually state in the post that I agree with this. From my post:

In that spirit, I think it’s worth us carefully confronting the moral question here: is fraud in the service of raising money for effective causes wrong?

Perhaps that is not as clear as you would like, but like I said it was a short post. And that sentence is pretty clearly saying that I think it’s worthwhile for us to try to carefully confront the moral question of what is okay and what is not—which the post then attempts to start the discussion on by providing some of what I think.

evhub Nov 29, 2022, 1:26 AM
4 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

I mean, indeed the combination of “fraud is a vague, poorly defined category” together with a strong condemnation of said “fraud”, without much explicit guidance on what kind of thing you are talking about, is what I am objecting to in your post.

I guess I don’t really think this is a problem. We’re perfectly comfortable with statements like “murder is wrong” while also understanding that “but killing Hitler would be okay.” I don’t mean to say that talking about the edge cases isn’t ever helpful—in fact, I think it can be quite useful to try to be clear about what’s happening on the edges in certain cases, since it can sometimes be quite relevant. But I don’t see that as a reason to object to someone saying “murder is wrong.”

To be clear, if your criticism is “the post doesn’t say much beyond the obvious,” I think that’s basically correct—it was a short post and wasn’t intended to accomplish much more than basic common knowledge building around this sort of fraud being bad even when done with ostensibly altruistic motivations. And I agree that further posts discussing more clearly how to think about various edge cases would be a valuable contribution to the ongoing discussion (though I don’t personally plan to write such a post because I think I have more valuable things to do with my time).

However, if your criticism is “your post says edge case B is bad but edge case B is actually good,” I think that’s a pretty silly criticism that seems like it just doesn’t really understand or engage with the inherent fuzziness of conceptual categories.

evhub Nov 29, 2022, 12:09 AM
4 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
That sounds like a fully generalized defense against all counterarguments, and I don’t think is how discourse usually works.

It’s clearly not fully general because it only applies to excluding edge cases that don’t satisfy the reasons I explicitly state in the post.

If you say “proposition A is true about category B, for reasons X, Y, Z” and someone else is like “but here is an argument C for why proposition A is not true about category B”, then of course you don’t get to be like, “oh, well, I of course meant the subset of category B where argument C doesn’t hold”.

Sure, but that’s not what happened. There are some pretty big disanalogies between the scenarios you’re describing and what actually happened:
1. The question is about what activities belong to the vague, poorly defined category of “fraud,” not about the truth of some clearly stated “proposition A.” When someone says “category A has property X,” for any vague category A—which is basically all categories of things—there will always be edge cases where it’s not clear.
2. You’re not presenting some new “argument C” for why fraud is good actually. You’re just saying there are edge cases where my arguments don’t apply. Which is obviously correct! But just because there are always edge cases for all categories—it’s effectively just an objection to the use of categories at all.
3. Furthermore, in this case, I pretty clearly laid out exactly why I thought fraud was bad. Which gives you a lot of evidence to figure out what class of things I was centrally pointing to when using “fraud” as a general category, and it’s pretty clear based on those reasons that the examples you’re providing don’t fit into that category.

evhub Nov 28, 2022, 10:55 PM
4 points
2 ∶ 2
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
Adding on to my other reply: from my perspective, I think that if I say “category A is bad because X, Y, Z” and you’re like “but edge case B!” and edge case B doesn’t satisfy X, Y, or Z, then clearly I’m not including it in category A.

evhub Nov 28, 2022, 10:46 PM
12 points
3 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
I think you’re wrong about how most people would interpret the post. I predict that if readers were polled on whether or not the post agreed with “lying to Nazis is wrong” the results would be heavily in favor of “no, the post does not agree with that.” If you actually had a poll that showed the opposite I would definitely update.

evhub Nov 28, 2022, 10:13 PM
10 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

My guess is most readers are more interested in the condemnation part though, given the overwhelming support that posts like this have received, which have basically no content besides condemnation (and IMO with even bigger problems on being inaccurate about where to draw ethical lines).

I think my post is quite clear about what sort of fraud I am talking about. If you look at the reasons that I give in my post for why fraud is wrong, they clearly don’t apply to any of examples of justifiable lying that you’ve provided here (lying to Nazis, doing the least fraudulent thing in a catch-22, lying by accident, etc.).

In particular, if we take the lying to Nazis example and see what the reasons I provide say:

When we, as humans, consider whether or not it makes sense to break the rules for our own benefit, we are running on corrupted hardware: we are very good at justifying to ourselves that seizing money and power for own benefit is really for the good of everyone. If I found myself in a situation where it seemed to me like seizing power for myself was net good, I would worry that in fact I was fooling myself—and even if I was pretty sure I wasn’t fooling myself, I would still worry that I was falling prey to the unilateralist’s curse if it wasn’t very clearly a good idea to others as well.

This clearly doesn’t apply to lying to Nazis, since it’s not a situation where money and power are being seized for oneself.

Additionally, if you’re familiar with decision theory, you’ll know that credibly pre-committing to follow certain principles—such as never engaging in fraud—is extremely advantageous, as it makes clear to other agents that you are a trustworthy actor who can be relied upon. In my opinion, I think such strategies of credible pre-commitments are extremely important for cooperation and coordination.

I think the fact that you would lie to a Nazi makes you more trustworthy for coordination and cooperation, not less.

Furthermore, I will point out, if FTX did engage in fraud here, it was clearly in fact not a good idea in this case: I think the lasting consequences to EA—and the damage caused by FTX to all of their customers and employees—will likely outweigh the altruistic funding already provided by FTX to effective causes.

And in the case of lying to Nazis, the consequences are clearly positive.

evhub Nov 15, 2022, 9:01 PM
17 points
5 ∶ 0
in reply to: David M’s comment on: We must be very clear: fraud in the service of effective altruism is unacceptable
The portion you quote is included at the very end as an additional point about how even if you don’t buy my primary arguments that fraud in general is bad, in this case it was empirically bad. It is not my primary reason for thinking fraud is bad here, and I think the post is quite clear about that.

evhub Nov 12, 2022, 6:45 AM
31 points
14 ∶ 0
on: IMPCO, don’t injure yourself by returning FTXFF money for services you already provided
I agree with this post from a moral perspective, though one thing it does not touch on is the legal question. My guess is that, in the same way that a court probably wouldn’t try to claw back money from a utility company/janitor/etc. that FTXFF beneficiaries are also probably safe, but IANAL so maybe somebody who knows more there could comment.

evhub Nov 11, 2022, 8:40 PM
26 points
9 ∶ 0
in reply to: David M’s comment on: We must be very clear: fraud in the service of effective altruism is unacceptable
That’s a pretty wild misreading of my post. The main thesis of the post is that we should unequivocally condemn fraud. I do not think that the reason that fraud is bad is because of PR reasons, nor do I say that in the post—if you read what I wrote about why I think it’s wrong to commit fraud at the end, what I say is that you should have a general policy against ever committing fraud, regardless of the PR consequences one way or another.

We must be very clear: fraud in the service of effective altruism is unacceptable

evhubNov 10, 2022, 11:31 PM

713 points

86 comments3 min readEA link

Long-Term Future Fund: December 2021 grant recommendations

abergalAug 18, 2022, 8:50 PM

68 points

19 comments15 min readEA link

evhub Apr 19, 2022, 10:23 PM
3 points
0 ∶ 0
in reply to: AdamGleave’s comment on: Free-spending EA might be a big problem for optics and epistemics

Anecdotally it seems like many of the world’s most successful companies do try to make frugality part of their culture, e.g. it’s one of Amazon’s leadership principles.

Google, by contrast, is notoriously the opposite—for example emphasizing just trying lots of crazy, big, ambitious, expensive bets (e.g. their “10x” philosophy). Also see how Google talked about frugality in 2011.

evhub Apr 13, 2022, 9:22 PM
202 points
1 ∶ 0
on: Free-spending EA might be a big problem for optics and epistemics
One thing that bugged me when I first got involved with EA was the extent to which the community seemed hesitant to spend lots of money on stuff like retreats, student groups, dinners, compensation, etc. despite the cost-benefit analysis seeming to favor doing so pretty strongly. I know that, from my perspective, I felt like this was some evidence that many EAs didn’t take their stated ideals as seriously as I had hoped—e.g. that many people might just be trying to act in the way that they think an altruistic person should rather than really carefully thinking through what an altruistic person should actually do.

This is in direct contrast to the point you make that spending money like this might make people think we take our ideals less seriously—at least in my experience, had I witnessed an EA community that was more willing to spend money on projects like this, I would have been more rather than less convinced that EA was the real deal. I don’t currently have any strong beliefs about which of these reactions is more likely/concerning, but I think it’s at least worth pointing out that there is definitely an effect in the opposite direction to the one that you point out as well.
What links here?
- Benjamin_Todd's comment on Free-spending EA might be a big problem for optics and epistemics by George Rosenfeld (Apr 15, 2022, 10:32 AM; 184 points)

Long-Term Future Fund: July 2021 grant recommendations

abergalJan 18, 2022, 8:49 AM

75 points

7 comments17 min readEA link

evhub Oct 1, 2021, 2:11 AM
8 points
0 ∶ 0
in reply to: So-Low Growth’s comment on: You can talk to EA Funds before applying
Academic projects are definitely the sort of thing we fund all the time. I don’t know if the sort of research you’re doing is longtermist-related, but if you have an explanation of why you think your research would be valuable from a longtermist perspective, we’d love to hear it.

evhub

Dis­cov­er­ing Lan­guage Model Be­hav­iors with Model-Writ­ten Evaluations

We must be very clear: fraud in the ser­vice of effec­tive al­tru­ism is unacceptable

Long-Term Fu­ture Fund: De­cem­ber 2021 grant recommendations

Long-Term Fu­ture Fund: July 2021 grant recommendations

Discovering Language Model Behaviors with Model-Written Evaluations

We must be very clear: fraud in the service of effective altruism is unacceptable

Long-Term Future Fund: December 2021 grant recommendations

Long-Term Future Fund: July 2021 grant recommendations