evhub

Karma: 1,777

Evan Hubinger (he/him/his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic’s positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I’m joining Anthropic”

Selected work:

evhub Nov 29, 2022, 1:57 AM
2 points
1 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

in a form that implies strong moral censure to anyone who argues the opposite

I don’t think this and didn’t say it. If you have any quotes from the post that you think say this, I’d be happy to edit it to be more clear, but from my perspective it feels like you’re inventing a straw man to be mad at rather than actually engaging with what I said.

You also said that we should do so independently of the facts of the FTX case, which feels weird to me, because I sure think the details of the case are very relevant to what ethical lines I want to draw in the future.

I think that, for the most part, you should be drawing your ethical boundaries in a way that is logically prior to learning about these sorts of facts. Otherwise it’s very hard to cooperate with you, for example.

The section you quote here reads to me as a rhetorical question.

It isn’t intended as a rhetorical question. I am being quite sincere there, though rereading it, I see how you could be confused. I just edited that section to the following:

In that spirit, I think it’s worth us carefully confronting the moral question here: is fraud in the service of raising money for effective causes wrong? This is a thorny moral question that is worth nuanced discussion, and I don’t claim to have all the answers.

Nevertheless, I think fraud in the service of effective altruism is basically unacceptable—and that’s as someone who is about as hardcore of a total utilitarian as it is possible to be.

evhub Nov 29, 2022, 1:30 AM
2 points
1 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

think what we owe the world is both reflection about where our actual lines are (and how the ones that we did indeed have might have contributed to this situation), as well as honest and precise statements about what kinds of things we might actually consider doing in the future.

I actually state in the post that I agree with this. From my post:

In that spirit, I think it’s worth us carefully confronting the moral question here: is fraud in the service of raising money for effective causes wrong?

Perhaps that is not as clear as you would like, but like I said it was a short post. And that sentence is pretty clearly saying that I think it’s worthwhile for us to try to carefully confront the moral question of what is okay and what is not—which the post then attempts to start the discussion on by providing some of what I think.

evhub Nov 29, 2022, 1:26 AM
4 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

I mean, indeed the combination of “fraud is a vague, poorly defined category” together with a strong condemnation of said “fraud”, without much explicit guidance on what kind of thing you are talking about, is what I am objecting to in your post.

I guess I don’t really think this is a problem. We’re perfectly comfortable with statements like “murder is wrong” while also understanding that “but killing Hitler would be okay.” I don’t mean to say that talking about the edge cases isn’t ever helpful—in fact, I think it can be quite useful to try to be clear about what’s happening on the edges in certain cases, since it can sometimes be quite relevant. But I don’t see that as a reason to object to someone saying “murder is wrong.”

To be clear, if your criticism is “the post doesn’t say much beyond the obvious,” I think that’s basically correct—it was a short post and wasn’t intended to accomplish much more than basic common knowledge building around this sort of fraud being bad even when done with ostensibly altruistic motivations. And I agree that further posts discussing more clearly how to think about various edge cases would be a valuable contribution to the ongoing discussion (though I don’t personally plan to write such a post because I think I have more valuable things to do with my time).

However, if your criticism is “your post says edge case B is bad but edge case B is actually good,” I think that’s a pretty silly criticism that seems like it just doesn’t really understand or engage with the inherent fuzziness of conceptual categories.

evhub Nov 29, 2022, 12:09 AM
4 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
That sounds like a fully generalized defense against all counterarguments, and I don’t think is how discourse usually works.

It’s clearly not fully general because it only applies to excluding edge cases that don’t satisfy the reasons I explicitly state in the post.

If you say “proposition A is true about category B, for reasons X, Y, Z” and someone else is like “but here is an argument C for why proposition A is not true about category B”, then of course you don’t get to be like, “oh, well, I of course meant the subset of category B where argument C doesn’t hold”.

Sure, but that’s not what happened. There are some pretty big disanalogies between the scenarios you’re describing and what actually happened:
1. The question is about what activities belong to the vague, poorly defined category of “fraud,” not about the truth of some clearly stated “proposition A.” When someone says “category A has property X,” for any vague category A—which is basically all categories of things—there will always be edge cases where it’s not clear.
2. You’re not presenting some new “argument C” for why fraud is good actually. You’re just saying there are edge cases where my arguments don’t apply. Which is obviously correct! But just because there are always edge cases for all categories—it’s effectively just an objection to the use of categories at all.
3. Furthermore, in this case, I pretty clearly laid out exactly why I thought fraud was bad. Which gives you a lot of evidence to figure out what class of things I was centrally pointing to when using “fraud” as a general category, and it’s pretty clear based on those reasons that the examples you’re providing don’t fit into that category.

evhub Nov 28, 2022, 10:55 PM
4 points
2 ∶ 2
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
Adding on to my other reply: from my perspective, I think that if I say “category A is bad because X, Y, Z” and you’re like “but edge case B!” and edge case B doesn’t satisfy X, Y, or Z, then clearly I’m not including it in category A.

evhub Nov 28, 2022, 10:46 PM
12 points
3 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation
I think you’re wrong about how most people would interpret the post. I predict that if readers were polled on whether or not the post agreed with “lying to Nazis is wrong” the results would be heavily in favor of “no, the post does not agree with that.” If you actually had a poll that showed the opposite I would definitely update.

evhub Nov 28, 2022, 10:13 PM
10 points
2 ∶ 1
in reply to: Habryka [Deactivated]’s comment on: Rethink Priorities’ Leadership Statement on the FTX situation

My guess is most readers are more interested in the condemnation part though, given the overwhelming support that posts like this have received, which have basically no content besides condemnation (and IMO with even bigger problems on being inaccurate about where to draw ethical lines).

I think my post is quite clear about what sort of fraud I am talking about. If you look at the reasons that I give in my post for why fraud is wrong, they clearly don’t apply to any of examples of justifiable lying that you’ve provided here (lying to Nazis, doing the least fraudulent thing in a catch-22, lying by accident, etc.).

In particular, if we take the lying to Nazis example and see what the reasons I provide say:

When we, as humans, consider whether or not it makes sense to break the rules for our own benefit, we are running on corrupted hardware: we are very good at justifying to ourselves that seizing money and power for own benefit is really for the good of everyone. If I found myself in a situation where it seemed to me like seizing power for myself was net good, I would worry that in fact I was fooling myself—and even if I was pretty sure I wasn’t fooling myself, I would still worry that I was falling prey to the unilateralist’s curse if it wasn’t very clearly a good idea to others as well.

This clearly doesn’t apply to lying to Nazis, since it’s not a situation where money and power are being seized for oneself.

Additionally, if you’re familiar with decision theory, you’ll know that credibly pre-committing to follow certain principles—such as never engaging in fraud—is extremely advantageous, as it makes clear to other agents that you are a trustworthy actor who can be relied upon. In my opinion, I think such strategies of credible pre-commitments are extremely important for cooperation and coordination.

I think the fact that you would lie to a Nazi makes you more trustworthy for coordination and cooperation, not less.

Furthermore, I will point out, if FTX did engage in fraud here, it was clearly in fact not a good idea in this case: I think the lasting consequences to EA—and the damage caused by FTX to all of their customers and employees—will likely outweigh the altruistic funding already provided by FTX to effective causes.

And in the case of lying to Nazis, the consequences are clearly positive.

evhub Nov 15, 2022, 9:01 PM
17 points
5 ∶ 0
in reply to: David M’s comment on: We must be very clear: fraud in the service of effective altruism is unacceptable
The portion you quote is included at the very end as an additional point about how even if you don’t buy my primary arguments that fraud in general is bad, in this case it was empirically bad. It is not my primary reason for thinking fraud is bad here, and I think the post is quite clear about that.

evhub Nov 12, 2022, 6:45 AM
31 points
14 ∶ 0
on: IMPCO, don’t injure yourself by returning FTXFF money for services you already provided
I agree with this post from a moral perspective, though one thing it does not touch on is the legal question. My guess is that, in the same way that a court probably wouldn’t try to claw back money from a utility company/janitor/etc. that FTXFF beneficiaries are also probably safe, but IANAL so maybe somebody who knows more there could comment.

evhub Nov 11, 2022, 8:40 PM
26 points
9 ∶ 0
in reply to: David M’s comment on: We must be very clear: fraud in the service of effective altruism is unacceptable
That’s a pretty wild misreading of my post. The main thesis of the post is that we should unequivocally condemn fraud. I do not think that the reason that fraud is bad is because of PR reasons, nor do I say that in the post—if you read what I wrote about why I think it’s wrong to commit fraud at the end, what I say is that you should have a general policy against ever committing fraud, regardless of the PR consequences one way or another.

evhub Apr 19, 2022, 10:23 PM
3 points
0 ∶ 0
in reply to: AdamGleave’s comment on: Free-spending EA might be a big problem for optics and epistemics

Anecdotally it seems like many of the world’s most successful companies do try to make frugality part of their culture, e.g. it’s one of Amazon’s leadership principles.

Google, by contrast, is notoriously the opposite—for example emphasizing just trying lots of crazy, big, ambitious, expensive bets (e.g. their “10x” philosophy). Also see how Google talked about frugality in 2011.

evhub Apr 13, 2022, 9:22 PM
202 points
1 ∶ 0
on: Free-spending EA might be a big problem for optics and epistemics
One thing that bugged me when I first got involved with EA was the extent to which the community seemed hesitant to spend lots of money on stuff like retreats, student groups, dinners, compensation, etc. despite the cost-benefit analysis seeming to favor doing so pretty strongly. I know that, from my perspective, I felt like this was some evidence that many EAs didn’t take their stated ideals as seriously as I had hoped—e.g. that many people might just be trying to act in the way that they think an altruistic person should rather than really carefully thinking through what an altruistic person should actually do.

This is in direct contrast to the point you make that spending money like this might make people think we take our ideals less seriously—at least in my experience, had I witnessed an EA community that was more willing to spend money on projects like this, I would have been more rather than less convinced that EA was the real deal. I don’t currently have any strong beliefs about which of these reactions is more likely/concerning, but I think it’s at least worth pointing out that there is definitely an effect in the opposite direction to the one that you point out as well.
What links here?
- Benjamin_Todd's comment on Free-spending EA might be a big problem for optics and epistemics by George Rosenfeld (Apr 15, 2022, 10:32 AM; 184 points)

evhub Oct 1, 2021, 2:11 AM
8 points
0 ∶ 0
in reply to: So-Low Growth’s comment on: You can talk to EA Funds before applying
Academic projects are definitely the sort of thing we fund all the time. I don’t know if the sort of research you’re doing is longtermist-related, but if you have an explanation of why you think your research would be valuable from a longtermist perspective, we’d love to hear it.

evhub Sep 30, 2021, 8:13 PM
10 points
0 ∶ 0
on: You can talk to EA Funds before applying
Since it was brought up to me, I also want to clarify that EA Funds can fund essentially anyone, including:
- people who have a separate job but want to spend extra time doing an EA project,
- people who don’t have a Bachelor’s degree or any other sort of academic credentials,
- kids who are in high school but are excited about EA and want to do something,
- fledgling organizations,
- etc.

evhub Jun 30, 2021, 8:49 AM
4 points
0 ∶ 0
in reply to: Anon-biosec-account’s comment on: You can now apply to EA Funds anytime! (LTFF & EAIF only)
I’m one of the grant evaluators for the LTFF and I don’t think I would have any qualms with funding a project 6-12 months in advance.

evhub Apr 24, 2021, 11:45 PM
13 points
0 ∶ 0
in reply to: Wei Dai’s comment on: Concerns with ACE’s Recent Behavior
To be clear, I agree with a lot of the points that you’re making—the point of sketching out that model was just to show the sort of thing I’m doing; I wasn’t actually trying to argue for a specific conclusion. The actual correct strategy for figuring out the right policy here, in my opinion, is to carefully weigh all the different considerations like the ones you’re mentioning, which—at the risk of crossing object and meta levels—I suspect to be difficult to do in a low-bandwidth online setting like this.

Maybe it’ll still be helpful to just give my take using this conversation as an example. In this situation, I expect that:
- My models here are complicated enough that I don’t expect to be able to convey them here to a point where you’d understand them without a lot of effort.
- I expect I could properly convey them in a more high-bandwidth conversation (e.g. offline, not text) with you, which I’d be willing to have with you if you wanted.
- To the extent that we try to do so online, I think there are systematic biases in the format which will lead to beliefs (of at least the readers) being systematically pushed in incorrect directions—as an example, I expect arguments/positions that use simple, universalizing arguments (e.g. Bayesian reasoning says we should do this, therefore we should do it) to lose out to arguments that involve summing up a bunch of pros and cons and then concluding that the result is above or below some threshold (which in my opinion is what most actual true arguments look like).

evhub Apr 24, 2021, 9:17 PM
0 points
0 ∶ 0
in reply to: Wei Dai’s comment on: Concerns with ACE’s Recent Behavior
I think you’re imagining that I’m doing something much more exotic here than I am. I’m basically just advocating for cooperating on what I see as a prisoner’s-dilemma-style game (I’m sure you can also cast it as a stag hunt or make some really complex game-theoretic model to capture all the nuances—I’m not trying to do that there; my point here is just to explain the sort of thing that I’m doing).

Consider:

A and B can each choose:
- public) publicly argue against the other
- private) privately discuss the right thing to do
And they each have utility functions such that
- A = public; B = private:
  - u_A = 3
  - u_B = 0
  - Why: A is able to argue publicly that A is better than B and therefore gets a bunch of resources, but this costs resources and overall some of their shared values are destroyed due to public argument not directing resources very effectively.
- A = private; B = public:
  - u_A = 0
  - u_B = 3
  - Why: ditto except the reverse.
- A = public; B = public:
  - u_A = 1
  - u_B = 1
  - Why: Both A and B argue publicly that they’re better than each other, which consumes a bunch of resources and leads to a suboptimal allocation.
- A = private; B = private:
  - u_A = 2
  - u_B = 2
  - Why: Neither A nor B argue publicly that they’re better than each other, not consuming as many resources and allowing for a better overall resource allocation.
Then, I’m saying that in this sort of situation you should play (private) rather than (public)—and that therefore we shouldn’t punish people for playing (private), since punishing people for playing (private) has the effect of forcing us to Nash and ensuring that people always play (public), destroying overall welfare.

evhub Apr 24, 2021, 7:38 PM
2 points
0 ∶ 0
in reply to: Wei Dai’s comment on: Concerns with ACE’s Recent Behavior

For example would you really not have thought worse of MIRI (Singularity Institute at the time) if it had labeled Holden Karnofsky’s public criticism “hostile” and refused to respond to it, citing that its time could be better spent elsewhere?

To be clear, I think that ACE calling the OP “hostile” is a pretty reasonable thing to judge them for. My objection is only to judging them for the part where they don’t want to respond any further. So as for the example, I definitely would have thought worse of MIRI if they had labeled Holden’s criticisms as “hostile”—but not just for not responding. Perhaps a better example here would be MIRI still not having responded to Paul’s arguments for slow takeoff—imo, I think Paul’s arguments should update you, but MIRI not having responded shouldn’t.

Would you update in a positive direction if an organization does effectively respond to public criticism?

I think you should update on all the object-level information that you have, but not update on the meta-level information coming from an inference like “because they chose not to say something here, that implies they don’t have anything good to say.”

Do you update on the existence of the criticism itself, before knowing whether or how the organization has chosen to respond?

Yes.

evhub Apr 22, 2021, 7:08 PM
5 points
0 ∶ 0
in reply to: Owen Cotton-Barratt’s comment on: Concerns with ACE’s Recent Behavior
That’s a great point; I agree with that.

evhub Apr 22, 2021, 7:07 AM
3 points
0 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: Concerns with ACE’s Recent Behavior
I disagree, obviously, though I suspect that little will be gained by hashing it out in more here. To be clear, I have certainly thought about this sort of issue in great detail as well.