People seem surprised and bewildered when AI folks defect away from AI safety towards capabilities. People trust that as AI companies grow, those gaining power and money from shares will not be adversely influenced by that power and money.
fwiw I donât actually know many examples of this, and the ones I hear cited often seem uncompelling to me. E.g.:
Mechanizeâs founders donât seem like EAs who got corrupted by AI money but rather EAs with unusual moral and empirical views which result in them thinking that the best course of action is the exact opposite of what most EAs think
Hmm, I think if smart EA/âRat types get âcorruptedâ in general, theyâll present as thoughtful people with reasons that are hard to dismiss quickly when questioned by EAs. I get the vague sense that your evidence bar for âcorruptionâ is going to be too high to be useful in most worlds where thereâs a lot of corruption.
(thatâs not to say that EAs/âRats/âetc. who join labs/âstart wildly profitable companies speeding up AI progress have been âcorruptedââI just think if they were, it would present pretty similarly to how it has done and itâs hard to get lots of easy to share evidence)
I donât disagree, itâs more that this feels a bit like privileging the hypothesis? I think the modal reason Iâve heard from people who did capabilities work and now regret it is something like âI knew I was misaligned with leadership but I thought leaving would be even worse.â
If, for some reason, Anthropic asked me how to prevent people from regretting working for them, I would focus much more on âhave a thing for people to do once they realize their colleague is corruptâ instead of âhave a more nuanced way of telling if their colleague is corrupt.â
Thanks! I only know a handful of people in this category, but for what itâs worth, it again feels like people who were predisposed to thinking that working on pretraining would be okay rather than them being âcorrupted.â
E.g., I recently talked to someone who told me that their main takeaway from a safety fellowship was realizing that they didnât fit in because they actually werenât worried about existential risk in the same way that the other attendees were.
In the AI frame I remember reading about 3 situations on the forum (one of which was mechanise). I also consider this to a lesser extent around animal sentience arguments from those deep in the animal welfare world.
the most pertinent example for me would be Anthropicâs top leadership ditching their solid safety plan with clear red lines for a vague and practically useless one, and the justifications by @Holden Karnofsky (whoâs wife owns the company) which felt strange to me. He usually makes such compelling arguments, and that one seemed less so. Iâm not the most rational person, but Habrykaâs arguments against the safety plan change on less wrong were compelling to me.
Iâm not saying we shouldnât argue the object point, but just that we should consider peopleâs incentives and weight the opinions of those with power/âmoney conflicts of interest somewhat less heavily than those without.
I also consider this to a lesser extent around animal sentience arguments
+1, âit would be very easy for me to ignore the possibility that nematodes might be consciousâ is a major impediment to thinking clearly about animal sentience (including for me).
Dude itâs basically the whole of Anthropic! And the fact that EAs (mostly) canât see this is worrying. OP worries about Anthropicâs money being a corrupting influence, but their whole company is far far worse than FTX, because of the existential risk itâs subjecting the entire world to.
fwiw I donât actually know many examples of this, and the ones I hear cited often seem uncompelling to me. E.g.:
Greg Brockman doesnât seem like a true believer in OpenAIâs nonprofit mission who got corrupted but rather someone who went into it wanting to make a profit
Mechanizeâs founders donât seem like EAs who got corrupted by AI money but rather EAs with unusual moral and empirical views which result in them thinking that the best course of action is the exact opposite of what most EAs think
(Counterexamples appreciated, though!)
Hmm, I think if smart EA/âRat types get âcorruptedâ in general, theyâll present as thoughtful people with reasons that are hard to dismiss quickly when questioned by EAs. I get the vague sense that your evidence bar for âcorruptionâ is going to be too high to be useful in most worlds where thereâs a lot of corruption.
(thatâs not to say that EAs/âRats/âetc. who join labs/âstart wildly profitable companies speeding up AI progress have been âcorruptedââI just think if they were, it would present pretty similarly to how it has done and itâs hard to get lots of easy to share evidence)
I donât disagree, itâs more that this feels a bit like privileging the hypothesis? I think the modal reason Iâve heard from people who did capabilities work and now regret it is something like âI knew I was misaligned with leadership but I thought leaving would be even worse.â
If, for some reason, Anthropic asked me how to prevent people from regretting working for them, I would focus much more on âhave a thing for people to do once they realize their colleague is corruptâ instead of âhave a more nuanced way of telling if their colleague is corrupt.â
I think he would include a lot of people who work at Anthropic, for example, on pre-training, some of whom went through MATS or something.
Thanks! I only know a handful of people in this category, but for what itâs worth, it again feels like people who were predisposed to thinking that working on pretraining would be okay rather than them being âcorrupted.â
E.g., I recently talked to someone who told me that their main takeaway from a safety fellowship was realizing that they didnât fit in because they actually werenât worried about existential risk in the same way that the other attendees were.
In the AI frame I remember reading about 3 situations on the forum (one of which was mechanise). I also consider this to a lesser extent around animal sentience arguments from those deep in the animal welfare world.
the most pertinent example for me would be Anthropicâs top leadership ditching their solid safety plan with clear red lines for a vague and practically useless one, and the justifications by @Holden Karnofsky (whoâs wife owns the company) which felt strange to me. He usually makes such compelling arguments, and that one seemed less so. Iâm not the most rational person, but Habrykaâs arguments against the safety plan change on less wrong were compelling to me.
Iâm not saying we shouldnât argue the object point, but just that we should consider peopleâs incentives and weight the opinions of those with power/âmoney conflicts of interest somewhat less heavily than those without.
+1, âit would be very easy for me to ignore the possibility that nematodes might be consciousâ is a major impediment to thinking clearly about animal sentience (including for me).
Dude itâs basically the whole of Anthropic! And the fact that EAs (mostly) canât see this is worrying. OP worries about Anthropicâs money being a corrupting influence, but their whole company is far far worse than FTX, because of the existential risk itâs subjecting the entire world to.