I think what does the FTX case provides some evidence for, is some fraction of smart EAs exposed to utilitarianism being prone to attempt to rely on the explicit act utilitarianism, despite the warnings.
I think part of the story here is the a weird status dynamic where... 1. I would basically trust some people to try the explicit direct utilitarian thing: eg I think it is fine for Derek Parfit or Toby Ord. 2. This creates some weird correlation where the better you are on some combination of (smartness/understanding of ethics/power in modelling the world), the more you can try to be actually guided by consequences 3. This can make being ‘hardcore’ consequentialist …sort of cool and “what the top people do” 4. … which is a setup where people can start goodhart/signal on it
Yeah, I think it’s a severe problem that if you are good at decision theory you can in fact validly grab big old chunks of deontology directly out of consequentialism including lots of the cautionary parts, or to put it perhaps a bit more sharply, a coherent superintelligence with a nice utility function does not in fact need deontology; and if you tell that to a certain kind of person they will in fact decide that they’d be cooler if they were superintelligences so they must be really skillful at deriving deontology from decision theory and therefore they can discard the deontology and just do what the decision theory does. I’m not sure how to handle this; I think that the concept of “cognitohazard” gets vastly overplayed around here, but there’s still true facts that cause a certain kind of person to predictably get their brain stuck on them, and this could plausibly be one of them. It’s also too important of a fact (eg to alignment) for “keep it completely secret” to be a plausible option either.
In practice I think utilitarians should adopt mostly a skillful combination of virtue ethics, deontic rules, and explicit calculations.
I think what does the FTX case provides some evidence for, is some fraction of smart EAs exposed to utilitarianism being prone to attempt to rely on the explicit act utilitarianism, despite the warnings.
I think part of the story here is the a weird status dynamic where...
1. I would basically trust some people to try the explicit direct utilitarian thing: eg I think it is fine for Derek Parfit or Toby Ord.
2. This creates some weird correlation where the better you are on some combination of (smartness/understanding of ethics/power in modelling the world), the more you can try to be actually guided by consequences
3. This can make being ‘hardcore’ consequentialist …sort of cool and “what the top people do”
4. … which is a setup where people can start goodhart/signal on it
Yeah, I think it’s a severe problem that if you are good at decision theory you can in fact validly grab big old chunks of deontology directly out of consequentialism including lots of the cautionary parts, or to put it perhaps a bit more sharply, a coherent superintelligence with a nice utility function does not in fact need deontology; and if you tell that to a certain kind of person they will in fact decide that they’d be cooler if they were superintelligences so they must be really skillful at deriving deontology from decision theory and therefore they can discard the deontology and just do what the decision theory does. I’m not sure how to handle this; I think that the concept of “cognitohazard” gets vastly overplayed around here, but there’s still true facts that cause a certain kind of person to predictably get their brain stuck on them, and this could plausibly be one of them. It’s also too important of a fact (eg to alignment) for “keep it completely secret” to be a plausible option either.