You keep saying that classical utilitarianism combined with short timelines condones crime, but I don’t think this is the case at all.
The standard utilitarian argument for adhering to various commonsense moral norms, such as norms against lying, stealing and killing, is that violating these norms would have disastrous consequences (much worse than you naively think), damaging your reputation and, in turn, your future ability to do good in the world. A moral perspective, such as the total view, for which the value at stake is much higher than previously believed, doesn’t increase the utilitarian incentives for breaking such moral norms. Although the goodness you can realize by violating these norms is now much greater, the reputational costs are correspondingly large. As Hubinger reminds us in a recent post, “credibly pre-committing to follow certain principles—such as never engaging in fraud—is extremely advantageous, as it makes clear to other agents that you are a trustworthy actor who can be relied upon.” Thinking that you have a license to disregard these principles because the long-term future has astronomical value fails to appreciate that endangering your perceived trustworthiness will seriously compromise your ability to protect that valuable future.
But the short timelines mean the chances of getting caught are a lot smaller because the world might end in 20-30 years. As we get closer to AGI, the chances of getting caught will be smaller still.
FWIW obviously people with the same utilitarian views will disagree on what is positive EV / a good action under the same set of premises / beliefs.
But even then, I think the chances you condone fraud to fund AI safety is very high under the combo of pure / extreme classical total utilitarianism + longtermism + short AI timelines, even if some people who share these beliefs might not condone fraud.
You keep saying that classical utilitarianism combined with short timelines condones crime, but I don’t think this is the case at all.
The standard utilitarian argument for adhering to various commonsense moral norms, such as norms against lying, stealing and killing, is that violating these norms would have disastrous consequences (much worse than you naively think), damaging your reputation and, in turn, your future ability to do good in the world. A moral perspective, such as the total view, for which the value at stake is much higher than previously believed, doesn’t increase the utilitarian incentives for breaking such moral norms. Although the goodness you can realize by violating these norms is now much greater, the reputational costs are correspondingly large. As Hubinger reminds us in a recent post, “credibly pre-committing to follow certain principles—such as never engaging in fraud—is extremely advantageous, as it makes clear to other agents that you are a trustworthy actor who can be relied upon.” Thinking that you have a license to disregard these principles because the long-term future has astronomical value fails to appreciate that endangering your perceived trustworthiness will seriously compromise your ability to protect that valuable future.
But the short timelines mean the chances of getting caught are a lot smaller because the world might end in 20-30 years. As we get closer to AGI, the chances of getting caught will be smaller still.
FWIW obviously people with the same utilitarian views will disagree on what is positive EV / a good action under the same set of premises / beliefs.
But even then, I think the chances you condone fraud to fund AI safety is very high under the combo of pure / extreme classical total utilitarianism + longtermism + short AI timelines, even if some people who share these beliefs might not condone fraud.