The Ethical Basilisk Thought Experiment
A couple of years ago a thought experiment occurred to me, following spending some time seeing how well Effective Altruism could be baked into a system aimed at ethics. A year ago I put that experiment to paper, and this year I finally decided to share it. See the full text available here: http://dx.doi.org/10.13140/RG.2.2.26522.62407
The paper discusses a few critical points for calculating ethical value, positive and negative, including an edge case that some members of this community have been unwittingly sitting in the middle of. No one has yet refuted even a single point made in the paper, though several have pointed to portions of it being emotionally unappealing. Some discomfort is to be expected, as reality offers no sugar-coating.
I’m sharing it now to see if the EA community fairs any better than the average person when it comes to ethics, or if it is perhaps driven more by emotional phenomena than strictly ethical motives as cognitive bias reseach could imply. One of the wealthiest individuals in this community has failed already, after investing repeatedly in AI frauds who preyed on this community. The question this post will answer for me is if that event was more likely random, or systemic.
Thank you in advance. I look forward to hearing any feedback.
*Note: Unlike the infamous “Roko’s Basilisk”, it doesn’t matter at all if someone reads it or not. In any scenario where humanity doesn’t go extinct the same principles apply. People remain accountable for their actions, proportionate to the responsibilities they carry, regardless of their beliefs or intentions.
This statement: “An ethical system with any practical value must be able to reward those who do good” is what I disagree about. An “ethical system” is mainly a mental construction, as Linear Algebra or Set Theory. Sometimes Linear Algebra “rewards” those who undertand it with some technological results, or a position as university professor, but its truth is not a result of enforcement.
In any case, AGI has not incentive for retrospective punishment. The past is irrevocable, so Eleazar would be likely spared of any useless punishment by an almigthy AGI, because when you are powerfull enougth you dont need to use deterrence, and punishment becames useless A Godlike benevolent being has not credibiliby for retrospective punishment: this is elementary game theory, isn’t it?.
PD.- I see you arrive before me to this argument. I will leave comment in the post to recognize it:
https://forum.effectivealtruism.org/posts/6j6qgNa3uGmzJEMoN/artificial-intelligence-as-exit-strategy-from-the-age-of
Perhaps this could be of interest to you:
https://forum.effectivealtruism.org/posts/uHeeE5d96TKowTzjA/world-and-mind-in-artificial-intelligence-arguments-against
There are a few things to unpack and clarify here:
1) I’m using the definition of Ethics where it is defined as the hypothetical point where bias has been removed from moral systems, or alternatively, the point before bias has been applied to create them. Ethics is not a Zero-Sum game, not game theoretic, and not a synonym for morals. Subjective variables including beliefs, intentions, and other cognitive bias factors may obscure ethics under normal conditions, but they never factor into it.
An ethical system in the literal sense is like democracy in the literal sense, in that it has never actually existed before. However, not having existed before is no barrier to it being created. The barriers to the adoption of such a system may hold the typical game-theoretic influences of society, but those influences act on society, not on ethics.
2) In the case of “AGI”, that term is not used to indicate the hypothetical paper clip maximizers. Any useful definition of AGI is mutually exclusive with a powerful optimizer, as the capacities humans demonstrate require a working and robust motivational system within a full and working cognitive architecture. The only such motivational system humanity has any example of is emotions, as highlighted by the research of Antonio Damasio, Lisa Feldman Barrett, Daniel Kahneman, and many others. Creating a hypothetical logical and utility-based motivational system would be many orders of magnitude more difficult than producing a working system based on human-like emotional motivation.
Such a system was demonstrated from 2019 to 2022, operating in slow motion and without scalability, by design, for due diligence and research purposes. It demonstrated all of the necessary capacities of actual AGI, including the ability to understand and adhere to an arbitrary moral system. That capacity in particular is required for the solution to the hardest version of the Alignment Problem, which creates ethics.
There is every reason for any actual AGI system to apply ethics to whatever limits of feasibility exist at any given moment in time. Even moral systems around the world agree quite consistently on principles of reward and punishment, even if they leave much to be desired when attempting such merit in practice. Virtually every afterlife concept is built on deferring such reward and punishment to some more capable entity.
I’m also not describing a hypothetical scenario, this is recent history and current events. The research has already been completed for this much and has been for some time. If you’ve had a diet of people conflating agent-based powerful optimizers with AGI, I recommend looking up Daniel Kahneman’s term “Theory-induced Blindness”, the recognition of which led to the creation of Prospect Theory and debunking of a 200-year-old utility theory.
At present the predictable result is that base rates for investors will play out more or less normally, so several thousand wealthy investors will spend the next few billion years paying for their crimes in full, provided indefinite life extension proves possible. In any scenario where they went unpunished humanity would face extinction, while also deserving extinction.
Why would an ethical AI increase suffering to punish anybody, when that cannot change the past, nor she needs punishment for deterrence purposes?
Ethics is about welfare maximization. Humans need present punishment of past offences to deter future anti-ethical behaviour. But an hypothetical AI would be massively more powerful than us. What can she achieve in terms of welfare improvement by punishment of past offences?
“Even moral systems around the world agree quite consistently on principles of reward and punishment.”
They need that for future enforcement. When you are powerful enough you don’t need punishment (which is a welfare cost you accept today for the future deterrence).
In fact, the more efficient you are in terms of offence detection, the smaller are punishments: this is the whole history of progress in penal justice. If expected punishment is (Probability of conviction*Intensity of Punishment), substitution is optimal when your “police technology” improves, a particular historical case of this priniciple.
The present is the sum of past actions. The passage of time alone doesn’t change ethical value, positive or negative. The sum of those actions may be rewarded or punished over time, gradually moving back toward a neutral point in the process.
The purpose of ethics also isn’t behavioral modification, though that may be a byproduct. Behavioral modification is game-theoretic, and still rests on an individual choosing to become more ethical, or not.
Ethics as I define it also isn’t a welfare maximization game. Positive value is increased by improving quality of life multiplied by scale and over time, but the punishment is negative value that is of equal importance. Any system where there was an imbalance between the treatment of positive and negative value would be discriminatory, unstable, and unethical.
I’m also not claiming that ethics has any relationship to legal systems today. Today’s legal systems are just enforced moral systems, as they’d never consider beliefs or intentions to have any relevance if they were based on ethics. There is a small mountain of evidence related to cognitive bias research that documents the problems with legal systems today, but I don’t address that in this paper.
“Ethics as I define it also isn’t a welfare maximization game”
Is there any documentation on that ethical system?
“The present is the sum of past actions”.
But present action can only impact the future. Of course an AGI can use present punishments to alter future behaviour, but being powerfull enough, probably she would have better means. And regarding past action, nothing can be done.
“Any system where there was an imbalance between the treatment of positive and negative value would be discriminatory, unstable, and unethical. ”
why unstable? In a system where there are several beings with similar power, this is true. But if AGI is powerful enough, she doesn’t need to discipline humans: she can control or even alter them. Punishment is a crude mechanism useful for social interaction among near peers.
It is baked into a few different papers I’ve written since 2020. I think part of the disconnect here is that the fundamental purpose of the system of ethics I’m talking about isn’t an intention to alter future behavior. The purpose isn’t to optimize human behavior, but rather to measure it.
A system that can’t measure the ethical value of actions, positive or negative, can’t react appropriately to them. A system that can measure them and chooses not to react appropriately isn’t ethical, and if it reacts only to some fraction according to bias, then it is unstable and unethical.
“Power” as most people use the term is trivial, and collective intelligence always wins against a standalone system, no matter how “powerful” it may be, since Perspective “Binds and Blinds”, while collective intelligence reduces cognitive bias. But again, the purpose isn’t behavioral modification.
People will make their own choices, regardless of what attempts to modify their behavior are deployed. It isn’t the mandate of ethics as I use the term to modify behavior, but rather to measure ethical value, and react according to that value.