Thanks so much for writing this! I’m curating it. I think it’s useful for more people to understand these “hopes” — which are also a helpful framework for understanding what people are working on (and what it might be useful to work on).
In case people skip straight to the comments when they see the post, here’s a very brief summary of the post:
The post outlines three high-level hopes for making AI safe:
Digital neuroscience: the ability to read or otherwise understand the digital brains/thoughts of AI systems to know their aims and change them if necessary.
Limited AI: designing AI systems to be limited in certain ways, such as being myopic or narrow, in order to make them safer.
Here’s ChatGPT’s summary of the analogy: 1. In the analogy, an 8-year-old is trying to hire an adult CEO for their $1 trillion company, but has no way of knowing whether the candidates are motivated by genuine concern for the child’s well-being (saints), just want to make the child happy in the short term (sycophants), or have their own agendas (schemers). 2. This analogy represents the challenge of ensuring that advanced AI systems are aligned with human interests, as it may be difficult to test their safety and intentions. 3. The 8-year-old is analogous to a human trying to train a powerful deep learning model, and the hiring process is analogous to the process of training the model, which implicitly searches through a large space of possible models and picks out one that performs well. 4. The CEO candidates are analogous to different AI systems, and the 8-year-old’s inability to accurately assess their motivations is analogous to the difficulty of determining the true aims of AI systems. 5. If the 8-year-old hires a sycophant or schemer, their company and wealth will be at risk, just as if an AI system is not aligned with human interests, it could pose a risk to humanity.
Thanks so much for writing this! I’m curating it. I think it’s useful for more people to understand these “hopes” — which are also a helpful framework for understanding what people are working on (and what it might be useful to work on).
In case people skip straight to the comments when they see the post, here’s a very brief summary of the post:
The post outlines three high-level hopes for making AI safe:
Digital neuroscience: the ability to read or otherwise understand the digital brains/thoughts of AI systems to know their aims and change them if necessary.
Limited AI: designing AI systems to be limited in certain ways, such as being myopic or narrow, in order to make them safer.
AI checks and balances: using AI systems to critique or supervise other AI systems.
Other hopes are outlined elsewhere
The post elaborates on them and on how they might work or fail (and thoughts on how likely that is)
The post also explains (or links to) some background, including:
There is a risk of AI systems aiming to defeat all of humanity and succeeding.
Countermeasures to prevent this risk may be difficult due to challenges in AI safety research (e.g.)
An analogy for humans trying to understand if an AI is “safe”: an 8-year-old trying to decide between adult job candidates who may be manipulative.[1]
Some other things I like about the post (not exhaustive or in any particular order):
Lots of links that help readers explore more if they want to
A summary at the top, and clear sections
The post is quite measured; it doesn’t try to sell its readers on any of the specific hopes — it notes their limitations
Here’s ChatGPT’s summary of the analogy:
1. In the analogy, an 8-year-old is trying to hire an adult CEO for their $1 trillion company, but has no way of knowing whether the candidates are motivated by genuine concern for the child’s well-being (saints), just want to make the child happy in the short term (sycophants), or have their own agendas (schemers).
2. This analogy represents the challenge of ensuring that advanced AI systems are aligned with human interests, as it may be difficult to test their safety and intentions.
3. The 8-year-old is analogous to a human trying to train a powerful deep learning model, and the hiring process is analogous to the process of training the model, which implicitly searches through a large space of possible models and picks out one that performs well.
4. The CEO candidates are analogous to different AI systems, and the 8-year-old’s inability to accurately assess their motivations is analogous to the difficulty of determining the true aims of AI systems.
5. If the 8-year-old hires a sycophant or schemer, their company and wealth will be at risk, just as if an AI system is not aligned with human interests, it could pose a risk to humanity.