Disclosing a conflict of interest demonstrates explicit awareness of potential bias. It’s often done to make sure the reader tries to weigh the merits of the content by itself. Your comment shows me that you have (perhaps) not done so, by ignoring the points the author argued. If you see any evidence of bias in the takes in the article/post, can you be more specific? That way, the author is given an honest chance to defend his viewpoint.
blueberry
Natural selection will tend to promote AIs that disempower human beings. For example, we currently have chatbots that can help us solve problems. But AI developers are working to give these chatbots the ability to access the internet and online banking, and even control the actions of physical robots. While society would be better off if AIs make human workers more productive, competitive pressure pushes towards AI systems that automate human labor. Self-preservation and power seeking behaviors would also give AIs an evolutionary advantage, even to the detriment of humanity.
In this vein, is there anything to the idea of focusing more on aligning incentives than AI itself? Meaning, is it more useful to alter selection pressures (which behaviors are rewarded outside of training) vs trying to induce “useful mutations” (alignment of specific AIs)? I have no idea how well this would work in practice, but it seems less fragile. One half-baked idea: heavily tax direct AI labor, but not indirect AI labor (i.e. make it cheaper to get AIs to help humans be more productive than to do it without human involvement)
A single really convincing demonstration of something like deceptive alignment could make a big difference to the case for standards and monitoring (next section).
This struck me as a particularly good example of a small improvement having a meaningful impact. On a personal note, seeing the example of deceptive alignment you wrote would make me immediately move to the hit-the-emergency-brakes/burn-it-all-down camp. I imagine that many would react in a similar way, which might place a lot of pressure on AI labs to collectively start implementing some strict (not just for show) standards.
The idea of the intention-action gap is really interesting. I would imagine that the personal utility lost by closing this gap is also a significant factor. Meaning, if I recognize that this AI is sentient, what can I no longer do with/to it? If the sacrifice is too inconvenient, we might not be in such a hurry to concede that our intuitions are right by acting on them.
1.whether AI systems are developed,
2. whether they rebel
3.whether they beat us
4.whether, having defeated us, they decide to kill/enslave us all.
I’m curious what 3 (defeat) might look like without 4 happening?
For these reasons I do not believe the EA movement should focus too much or too exclusively on LLMs or similar models as candidates for an AGI precursor, or put too much of a focus on short time horizons. We should pursue a diverse range of strategies for mitigating AI risk, and devote significant resources towards longer time horizons.
Do you think that most strategies that are potentially useful given short timelines remain so as timelines lengthen? (i.e. where the effectiveness of the strategy is timeline-independent)
Which assumption carries the largest penalty if incorrect? (anticipating and planning for shorter timelines and being wrong vs. anticipating and planning for longer timelines and being wrong)
Instead of trying to refute Alice from general principles, I think Bob should instead point to concrete reasons for optimism (for example, Bob could say “for reasons A, B, and C it is likely that we can coordinate on not building AGI for the next 40 years and solve alignment in the meantime”).
As an aside to the main point of your post, I think Bob arrived at his position by default. I suspect that part of it comes from the fact that the bulk of human experiences deal with natural systems. These natural systems are often robust and could be described as default-success. Take human interaction for instance: we assume that any stranger we meet is not a sociopath, because they rarely are. This system is robust and default-success because anti-social behavior is maladaptive. Because AI is so easy for our brains to place in the category of humans, we might by extension put it in the “natural system”-box. With that comes the assumption that it’s behavior reverts to default-success. Have you ever been irritated at your computer because it freezes? This irrational response could be traced to us being angry that the computer doesn’t follow the rules of behavior that have to be followed when in the (human) box that we erroneously placed it in.
I’ve been thinking about this specific idea:
Intuitively, I think it makes sense that data should be the limiting factor of AI growth. A human with an IQ of 150 growing up in the rainforest will be very good at identifying plants, but won’t all of a sudden discover quantum physics. Similarly, an AI trained on only images of trees, even with compute 100 times more than we have now, will not be able to make progress in quantum physics.
It seems to me that you’re making the point that extreme out-of-distribution domains are unreachable by generalization (at least rapidly). Let’s consider that humans actually went from only identifying plants to making progress in quantum physics. How did this happen?
Humans didn’t do it all of a sudden. It was only possible in stepwise fashion spanning generations, and required building on past knowledge (the way to climb ten steps up the ladder is simply to climb one step at a time ten times over).
Human population increases meant that more people were working on learning new knowledge.
Humans had to (as you point out) gather new information (not in our rainforest training set) in order to learn new insights.
Humans often had to test their insights to gain practical knowledge (which you also point out with respect to theoretical vs experimental physics)
If we assume that generating high-quality synthetic data would not allow for new knowledge outside of the learned domain, you would necessarily have to gather new information that humans have not gathered yet to not hit the data ceiling. As long as humans are required to gather new information, it’s reasonable to assume that sustainable exponential improvement is unlikely, since human information-gathering speed would not increase in tandem. Okay, let’s remove the human bottleneck. In this case, an exponentially improving AI would have to find a way to gather information from the outside world with exponentially increasing speeds (as well as test insights/theories at those speeds). Can you think of any way this would be possible? Otherwise, I find it hard not to reach the same conclusion as you.
Depends on what level and type of advancement we’re talking about. I think interactiveness opens far more doors in VR than say improved graphics. Something that immediately came to mind was the ability to perform simulations of surgery without the high stakes. If you could tailor the experience to specific patients, you could get an opportunity to discover unexpected complications that might arise with that specific procedure.
With higher levels of immersion, your example of exploring the space station would be really interesting. I’m not sure of the benefit to humanity, but things like walking on the moon in VR would be mindblowing. As you imply, it might also give us some valuable perspective that carries over to real life, expands our horizons, and makes us less petty.
It’s interesting how OpenAI basically concedes that it’s a fruitless effort further down in the very same post:
Because the upsides are so tremendous, the cost to build it decreases each year, the number of actors building it is rapidly increasing, and it’s inherently part of the technological path we are on, stopping it would require something like a global surveillance regime, and even that isn’t guaranteed to work.
It’s not hard to imagine compute eventually becoming cheap and fast enough to train GPT4+ models on high-end consumer computers. How does one limit homebrewed training runs without limiting capabilities that are also used for non-training purposes?
1. I like the idea of concrete (publicly stated) pre-defined measures, since it lowers the risk of moving safety standards/targets. It would be a substantial improvement over what we have today, especially if there’s coordination between top labs.
2. The graph shows jumps where y increases at a rate greater than x. Has this ever happened before? What we’ve seen so far is more of a mirrored L. First we move along the x-axis, later (to a smaller degree) along the y-axis.
3. The line between the red and blue area should be heavily blurred/striped. This might seem like an aesthetic nitpick, but we can’t map the edges of what we’ve never seen. Our current perceptions are thought up by human minds that are innately tuned to empathize with and predict human behavior, which unwittingly leads to thinking along the lines: “If I was an AI and thought like a psychopathic human, what would I do?”. We don’t do this explicitly, but that’s what we’re actually doing. The real danger lies in the unknown unknowns, which cannot be plotted on a graph a priori. At the moment, we’re assuming progression of dangers/capabilities in a “logical order”, i.e. the way humans gain abilities/learn things. If the order is thrown around, so are the warning signs.