A priori, I’d expect a randomly formulated AI regulation to be about 50% likely to be an improvement on the status quo, since the status quo wasn’t selected for being good for alignment.
I don’t agree.
It’s true that the status quo wasn’t selected for being good for alignment directly, but it was still selected for things that are arguably highly related to alignment. Our laws are the end result of literally thousands of years of experimentation, tweaking, and innovation in the face of risks. In that time, numerous technologies and threats have arisen, prompting us to change our laws and norms to adapt.
To believe that a literal random change to the status quo has a 50% chance of being beneficial, you’d likely have to believe that AI is so radically outside the ordinary reference class of risks that it is truly nothing whatsoever like we have ever witnessed or come across before. And while I can see a case for AI being highly unusual, I don’t think I’d be willing to go that far.
I don’t see good arguments supporting this point. I tend to think the opposite—building a coalition to pass a regulation now makes it easier to pass other regulations later.
Building a coalition now makes it easier to pass other similar regulations later, but it doesn’t necessarily make it easier to switch to an entirely different regulatory regime.
Laws and their associated bureaucracies tend to entrench themselves. Suppose that as a result of neo-luddite sentiment, the people hired to oversee AI risks in the government concern themselves only with risks to employment, ignoring what we’d consider to be more pressing concerns. I think it would be quite a lot harder to fire all of them and replace them with people who care relatively more about extinction, than to simply hire right-minded people in the first place.
If an alignment scheme doesn’t work with arbitrary data restrictions, it’s probably not good enough in principle. Even all the data on the internet is “arbitrarily restricted” relative to all the data that exists or could exist. If my alignment scheme fails with public domain data only, why shouldn’t it also fail with all the data on the internet?
My weak guess is that there’s a kind of bias at play in AI risk thinking in general, where any force that isn’t zero is taken to be arbitrarily intense. Like, if there is pressure for agents to exist, there will arbitrarily quickly be arbitrarily agentic things. If there is a feedback loop, it will be arbitrarily strong. Here, if stalling AI can’t be forever, then it’s essentially zero time. If a regulation won’t obstruct every dangerous project, then is worthless. Any finite economic disincentive for dangerous AI is nothing in the face of the omnipotent economic incentives for AI. I think this is a bad mental habit: things in the real world often come down to actual finite quantities.
Likewise, I think actual quantities of data here might matter a lot. I’m not confident at all that arbitrarily restricting 98% of the supply of data won’t make the difference between successful and unsuccessful alignment, relative to allowing the full supply of data. I do lean towards thinking it won’t make that difference, but my confidence is low, and I think it might very easily come down to the specific details of what’s being allowed and what’s being restricted.
On the other hand, I’m quite convinced that, abstractly, it is highly implausible that arbitrarily limiting what data researchers have access to will be positive for alignment. This consideration leaves me on the side of caution, and inclines me to say we should probably not put in place such arbitrary restrictions.
I think a better argument for your conclusion is that incentivizing researchers to move away from big data approaches might make AI research more accessible and harder to control. Legal restrictions also favor open source, which has the same effect. We don’t want cutting-edge AI to become something that exists on the margins, like using bittorrent for piracy.
I think that’s also true. It’s a good point that I didn’t think to put into the post.
Our laws are the end result of literally thousands of years of of experimentation
The distribution of legal cases involving technology over the past 1000 years is very different than the distribution of legal cases involving technology over the past 10 years. “Law isn’t keeping up with tech” is a common observation nowadays.
a literal random change to the status quo
How about we revise to “random viable legislation” or something like that. Any legislation pushed by artists will be in the same reference class as the “thousands of years of of experimentation” you mention (except more recent, and thus better adapted to current reality).
AI is so radically outside the ordinary reference class of risks that it is truly nothing whatsoever like we have ever witnessed or come across before
Either AI will be transformative, in which case this is more or less true, or it won’t be transformative, in which case the regulations matter a lot less.
Suppose that as a result of neo-luddite sentiment, the people hired to oversee AI risks in the government concern themselves only with risks to employment, ignoring what we’d consider to be more pressing concerns.
If we’re involved in current efforts, maybe some of the people hired to oversee AI risks will be EAs. Or maybe we can convert some “neo-luddites” to our point of view.
simply hire right-minded people in the first place
Sounds to me like you’re letting the perfect be the enemy of the good. We don’t have perfect control over what legislation gets passed, including this particular legislation. Odds are decent that the artist lobby succeeds even with our opposition, or that current legislative momentum is better aligned with humanity’s future than any legislative momentum which occurs later. We have to think about the impact of our efforts on the margin, as opposed to thinking of a “President Matthew Barnett” scenario.
On the other hand, I’m quite convinced that, abstractly, it is highly implausible that arbitrarily limiting what data researchers have access to will be positive for alignment.
It could push researchers towards more robust schemes which work with less data.
I want a world where the only way for a company like OpenAI to make ChatGPT commercially useful is to pioneer alignment techniques that will actually work in principle. Throwing data & compute at ChatGPT until it seems aligned, the way OpenAI is doing, seems like a path to ruin.
As an intuition pump, it seems possible to me that a solution for adversarial examples would make GPT work well even when trained on less data. So by making it easy to train GPT on lots of data, we may be letting OpenAI neglect adversarial examples. We want an “alignment overhang” where our alignment techniques are so good that they work even with a small dataset, and become even better when used with a large dataset. (I guess this argument doesn’t work in the specific case of safety problems which only appear with a large dataset, but I’m not sure if there’s anything like that.)
Another note: I’ve had the experience of sharing alignment ideas with OpenAI staff. They responded by saying “what we’re doing seems good enough” / not trying my idea (to my knowledge). Now they’re running into problems which I believe the ideas I shared might’ve solved. I wish they’d focus more on finding a solid approach, and less on throwing data at techniques I view as subpar.
I don’t agree.
It’s true that the status quo wasn’t selected for being good for alignment directly, but it was still selected for things that are arguably highly related to alignment. Our laws are the end result of literally thousands of years of experimentation, tweaking, and innovation in the face of risks. In that time, numerous technologies and threats have arisen, prompting us to change our laws and norms to adapt.
To believe that a literal random change to the status quo has a 50% chance of being beneficial, you’d likely have to believe that AI is so radically outside the ordinary reference class of risks that it is truly nothing whatsoever like we have ever witnessed or come across before. And while I can see a case for AI being highly unusual, I don’t think I’d be willing to go that far.
Building a coalition now makes it easier to pass other similar regulations later, but it doesn’t necessarily make it easier to switch to an entirely different regulatory regime.
Laws and their associated bureaucracies tend to entrench themselves. Suppose that as a result of neo-luddite sentiment, the people hired to oversee AI risks in the government concern themselves only with risks to employment, ignoring what we’d consider to be more pressing concerns. I think it would be quite a lot harder to fire all of them and replace them with people who care relatively more about extinction, than to simply hire right-minded people in the first place.
I think it might be worth quoting Katja Grace from a few days ago,
Likewise, I think actual quantities of data here might matter a lot. I’m not confident at all that arbitrarily restricting 98% of the supply of data won’t make the difference between successful and unsuccessful alignment, relative to allowing the full supply of data. I do lean towards thinking it won’t make that difference, but my confidence is low, and I think it might very easily come down to the specific details of what’s being allowed and what’s being restricted.
On the other hand, I’m quite convinced that, abstractly, it is highly implausible that arbitrarily limiting what data researchers have access to will be positive for alignment. This consideration leaves me on the side of caution, and inclines me to say we should probably not put in place such arbitrary restrictions.
I think that’s also true. It’s a good point that I didn’t think to put into the post.
The distribution of legal cases involving technology over the past 1000 years is very different than the distribution of legal cases involving technology over the past 10 years. “Law isn’t keeping up with tech” is a common observation nowadays.
How about we revise to “random viable legislation” or something like that. Any legislation pushed by artists will be in the same reference class as the “thousands of years of of experimentation” you mention (except more recent, and thus better adapted to current reality).
Either AI will be transformative, in which case this is more or less true, or it won’t be transformative, in which case the regulations matter a lot less.
If we’re involved in current efforts, maybe some of the people hired to oversee AI risks will be EAs. Or maybe we can convert some “neo-luddites” to our point of view.
Sounds to me like you’re letting the perfect be the enemy of the good. We don’t have perfect control over what legislation gets passed, including this particular legislation. Odds are decent that the artist lobby succeeds even with our opposition, or that current legislative momentum is better aligned with humanity’s future than any legislative momentum which occurs later. We have to think about the impact of our efforts on the margin, as opposed to thinking of a “President Matthew Barnett” scenario.
It could push researchers towards more robust schemes which work with less data.
I want a world where the only way for a company like OpenAI to make ChatGPT commercially useful is to pioneer alignment techniques that will actually work in principle. Throwing data & compute at ChatGPT until it seems aligned, the way OpenAI is doing, seems like a path to ruin.
As an intuition pump, it seems possible to me that a solution for adversarial examples would make GPT work well even when trained on less data. So by making it easy to train GPT on lots of data, we may be letting OpenAI neglect adversarial examples. We want an “alignment overhang” where our alignment techniques are so good that they work even with a small dataset, and become even better when used with a large dataset. (I guess this argument doesn’t work in the specific case of safety problems which only appear with a large dataset, but I’m not sure if there’s anything like that.)
Another note: I’ve had the experience of sharing alignment ideas with OpenAI staff. They responded by saying “what we’re doing seems good enough” / not trying my idea (to my knowledge). Now they’re running into problems which I believe the ideas I shared might’ve solved. I wish they’d focus more on finding a solid approach, and less on throwing data at techniques I view as subpar.