One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.
Here are various sources of pressure:
Lab leadership
Employees of the lab
Investors
Regulators
Customers
In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).
And here are various sources of inconvenience that could be associated with using particular techniques, even assuming they’re in principle competitive (in both the performance-competitive and training-competitive senses).
Perhaps they require using substantially different algorithms or technologies, even if these aren’t fundamentally worse. As a dumb example, imagine that building an aligned AGI requires building your training code in some language that is much less bug-prone than Python, eg Haskell. It’s not really fundamentally harder to do ML in Haskell than Python, but all the ML libraries are in Python and in practice it would require a whole lot of annoying work that an org would be extremely reluctant to do.
Perhaps they require more complicated processes with more moving parts.
Perhaps they require the org to do things that are different from the things it’s good at doing. For example, I get the sense that ML researchers are averse to interacting with human labellers (because it is pretty annoying) and so underutilize techniques that involve eg having humans in the loop. Organizations that will be at the cutting edge of AI research will probably have organizational structures that are optimized for the core competencies related to their work. I expect these core competencies to include ML research, distributed systems engineering (for training gargantuan models), fundraising (because these projects will likely be extremely capital intensive), perhaps interfacing with regulators, and various work related to commercializing these large models. I think it’s plausible that alignment will require organizational capacities quite different from these.
Perhaps they require you to have capable and independent red teams whose concerns are taken seriously.
And so when I’m thinking about labs not using excellent alignment strategies that had already been developed, I imagine the failures differently depending on how much inconvenience there was:
“They just didn’t care”: The amount of pressure on them to use these techniques was extremely low. I’d be kind of surprised by this failure: I feel like if it really came down to it, and especially if EA was willing to spend a substantial fraction of its total resources on affecting some small number of decisions, basically all existing labs could be persuaded to do fairly easy things for the sake of reducing AI x-risk.
“They cared somewhat, but it was too inconvenient to use them”. I think that a lot of the point of applied alignment research is reducing the probability of failures like this.
“The techniques were not competitive”. In this case, even large amounts of pressure might not suffice (though presumably, sufficiently large amounts of pressure could cause the whole world to use these techniques even if they weren’t that competitive.)
One simple model for this is: labs build aligned models if the amount of pressure on them to use sufficiently reliable alignment techniques is greater than the inconvenience associated with using those techniques.
Here are various sources of pressure:
Lab leadership
Employees of the lab
Investors
Regulators
Customers
In practice, all of these sources of pressure are involved in companies spending resources on, eg, improving animal welfare standards, reducing environmental costs, or DEI (diversity, equity, and inclusion).
And here are various sources of inconvenience that could be associated with using particular techniques, even assuming they’re in principle competitive (in both the performance-competitive and training-competitive senses).
Perhaps they require using substantially different algorithms or technologies, even if these aren’t fundamentally worse. As a dumb example, imagine that building an aligned AGI requires building your training code in some language that is much less bug-prone than Python, eg Haskell. It’s not really fundamentally harder to do ML in Haskell than Python, but all the ML libraries are in Python and in practice it would require a whole lot of annoying work that an org would be extremely reluctant to do.
Perhaps they require more complicated processes with more moving parts.
Perhaps they require the org to do things that are different from the things it’s good at doing. For example, I get the sense that ML researchers are averse to interacting with human labellers (because it is pretty annoying) and so underutilize techniques that involve eg having humans in the loop. Organizations that will be at the cutting edge of AI research will probably have organizational structures that are optimized for the core competencies related to their work. I expect these core competencies to include ML research, distributed systems engineering (for training gargantuan models), fundraising (because these projects will likely be extremely capital intensive), perhaps interfacing with regulators, and various work related to commercializing these large models. I think it’s plausible that alignment will require organizational capacities quite different from these.
Perhaps they require you to have capable and independent red teams whose concerns are taken seriously.
And so when I’m thinking about labs not using excellent alignment strategies that had already been developed, I imagine the failures differently depending on how much inconvenience there was:
“They just didn’t care”: The amount of pressure on them to use these techniques was extremely low. I’d be kind of surprised by this failure: I feel like if it really came down to it, and especially if EA was willing to spend a substantial fraction of its total resources on affecting some small number of decisions, basically all existing labs could be persuaded to do fairly easy things for the sake of reducing AI x-risk.
“They cared somewhat, but it was too inconvenient to use them”. I think that a lot of the point of applied alignment research is reducing the probability of failures like this.
“The techniques were not competitive”. In this case, even large amounts of pressure might not suffice (though presumably, sufficiently large amounts of pressure could cause the whole world to use these techniques even if they weren’t that competitive.)
Thanks for the response! I found the second set of bullet points especially interesting/novel.