No I don’t think we’ve met! In 2016 I was a professional physicist living in Boston. I’m not sure if I would have even known what “EA” stood for in 2016. :-)
It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
I agree. But maybe I would have said “less hard” rather than “easier” to better convey a certain mood :-P
It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work.
I’m not sure what your model is here.
Maybe a useful framing is “alignment tax”: if it’s possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That’s the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (It’s never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won’t, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because we’re asking organizations to make tradeoffs for safety, and if that’s a very big ask, we’re less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we’re asking is for them to spend 1% more training time, maybe they all will. If instead we’re asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that’s much less promising.
So either way, we want a low alignment tax.
OK, now let’s get back to what you wrote.
I think maybe your model is:
“If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI”
(You can correct me if I’m misunderstanding.)
If we accept that premise, then I can see where you’re coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely “probably useless” in a unipolar fast-takeoff world. (In the latter case, there’s at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But I’m not sure that I believe the premise. Or at least I’m pretty unsure. I am not myself an Agent Foundations researcher, but I don’t imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that they’re hoping for in their research.
Oh, hmmm, the other possibility is that you’re mentally lumping together “multipolar slow-takeoff AGI” with “prosaic AGI” and with “short timelines”. These are indeed often lumped together, even if they’re different things. Anyway, I would certainly agree that both “prosaic AGI” and “short timelines” would make Agent Foundations research less promising compared to neural-net-specific work.
No I don’t think we’ve met! In 2016 I was a professional physicist living in Boston. I’m not sure if I would have even known what “EA” stood for in 2016. :-)
I agree. But maybe I would have said “less hard” rather than “easier” to better convey a certain mood :-P
I’m not sure what your model is here.
Maybe a useful framing is “alignment tax”: if it’s possible to make an AI that can do some task X unsafely with a certain amount of time/money/testing/research/compute/whatever, then how much extra time/money/etc. would it take to make an AI that can do task X safely? That’s the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (It’s never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others won’t, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because we’re asking organizations to make tradeoffs for safety, and if that’s a very big ask, we’re less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all we’re asking is for them to spend 1% more training time, maybe they all will. If instead we’re asking them all to spend 100× more compute plus an extra 3 years of pre-deployment test protocols, well, that’s much less promising.
So either way, we want a low alignment tax.
OK, now let’s get back to what you wrote.
I think maybe your model is:
“If Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGI”
(You can correct me if I’m misunderstanding.)
If we accept that premise, then I can see where you’re coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely “probably useless” in a unipolar fast-takeoff world. (In the latter case, there’s at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But I’m not sure that I believe the premise. Or at least I’m pretty unsure. I am not myself an Agent Foundations researcher, but I don’t imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that they’re hoping for in their research.
Oh, hmmm, the other possibility is that you’re mentally lumping together “multipolar slow-takeoff AGI” with “prosaic AGI” and with “short timelines”. These are indeed often lumped together, even if they’re different things. Anyway, I would certainly agree that both “prosaic AGI” and “short timelines” would make Agent Foundations research less promising compared to neural-net-specific work.