Thanks for your answer. (Just to check, I think you are a different Steve Byrnes than the one I met at Stanford EA in 2016 or so?)
I do want to emphasize is that I donât doubt that technical AI safety work is one of the top priorities. It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work. It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
No I donât think weâve met! In 2016 I was a professional physicist living in Boston. Iâm not sure if I would have even known what âEAâ stood for in 2016. :-)
It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
I agree. But maybe I would have said âless hardâ rather than âeasierâ to better convey a certain mood :-P
It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work.
Iâm not sure what your model is here.
Maybe a useful framing is âalignment taxâ: if itâs possible to make an AI that can do some task X unsafely with a certain amount of time/âmoney/âtesting/âresearch/âcompute/âwhatever, then how much extra time/âmoney/âetc. would it take to make an AI that can do task X safely? Thatâs the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (Itâs never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others wonât, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because weâre asking organizations to make tradeoffs for safety, and if thatâs a very big ask, weâre less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all weâre asking is for them to spend 1% more training time, maybe they all will. If instead weâre asking them all to spend 100Ă more compute plus an extra 3 years of pre-deployment test protocols, well, thatâs much less promising.
So either way, we want a low alignment tax.
OK, now letâs get back to what you wrote.
I think maybe your model is:
âIf Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGIâ
(You can correct me if Iâm misunderstanding.)
If we accept that premise, then I can see where youâre coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely âprobably uselessâ in a unipolar fast-takeoff world. (In the latter case, thereâs at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But Iâm not sure that I believe the premise. Or at least Iâm pretty unsure. I am not myself an Agent Foundations researcher, but I donât imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that theyâre hoping for in their research.
Oh, hmmm, the other possibility is that youâre mentally lumping together âmultipolar slow-takeoff AGIâ with âprosaic AGIâ and with âshort timelinesâ. These are indeed often lumped together, even if theyâre different things. Anyway, I would certainly agree that both âprosaic AGIâ and âshort timelinesâ would make Agent Foundations research less promising compared to neural-net-specific work.
Thanks for your answer. (Just to check, I think you are a different Steve Byrnes than the one I met at Stanford EA in 2016 or so?)
I do want to emphasize is that I donât doubt that technical AI safety work is one of the top priorities. It does seem like within technical AI safety research the best work seems to shift away from Agent Foundations type of work and towards neural-nets-specific work. It also seems like the technical problem does get easier in expectation if you have more than one shot. By contrast, I claim, many of the Moloch-style problems get harder.
No I donât think weâve met! In 2016 I was a professional physicist living in Boston. Iâm not sure if I would have even known what âEAâ stood for in 2016. :-)
I agree. But maybe I would have said âless hardâ rather than âeasierâ to better convey a certain mood :-P
Iâm not sure what your model is here.
Maybe a useful framing is âalignment taxâ: if itâs possible to make an AI that can do some task X unsafely with a certain amount of time/âmoney/âtesting/âresearch/âcompute/âwhatever, then how much extra time/âmoney/âetc. would it take to make an AI that can do task X safely? Thatâs the alignment tax.
The goal is for the alignment tax to be as close as possible to 0%. (Itâs never going to be exactly 0%.)
In the fast-takeoff unipolar case, we want a low alignment tax because some organizations will be paying the alignment tax and others wonât, and we want one of the former to win the race, not one of the latter.
In the slow-takeoff multipolar case, we want a low alignment tax because weâre asking organizations to make tradeoffs for safety, and if thatâs a very big ask, weâre less likely to succeed. If the alignment tax is 1%, we might actually succeed. Remember, that there are many reasons that organizations are incentivized to make safe AIs, not least because they want the AIs to stay under their control and do the things they want them to do, not to mention legal risks, reputation risks, employees who care about their children, etc. etc. So if all weâre asking is for them to spend 1% more training time, maybe they all will. If instead weâre asking them all to spend 100Ă more compute plus an extra 3 years of pre-deployment test protocols, well, thatâs much less promising.
So either way, we want a low alignment tax.
OK, now letâs get back to what you wrote.
I think maybe your model is:
âIf Agent Foundations research pans out at all, it would pan out by discovering a high-alignment-tax method of making AGIâ
(You can correct me if Iâm misunderstanding.)
If we accept that premise, then I can see where youâre coming from. This would be almost definitely useless in a multipolar slow-takeoff world, and merely âprobably uselessâ in a unipolar fast-takeoff world. (In the latter case, thereâs at least a prayer of a chance that the safe actors will be so far ahead of the unsafe actors that the former can pay the tax and win the race anyway.)
But Iâm not sure that I believe the premise. Or at least Iâm pretty unsure. I am not myself an Agent Foundations researcher, but I donât imagine that Agent Foundations researchers would agree with the premise that high-alignment-tax AGI is the best that theyâre hoping for in their research.
Oh, hmmm, the other possibility is that youâre mentally lumping together âmultipolar slow-takeoff AGIâ with âprosaic AGIâ and with âshort timelinesâ. These are indeed often lumped together, even if theyâre different things. Anyway, I would certainly agree that both âprosaic AGIâ and âshort timelinesâ would make Agent Foundations research less promising compared to neural-net-specific work.