Yeah, that’s a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano’s research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the ‘official’ name. (I also buried a comment on this in footnote 6)
I think lately, I’ve mostly seen people refer to the agenda and ideas as Iterated Amplification. (And IIRC I also think the amplification is the more relevant part.)
I’m glad “distillation” is emphasized as well in the acronym, because I think it resolves an important question about competitiveness. My initial impression, from the pitch of IA as “solve arbitrarily hard problems with aligned AIs by using human-endorsed decompositions,” was that this wouldn’t work because explicitly decomposing tasks this way in deployment sounds too slow. But distillation in theory solves that problem, because the decomposition from the training phase becomes implicit. (Of course, it raises safety risks too, because we need to check that the compression of this process into a “fast” policy didn’t compromise the safety properties that motivated decomposition in the training in the first place.)
Really glad to see this published. :)
Silly question, I hope to engage more later:
Doesn’t it stand for Iterated Distillation and Amplification? Or what’s the D doing there?
Hey Max, thanks for your comment :)
Yeah, that’s a bit confusing. I think technically, yes, IDA is iterated distillation and amplification and that Iterated Amplification is just IA. However, IIRC many people referred to Paul Christiano’s research agenda as IDA even though his sequence is called Iterated amplification, so I stuck to the abbreviation that I saw more often while also sticking to the ‘official’ name. (I also buried a comment on this in footnote 6)
I think lately, I’ve mostly seen people refer to the agenda and ideas as Iterated Amplification. (And IIRC I also think the amplification is the more relevant part.)
I’m glad “distillation” is emphasized as well in the acronym, because I think it resolves an important question about competitiveness. My initial impression, from the pitch of IA as “solve arbitrarily hard problems with aligned AIs by using human-endorsed decompositions,” was that this wouldn’t work because explicitly decomposing tasks this way in deployment sounds too slow. But distillation in theory solves that problem, because the decomposition from the training phase becomes implicit. (Of course, it raises safety risks too, because we need to check that the compression of this process into a “fast” policy didn’t compromise the safety properties that motivated decomposition in the training in the first place.)