I’m glad “distillation” is emphasized as well in the acronym, because I think it resolves an important question about competitiveness. My initial impression, from the pitch of IA as “solve arbitrarily hard problems with aligned AIs by using human-endorsed decompositions,” was that this wouldn’t work because explicitly decomposing tasks this way in deployment sounds too slow. But distillation in theory solves that problem, because the decomposition from the training phase becomes implicit. (Of course, it raises safety risks too, because we need to check that the compression of this process into a “fast” policy didn’t compromise the safety properties that motivated decomposition in the training in the first place.)
I’m glad “distillation” is emphasized as well in the acronym, because I think it resolves an important question about competitiveness. My initial impression, from the pitch of IA as “solve arbitrarily hard problems with aligned AIs by using human-endorsed decompositions,” was that this wouldn’t work because explicitly decomposing tasks this way in deployment sounds too slow. But distillation in theory solves that problem, because the decomposition from the training phase becomes implicit. (Of course, it raises safety risks too, because we need to check that the compression of this process into a “fast” policy didn’t compromise the safety properties that motivated decomposition in the training in the first place.)