I noticed that “Will humans build goal-directed agents?” was changed from being a required reading to Week 2 to being an optional reading. I don’t disagree with this choice, as I didn’t find the post very convincing, though I was rather fond of your post “AGI safety from first principles: Goals and Agency”. However, now all the required readings for Week 2 essentially take for granted that AGI will have large-scale goals. Before I participated in AGI Safety Fundamentals in the first round this year, I never considered the possibility that AGI could be non-goal-directed. I thought that since AI involves an objective function, we can directly conclude that a superintelligence would have the goal of optimizing the environment accordingly in a goal-directed fashion—especially since this seems to be an assumption underlying popular introductions such as by Wait But Why and Yudkowsky. It was only after reading “Goals and Agency” as part of the program that I realized that goal-directed AGI wasn’t a logical necessity. It might be helpful to draw out this consideration in the readings or “key ideas” section. Do you think the question of whether AGI will be goal-directed is important for participants to consider?
Overall though I think this revised curriculum looks really good!
This is a great point, and I do think it’s an important question for participants to consider; I should switch the last reading for something covering this. The bottleneck is just finding a satisfactory reading—I’m not totally happy with any of the posts covering this, but maybe AGI safety from first principles is the closest to what I want.
I noticed that “Will humans build goal-directed agents?” was changed from being a required reading to Week 2 to being an optional reading. I don’t disagree with this choice, as I didn’t find the post very convincing, though I was rather fond of your post “AGI safety from first principles: Goals and Agency”. However, now all the required readings for Week 2 essentially take for granted that AGI will have large-scale goals. Before I participated in AGI Safety Fundamentals in the first round this year, I never considered the possibility that AGI could be non-goal-directed. I thought that since AI involves an objective function, we can directly conclude that a superintelligence would have the goal of optimizing the environment accordingly in a goal-directed fashion—especially since this seems to be an assumption underlying popular introductions such as by Wait But Why and Yudkowsky. It was only after reading “Goals and Agency” as part of the program that I realized that goal-directed AGI wasn’t a logical necessity. It might be helpful to draw out this consideration in the readings or “key ideas” section. Do you think the question of whether AGI will be goal-directed is important for participants to consider?
Overall though I think this revised curriculum looks really good!
This is a great point, and I do think it’s an important question for participants to consider; I should switch the last reading for something covering this. The bottleneck is just finding a satisfactory reading—I’m not totally happy with any of the posts covering this, but maybe AGI safety from first principles is the closest to what I want.
Actually, Joe Carlsmith does it better in Is power-seeking AI an existential risk? So I’ve swapped that in instead.