I’m quite tempted to create a course for conceptual AI alignment, especially since agent foundations has been removed from the latest version of the BlueDot Impact course[1].
If I did this, I would probably run it as follows:
a) Each week would have two sessions. One to discuss the readings and another for people to bounce their takes off others in the cohort. I expect that people trying to learn conceptual alignment would benefit from having extra time to discuss their ideas with informed participants. b) The course would be less introductory, though without assuming knowledge of AGISF. AGISF already serves as a general introduction for those who need it and making progress on conceptual alignment is less of a numbers game, so it would likely make sense to focus on people further along the pipeline, rather than trying to expand the top of the funnel. In terms of the rough target audience, I imagine people who have been browsing Less Wrong or hanging around the AI safety community for years; or maybe someone who found out about it more recently and has been seriously reading up on it for the last couple of months. For this reason, I would want to assume that people already know why we’re worried about AI Safety and basic ideas like inner/outer alignment and instrumental convergence.[2] c) I’d probably follow the AGISF in picking one question to focus on every week. I also like how it contextualises each reading.
Figuring out what to include seems like it’d be a massive challenge, but I agree that one of the best ways to do this would be to just create a curriculum, send it around to people and then additionally collect feedback from people who have gone through the course.
Anyway, I’d love to hear if anyone has any thoughts on what such a course should look like.
(The closest current course is the Key Phenomenon in AI Safety Course that PIBSS ran, but this would assume that people are more technical—in the broader sense where technical includes maths, physics, comp sci, etc—and would be less introductory).
This is quite a reasonable decision. Shorter timelines makes agent foundations work less pressing. Additionally, I imagine that most people who complete AGISF would not gain that much value from covering a week on agent foundations, at least not this early in their alignment journeys. Having a week where a substantial part of the cohort feel “why was I taught this” is not a very good experience for them.
Two sessions. One to discuss the readings and another for people to bounce their takes off others in the cohort.
Sounds like a fun experiment! I found that just open discussion sometimes leads to less valuable discussion, so in both cases I’d focus on a few specific discussion prompts / trying to help people come to a conclusion on some question. I linked to something about learning activities in the main post, which I think helps with session design. As with anything though, I think trying it out is the only way to know for sure, so feel free to ignore me.
without assuming knowledge of AGISF
I’d be keen to hear specifically what the pre-requisite knowledge is—just in order to inform people if they ‘know enough’ to take your course. Maybe it’s weeks 1-3 of the alignment course? Agree with your assessment that further courses can be more specific, though.
I agree that one of the best ways to do this would be to just create a curriculum, send it around to people and then additionally collect feedback from people who have gone through the course
Sounds right! I would encourage you trying to front-load some of the work before creating a curriculum though. Without knowing how expert you are in agent foundations yourself—I’d suggest trying to take steps that mean your first stab is close enough for giving feedback to seem valuable to the people you ask, and so it’s not a huge lift to get from 1st draft to final product and there are no nasty surprises from people who would have done it completely differently.
I.e. what if you ask 3-5 experts what they think the most important part of agent foundations is, and maybe try to conduct 30 min interviews with them to solicit the story they would tell in a curriculum? You can also ask them their top recommended resources, and why they recommend it. That would be a strong start, I think.
I found that just open discussion sometimes leads to less valuable discussion, so in both cases I’d focus on a few specific discussion prompts / trying to help people come to a conclusion on some question
That’s useful feedback. Maybe it’d be best to take some time at the end of the first session of the week to figure out what questions to discuss in the second session? This would also allow people to look things up before the discussion and take some time for reflection.
I’d be keen to hear specifically what the pre-requisite knowledge is—just in order to inform people if they ‘know enough’ to take your course. Maybe it’s weeks 1-3 of the alignment course?
Thoughts on prerequisites off the top of my head: Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds. Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms. The purpose of this course also wouldn’t be to reiterate the basic AI safety argument, although it might cover debates such as the validity of counting arguments for mesa-optimisers or whether RLHF means that we should expect outer alignment to be solved by default.
I.e. what if you ask 3-5 experts what they think the most important part of agent foundations is, and maybe try to conduct 30 min interviews with them to solicit the story they would tell in a curriculum? You can also ask them their top recommended resources, and why they recommend it. That would be a strong start, I think.
That’s a great suggestion. I would still be tempted to create a draft curriculum though, even just at the level of week 1 focuses on question x and includes readings on topics a, b and c. I could also lift heavily from the previous agent foundations week and other past versions of AISF, alignment 201, key phenomenon in AI Safety, MATS AI Safety Strategy Curriculum, MIRI’s Research Guide, John Wentworth’s alignment training program + the highlighted AI Safety Sequences on Less Wrong (in addition to possibly including some material from the AI Safety Bootcamp or Advanced Fellowship that I ran).
I’d want to first ask them what they would like to see included without them being anchored on my draft, then I’d show them my draft and ask for more specific feedback. Expert time is valuable, so I’d want to get the most out of their time and it is easier to critique a specific artifact.
Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds.
I would reccomend having a week 0 with some ML and RL basics.
I did a day 0 ML and RL speed run, at the start of two of my AI Safety workshops at EA hotel in 2019. Where you there for that? It might have been recorded, but I have no idea where it might have ended up. Although obviously some things have happened since then.
Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms.
Seems very worth creating. Depending on peoples background some people will have an understanding of these with out knowing the terminology. A document explaining each term, and a “read more” link to some useful post would be great. Both for people to know if they have the pre-requisite, and to help anyone who almost have the prerequisite to find that one blogpost they (them specifically) should read to be able to follow the course.
I’m quite tempted to create a course for conceptual AI alignment, especially since agent foundations has been removed from the latest version of the BlueDot Impact course[1].
If I did this, I would probably run it as follows:
a) Each week would have two sessions. One to discuss the readings and another for people to bounce their takes off others in the cohort. I expect that people trying to learn conceptual alignment would benefit from having extra time to discuss their ideas with informed participants.
b) The course would be less introductory, though without assuming knowledge of AGISF. AGISF already serves as a general introduction for those who need it and making progress on conceptual alignment is less of a numbers game, so it would likely make sense to focus on people further along the pipeline, rather than trying to expand the top of the funnel. In terms of the rough target audience, I imagine people who have been browsing Less Wrong or hanging around the AI safety community for years; or maybe someone who found out about it more recently and has been seriously reading up on it for the last couple of months. For this reason, I would want to assume that people already know why we’re worried about AI Safety and basic ideas like inner/outer alignment and instrumental convergence.[2]
c) I’d probably follow the AGISF in picking one question to focus on every week. I also like how it contextualises each reading.
Figuring out what to include seems like it’d be a massive challenge, but I agree that one of the best ways to do this would be to just create a curriculum, send it around to people and then additionally collect feedback from people who have gone through the course.
Anyway, I’d love to hear if anyone has any thoughts on what such a course should look like.
(The closest current course is the Key Phenomenon in AI Safety Course that PIBSS ran, but this would assume that people are more technical—in the broader sense where technical includes maths, physics, comp sci, etc—and would be less introductory).
This is quite a reasonable decision. Shorter timelines makes agent foundations work less pressing. Additionally, I imagine that most people who complete AGISF would not gain that much value from covering a week on agent foundations, at least not this early in their alignment journeys. Having a week where a substantial part of the cohort feel “why was I taught this” is not a very good experience for them.
Though it wouldn’t be too hard to create a document containing assumed knowledge.
Thanks for engaging!
Sounds like a fun experiment! I found that just open discussion sometimes leads to less valuable discussion, so in both cases I’d focus on a few specific discussion prompts / trying to help people come to a conclusion on some question. I linked to something about learning activities in the main post, which I think helps with session design. As with anything though, I think trying it out is the only way to know for sure, so feel free to ignore me.
I’d be keen to hear specifically what the pre-requisite knowledge is—just in order to inform people if they ‘know enough’ to take your course. Maybe it’s weeks 1-3 of the alignment course? Agree with your assessment that further courses can be more specific, though.
Sounds right! I would encourage you trying to front-load some of the work before creating a curriculum though. Without knowing how expert you are in agent foundations yourself—I’d suggest trying to take steps that mean your first stab is close enough for giving feedback to seem valuable to the people you ask, and so it’s not a huge lift to get from 1st draft to final product and there are no nasty surprises from people who would have done it completely differently.
I.e. what if you ask 3-5 experts what they think the most important part of agent foundations is, and maybe try to conduct 30 min interviews with them to solicit the story they would tell in a curriculum? You can also ask them their top recommended resources, and why they recommend it. That would be a strong start, I think.
That’s useful feedback. Maybe it’d be best to take some time at the end of the first session of the week to figure out what questions to discuss in the second session? This would also allow people to look things up before the discussion and take some time for reflection.
Thoughts on prerequisites off the top of my head:
Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds.
Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms. The purpose of this course also wouldn’t be to reiterate the basic AI safety argument, although it might cover debates such as the validity of counting arguments for mesa-optimisers or whether RLHF means that we should expect outer alignment to be solved by default.
That’s a great suggestion. I would still be tempted to create a draft curriculum though, even just at the level of week 1 focuses on question x and includes readings on topics a, b and c. I could also lift heavily from the previous agent foundations week and other past versions of AISF, alignment 201, key phenomenon in AI Safety, MATS AI Safety Strategy Curriculum, MIRI’s Research Guide, John Wentworth’s alignment training program + the highlighted AI Safety Sequences on Less Wrong (in addition to possibly including some material from the AI Safety Bootcamp or Advanced Fellowship that I ran).
I’d want to first ask them what they would like to see included without them being anchored on my draft, then I’d show them my draft and ask for more specific feedback. Expert time is valuable, so I’d want to get the most out of their time and it is easier to critique a specific artifact.
I would reccomend having a week 0 with some ML and RL basics.
I did a day 0 ML and RL speed run, at the start of two of my AI Safety workshops at EA hotel in 2019. Where you there for that? It might have been recorded, but I have no idea where it might have ended up. Although obviously some things have happened since then.
Seems very worth creating. Depending on peoples background some people will have an understanding of these with out knowing the terminology. A document explaining each term, and a “read more” link to some useful post would be great. Both for people to know if they have the pre-requisite, and to help anyone who almost have the prerequisite to find that one blogpost they (them specifically) should read to be able to follow the course.
I was there for an AI Safety workshop, I can’t remember the content though. Do you know what you included?