I found that just open discussion sometimes leads to less valuable discussion, so in both cases I’d focus on a few specific discussion prompts / trying to help people come to a conclusion on some question
That’s useful feedback. Maybe it’d be best to take some time at the end of the first session of the week to figure out what questions to discuss in the second session? This would also allow people to look things up before the discussion and take some time for reflection.
I’d be keen to hear specifically what the pre-requisite knowledge is—just in order to inform people if they ‘know enough’ to take your course. Maybe it’s weeks 1-3 of the alignment course?
Thoughts on prerequisites off the top of my head: Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds. Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms. The purpose of this course also wouldn’t be to reiterate the basic AI safety argument, although it might cover debates such as the validity of counting arguments for mesa-optimisers or whether RLHF means that we should expect outer alignment to be solved by default.
I.e. what if you ask 3-5 experts what they think the most important part of agent foundations is, and maybe try to conduct 30 min interviews with them to solicit the story they would tell in a curriculum? You can also ask them their top recommended resources, and why they recommend it. That would be a strong start, I think.
That’s a great suggestion. I would still be tempted to create a draft curriculum though, even just at the level of week 1 focuses on question x and includes readings on topics a, b and c. I could also lift heavily from the previous agent foundations week and other past versions of AISF, alignment 201, key phenomenon in AI Safety, MATS AI Safety Strategy Curriculum, MIRI’s Research Guide, John Wentworth’s alignment training program + the highlighted AI Safety Sequences on Less Wrong (in addition to possibly including some material from the AI Safety Bootcamp or Advanced Fellowship that I ran).
I’d want to first ask them what they would like to see included without them being anchored on my draft, then I’d show them my draft and ask for more specific feedback. Expert time is valuable, so I’d want to get the most out of their time and it is easier to critique a specific artifact.
Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds.
I would reccomend having a week 0 with some ML and RL basics.
I did a day 0 ML and RL speed run, at the start of two of my AI Safety workshops at EA hotel in 2019. Where you there for that? It might have been recorded, but I have no idea where it might have ended up. Although obviously some things have happened since then.
Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms.
Seems very worth creating. Depending on peoples background some people will have an understanding of these with out knowing the terminology. A document explaining each term, and a “read more” link to some useful post would be great. Both for people to know if they have the pre-requisite, and to help anyone who almost have the prerequisite to find that one blogpost they (them specifically) should read to be able to follow the course.
That’s useful feedback. Maybe it’d be best to take some time at the end of the first session of the week to figure out what questions to discuss in the second session? This would also allow people to look things up before the discussion and take some time for reflection.
Thoughts on prerequisites off the top of my head:
Week 0: Even though it is a theory course, it would likely be useful to have some basic understanding of machine learning, although this would vary depending on the exact content of the course. It might or might not make sense to run a week 0 depending on most people’s backgrounds.
Week 1 & 2: I’d assume that the participants have at least a basic understanding of inner vs outer alignment, deceptive alignment, instrumental convergence, orthogonality thesis, why we’re concerned about powerful optimisers, value lock-in, recursive self-improvement, slow vs. fast take-off, superintelligence, transformative AI, wireheading, though I could quite easily create a document that defines all of these terms. The purpose of this course also wouldn’t be to reiterate the basic AI safety argument, although it might cover debates such as the validity of counting arguments for mesa-optimisers or whether RLHF means that we should expect outer alignment to be solved by default.
That’s a great suggestion. I would still be tempted to create a draft curriculum though, even just at the level of week 1 focuses on question x and includes readings on topics a, b and c. I could also lift heavily from the previous agent foundations week and other past versions of AISF, alignment 201, key phenomenon in AI Safety, MATS AI Safety Strategy Curriculum, MIRI’s Research Guide, John Wentworth’s alignment training program + the highlighted AI Safety Sequences on Less Wrong (in addition to possibly including some material from the AI Safety Bootcamp or Advanced Fellowship that I ran).
I’d want to first ask them what they would like to see included without them being anchored on my draft, then I’d show them my draft and ask for more specific feedback. Expert time is valuable, so I’d want to get the most out of their time and it is easier to critique a specific artifact.
I would reccomend having a week 0 with some ML and RL basics.
I did a day 0 ML and RL speed run, at the start of two of my AI Safety workshops at EA hotel in 2019. Where you there for that? It might have been recorded, but I have no idea where it might have ended up. Although obviously some things have happened since then.
Seems very worth creating. Depending on peoples background some people will have an understanding of these with out knowing the terminology. A document explaining each term, and a “read more” link to some useful post would be great. Both for people to know if they have the pre-requisite, and to help anyone who almost have the prerequisite to find that one blogpost they (them specifically) should read to be able to follow the course.
I was there for an AI Safety workshop, I can’t remember the content though. Do you know what you included?