I agree that my book list is incomplete, and it was aimed more at topics that the OP brought up.
For each of the additional topics you mentioned, it doesn’t seem like Yudkowsky’s Sequences are the best introduction. E.g., for decision theory I got more out of reading a random MIRI paper trying to formalize FDT. For AI x-risk in particular it would also surprise me if you would also recommend the sequences rather than some newer introduction.
I have read all of the books/content you link above
Is this literally true? In particular, have you read David’s Sling?
the best introduction. E.g., for decision theory I got more out of reading a random MIRI paper trying to formalize FDT.
Yeah, I think the best TDT/FDT/LDY material in-particular is probably MIRI papers. The original TDT paper is quite good, and I consider it kind of part of the sequences, since it’s written around the same time, and written in a pretty similar style.
For AI x-risk in particular it would also surprise me if you would also recommend the sequences rather than some newer introduction.
Nope, still think the sequences are by far the best (and indeed most alignment conversations I have with new people who showed up in the last 5 years tend to consist of me summarizing sequences posts, which has gotten pretty annoying after a while). There is of course useful additional stuff, but if someone wanted to start working on AI Alignment, the sequences still seem by far the best large thing to read (there are of course individual articles that do individual things best, but there isn’t really anything else textbook shaped).
What are the core pieces about AI risk in the sequences? Looking through the list, I don’t see any sequence about AI risk. Yudkowsky’s account on the Alignment Forum doesn’t have anything more than six years old, aka nothing from the sequences era.
Personally I’d point to Joe Carlsmith’s report, Richard Ngo’s writeups, Ajeya Cotra’s writeup, some of Holden Karnofsky’s writing, Concrete Problems in AI Safety and Unsolved Problems in ML Safety as the best introductions to the topic.
The primary purpose of the sequences was to communicate the generators behind AI risk and to teach the tools necessary (according to Eliezer) to make progress on it, so references to it are all over the place, and it’s the second most central theme to the essays.
Later essays in the sequences tend to have more references to AI risk than earlier ones. Here is a somewhat random selection of ones that seemed crucial when looking over the list, though this is really very unlikely to be comprehensive:
There are lots more. Indeed, towards the latter half of the sequences it’s hard not to see an essay quite straightforwardly about AI Alignment every 2-3 essays.
My guess is that he meant the sequences convey the kind of more foundational epistemology which helps people people derive better models on subjects like AI Alignment by themselves, though all of the sequences in The Machine in the Ghost and Mere Goodness have direct object-level relevance.
Excepting Ngo’s AGI safety from first principles, I don’t especially like most of those resources as introductions exactly because they offer readers very little opportunity to test or build on their beliefs. Also, I think most of them are substantially wrong. (Concrete Problems in AI Safety seems fine, but is also skipping a lot of steps. I haven’t read Unsolved Problems in ML Safety.)
I agree that my book list is incomplete, and it was aimed more at topics that the OP brought up.
For each of the additional topics you mentioned, it doesn’t seem like Yudkowsky’s Sequences are the best introduction. E.g., for decision theory I got more out of reading a random MIRI paper trying to formalize FDT. For AI x-risk in particular it would also surprise me if you would also recommend the sequences rather than some newer introduction.
Is this literally true? In particular, have you read David’s Sling?
Yeah, I think the best TDT/FDT/LDY material in-particular is probably MIRI papers. The original TDT paper is quite good, and I consider it kind of part of the sequences, since it’s written around the same time, and written in a pretty similar style.
Nope, still think the sequences are by far the best (and indeed most alignment conversations I have with new people who showed up in the last 5 years tend to consist of me summarizing sequences posts, which has gotten pretty annoying after a while). There is of course useful additional stuff, but if someone wanted to start working on AI Alignment, the sequences still seem by far the best large thing to read (there are of course individual articles that do individual things best, but there isn’t really anything else textbook shaped).
What are the core pieces about AI risk in the sequences? Looking through the list, I don’t see any sequence about AI risk. Yudkowsky’s account on the Alignment Forum doesn’t have anything more than six years old, aka nothing from the sequences era.
Personally I’d point to Joe Carlsmith’s report, Richard Ngo’s writeups, Ajeya Cotra’s writeup, some of Holden Karnofsky’s writing, Concrete Problems in AI Safety and Unsolved Problems in ML Safety as the best introductions to the topic.
The primary purpose of the sequences was to communicate the generators behind AI risk and to teach the tools necessary (according to Eliezer) to make progress on it, so references to it are all over the place, and it’s the second most central theme to the essays.
Later essays in the sequences tend to have more references to AI risk than earlier ones. Here is a somewhat random selection of ones that seemed crucial when looking over the list, though this is really very unlikely to be comprehensive:
Ghosts in the Machine
Optimization and the Intelligence Explosion
Belief in Intelligence
The Hidden Complexity of Wishes
That Alien Message (I think this one is particularly good)
Dreams of AI Design
Raised in Technophilia
Value is Fragile
There are lots more. Indeed, towards the latter half of the sequences it’s hard not to see an essay quite straightforwardly about AI Alignment every 2-3 essays.
My guess is that he meant the sequences convey the kind of more foundational epistemology which helps people people derive better models on subjects like AI Alignment by themselves, though all of the sequences in The Machine in the Ghost and Mere Goodness have direct object-level relevance.
Excepting Ngo’s AGI safety from first principles, I don’t especially like most of those resources as introductions exactly because they offer readers very little opportunity to test or build on their beliefs. Also, I think most of them are substantially wrong. (Concrete Problems in AI Safety seems fine, but is also skipping a lot of steps. I haven’t read Unsolved Problems in ML Safety.)