I’m a Research Fellow at Forethought; before that, I ran the non-engineering side of the EA Forum (this platform), ran the EA Newsletter, and worked on some other content-related tasks at CEA. [More about the Forum/CEA Online job.]
...
Some of my favorite of my own posts:
I finished my undergraduate studies with a double major in mathematics and comparative literature in 2021. I was a research fellow at Rethink Priorities in the summer of 2021 and was then hired by the Events Team at CEA. I later switched to the Online Team. In the past, I’ve also done some (math) research and worked at Canada/USA Mathcamp.
Some links I think people should see more frequently:
Notes on some of my AI-related confusions[1]
It’s hard for me to get a sense for stuff like “how quickly are we moving towards the kind of AI that I’m really worried about?” I think this stems partly from (1) a conflation of different types of “crazy powerful AI”, and (2) the way that benchmarks and other measures of “AI progress” de-couple from actual progress towards the relevant things. Trying to represent these things graphically helps me orient/think.
First, it seems useful to distinguish the breadth or generality of state-of-the-art AI models and how able they are on some relevant capabilities. Once I separate these out, I can plot roughly where some definitions of “crazy powerful AI” apparently lie on these axes:
(I think there are too many definitions of “AGI” at this point. Many people would make that area much narrower, but possibly in different ways.)
Visualizing things this way also makes it easier for me[2] to ask: Where do various threat models kick in? Where do we get “transformative” effects? (Where does “TAI” lie?)
Another question that I keep thinking about is something like: “what are key narrow (sets of) capabilities such that the risks from models grow ~linearly as they improve on those capabilities?” Or maybe “What is the narrowest set of capabilities for which we capture basically all the relevant info by turning the axes above into something like ‘average ability on that set’ and ‘coverage of those abilities’, and then plotting how risk changes as we move the frontier?”
The most plausible sets of abilities like this might be something like:
Everything necessary for AI R&D[3]
Long-horizon planning and technical skills?
If I try the former, how does risk from different AI systems change?
And we could try drawing some curves that represent our guesses about how the risk changes as we make progress on a narrow set of AI capabilities on the x-axis. This is very hard; I worry that companies focus on benchmarks in ways that make them less meaningful, so I don’t want to put performance on a specific benchmark on the x-axis. But we could try placing some fuzzier “true” milestones along the way, asking what the shape of the curve would be in reference to those, and then trying to approximate how far along we are with respect to them by using a combination of metrics and other measures. (Of course, it’s also really difficult to develop a reasonable/useful sense for how far apart those milestones are on the most appropriate measure of progress — or how close partial completion of these milestones is to full completion.)
Here’s a sketch:
Overall I’m really unsure of which milestones I should pay attention to here, and how risk changes as we might move through them.
It could make sense to pay attention to real-world impacts of (future) AI systems instead of their ~intrinsic qualities, but real-world impacts seem harder to find robust precursors to, rely on many non-AI factors, and interpreting them involves trying to untangle many different cruxes or worldviews. (Paying attention to both intrinsic qualities and real-world impacts seems useful and important, though.)
All of this also complicates how I relate to questions like “Is AI progress speeding up or slowing down?” (If I ignore all of these confusions and just try to evaluate progress intuitively / holistically, it doesn’t seem to be slowing down in relevant ways.[4])
Thoughts/suggestions/comments on any of this are very welcome (although I may not respond, at least not quickly).
Some content related to at least some of the above (non-exhaustive):
METR’s RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts
AI Impacts page on HLAI, especially “Human-level” is superhuman
List of some definitions of advanced AI systems
John Wentworth distinguishing between “early transformative AI” and “superintelligence”
Holden Karnofsky’s recent piece for the Carnegie Endowment for International Peace: AI Has Been Surprising for Years
Writing on (issues with) benchmarks/evals: Kelsey Piper in Vox, Anthropic’s “challenges in evaluating AI systems”, Epoch in 2023 on how well compute predicts benchmark performance (pretty well on average, harder individually)
Recent post that I appreciated, which outlined timelines via some milestones and then outlined a picture of how the world might change in the background
Not representing “Forethought views” here! (I don’t know what Forethought folks think of all of this.)
Written/drawn very quickly.
This diagram also makes me wonder how much pushing the bottom right corner further to the right (on some relevant capabilities) would help (especially as an alternative to pushing up or diagonally), given that sub-human general models don’t seem that safety-favoring but some narrow-but-relevant superhuman capabilities could help us deal with the risks of more general, human-level+ systems.
Could a narrower set work? E.g. to what extent do we just care about ML engineering?
although I’m somewhat surprised at the lack of apparent real-world effects