Here’s some quick takes on what you can do if you want to contribute to AI safety or governance (they may generalise, but no guarantees). Paraphrased from a longer talk I gave, transcript here.
First, there’s still tons of alpha left in having good takes.
(Matt Reardon originally said this to me and I was like, “what, no way”, but now I think he was right and this is still true – thanks Matt!)
You might be surprised, because there’s many people doing AI safety and governance work, but I think there’s still plenty of demand for good takes, and you can distinguish yourself professionally by being a reliable source of them.
But how do you have good takes?
I think the thing you do to form good takes, oversimplifying only slightly, is you read Learning by Writing and you go “yes, that’s how I should orient to the reading and writing that I do,” and then you do that a bunch of times with your reading and writing on AI safety and governance work, and then you share your writing somewhere and have lots of conversations with people about it and change your mind and learn more, and that’s how you have good takes.
What to read?
Start with the basics (e.g. BlueDot’s courses, other reading lists) then work from there on what’s interesting x important
Write in public
Usually, if you haven’t got evidence of your takes being excellent, it’s not that useful to just generally voice your takes. I think having takes and backing them up with some evidence, or saying things like “I read this thing, here’s my summary, here’s what I think” is useful. But it’s kind of hard to get readers to care if you’re just like “I’m some guy, here are my takes.”
Some especially useful kinds of writing
In order to get people to care about your takes, you could do useful kinds of writing first, like:
Explaining important concepts
E.g., evals awareness, non-LLM architectures (should I care? why?) , AI control, best arguments for/against short timelines, continual learning shenanigans
Collecting evidence on particular topics
E.g., empirical evidence of misalignment, AI incidents in the wild
Summarizing and giving reactions to important resources that many people won’t have time to read
For example, if someone wrote a blog post on “I read Anthropic’s sabotage report, and here’s what I think about it,” I would probably read that blog post, and might find it useful.
Writing vignettes, like AI 2027, about your mainline predictions for how AI development goes.
Ideas for technical AI safety
Reproduce papers
Port evals to Inspect
Do the same kinds of quick and shallow exploration you’re probably already doing, but write about it—put your code on the internet and write a couple paragraphs about your takeaways, and then someone might actually read it!
Some quickly-generated, not-at-all-prioritised ideas for topics
Stated vs revealed preferences in LLMs
How sensitive to prompting is Anthropic’s blackmail results?
Testing eval awareness on different models/with different evaluations
Can you extend persona vectors to make LLMs better at certain tasks? (Is there a persona vector for “careful conceptual reasoning”?)
Is unsupervised elicitation a good way to elicit hidden/password-locked/sandbagged capabilities?
You can also generate these topics yourself by asking, “What am I interested in?”
Nobody’s on the ball
I think there are many topics in AI safety and governance where nobody’s on the ball at all.
And on the one hand, this kind of sucks: nobody’s on the ball, and it’s maybe a really big deal, and no one is handling it, and we’re not on track to make it go well.
But on the other hand, at least selfishly, for your personal career—yay, nobody’s on the ball! You could just be on the ball yourself: there’s not that much competition.
So if you spend some time thinking about AI safety and governance, you could probably pretty easily become an expert in something pretty fast, and end up having pretty good takes, and therefore just help a bunch.
Here’s some quick takes on what you can do if you want to contribute to AI safety or governance (they may generalise, but no guarantees). Paraphrased from a longer talk I gave, transcript here.
First, there’s still tons of alpha left in having good takes.
(Matt Reardon originally said this to me and I was like, “what, no way”, but now I think he was right and this is still true – thanks Matt!)
You might be surprised, because there’s many people doing AI safety and governance work, but I think there’s still plenty of demand for good takes, and you can distinguish yourself professionally by being a reliable source of them.
But how do you have good takes?
I think the thing you do to form good takes, oversimplifying only slightly, is you read Learning by Writing and you go “yes, that’s how I should orient to the reading and writing that I do,” and then you do that a bunch of times with your reading and writing on AI safety and governance work, and then you share your writing somewhere and have lots of conversations with people about it and change your mind and learn more, and that’s how you have good takes.
What to read?
Start with the basics (e.g. BlueDot’s courses, other reading lists) then work from there on what’s interesting x important
Write in public
Usually, if you haven’t got evidence of your takes being excellent, it’s not that useful to just generally voice your takes. I think having takes and backing them up with some evidence, or saying things like “I read this thing, here’s my summary, here’s what I think” is useful. But it’s kind of hard to get readers to care if you’re just like “I’m some guy, here are my takes.”
Some especially useful kinds of writing
In order to get people to care about your takes, you could do useful kinds of writing first, like:
Explaining important concepts
E.g., evals awareness, non-LLM architectures (should I care? why?) , AI control, best arguments for/against short timelines, continual learning shenanigans
Collecting evidence on particular topics
E.g., empirical evidence of misalignment, AI incidents in the wild
Summarizing and giving reactions to important resources that many people won’t have time to read
For example, if someone wrote a blog post on “I read Anthropic’s sabotage report, and here’s what I think about it,” I would probably read that blog post, and might find it useful.
Writing vignettes, like AI 2027, about your mainline predictions for how AI development goes.
Ideas for technical AI safety
Reproduce papers
Port evals to Inspect
Do the same kinds of quick and shallow exploration you’re probably already doing, but write about it—put your code on the internet and write a couple paragraphs about your takeaways, and then someone might actually read it!
Some quickly-generated, not-at-all-prioritised ideas for topics
Stated vs revealed preferences in LLMs
How sensitive to prompting is Anthropic’s blackmail results?
Testing eval awareness on different models/with different evaluations
Can you extend persona vectors to make LLMs better at certain tasks? (Is there a persona vector for “careful conceptual reasoning”?)
Is unsupervised elicitation a good way to elicit hidden/password-locked/sandbagged capabilities?
You can also generate these topics yourself by asking, “What am I interested in?”
Nobody’s on the ball
I think there are many topics in AI safety and governance where nobody’s on the ball at all.
And on the one hand, this kind of sucks: nobody’s on the ball, and it’s maybe a really big deal, and no one is handling it, and we’re not on track to make it go well.
But on the other hand, at least selfishly, for your personal career—yay, nobody’s on the ball! You could just be on the ball yourself: there’s not that much competition.
So if you spend some time thinking about AI safety and governance, you could probably pretty easily become an expert in something pretty fast, and end up having pretty good takes, and therefore just help a bunch.
Consider doing that!
(All views here my own.)