Things I’d say to people who are starting out with AI Safety
Intro:
I’m imagining someone “with a profession”
(a mathematician / developer / product manager / researcher / something else) who’s been following AI Safety through Scott Alexander or LW or so, and want to do something more seriously now
To be clear, I am absolutely unqualified to give any advice here, and everyone is invited to point out disagreements.
I did this for ~3 months.
This is not my normal “software career” advice (of which I’m much more confident)
I’m going to prefer to be opinionated and wrong than to put so many disclaimers here that my words mean nothing. You’ll get my opinion
So, some things which I wish I’d known when I started all this, or other stuff that seems useful:
There is no clear “this is the way to solve AI Safety, just learn X and then do Y”.
Similar to how, maybe, there’s no “this is the way to solve cancer, just learn X and then do Y”, just much worse. With cancer we have I guess 1000 ideas or so, and many of them have already cured/detected/reduced cancer, at least with cancer there are clear things we want to learn (I think?). with AI Safety we have about 5-20 serious (whatever that means) ideas and I can’t personally say about any of them “omg that would totally solve the problem if we could get it to work”. Still, they each have some kind of upside (which I can make a cancer-research metaphor for), so for some definition of progress, that would make progress
Even worse, some solutions seem (to me and to many others) to cause more harm than good.
Historically, people who cared about AI Safety have pushed AI Capabilities a LOT (which I think is bad)
Even worse, there is no consensus.
Really smart people are discussing this online but not coming to clear (to me) conclusions, and they have (in my opinion) maybe the best online platform for healthy discourse in the world (lesswrong)
And so, looking at this situation, it seems to me like you have a choice.
You can either try to figure out something better than everyone else has (which includes also “figure out who of all these people is correct, without knowing who to trust”),
or you can chose some project to join without understanding why they’re doing whatever they’re doing (beyond hearing the pitch and nodding, or seeing they got a lot of upvotes somewhere)
If your path is going to be “just go do something”, then
my biggest piece of advice (or more realistically, my request), is “make sure you don’t cause damage”, and specifically make sure you know what the “Unilateralist’s Curse” is. It basically means that if we have 1000 people and they can all take some potentially-dangerous action, then the most happy-to-take-risks person (or maybe, the least-aware-of-risks person) is the one who’s going to take that action.
This includes suggestions like “cause an AI accident so people will be afraid” (which I hope nobody will do and I’m afraid that as people join the movement one of them will do that).
On a meta level, if you invite lots of risk-taking people to learn a bit about AI Safety and then they go do something bad, that is just as dangerous and I have no idea how as a community we’re going to avoid that long term. But in the bottom line, please don’t make the situation worse, that is really important in my opinion, and also, not-making-things-worse turns out to be surprisingly hard
If you’re looking for someone to trust about wether to join some Project X and you care about my opinion, I’d point you to the opinions of Yudkowsky and Nate (update: maybe also Zvi): did they write something about Project X? In other words, I’m pointing at them as trustworthy-in-my-(current)-opinion. Some people would say it is bad for me to do that rather than telling you to study AI Safety for 3 months and form your own opinions. I agree (if you’re interested in studying), but if not—well I think me writing this is making the situation better and not worse (and I also think my message here won’t brainwash anyone)
I also endorse taking a job that 80k recommends (consider getting coaching with them, signing up to their longtermist-census, and/or looking at the “top recommended” jobs in their job board (update: they might have removed that option 🙀)). This is somewhat the best we have as a community, and if you talk to 80k you’re unlikely to miss a “big (job) win” that you’d find otherwise, I think.
If you chose the path of “figure out something smart that nobody else has”… (this is my choice)
First of all, the field is flooded with people trying to do that. Almost nobody manages (in my opinion), so the priors are bad. I think this is important to acknowledge just like “most startups fail” is important to acknowledge. So, given these low priors, how would you approach the problem?
I’d encourage that you do something “high variance”, where if you’d do well, you might be better than everyone else in the field. [disclaimer about don’t-do-damage]
For example, if something seems super cool and fun to you—consider doing that.
Prompt: “Is there something you like doing for fun that others consider work?” (I think Alex from 80k came up with this, I really like it)
For example, if you have a strange unusual talent—can you use it?
Examples from myself for places that I think I might have a big advantage over others:
It seems obvious to me that “getting feedback quickly and often” is super important and nobody is doing that. Everyone’s just reading books and stuff, this seems like an obvious mistake to me, so I’m doing it differently (not that I’m not reading at all).
Note this might go really well or I might fall on my face, but worst case, I do 0 impact but also 0 damage
Infosec around AI Safety seems inspiringly terrible, and I notice this because I did infosec for the Israeli military & government, which are orgs that actually care about their infosec (as opposed to most of the industry which seems to me to want to “look like they care about security” or something like that) [since then, EA started taking AIS Infosec more seriously, unclear what the situation is today. Specifically there’s an infosec reading group and an 80k article]
It seems to me like everyone’s dropping the ball of “having productive conversations about AI Safety over video”, and specifically—people (like me) who don’t live in a hub full of AI Safety people don’t have enough people to talk to, so I’m attempting to set something like that up. Consider joining! See #hangout for more [update: I sort of gave up on this]
I have priors about how to learn software development (or product management), and I’m using them to learn AI Safety. This is mainly around “feedback”
Seems like everyone’s forgetting to ask “if this research agenda would work amazingly, would it solve the problem?”, and are instead pursuing agendas that seem interesting/tractable. [I am probably totally wrong about this, don’t actually trust me please, I’m just trying to point out ways that I notice I think differently from most, and my attempt to embrace those (without causing damage) as opposed to taking the same path as everyone]
Apparently new people keep coming up with the same bad ideas again and again, there’s at least one post about that. The meme is that if you ask “why don’t we just do X” then you’re missing something. I still think it’s worth asking (for example, in #no-dumb-questions which won’t spam anyone), but expect the answer to teach you something and make you smarter, don’t expect a “you solved it, we’ll call the president now!” moment
Look after your mental health somehow, and/or quit before your mental health gets too bad
I play VR games, I wonder if people would like to join sometimes [update: no more. and also, not doing a great job with my own mental health]
Money
You can apply to LTFF even with 0 experience, I think. Worst case you can apply again later, I think. [update: Nonlinear has a form to apply to tons of grants at once. but still, I think most people are stuck on “I’m not worthy of applying” or so, which I think is best addressed by trying, and if you’re rejected, then try again later]
Jobs pay money, consider applying to a job
Not having enough money is bad. I don’t think it’s good to encourage people to work without being paid. You have my support to go take a normal job and get paid.
Beware the meta trap
For example, “we will solve ai alignment by accelerating ai alignment research” (me: but.. which research? the research that accelerates alignment research that accelerates alignment research? at some point you’ve got to accelerate something object-level)
For example, “I will help by helping others help” (who will you help? are they doing something important? are they maybe working on helping others helping others help others recruit people who can help more people?)
For example, “I will investigate timelines so that we can redistribute the funding in a way that will make sense given the time we have left” (but.. are there any concrete research projects that you think are maybe getting too-much or too-little funding given something you might maybe discover about timelines? maybe, for example, everything good is already getting funding? or maybe only bad things are? what is even good?)
This is different if a funder (OpenPhil?) explicitly asks for help with timelines, and if you are in the path of “trust others” and you trust that if OpenPhil asks for something explicitly then they know what they’re talking about.
I am not saying that going meta is always bad and I do think some of these projects make sense, but it’s really hard to figure out which ones, and it seems to me like some people who want to help with alignment and go directly meta are trying to solve very wrong problems.
I am not against doing high quality product management, including user research, including figuring out who the important users are, and solving those people’s problems, startup-style. (but I’d only recommend this path to few people)
Beware the “reading forever” trap
If your algorithm is something like “while there is more to read that seems important, read that thing”, then you will never exit this loop (I think). Pick your solution to this problem, but don’t ignore it
If you start reading about how to solve the “reading forever trap”
(I’m tried, maybe I’ll continue this later. remember that I have no idea what I’m talking about and that other more qualified people have written about this too)
I’ve originally posted this in here, in the AI Alignment Slack. If you’re interested, I put a lot of my journey (my questions, my solution ideas, and so on) in the same channel.
Since then I’ve become more pessimistic and am leaning away from trying to solve AI Alignment myself. Maybe I’ll write about that too.
Things I’d say to people who are starting out with AI Safety
Intro:
I’m imagining someone “with a profession”
(a mathematician / developer / product manager / researcher / something else) who’s been following AI Safety through Scott Alexander or LW or so, and want to do something more seriously now
To be clear, I am absolutely unqualified to give any advice here, and everyone is invited to point out disagreements.
I did this for ~3 months.
This is not my normal “software career” advice (of which I’m much more confident)
I’m going to prefer to be opinionated and wrong than to put so many disclaimers here that my words mean nothing. You’ll get my opinion
So, some things which I wish I’d known when I started all this, or other stuff that seems useful:
There is no clear “this is the way to solve AI Safety, just learn X and then do Y”.
Similar to how, maybe, there’s no “this is the way to solve cancer, just learn X and then do Y”, just much worse. With cancer we have I guess 1000 ideas or so, and many of them have already cured/detected/reduced cancer, at least with cancer there are clear things we want to learn (I think?). with AI Safety we have about 5-20 serious (whatever that means) ideas and I can’t personally say about any of them “omg that would totally solve the problem if we could get it to work”. Still, they each have some kind of upside (which I can make a cancer-research metaphor for), so for some definition of progress, that would make progress
Even worse, some solutions seem (to me and to many others) to cause more harm than good.
Historically, people who cared about AI Safety have pushed AI Capabilities a LOT (which I think is bad)
Even worse, there is no consensus.
Really smart people are discussing this online but not coming to clear (to me) conclusions, and they have (in my opinion) maybe the best online platform for healthy discourse in the world (lesswrong)
And so, looking at this situation, it seems to me like you have a choice.
You can either try to figure out something better than everyone else has (which includes also “figure out who of all these people is correct, without knowing who to trust”),
or you can chose some project to join without understanding why they’re doing whatever they’re doing (beyond hearing the pitch and nodding, or seeing they got a lot of upvotes somewhere)
If your path is going to be “just go do something”, then
my biggest piece of advice (or more realistically, my request), is “make sure you don’t cause damage”, and specifically make sure you know what the “Unilateralist’s Curse” is. It basically means that if we have 1000 people and they can all take some potentially-dangerous action, then the most happy-to-take-risks person (or maybe, the least-aware-of-risks person) is the one who’s going to take that action.
This includes suggestions like “cause an AI accident so people will be afraid” (which I hope nobody will do and I’m afraid that as people join the movement one of them will do that).
On a meta level, if you invite lots of risk-taking people to learn a bit about AI Safety and then they go do something bad, that is just as dangerous and I have no idea how as a community we’re going to avoid that long term. But in the bottom line, please don’t make the situation worse, that is really important in my opinion, and also, not-making-things-worse turns out to be surprisingly hard
If you’re looking for someone to trust about wether to join some Project X and you care about my opinion, I’d point you to the opinions of Yudkowsky and Nate (update: maybe also Zvi): did they write something about Project X? In other words, I’m pointing at them as trustworthy-in-my-(current)-opinion. Some people would say it is bad for me to do that rather than telling you to study AI Safety for 3 months and form your own opinions. I agree (if you’re interested in studying), but if not—well I think me writing this is making the situation better and not worse (and I also think my message here won’t brainwash anyone)
I also endorse taking a job that 80k recommends (consider getting coaching with them, signing up to their longtermist-census, and/or looking at the “top recommended” jobs in their job board (update: they might have removed that option 🙀)). This is somewhat the best we have as a community, and if you talk to 80k you’re unlikely to miss a “big (job) win” that you’d find otherwise, I think.
If you chose the path of “figure out something smart that nobody else has”… (this is my choice)
First of all, the field is flooded with people trying to do that. Almost nobody manages (in my opinion), so the priors are bad. I think this is important to acknowledge just like “most startups fail” is important to acknowledge. So, given these low priors, how would you approach the problem?
I’d encourage that you do something “high variance”, where if you’d do well, you might be better than everyone else in the field. [disclaimer about don’t-do-damage]
For example, if something seems obvious to you but nobody else is doing it—consider doing that.
For example, if something seems super cool and fun to you—consider doing that.
Prompt: “Is there something you like doing for fun that others consider work?” (I think Alex from 80k came up with this, I really like it)
For example, if you have a strange unusual talent—can you use it?
Examples from myself for places that I think I might have a big advantage over others:
It seems obvious to me that “getting feedback quickly and often” is super important and nobody is doing that. Everyone’s just reading books and stuff, this seems like an obvious mistake to me, so I’m doing it differently (not that I’m not reading at all).
Note this might go really well or I might fall on my face, but worst case, I do 0 impact but also 0 damage
Infosec around AI Safety seems inspiringly terrible, and I notice this because I did infosec for the Israeli military & government, which are orgs that actually care about their infosec (as opposed to most of the industry which seems to me to want to “look like they care about security” or something like that) [since then, EA started taking AIS Infosec more seriously, unclear what the situation is today. Specifically there’s an infosec reading group and an 80k article]
It seems to me like everyone’s dropping the ball of “having productive conversations about AI Safety over video”, and specifically—people (like me) who don’t live in a hub full of AI Safety people don’t have enough people to talk to, so I’m attempting to set something like that up. Consider joining! See #hangout for more [update: I sort of gave up on this]
I have priors about how to learn software development (or product management), and I’m using them to learn AI Safety. This is mainly around “feedback”
Seems like everyone’s forgetting to ask “if this research agenda would work amazingly, would it solve the problem?”, and are instead pursuing agendas that seem interesting/tractable. [I am probably totally wrong about this, don’t actually trust me please, I’m just trying to point out ways that I notice I think differently from most, and my attempt to embrace those (without causing damage) as opposed to taking the same path as everyone]
Apparently new people keep coming up with the same bad ideas again and again, there’s at least one post about that. The meme is that if you ask “why don’t we just do X” then you’re missing something. I still think it’s worth asking (for example, in #no-dumb-questions which won’t spam anyone), but expect the answer to teach you something and make you smarter, don’t expect a “you solved it, we’ll call the president now!” moment
Look after your mental health somehow, and/or quit before your mental health gets too bad
I play VR games, I wonder if people would like to join sometimes [update: no more. and also, not doing a great job with my own mental health]
Money
You can apply to LTFF even with 0 experience, I think. Worst case you can apply again later, I think. [update: Nonlinear has a form to apply to tons of grants at once. but still, I think most people are stuck on “I’m not worthy of applying” or so, which I think is best addressed by trying, and if you’re rejected, then try again later]
Jobs pay money, consider applying to a job
Not having enough money is bad. I don’t think it’s good to encourage people to work without being paid. You have my support to go take a normal job and get paid.
Beware the meta trap
For example, “we will solve ai alignment by accelerating ai alignment research” (me: but.. which research? the research that accelerates alignment research that accelerates alignment research? at some point you’ve got to accelerate something object-level)
For example, “I will help by helping others help” (who will you help? are they doing something important? are they maybe working on helping others helping others help others recruit people who can help more people?)
For example, “I will investigate timelines so that we can redistribute the funding in a way that will make sense given the time we have left” (but.. are there any concrete research projects that you think are maybe getting too-much or too-little funding given something you might maybe discover about timelines? maybe, for example, everything good is already getting funding? or maybe only bad things are? what is even good?)
This is different if a funder (OpenPhil?) explicitly asks for help with timelines, and if you are in the path of “trust others” and you trust that if OpenPhil asks for something explicitly then they know what they’re talking about.
I am not saying that going meta is always bad and I do think some of these projects make sense, but it’s really hard to figure out which ones, and it seems to me like some people who want to help with alignment and go directly meta are trying to solve very wrong problems.
I am not against doing high quality product management, including user research, including figuring out who the important users are, and solving those people’s problems, startup-style. (but I’d only recommend this path to few people)
Beware the “reading forever” trap
If your algorithm is something like “while there is more to read that seems important, read that thing”, then you will never exit this loop (I think). Pick your solution to this problem, but don’t ignore it
If you start reading about how to solve the “reading forever trap”
(I’m tried, maybe I’ll continue this later. remember that I have no idea what I’m talking about and that other more qualified people have written about this too)
I’ve originally posted this in here, in the AI Alignment Slack. If you’re interested, I put a lot of my journey (my questions, my solution ideas, and so on) in the same channel.
Since then I’ve become more pessimistic and am leaning away from trying to solve AI Alignment myself. Maybe I’ll write about that too.
If I’d point you to one more resource, it would be AGI safety career advice by Richard Ngo.