Re: On whether OpenAI could make a role that feels insufficiently truly safety-focused: there have been and continue to be OpenAI safety-ish roles that we donāt list because we lack confidence theyāre safety-focused.
For the alignment role in question, I think the team description given at the top of the post gives important context for the roleās responsibilities:
OpenAIās Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
With the above in mind, the role responsibilities seem fine to me. I think this is all pretty tricky, but in general, Iāve been moving toward looking at this in terms of the teams:
Alignment Science: Per the above team description, Iām excited for people to work there ā though, concerning the question of what evidence would shift me, this would change if the research they release doesnāt match the team description.
Preparedness: I continue to think itās good for people to work on this team, as per the description: āThis team ⦠is tasked with identifying, tracking, and preparing for catastrophic risks related to frontier AI models.ā
Safety Systems: I think roles here depend on what they address. I think the problems listed in their team description include problems I definitely want people working on (detecting unknown classes of harm, red-teaming to discover novel failure cases, sharing learning across industry, etc), but itās possible that we should be more restrictive in which roles we list from this team.
I donāt feel confident giving a probability here, but I do think thereās a crux here around me not expecting the above team descriptions to be straightforward lies. Itās possible that the teams will have limited resources to achieve their goals, and with the Safety Systems team in particular, I think thereās an extra risk of safety work blending into product work. However, my impression is that the teams will continue to work on their stated goals.
I do think itās worthwhile to think of some evidence that would shift me against listing roles from a team:
If a team doesnāt publish relevant safety research within something like a year.
If a teamās stated goal is updated to have less safety focus.
Other notes:
Weāre actually in the process of updating the AI company article.
The top-level disclaimer: Yeah, I think this needs updating to something more concrete. We put it up while āeverything was happeningā but Iāve neglected to change it, which is my mistake and will probably prioritize fixing over the next few days.
Thanks for diving into the implicit endorsement point. I acknowledge this could be a problem (and if so, I want to avoid it or at least mitigate it), so Iām going to think about what to do here.
Fwiw while writing the above, I did also think āhmm, I should also have some cruxes for āwhat would update me towards āthese jobs are more real than I currently think.āā Iām mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesnāt make sense:
we already have evidence of them failing to uphold commitments theyāve made in clear cut ways. (i.e. Iād count their superalignment compute promises as basically a straightforward lie, and if not a ālieā, it at least clearly demonstrates that their written words donāt count for much. This seems straightforwardly relevant to the specific topic of āwhat does a given job at OpenAI entail?ā, in addition to being evidence about their overall relationship with existential safety)
weāve similarly seen OpenAI change itās stated policies, such as removing restrictions on military use. Or, initially being a nonprofit and converting into āfor-profit-managed by non-profitā (where the āmanaged by nonprofit boardā part turned out to be pretty ineffectual) (not sure if I endorse this, mulling over Habrykaās comment)
Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to āsay things that turned out not to be trueā before you stop taking them at face value? And why is that āmore times than they have already?ā.
Separate from straightforward lies, and/āor altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of āwhat counts as alignment.ā They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of āiterate on āalignmentā with nearterm systems as best you can to learn and prepareā is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, itās very hard to tell the difference from the outside between:
āiterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligencesā
āiterating on nearterm systems, in a way that you think/āhope will teach you skills for navigating superintelligence⦠but, youāre wrong about how much youāre learning, and whether itās net positiveā
āiterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that itās particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says itās about.
You donāt just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: āinsufficiently competent to outweigh risks and downsidesā), even if their heart were in the right place.
I am very worried that this isnāt already something you have explicit models about.
As Iāve discussed in the comments on a related post, I donāt think OpenAI meaningfully changed any of its stated policies with regards to military usage. I donāt think OpenAI really ever promised anyone they wouldnāt work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. Itās similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldnāt consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, letās say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
The arguments you give all sound like reasons OpenAI safety positions could be beneficial. But I find them completely swamped by all the evidence that they wonāt be, especially given how much evidence OpenAI has hidden via NDAs.
But letās assume weāre in a world where certain people could do meaningful safety work an OpenAI. What are the chances those people need 80k to tell them about it? OpenAI is the biggest, most publicized AI company in the world; if Alice only finds out about OpenAI jobs via 80k thatās prima facie evidence she wonāt make a contribution to safety.
What could the listing do? Maybe Bob has heard of OAI but is on the fence about applying. An 80k job posting might push him over the edge to applying or accepting. The main way I see that happening is via a halo effect from 80k. The mere existence of the posting implies that the job is aligned with EA/ā80kās values.
I donāt think thereās a way to remove that implication with any amount of disclaimers. The job is still on the board. If anything disclaimers make the best case scenarios seem even better, because why else would you host such a dangerous position?
So let me ask: what do you see as the upside to highlighting OAI safety jobs on the job board? Not of the job itself, but the posting. Who is it that would do good work in that role, and the 80k job board posting is instrumental in them entering it?
Update: Weāve changed the language in our top-level disclaimers: example. Thanks again for flagging! Weāre now thinking about how to best minimize the possibility of implying endorsement.
To try to be a bit more helpful rather than just complaining and arguing: when I model your current worldview, and try to imagine a disclaimer that helps a bit more with my concerns but seems like it might work for you given your current views, hereās a stab. Changes bolded.
OpenAI is a frontier AI research and product company, with teams working on alignment, policy, and security. We recommend specific opportunities at OpenAI that we think may be high impact. We recommend applicants pay attention to the details of individual roles at OpenAI, and form their own judgment about whether the role is net positive. We do not necessarily recommend working at other positions at OpenAI
(itās not my main crux, by āfrontierā felt both like a more up-to-date term for what OpenAI does, and also feels more specifically like itās making a claim about the product than generally awarding status to the company the way āleadingā does)
I canāt find the disclaimer. Not saying it isnāt there. But it should be obvious from just skimming the page, since that is what most people will do.
Re: On whether OpenAI could make a role that feels insufficiently truly safety-focused: there have been and continue to be OpenAI safety-ish roles that we donāt list because we lack confidence theyāre safety-focused.
For the alignment role in question, I think the team description given at the top of the post gives important context for the roleās responsibilities:
OpenAIās Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
With the above in mind, the role responsibilities seem fine to me. I think this is all pretty tricky, but in general, Iāve been moving toward looking at this in terms of the teams:
Alignment Science: Per the above team description, Iām excited for people to work there ā though, concerning the question of what evidence would shift me, this would change if the research they release doesnāt match the team description.
Preparedness: I continue to think itās good for people to work on this team, as per the description: āThis team ⦠is tasked with identifying, tracking, and preparing for catastrophic risks related to frontier AI models.ā
Safety Systems: I think roles here depend on what they address. I think the problems listed in their team description include problems I definitely want people working on (detecting unknown classes of harm, red-teaming to discover novel failure cases, sharing learning across industry, etc), but itās possible that we should be more restrictive in which roles we list from this team.
I donāt feel confident giving a probability here, but I do think thereās a crux here around me not expecting the above team descriptions to be straightforward lies. Itās possible that the teams will have limited resources to achieve their goals, and with the Safety Systems team in particular, I think thereās an extra risk of safety work blending into product work. However, my impression is that the teams will continue to work on their stated goals.
I do think itās worthwhile to think of some evidence that would shift me against listing roles from a team:
If a team doesnāt publish relevant safety research within something like a year.
If a teamās stated goal is updated to have less safety focus.
Other notes:
Weāre actually in the process of updating the AI company article.
The top-level disclaimer: Yeah, I think this needs updating to something more concrete. We put it up while āeverything was happeningā but Iāve neglected to change it, which is my mistake and will probably prioritize fixing over the next few days.
Thanks for diving into the implicit endorsement point. I acknowledge this could be a problem (and if so, I want to avoid it or at least mitigate it), so Iām going to think about what to do here.
Thanks.
Fwiw while writing the above, I did also think āhmm, I should also have some cruxes for āwhat would update me towards āthese jobs are more real than I currently think.āā Iām mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesnāt make sense:
we already have evidence of them failing to uphold commitments theyāve made in clear cut ways. (i.e. Iād count their superalignment compute promises as basically a straightforward lie, and if not a ālieā, it at least clearly demonstrates that their written words donāt count for much. This seems straightforwardly relevant to the specific topic of āwhat does a given job at OpenAI entail?ā, in addition to being evidence about their overall relationship with existential safety)
weāve similarly seen OpenAI change itās stated policies, such asremoving restrictions on military use. Or, initially being a nonprofit and converting into āfor-profit-managed by non-profitā (where the āmanaged by nonprofit boardā part turned out to be pretty ineffectual)(not sure if I endorse this, mulling over Habrykaās comment)Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to āsay things that turned out not to be trueā before you stop taking them at face value? And why is that āmore times than they have already?ā.
Separate from straightforward lies, and/āor altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of āwhat counts as alignment.ā They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of āiterate on āalignmentā with nearterm systems as best you can to learn and prepareā is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, itās very hard to tell the difference from the outside between:
āiterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligencesā
āiterating on nearterm systems, in a way that you think/āhope will teach you skills for navigating superintelligence⦠but, youāre wrong about how much youāre learning, and whether itās net positiveā
āiterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that itās particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says itās about.
You donāt just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: āinsufficiently competent to outweigh risks and downsidesā), even if their heart were in the right place.
I am very worried that this isnāt already something you have explicit models about.
As Iāve discussed in the comments on a related post, I donāt think OpenAI meaningfully changed any of its stated policies with regards to military usage. I donāt think OpenAI really ever promised anyone they wouldnāt work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. Itās similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldnāt consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, letās say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
Mmm, nod. I will look into the actual history here more, but, sounds plausible. (edited the previous comment a bit for now)
The arguments you give all sound like reasons OpenAI safety positions could be beneficial. But I find them completely swamped by all the evidence that they wonāt be, especially given how much evidence OpenAI has hidden via NDAs.
But letās assume weāre in a world where certain people could do meaningful safety work an OpenAI. What are the chances those people need 80k to tell them about it? OpenAI is the biggest, most publicized AI company in the world; if Alice only finds out about OpenAI jobs via 80k thatās prima facie evidence she wonāt make a contribution to safety.
What could the listing do? Maybe Bob has heard of OAI but is on the fence about applying. An 80k job posting might push him over the edge to applying or accepting. The main way I see that happening is via a halo effect from 80k. The mere existence of the posting implies that the job is aligned with EA/ā80kās values.
I donāt think thereās a way to remove that implication with any amount of disclaimers. The job is still on the board. If anything disclaimers make the best case scenarios seem even better, because why else would you host such a dangerous position?
So let me ask: what do you see as the upside to highlighting OAI safety jobs on the job board? Not of the job itself, but the posting. Who is it that would do good work in that role, and the 80k job board posting is instrumental in them entering it?
Update: Weāve changed the language in our top-level disclaimers: example. Thanks again for flagging! Weāre now thinking about how to best minimize the possibility of implying endorsement.
Following up my other comment:
To try to be a bit more helpful rather than just complaining and arguing: when I model your current worldview, and try to imagine a disclaimer that helps a bit more with my concerns but seems like it might work for you given your current views, hereās a stab. Changes bolded.
(itās not my main crux, by āfrontierā felt both like a more up-to-date term for what OpenAI does, and also feels more specifically like itās making a claim about the product than generally awarding status to the company the way āleadingā does)
Thanks. This still seems pretty insufficient to me, but, itās at least an improvement and I appreciate you making some changes here.
I canāt find the disclaimer. Not saying it isnāt there. But it should be obvious from just skimming the page, since that is what most people will do.