I won’t argue more for removing infosec roles at the moment. As noted in the post, I think this is at least a reasonable position to hold. I (weakly) disagree, but for reasons that don’t seem worth getting into here.
The things I’d argue here:
Safetywashing is actually pretty bad, for the world’s epistemics and for EA and AI safety’s collective epistemics. I think it also warps the epistemics of the people taking the job, so while they might be getting some career experience… they’re also likely getting a distorted view of what what AI safety is, and becoming worse researchers than they would otherwise.
As previously stated – it’s not that I don’t think anyone should take these jobs, but I think the sort of person who should take them is someone who has a higher degree of context and skill than I expect the 80k job board to filter for.
Even if you disagree with those points, you should have some kind of crux for “what would distinguish an ‘impactful AI safety job?’” vs a fake safety-washed role. It should be at least possible for OpenAI to make a role so clearly fake that you notice and stop listing it.
If you’re set on continuing to list OpenAI Alignment roles, I think the current disclaimer is really inadequate and misleading. (Partly because of the object-level content in Working at an AI company, which I think wrongly characterizes OpenAI, and partly because what disclaimers you do have are deep in that post. On the top-level job ad, there’s no indication that applicants should be skeptical about OpenAI.
Re: cruxes for safetywashing
You’d presumably agree, OpenAI couldn’t just call any old job ‘Alignment Science’ and have it automatically count as worth listing on your site.
Companies at least sometimes lie, and they often use obfuscating language to mislead. OpenAI’s track record is such that, we know that they do lie and mislead. So, IMO, your prior here should be moderately high.
Maybe you only think it’s, like, 10%? (or less? IMO less than 10% feels pretty strained to me). But, at what credence would you stop listing it on the job board? And what evidence would increase your odds?
We are seeking Researchers to help design and implement experiments for alignment research. Responsibilities may include:
Writing performant and clean code for ML training
Independently running and analyzing ML experiments to diagnose problems and understand which changes are real improvements
Writing clean non-ML code, for example when building interfaces to let workers interact with our models or pipelines for managing human data
Collaborating closely with a small team to balance the need for flexibility and iteration speed in research with the need for stability and reliability in a complex long-lived project
Understanding our high-level research roadmap to help plan and prioritize future experiments
Designing novel approaches for using LLMs in alignment research
Want to use your engineering skills to push the frontiers of what state-of-the-art language models can accomplish
Possess a strong curiosity about aligning and understanding ML models, and are motivated to use your career to address this challenge
Enjoy fast-paced, collaborative, and cutting-edge research environments
Have experience implementing ML algorithms (e.g., PyTorch)
Can develop data visualization or data collection interfaces (e.g., JavaScript, Python)
Want to ensure that powerful AI systems stay under human control.
Is this an alignment research role, or a capabilities role that pays token lip service to alignment?My guess (60%) based on my knowledge of OpenAI is it’s more like the latter.
It says “Designing novel approaches for using LLMs in alignment research”, but that’s only useful if you think OpenAI uses the phrase “alignment research” to mean something important. We know they eventually coined the term “superalignment”, distinguished from most of what they’d been calling “alignment” work (where “superalignment” is closer to what was originally meant by the term).
If OpenAI was creating jobs that weren’t really helpful at all, but labeling them “alignment” anyway, how would you know?
Re: On whether OpenAI could make a role that feels insufficiently truly safety-focused: there have been and continue to be OpenAI safety-ish roles that we don’t list because we lack confidence they’re safety-focused.
For the alignment role in question, I think the team description given at the top of the post gives important context for the role’s responsibilities:
OpenAI’s Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
With the above in mind, the role responsibilities seem fine to me. I think this is all pretty tricky, but in general, I’ve been moving toward looking at this in terms of the teams:
Alignment Science: Per the above team description, I’m excited for people to work there – though, concerning the question of what evidence would shift me, this would change if the research they release doesn’t match the team description.
Preparedness: I continue to think it’s good for people to work on this team, as per the description: “This team … is tasked with identifying, tracking, and preparing for catastrophic risks related to frontier AI models.”
Safety Systems: I think roles here depend on what they address. I think the problems listed in their team description include problems I definitely want people working on (detecting unknown classes of harm, red-teaming to discover novel failure cases, sharing learning across industry, etc), but it’s possible that we should be more restrictive in which roles we list from this team.
I don’t feel confident giving a probability here, but I do think there’s a crux here around me not expecting the above team descriptions to be straightforward lies. It’s possible that the teams will have limited resources to achieve their goals, and with the Safety Systems team in particular, I think there’s an extra risk of safety work blending into product work. However, my impression is that the teams will continue to work on their stated goals.
I do think it’s worthwhile to think of some evidence that would shift me against listing roles from a team:
If a team doesn’t publish relevant safety research within something like a year.
If a team’s stated goal is updated to have less safety focus.
Other notes:
We’re actually in the process of updating the AI company article.
The top-level disclaimer: Yeah, I think this needs updating to something more concrete. We put it up while ‘everything was happening’ but I’ve neglected to change it, which is my mistake and will probably prioritize fixing over the next few days.
Thanks for diving into the implicit endorsement point. I acknowledge this could be a problem (and if so, I want to avoid it or at least mitigate it), so I’m going to think about what to do here.
Fwiw while writing the above, I did also think “hmm, I should also have some cruxes for ‘what would update me towards ‘these jobs are more real than I currently think.’” I’m mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesn’t make sense:
we already have evidence of them failing to uphold commitments they’ve made in clear cut ways. (i.e. I’d count their superalignment compute promises as basically a straightforward lie, and if not a “lie”, it at least clearly demonstrates that their written words don’t count for much. This seems straightforwardly relevant to the specific topic of “what does a given job at OpenAI entail?”, in addition to being evidence about their overall relationship with existential safety)
we’ve similarly seen OpenAI change it’s stated policies, such as removing restrictions on military use. Or, initially being a nonprofit and converting into “for-profit-managed by non-profit” (where the “managed by nonprofit board” part turned out to be pretty ineffectual) (not sure if I endorse this, mulling over Habryka’s comment)
Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to “say things that turned out not to be true” before you stop taking them at face value? And why is that “more times than they have already?”.
Separate from straightforward lies, and/or altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of “what counts as alignment.” They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of “iterate on ‘alignment’ with nearterm systems as best you can to learn and prepare” is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, it’s very hard to tell the difference from the outside between:
“iterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligences”
“iterating on nearterm systems, in a way that you think/hope will teach you skills for navigating superintelligence… but, you’re wrong about how much you’re learning, and whether it’s net positive”
“iterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that it’s particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says it’s about.
You don’t just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: “insufficiently competent to outweigh risks and downsides”), even if their heart were in the right place.
I am very worried that this isn’t already something you have explicit models about.
As I’ve discussed in the comments on a related post, I don’t think OpenAI meaningfully changed any of its stated policies with regards to military usage. I don’t think OpenAI really ever promised anyone they wouldn’t work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. It’s similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldn’t consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, let’s say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
The arguments you give all sound like reasons OpenAI safety positions could be beneficial. But I find them completely swamped by all the evidence that they won’t be, especially given how much evidence OpenAI has hidden via NDAs.
But let’s assume we’re in a world where certain people could do meaningful safety work an OpenAI. What are the chances those people need 80k to tell them about it? OpenAI is the biggest, most publicized AI company in the world; if Alice only finds out about OpenAI jobs via 80k that’s prima facie evidence she won’t make a contribution to safety.
What could the listing do? Maybe Bob has heard of OAI but is on the fence about applying. An 80k job posting might push him over the edge to applying or accepting. The main way I see that happening is via a halo effect from 80k. The mere existence of the posting implies that the job is aligned with EA/80k’s values.
I don’t think there’s a way to remove that implication with any amount of disclaimers. The job is still on the board. If anything disclaimers make the best case scenarios seem even better, because why else would you host such a dangerous position?
So let me ask: what do you see as the upside to highlighting OAI safety jobs on the job board? Not of the job itself, but the posting. Who is it that would do good work in that role, and the 80k job board posting is instrumental in them entering it?
Update: We’ve changed the language in our top-level disclaimers: example. Thanks again for flagging! We’re now thinking about how to best minimize the possibility of implying endorsement.
To try to be a bit more helpful rather than just complaining and arguing: when I model your current worldview, and try to imagine a disclaimer that helps a bit more with my concerns but seems like it might work for you given your current views, here’s a stab. Changes bolded.
OpenAI is a frontier AI research and product company, with teams working on alignment, policy, and security. We recommend specific opportunities at OpenAI that we think may be high impact. We recommend applicants pay attention to the details of individual roles at OpenAI, and form their own judgment about whether the role is net positive. We do not necessarily recommend working at other positions at OpenAI
(it’s not my main crux, by “frontier” felt both like a more up-to-date term for what OpenAI does, and also feels more specifically like it’s making a claim about the product than generally awarding status to the company the way “leading” does)
Nod, thanks for the reply.
I won’t argue more for removing infosec roles at the moment. As noted in the post, I think this is at least a reasonable position to hold. I (weakly) disagree, but for reasons that don’t seem worth getting into here.
The things I’d argue here:
Safetywashing is actually pretty bad, for the world’s epistemics and for EA and AI safety’s collective epistemics. I think it also warps the epistemics of the people taking the job, so while they might be getting some career experience… they’re also likely getting a distorted view of what what AI safety is, and becoming worse researchers than they would otherwise.
As previously stated – it’s not that I don’t think anyone should take these jobs, but I think the sort of person who should take them is someone who has a higher degree of context and skill than I expect the 80k job board to filter for.
Even if you disagree with those points, you should have some kind of crux for “what would distinguish an ‘impactful AI safety job?’” vs a fake safety-washed role. It should be at least possible for OpenAI to make a role so clearly fake that you notice and stop listing it.
If you’re set on continuing to list OpenAI Alignment roles, I think the current disclaimer is really inadequate and misleading. (Partly because of the object-level content in Working at an AI company, which I think wrongly characterizes OpenAI, and partly because what disclaimers you do have are deep in that post. On the top-level job ad, there’s no indication that applicants should be skeptical about OpenAI.
Re: cruxes for safetywashing
You’d presumably agree, OpenAI couldn’t just call any old job ‘Alignment Science’ and have it automatically count as worth listing on your site.
Companies at least sometimes lie, and they often use obfuscating language to mislead. OpenAI’s track record is such that, we know that they do lie and mislead. So, IMO, your prior here should be moderately high.
Maybe you only think it’s, like, 10%? (or less? IMO less than 10% feels pretty strained to me). But, at what credence would you stop listing it on the job board? And what evidence would increase your odds?
Taking the current Alignment Science researcher role as an example:
Is this an alignment research role, or a capabilities role that pays token lip service to alignment?My guess (60%) based on my knowledge of OpenAI is it’s more like the latter.
It says “Designing novel approaches for using LLMs in alignment research”, but that’s only useful if you think OpenAI uses the phrase “alignment research” to mean something important. We know they eventually coined the term “superalignment”, distinguished from most of what they’d been calling “alignment” work (where “superalignment” is closer to what was originally meant by the term).
If OpenAI was creating jobs that weren’t really helpful at all, but labeling them “alignment” anyway, how would you know?
Re: On whether OpenAI could make a role that feels insufficiently truly safety-focused: there have been and continue to be OpenAI safety-ish roles that we don’t list because we lack confidence they’re safety-focused.
For the alignment role in question, I think the team description given at the top of the post gives important context for the role’s responsibilities:
OpenAI’s Alignment Science research teams are working on technical approaches to ensure that AI systems reliably follow human intent even as their capabilities scale beyond human ability to directly supervise them.
With the above in mind, the role responsibilities seem fine to me. I think this is all pretty tricky, but in general, I’ve been moving toward looking at this in terms of the teams:
Alignment Science: Per the above team description, I’m excited for people to work there – though, concerning the question of what evidence would shift me, this would change if the research they release doesn’t match the team description.
Preparedness: I continue to think it’s good for people to work on this team, as per the description: “This team … is tasked with identifying, tracking, and preparing for catastrophic risks related to frontier AI models.”
Safety Systems: I think roles here depend on what they address. I think the problems listed in their team description include problems I definitely want people working on (detecting unknown classes of harm, red-teaming to discover novel failure cases, sharing learning across industry, etc), but it’s possible that we should be more restrictive in which roles we list from this team.
I don’t feel confident giving a probability here, but I do think there’s a crux here around me not expecting the above team descriptions to be straightforward lies. It’s possible that the teams will have limited resources to achieve their goals, and with the Safety Systems team in particular, I think there’s an extra risk of safety work blending into product work. However, my impression is that the teams will continue to work on their stated goals.
I do think it’s worthwhile to think of some evidence that would shift me against listing roles from a team:
If a team doesn’t publish relevant safety research within something like a year.
If a team’s stated goal is updated to have less safety focus.
Other notes:
We’re actually in the process of updating the AI company article.
The top-level disclaimer: Yeah, I think this needs updating to something more concrete. We put it up while ‘everything was happening’ but I’ve neglected to change it, which is my mistake and will probably prioritize fixing over the next few days.
Thanks for diving into the implicit endorsement point. I acknowledge this could be a problem (and if so, I want to avoid it or at least mitigate it), so I’m going to think about what to do here.
Thanks.
Fwiw while writing the above, I did also think “hmm, I should also have some cruxes for ‘what would update me towards ‘these jobs are more real than I currently think.’” I’m mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesn’t make sense:
we already have evidence of them failing to uphold commitments they’ve made in clear cut ways. (i.e. I’d count their superalignment compute promises as basically a straightforward lie, and if not a “lie”, it at least clearly demonstrates that their written words don’t count for much. This seems straightforwardly relevant to the specific topic of “what does a given job at OpenAI entail?”, in addition to being evidence about their overall relationship with existential safety)
we’ve similarly seen OpenAI change it’s stated policies, such asremoving restrictions on military use. Or, initially being a nonprofit and converting into “for-profit-managed by non-profit” (where the “managed by nonprofit board” part turned out to be pretty ineffectual)(not sure if I endorse this, mulling over Habryka’s comment)Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to “say things that turned out not to be true” before you stop taking them at face value? And why is that “more times than they have already?”.
Separate from straightforward lies, and/or altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of “what counts as alignment.” They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of “iterate on ‘alignment’ with nearterm systems as best you can to learn and prepare” is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, it’s very hard to tell the difference from the outside between:
“iterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligences”
“iterating on nearterm systems, in a way that you think/hope will teach you skills for navigating superintelligence… but, you’re wrong about how much you’re learning, and whether it’s net positive”
“iterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that it’s particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says it’s about.
You don’t just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: “insufficiently competent to outweigh risks and downsides”), even if their heart were in the right place.
I am very worried that this isn’t already something you have explicit models about.
As I’ve discussed in the comments on a related post, I don’t think OpenAI meaningfully changed any of its stated policies with regards to military usage. I don’t think OpenAI really ever promised anyone they wouldn’t work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. It’s similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldn’t consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, let’s say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
Mmm, nod. I will look into the actual history here more, but, sounds plausible. (edited the previous comment a bit for now)
The arguments you give all sound like reasons OpenAI safety positions could be beneficial. But I find them completely swamped by all the evidence that they won’t be, especially given how much evidence OpenAI has hidden via NDAs.
But let’s assume we’re in a world where certain people could do meaningful safety work an OpenAI. What are the chances those people need 80k to tell them about it? OpenAI is the biggest, most publicized AI company in the world; if Alice only finds out about OpenAI jobs via 80k that’s prima facie evidence she won’t make a contribution to safety.
What could the listing do? Maybe Bob has heard of OAI but is on the fence about applying. An 80k job posting might push him over the edge to applying or accepting. The main way I see that happening is via a halo effect from 80k. The mere existence of the posting implies that the job is aligned with EA/80k’s values.
I don’t think there’s a way to remove that implication with any amount of disclaimers. The job is still on the board. If anything disclaimers make the best case scenarios seem even better, because why else would you host such a dangerous position?
So let me ask: what do you see as the upside to highlighting OAI safety jobs on the job board? Not of the job itself, but the posting. Who is it that would do good work in that role, and the 80k job board posting is instrumental in them entering it?
Update: We’ve changed the language in our top-level disclaimers: example. Thanks again for flagging! We’re now thinking about how to best minimize the possibility of implying endorsement.
Following up my other comment:
To try to be a bit more helpful rather than just complaining and arguing: when I model your current worldview, and try to imagine a disclaimer that helps a bit more with my concerns but seems like it might work for you given your current views, here’s a stab. Changes bolded.
(it’s not my main crux, by “frontier” felt both like a more up-to-date term for what OpenAI does, and also feels more specifically like it’s making a claim about the product than generally awarding status to the company the way “leading” does)
Thanks. This still seems pretty insufficient to me, but, it’s at least an improvement and I appreciate you making some changes here.
I can’t find the disclaimer. Not saying it isn’t there. But it should be obvious from just skimming the page, since that is what most people will do.