Fwiw while writing the above, I did also think “hmm, I should also have some cruxes for ‘what would update me towards ‘these jobs are more real than I currently think.’” I’m mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesn’t make sense:
we already have evidence of them failing to uphold commitments they’ve made in clear cut ways. (i.e. I’d count their superalignment compute promises as basically a straightforward lie, and if not a “lie”, it at least clearly demonstrates that their written words don’t count for much. This seems straightforwardly relevant to the specific topic of “what does a given job at OpenAI entail?”, in addition to being evidence about their overall relationship with existential safety)
we’ve similarly seen OpenAI change it’s stated policies, such as removing restrictions on military use. Or, initially being a nonprofit and converting into “for-profit-managed by non-profit” (where the “managed by nonprofit board” part turned out to be pretty ineffectual) (not sure if I endorse this, mulling over Habryka’s comment)
Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to “say things that turned out not to be true” before you stop taking them at face value? And why is that “more times than they have already?”.
Separate from straightforward lies, and/or altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of “what counts as alignment.” They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of “iterate on ‘alignment’ with nearterm systems as best you can to learn and prepare” is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, it’s very hard to tell the difference from the outside between:
“iterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligences”
“iterating on nearterm systems, in a way that you think/hope will teach you skills for navigating superintelligence… but, you’re wrong about how much you’re learning, and whether it’s net positive”
“iterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that it’s particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says it’s about.
You don’t just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: “insufficiently competent to outweigh risks and downsides”), even if their heart were in the right place.
I am very worried that this isn’t already something you have explicit models about.
As I’ve discussed in the comments on a related post, I don’t think OpenAI meaningfully changed any of its stated policies with regards to military usage. I don’t think OpenAI really ever promised anyone they wouldn’t work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. It’s similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldn’t consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, let’s say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
Thanks.
Fwiw while writing the above, I did also think “hmm, I should also have some cruxes for ‘what would update me towards ‘these jobs are more real than I currently think.’” I’m mulling that over and will write up some thoughts soon.
It sounds like you basically trust their statements about their roles. I appreciate you stating your position clearly, but, I do think this position doesn’t make sense:
we already have evidence of them failing to uphold commitments they’ve made in clear cut ways. (i.e. I’d count their superalignment compute promises as basically a straightforward lie, and if not a “lie”, it at least clearly demonstrates that their written words don’t count for much. This seems straightforwardly relevant to the specific topic of “what does a given job at OpenAI entail?”, in addition to being evidence about their overall relationship with existential safety)
we’ve similarly seen OpenAI change it’s stated policies, such asremoving restrictions on military use. Or, initially being a nonprofit and converting into “for-profit-managed by non-profit” (where the “managed by nonprofit board” part turned out to be pretty ineffectual)(not sure if I endorse this, mulling over Habryka’s comment)Surely, this at at least updates you downward on how trustworthy their statements are? How many times do they have to “say things that turned out not to be true” before you stop taking them at face value? And why is that “more times than they have already?”.
Separate from straightforward lies, and/or altering of policy to the point where any statements they make seem very unreliable, there is plenty of degrees of freedom of “what counts as alignment.” They are already defining alignment in a way that is pretty much synonymous with short-term capabilities. I think the plan of “iterate on ‘alignment’ with nearterm systems as best you can to learn and prepare” is not necessarily a crazy plan. There are people I respect who endorse it, who previously defended it as an OpenAI approach, although notably most of those people have now left OpenAI (sometimes still working on similar plans at other orgs).
But, it’s very hard to tell the difference from the outside between:
“iterating on nearterm systems, contributing to AI race dynamics in the process, in a way that has a decent chance of teaching you skills that will be relevant for aligning superintelligences”
“iterating on nearterm systems, in a way that you think/hope will teach you skills for navigating superintelligence… but, you’re wrong about how much you’re learning, and whether it’s net positive”
“iterating on nearterm systems, and calling it alignment because it makes for better PR, but not even really believing that it’s particularly necessary to navigate superintelligence.
When recommending jobs for organizations that are potentially causing great harm, I think 80k has a responsibility to actually form good opinions on whether the job makes sense, independent on what the organization says it’s about.
You don’t just need to model whether OpenAI is intentionally lying, you also need to model whether they are phrasing things ambiguously, and you need to model whether they are self-decelving about whether these roles are legitimate alignment work, or valuable enough work to outweigh the risks. And, you need to model that they might just be wrong and incompetent at longterm alignment development (or: “insufficiently competent to outweigh risks and downsides”), even if their heart were in the right place.
I am very worried that this isn’t already something you have explicit models about.
As I’ve discussed in the comments on a related post, I don’t think OpenAI meaningfully changed any of its stated policies with regards to military usage. I don’t think OpenAI really ever promised anyone they wouldn’t work with militaries, and framing this as violating a past promise weakens the ability to hold them accountable for promises they actually made.
What OpenAI did was to allow more users to use their product. It’s similar to LessWrong allowing crawlers or jurisdictions that we previously blocked to now access the site. I certainly wouldn’t consider myself to have violated some promise by allowing crawlers or companies to access LessWrong that I had previously blocked (or for a closer analogy, let’s say we were currently blocking AI companies from crawling LW for training purposes, and I then change my mind and do allow them to do that, I would not consider myself to have broken any kind of promise or policy).
Mmm, nod. I will look into the actual history here more, but, sounds plausible. (edited the previous comment a bit for now)