This is a quick post to talk a little bit about what I’m planning to focus on in the near and medium-term future, and to highlight that I’m currently hiring for a joint executive and research assistant position. You can read more about the role and apply here! If you’re potentially interested, hopefully the comments below can help you figure out whether you’d enjoy the role.
Recent advances in AI, combined with economic modelling (e.g. here), suggest that we might well face explosive AI-driven growth in technological capability in the next decade, where what would have been centuries of technological and intellectual progress on a business-as-usual trajectory occur over the course of just months or years.
Most effort to date, from those worried by an intelligence explosion, has been on ensuring that AI systems are aligned: that they do what their designers intend them to do, at least closely enough that they don’t cause catastrophically bad outcomes.
But even if we make sufficient progress on alignment, humanity will have to make, or fail to make, many hard-to-reverse decisions with important and long-lasting consequences. I call these decisions GrandChallenges. Over the course of an explosion in technological capability, we will have to address many Grand Challenges in a short space of time including, potentially: what rights to give digital beings; how to govern the development of many new weapons of mass destruction; who gets control over an automated military; how to deal with fast-reproducing human or AI citizens; how to maintain good reasoning and decision-making even despite powerful persuasion technology and greatly-improved ability to ideologically indoctrinate others; and how to govern the race for space resources.
As a comparison, we could imagine if explosive growth had occurred in Europe in the 11th century, and that all the intellectual and technological advances that took a thousand years in our actual history occurred over the course of just a few years. It’s hard to see how decision-making would go well under those conditions.
The governance of explosive growth seems to me to be of comparable importance as AI alignment, not dramatically less tractable, and is currently much more neglected. The marginal cost-effectiveness of work in this area therefore seems to be even higher than marginal work on AI alignment. It is, however, still very pre-paradigmatic: it’s hard to know what’s most important in this area, what things would be desirable to push on, or even what good research looks like.
I’ll talk more about all this in my EAG: Bay Area talk, “New Frontiers in Effective Altruism.” I’m far from the only person to highlight these issues, though. For example, Holden Karnofsky has an excellent blog post on issues beyond misalignment; Lukas Finnveden has a great post on similar themes here and an extensive and in-depth series on potential projects here. More generally, I think there’s a lot of excitement about work in this broad area that isn’t yet being represented in places like the Forum. I’d be keen for more people to start learning about and thinking about these issues.
Over the last year, I’ve done a little bit of exploratory research into some of these areas; over the next six months, I plan to continue this in a focused way, with an eye toward making this a multi-year focus. In particular, I’m interested in the rights of digital beings, governance of space resources, and, above all, on the “meta” challenge of ensuring that we have good deliberative processes through the period of explosive growth. (One can think of work on the meta challenge as fleshing out somewhat realistic proposals that could take us in the direction of the “long reflection”.) By working on good deliberative processes, we could thereby improve decision-making on all the Grand Challenges we will face. This work could also help with AI safety, too: if we can guarantee power-sharing after the development of superintelligence, that decreases the incentive for competitors to race and cut corners on safety.
I’m not sure yet what output this would ultimately lead to, if I decide to continue work on this beyond the next six months. Plausibly there could be many possible books, policy papers, or research institutes on these issues, and I’d be excited to help make happen whichever of these seem highest-impact after further investigation.
Beyond this work, I’ll continue to provide support for individuals and organisations in EA (such as via fundraising, advice, advocacy and passing on opportunities) in an 80⁄20 way; most likely, I’ll just literally allocate 20% of my time to this, and spend the remaining 80% on the ethics and governance issues I list above. I expect not to be very involved with organisational decision-making (for example by being on boards of EA organisations) in the medium term, in order to stay focused and play to my comparative advantage.
I’m looking for a joint research and executive assistant to help with the work outlined above. The role involves research tasks such as providing feedback on drafts, conducting literature reviews and small research projects, as well as administrative tasks like processing emails, scheduling, and travel booking. The role could also turn into a more senior role, depending on experience and performance.
Example projects that a research assistant could help with include:
A literature review on the drivers of moral progress.
A “literature review” focused on reading through LessWrong, the EA Forum, and other blogs, and finding the best work there related to the fragility of value thesis.
Case studies on: What exactly happened to result in the creation of the UN, and the precise nature of the UN Charter? What can we learn from it? Similarly for The Kyoto Protocol, the Nuclear Non-Proliferation Agreement, the Montreal Protocol.
Short original research projects, such as:
Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.
Taking some particular neglected Grand Challenge, and fleshing out the reasons why this Grand Challenge might or might not be a big deal.
Supposing that the US wanted to make an agreement to share power and respect other countries’ sovereignty in the event that it develops superintelligence, figuring out how we could legibly guarantee future compliance with that agreement, such that the commitment is credible to other countries?
The deadline for applications is February the 11th. If this seems interesting, please apply!
As someone who is a) skeptical of X-risk from AI, but b) think there is a non-negligible (even if relatively low, maybe 3-4%) chance we’ll see 100 years of progress in 15 years at some point in the next 50 years, I’m glad you’re looking at this.
Thanks! Didn’t know you’re sceptical of AI x-risk. I wonder if there’s a correlation between being a philosopher and having low AI x-risk estimates; it seems that way anecdotally.
Yeah. I actually work on it right now (governance/forecasting not technical stuff obviously) because it’s the job that I managed to get when I really needed a job (and its interesting), but I remain personally skeptical. Though it is hard to tell the difference in such a speculative context between 1 in 1000 (which probably means it is actually worth working on in expectation, at least if you expect X-risk to drop dramatically if AI is negotiated successfully and have totalist sympathies in population ethics) and 1 in 1 million* (which might look worth working on in expectation if taken literally, but is probably really a signal that it might be way lower for all you know.) I don’t have anything terribly interesting to say about why I’m skeptical: just boring stuff about how prediction is hard, and your prior should be low on a very specific future path, and social epistemology worries about bubbles and ideas that pattern match to religious/apocalyptic, combined with a general feeling that the AI risk stuff I have read is not rigorous enough to (edit, missing bit here) overcome my low prior.
‘I wonder if there’s a correlation between being a philosopher and having low AI x-risk estimates; it seems that way anecdotally.’
I hadn’t heard that suggested before. But you will have a much better idea of the distribution of opinion than me. My guess would be that the divide will be LW/rationalist verses not. “Low” is also ambiguous of course: compared to MIRI people, or even someone like Christiano, you, or Joe Carlsmith probably have “low” estimates, but they are likely a lot higher than AI X-risk “skeptics” outside EA.
What is especially interesting here is your focus on an all hazards approach to Grand Challenges. Improved governance has the potential to influence all cause areas, including long-term and short-term, x-risks, and s-risks.
Here at the Odyssean institute, we’re developing a novel approach to these deep questions of governing Grand Challenges. We’re currently running our first horizon scan on tipping points in global catastrophic risk and will use this as a first step of a longer-term process which will include Decision Making under Deep Uncertainty (developed at RAND), and a deliberative democratic jury or assembly. In our White Paper on the Odyssean Process, we outlined how their combination would be a great contribution to avoid short termist thinking in policy formulation around GCRs. We’re happy to see yourself and Open AI taking a keen interest in this flourishing area of deliberative democratic governance!
We are highly encouaged by the fact that you see it “of comparable importance as AI alignment, not dramatically less tractable, and is currently much more neglected. The marginal cost-effectiveness of work in this area therefore seems to be even higher than marginal work on AI alignment.” Despite this, the work remains neglected even within EA and thus would benefit from greater focus and support for more resources to be allocated to it. We’d welcome a chance to discuss this in a more in depth way with you and others interested in supporting it.
Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.
FWIW many people are already very interested in capability evaluations related to AI acceleration of AI R&D.
For instance, at the UK AI Safety Institute, the Loss of Control team is interested in these evaluations.
Loss of control: As advanced AI systems become increasingly capable, autonomous, and goal-directed, there may be a risk that human overseers are no longer capable of effectively constraining the system’s behaviour. Such capabilities may emerge unexpectedly and pose problems should safeguards fail to constrain system behaviour. Evaluations will seek to avoid such accidents by characterising relevant abilities, such as the ability to deceive human operators, autonomously replicate, or adapt to human attempts to intervene. Evaluations may also aim to track the ability to leverage AI systems to create more powerful systems, which may lead to rapid advancements in a relatively short amount of time.
Build and lead a team focused on evaluating capabilities that are precursors to extreme harms from loss of control, with a current focus on autonomous replication and adaptation, and uncontrolled self-improvement.
As you are framing the choice between work on alignment and work on grand challenges/non-alignment work needed under transformative AI, I am curious how you think about pause efforts as a third class of work. Is this something you have thoughts on?
Perhaps at the core there is a theme here that comes up a lot which goes a bit like: Clearly there is a strong incentive to ‘work on’ any imminent and unavoidable challenge whose resolution could require or result in “hard-to-reverse decisions with important and long-lasting consequences”. Current x-risks have been established as sort of the ‘most obvious’ such challenges (in the sense that making wrong decisions potentially results in extinction, which obviously counts as ‘hard-to-reverse’ and the consequences of which are ‘long-lasting’). But can we think of any other such challenges or any other category of such challenges? I don’t know of any others that I’ve found anywhere near as convincing as the x-risk case, but I suppose that’s why the example project on case studies could be important?
Another thought I had is kind of: Why might people who have been concerned about x-risk from misaligned AI pivot to asking about these other challenges? (I’m not saying Will counts as ‘pivoting’ but just generally asking the question). I think one question I have in mind is: Is it because we have already reached a point of small (and diminishing) returns from putting today’s resources into the narrower goal of reducing x-risk from misaligned AI?
This is a quick post to talk a little bit about what I’m planning to focus on in the near and medium-term future, and to highlight that I’m currently hiring for a joint executive and research assistant position. You can read more about the role and apply here! If you’re potentially interested, hopefully the comments below can help you figure out whether you’d enjoy the role.
Recent advances in AI, combined with economic modelling (e.g. here), suggest that we might well face explosive AI-driven growth in technological capability in the next decade, where what would have been centuries of technological and intellectual progress on a business-as-usual trajectory occur over the course of just months or years.
Most effort to date, from those worried by an intelligence explosion, has been on ensuring that AI systems are aligned: that they do what their designers intend them to do, at least closely enough that they don’t cause catastrophically bad outcomes.
But even if we make sufficient progress on alignment, humanity will have to make, or fail to make, many hard-to-reverse decisions with important and long-lasting consequences. I call these decisions Grand Challenges. Over the course of an explosion in technological capability, we will have to address many Grand Challenges in a short space of time including, potentially: what rights to give digital beings; how to govern the development of many new weapons of mass destruction; who gets control over an automated military; how to deal with fast-reproducing human or AI citizens; how to maintain good reasoning and decision-making even despite powerful persuasion technology and greatly-improved ability to ideologically indoctrinate others; and how to govern the race for space resources.
As a comparison, we could imagine if explosive growth had occurred in Europe in the 11th century, and that all the intellectual and technological advances that took a thousand years in our actual history occurred over the course of just a few years. It’s hard to see how decision-making would go well under those conditions.
The governance of explosive growth seems to me to be of comparable importance as AI alignment, not dramatically less tractable, and is currently much more neglected. The marginal cost-effectiveness of work in this area therefore seems to be even higher than marginal work on AI alignment. It is, however, still very pre-paradigmatic: it’s hard to know what’s most important in this area, what things would be desirable to push on, or even what good research looks like.
I’ll talk more about all this in my EAG: Bay Area talk, “New Frontiers in Effective Altruism.” I’m far from the only person to highlight these issues, though. For example, Holden Karnofsky has an excellent blog post on issues beyond misalignment; Lukas Finnveden has a great post on similar themes here and an extensive and in-depth series on potential projects here. More generally, I think there’s a lot of excitement about work in this broad area that isn’t yet being represented in places like the Forum. I’d be keen for more people to start learning about and thinking about these issues.
Over the last year, I’ve done a little bit of exploratory research into some of these areas; over the next six months, I plan to continue this in a focused way, with an eye toward making this a multi-year focus. In particular, I’m interested in the rights of digital beings, governance of space resources, and, above all, on the “meta” challenge of ensuring that we have good deliberative processes through the period of explosive growth. (One can think of work on the meta challenge as fleshing out somewhat realistic proposals that could take us in the direction of the “long reflection”.) By working on good deliberative processes, we could thereby improve decision-making on all the Grand Challenges we will face. This work could also help with AI safety, too: if we can guarantee power-sharing after the development of superintelligence, that decreases the incentive for competitors to race and cut corners on safety.
I’m not sure yet what output this would ultimately lead to, if I decide to continue work on this beyond the next six months. Plausibly there could be many possible books, policy papers, or research institutes on these issues, and I’d be excited to help make happen whichever of these seem highest-impact after further investigation.
Beyond this work, I’ll continue to provide support for individuals and organisations in EA (such as via fundraising, advice, advocacy and passing on opportunities) in an 80⁄20 way; most likely, I’ll just literally allocate 20% of my time to this, and spend the remaining 80% on the ethics and governance issues I list above. I expect not to be very involved with organisational decision-making (for example by being on boards of EA organisations) in the medium term, in order to stay focused and play to my comparative advantage.
I’m looking for a joint research and executive assistant to help with the work outlined above. The role involves research tasks such as providing feedback on drafts, conducting literature reviews and small research projects, as well as administrative tasks like processing emails, scheduling, and travel booking. The role could also turn into a more senior role, depending on experience and performance.
Example projects that a research assistant could help with include:
A literature review on the drivers of moral progress.
A “literature review” focused on reading through LessWrong, the EA Forum, and other blogs, and finding the best work there related to the fragility of value thesis.
Case studies on: What exactly happened to result in the creation of the UN, and the precise nature of the UN Charter? What can we learn from it? Similarly for The Kyoto Protocol, the Nuclear Non-Proliferation Agreement, the Montreal Protocol.
Short original research projects, such as:
Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.
Taking some particular neglected Grand Challenge, and fleshing out the reasons why this Grand Challenge might or might not be a big deal.
Supposing that the US wanted to make an agreement to share power and respect other countries’ sovereignty in the event that it develops superintelligence, figuring out how we could legibly guarantee future compliance with that agreement, such that the commitment is credible to other countries?
The deadline for applications is February the 11th. If this seems interesting, please apply!
As someone who is a) skeptical of X-risk from AI, but b) think there is a non-negligible (even if relatively low, maybe 3-4%) chance we’ll see 100 years of progress in 15 years at some point in the next 50 years, I’m glad you’re looking at this.
Thanks! Didn’t know you’re sceptical of AI x-risk. I wonder if there’s a correlation between being a philosopher and having low AI x-risk estimates; it seems that way anecdotally.
Yeah. I actually work on it right now (governance/forecasting not technical stuff obviously) because it’s the job that I managed to get when I really needed a job (and its interesting), but I remain personally skeptical. Though it is hard to tell the difference in such a speculative context between 1 in 1000 (which probably means it is actually worth working on in expectation, at least if you expect X-risk to drop dramatically if AI is negotiated successfully and have totalist sympathies in population ethics) and 1 in 1 million* (which might look worth working on in expectation if taken literally, but is probably really a signal that it might be way lower for all you know.) I don’t have anything terribly interesting to say about why I’m skeptical: just boring stuff about how prediction is hard, and your prior should be low on a very specific future path, and social epistemology worries about bubbles and ideas that pattern match to religious/apocalyptic, combined with a general feeling that the AI risk stuff I have read is not rigorous enough to (edit, missing bit here) overcome my low prior.
‘I wonder if there’s a correlation between being a philosopher and having low AI x-risk estimates; it seems that way anecdotally.’
I hadn’t heard that suggested before. But you will have a much better idea of the distribution of opinion than me. My guess would be that the divide will be LW/rationalist verses not. “Low” is also ambiguous of course: compared to MIRI people, or even someone like Christiano, you, or Joe Carlsmith probably have “low” estimates, but they are likely a lot higher than AI X-risk “skeptics” outside EA.
*Seems too low to me, but I am of course biased.
Christiano says ~22% (“but you should treat these numbers as having 0.5 significant figures”) without a time-bound; and Carlsmith says “>10%” (see bottom of abstract) by 2070. So no big difference there.
Fair point. Carlsmith said less originally.
Hi Will,
What is especially interesting here is your focus on an all hazards approach to Grand Challenges. Improved governance has the potential to influence all cause areas, including long-term and short-term, x-risks, and s-risks.
Here at the Odyssean institute, we’re developing a novel approach to these deep questions of governing Grand Challenges. We’re currently running our first horizon scan on tipping points in global catastrophic risk and will use this as a first step of a longer-term process which will include Decision Making under Deep Uncertainty (developed at RAND), and a deliberative democratic jury or assembly. In our White Paper on the Odyssean Process, we outlined how their combination would be a great contribution to avoid short termist thinking in policy formulation around GCRs. We’re happy to see yourself and Open AI taking a keen interest in this flourishing area of deliberative democratic governance!
We are highly encouaged by the fact that you see it “of comparable importance as AI alignment, not dramatically less tractable, and is currently much more neglected. The marginal cost-effectiveness of work in this area therefore seems to be even higher than marginal work on AI alignment.” Despite this, the work remains neglected even within EA and thus would benefit from greater focus and support for more resources to be allocated to it. We’d welcome a chance to discuss this in a more in depth way with you and others interested in supporting it.
FWIW many people are already very interested in capability evaluations related to AI acceleration of AI R&D.
For instance, at the UK AI Safety Institute, the Loss of Control team is interested in these evaluations.
Some quotes:
Introducing the AI Safety Institute:
Jobs
Thanks so much for those links, I hadn’t seen them!
(So much AI-related stuff coming out every day, it’s so hard to keep on top of everything!)
METR ‘Model Evaluation & Threat Research’ might also be worth mentioning. I wonder if there’s a list of capability evaluations projects somewhere
Thanks for the update, Will!
As you are framing the choice between work on alignment and work on grand challenges/non-alignment work needed under transformative AI, I am curious how you think about pause efforts as a third class of work. Is this something you have thoughts on?
Perhaps at the core there is a theme here that comes up a lot which goes a bit like: Clearly there is a strong incentive to ‘work on’ any imminent and unavoidable challenge whose resolution could require or result in “hard-to-reverse decisions with important and long-lasting consequences”. Current x-risks have been established as sort of the ‘most obvious’ such challenges (in the sense that making wrong decisions potentially results in extinction, which obviously counts as ‘hard-to-reverse’ and the consequences of which are ‘long-lasting’). But can we think of any other such challenges or any other category of such challenges? I don’t know of any others that I’ve found anywhere near as convincing as the x-risk case, but I suppose that’s why the example project on case studies could be important?
Another thought I had is kind of: Why might people who have been concerned about x-risk from misaligned AI pivot to asking about these other challenges? (I’m not saying Will counts as ‘pivoting’ but just generally asking the question). I think one question I have in mind is: Is it because we have already reached a point of small (and diminishing) returns from putting today’s resources into the narrower goal of reducing x-risk from misaligned AI?