Five neglected work areas that could reduce AI risk

Crossposted to Lesswrong here

Tldr: We identify five areas of work that should be further investigated:

  1. Helping information aggregators scale during advanced AI development,

  2. Improving internal AI deployment in policy organizations,

  3. Researching the institutional design for evaluating alignment plans,

  4. Planning for how to evaluate automated alignment researchers, and

  5. Creating further education and information material.

Summary

We outline and discuss five areas of work that seem neglected. We only shallowly review and discuss these areas. Our proposals should be seen as a recommendation to further investigate these questions and figure out whether they are actually worth pursuing and how to pursue them.

For each of the five work areas, we discuss what they are, what one could do here, why they are important, and what our key uncertainties are.

  1. Help Information Aggregators and Reviewers Scale during Takeoff

    1. Effective information provision and compilation are vital for informed societal choices. As advanced AI systems evolve, current information channels and curators, such as journalists, social media platforms, think tanks, and governments, will likely struggle to keep up. Identifying ways to support them or introducing new organizations to monitor global events seems important.

  2. Consult and Research Responsible AI deployment in Governments

    1. Decision-making bodies such as the government are facing the increasingly important question of how to deploy advanced AI systems internally. This decision can make societal decision making much better but also much worse. Research and advising of policymakers can help avert various failure modes of internal AI deployment.

  3. Research Institutional Design for Evaluating Alignment Plans

    1. Many AI governance plans involve a public agency evaluating the alignment plan for an advanced AI project and then approving or rejecting the training run. Designing such an evaluation process seems hard, and default institutional designs seem likely insufficient, so somebody should figure out what this process should ideally look like.

  4. Plan for How to Evaluate Automated Alignment Researchers

    1. Most of the alignment work might be conducted by AI systems. OpenAI, for instance, has plans to develop automated alignment researchers. However, the success of this relies on being able to successfully evaluate alignment work at scale, which is currently lacking. We need to figure out how to evaluate the hard-to-measure questions, e.g., whether one is overall making progress on alignment.

  5. Create and Disseminate Education and Information Material

    1. If the world becomes much more aware of the possibility of transformative AI, many decision-makers will need to quickly and deeply familiarize themselves with AI alignment and AI existential safety questions. This requires good summaries, literature reviews, education courses, etc.

Help information aggregators scale during takeoff

What is it: In a world of widespread AI deployment, many things in the world may move much quicker, and existing information aggregators and collectors (e.g., journalists, Twitter/​X, think tanks) will likely be overwhelmed and be less successful at doing their job. Figuring out how to help them or step in as a novel organization aimed at tracking important things in the world seems important.

What to do: Build AI tools for better information filtering/​aggregation, build expertise in relevant areas (e.g., AI hardware, ML subfields, AI safety, biosecurity), get experience and connections by doing the version of this that currently exists (e.g., Epoch AI, journalism, intelligence agencies, AGI labs). Build AI Journalism as a subfield (like environmental journalism). Figure out if existing institutions can be changed to sufficiently do this or if a new institution is needed. We’re not too sure what else to do here.

Why is it impactful: This is currently extremely neglected. Some people casually talk about how deep-fakes and LLM-generated content/​spam could be a big deal for the information landscape, but nobody seems to be taking this sufficiently seriously and preparing, let alone planning for explosive growth scenarios: the current world is wholly unprepared.

Key uncertainties: It’s unclear if we’re going to get explosive growth before or after we face most of the misalignment risk; if it’s after, then this is probably much less important to work on. If take-off is sufficiently slow, existing institutions will adapt.

Additional discussion: Good information is important for pretty much all decision making. Information provision might be much worse in a world with advanced AI systems. It is currently very difficult to track all of the important things happening in the world. This will get even more difficult as the pace of tech R&D accelerates due to automated researchers and AI tools speeding up human researchers. Solving this problem might include building a new institute for understanding what’s going on, and relaying this to relevant decision makers. This institute would broadly be trying to understand and keep track of what is happening in the world with regard to AI development; some things that would be in its purview: reading current ML and AI safety research and trying to predict what is upcoming in those fields, tracking the location of GPU clusters, keeping a database of major AI developers and key x-risk related information about them (the quality of their models, large training runs they do, internal sentiment around AI risk, etc.), keeping tabs on the use of AI in particularly high stakes domains like warfare and biology, understanding how various countries are reacting to AI development and governing risks.

The general idea is that things could be moving super fast once we get substantial research speedups, and there are not currently institutions that are prepared for this speedup and help to improve the world. Various institutes would serve as a point of information aggregation and distillation, being able to support AI labs, governments, and other key decision makers by having already developed methods for keeping track of what’s true and important in this whacky landscape. If the world is able to develop an international agency on AI as suggested by Karnofsky, such an agency might absorb or closely partner with such institutes.

This institute might benefit from hiring people with experience doing complex information aggregation in difficult domains, such as people with a traditional intelligence background, finance analysts who specialize in AI-related fields, or journalists. There would also likely be a need to hire experts with an excellent understanding of various parts of the AI and AI safety research ecosystems, from chip design to fine-tuning.

Research and Consulting on AI Deployment in Governments

What is it: Decision-making bodies such as the government are facing the increasingly important question of how to deploy advanced AI systems internally. This decision can make societal decision-making much better but also much worse. Research and advising of policymakers can help avert various failure modes of internal AI deployment.

What to do: Individuals could advise governments on integrating Large Language Models (LLMs) into their operations or research AI integration in government (e.g., if a team starts using LLMs to write their daily briefs, how does this change the frequency of false things making their way into the brief?, do people trust LLMs to be factual more than is the case?). We’re not sure what to do in this area, but we expect it ranges from field experiments in the workplace to experimental testing with new AI products. While the impact in the short term might not be that big, it would help immensely to build relevant knowledge, trust, and networks to have future opportunities to feed into such processes.

Why is it impactful: We expect that the success of advanced AI development and deployment partly depends on how well-informed government decisions will be. Outsourcing work to AI systems when it comes to such decisions will be a hard balancing act with many possible failure modes: one might outsource too much, too little, or the wrong work. Work to improve this process will likely be underprovided because

  • Very few organizations work on digitization in government (a few academics, think tanks, and officials in public agencies),

  • Many may underestimate the transformative potential and risks associated with AI and may overlook critical aspects, and

  • By default, the recommendations and policies made will have a weak evidence base

Therefore, we expect there to be a lot of potential to improve AI deployment policies for high-stakes decision making.

Key uncertainties: Tractability: Anecdotal evidence from a former consultant seems to suggest that consulting the government to do digitization seems to mostly fail. If digitization of government is such a hard problem, we may expect 1) governments to err on the side of not using LLMs or 2) the tractability of government advising to be low.

Crowdedness: Perhaps, public policy researchers, digitization of government researchers, and organizational psychology researchers will be working on this. However, those researchers may not be motivated to move from basic to applied science fast enough.

The magnitude of failure from improperly integrating AI with one’s workflow scales with the available AI capabilities: failing to integrate GPT-4 or integrating it too much likely results in a couple of percentage points difference in productivity. For future AI systems, failing to integrate them could mean losing huge amounts of potential value, and integrating them in a bad way could result in major losses.

Focusing on this problem is particularly important if government decision-making matters a lot, e.g., in short-timeline slow-takeoff worlds or if governments are needed to make the post-AGI transition go well. On the other hand, if the major hurdles to existential security are near-term technical research, this would be less important to work on.

Additional Discussion: As AIs improve at solving long-horizon tasks, there will be strong incentives to delegate more of our human workflows to these AIs. For some domains, this will be perfectly safe, but for high-stakes domains it could cause major harms from AIs making mistakes or pursuing misaligned goals. One wants to track how various key societal roles are outsourcing tasks to the AI systems. This could involve studying how they are using it and what the effects are, e.g., is there too much outsourcing, too little outsourcing, are they outsourcing the wrong things, losing human expertise in the process, doing the kind of outsourcing where there is still meaningful human control? These questions could be studied by social scientists and, e.g., public sector consultancies. Researchers could interview people, do more anthropological studies, study the effects of various automation tools, and share best-practices.

If one understands the key failure modes, one would be able to make the technology safer, enhance human decision making, and avoid enfeeblement, that society willingly gives up control. The research could inform the decision of AI labs and product teams, AI regulation, or, e.g., organizations, such as the public sector or executive could just implement the guidelines and best-practices internally. To contribute to this, one could do research on AI workflow integration, outsourcing in the public sector, provide literature summaries, or consult and inform key decision-making organizations directly on responsible AI usage.

Institutional Design for Evaluating Alignment Plans

What is it: Many AI governance plans involve a public agency evaluating the alignment plan for an advanced AI project. The agency then approves or rejects the training run proposal. Designing such an evaluation process seems hard. Current institutional designs are likely insufficient. Somebody should figure out what this process should even look like.

What to do: Study the benefits and drawbacks of existing safety/​risk assessment frameworks, scope the problem of evaluating alignment plans, and build relevant connections and credibility. This is about designing an institutional process for evaluating alignment research, not doing the evaluation itself. Learn from best practices of how to aggregate information, evaluations, and criticisms from various stakeholders and experts. Study existing expert consulting processes of big institutions (e.g., the European Commission), their success and pitfalls.

Why is it impactful: The success of establishing such institutions relies on the process effectively distinguishing between better and worse alignment plans.

Key uncertainties: Work on designing this institution may be intractable, too abstract, or too novel, especially given the strong precedent of existing risk assessment frameworks. Perhaps interested individuals should just work on improving current regulatory processes regarding AI regulation on the margin.

Additional discussion: Most governance plans for transformative Artificial Intelligence involve a common procedure: the AGI development team proposes their alignment plan to a governing body (e.g., their plan for the deployment of a newly trained model, their plan for whether to automate research, their plan for training a bigger model), which then approves the plan, requests revisions, or denies it. But what should happen procedurally in this agency before it makes such a decision?

While we might not know right now which alignment plans will be effective, we can probably already think about designing the procedure. Existing risk and safety assessment frameworks or public consultations as done for legal reviews seem insufficient. What can we learn from them, and what would the ideal procedure look like? We think most processes are either easily gameable or do not aggregate most of the relevant information. An exploration could include: How public should the procedure be? Should open comments be invited? How should they be incorporated? Who assesses them? How should disagreements about the safety of the alignment plan be handled? What would such a back-and-forth look like? How would the final decision be made? Who holds that group accountable? How to handle conflicts of interest (e.g., relevant experts working at AGI labs)?

Evaluating automated alignment research

What is it: Much of the alignment work might be conducted by AI systems. OpenAI, for instance, has plans to develop automated alignment researchers. However, the efficacy of this hinges upon a robust evaluation framework of alignment work, which is currently lacking. This has been discussed before, but we want to signal boost it.

What to do: We don’t really know what needs to be done here. Perhaps preparing for this role might just look like doing alignment research oneself and trying to do lots of peer review.

Why is it impactful: Evaluating certain tasks, especially hard-to-measure ones, will remain a human task, potentially causing bottlenecks. Certain types of alignment work are likely difficult to evaluate, such as conceptual and framing work — as evidenced by strong disagreement among researchers about the value of different research.

While some evaluations might be straightforward (e.g., using language models for neuron interpretability or improving self-critique strategies) or sometimes AIs could support human evaluators (by doing replication or coming up with critiques), determining actual progress towards alignment remains a hard-to-measure evaluation target.

A common response is that “evaluation may be easier than generation”. However, this doesn’t mean evaluation will be easy in absolute terms, or relative to one’s resources for doing it, or that it will depend on the same resources as generation.

This means it would have to be important to i) know what exactly should be done by the humans and ii) how this could be tracked such that firms are not corner-cutting here.

Key uncertainties: Is this action relevant now? Is there any way one can feasibly prepare now?

Education, Review, and Information Material

What is it:If timelines are short and the world will become much more aware of the possibility of transformative AI, then many key decision-makers will need to quickly familiarize themselves with alignment and AI existential safety questions and develop a really good understanding. For that, there needs to be a really good source for summaries and literature reviews.

What to do: In contrast, to the other work areas we outlined in this post, there already exists more work in this area: YouTube videos on the risks, policy reports for governments, explainers, intro talks for various audiences, literature reviews, e.g., on AI timelines.

Such work could be expanded, improved, and professionalized. For instance, reviews on topics such as takeoff dynamics, alignment challenges, development timelines, and alignment strategies, ranging from a quick read to an in-depth analysis, are useful.

Why is it impactful: Such resources play a crucial role in shaping key decision-making processes and public opinion. Given the multiplicity of threat models, the speculativeness and inherent uncertainty in AI development, and the political incentives for simplification and polarisation, good information material and education material might be even more important than for other problem areas.

Key uncertainties: For videos, reports, and other high-quality or wide-reaching mediums, we usually have some winner-takes-all dynamics where only the best material is useful. This should have some implications for who and how people should work on this and what should be done. Even if winner-take-all dynamics exist, it may be unclear ex-ante who the “winners” will be, so investing in many projects might still be useful.

Crowdedness: it seems like this work has expanded a lot in the last 6 months. It’s not clear how many low-hanging fruits there will be in the future.