The Case for Non-Technical AI Safety/​Alignment Growth & Funding

Disclaimer: This is a very ideas-stage, spitball suggestion and not meant to be a perfect solution. Open to (kind?) input as this is my first forum post thanks to encouragement at the EA Forum session at EAG! Also, when discussing AI Safety adjacent fields I tend to lean towards law and policy as that is my experience, but the others matter just as much. The fields of AI Safety, AI Alignment, Longtermist AI Philosophy, etc. are all seperate and deal with interlinking or overlapping themes but there’s not space to discuss each seperately. For this I’ll use AI Safety to mean all of them, and discuss in general terms. More detailed insight into each would require a time investment I can’t yet make, but hope to one day.

Tl;dr—AI Safety, AI Alignment, and AI-related Longtermism (i’m gonna use AI Safety as a catch-all) all require strongly multidisciplinary solutions. Even purely technical solutions such as value alignment require input from policy, law, and governance to be effective or even deployed in real-world the first place. However, a major focus on technical solutions, projects, and community-building to the exclusion of other fields adjacent to AI Safety means that AI Safety projects which are non-technical in nature such as legal research or political research are more difficult to launch/​undertake, and therefore the field is missing out.

Introduction

The story of Homo Sapiens’ success as the current #1 spot on the food chain has been around 70% luck and 30% its ability to use intelligence to adapt to the world around it. Preparing for a future where humans may not be top of the intellectual food chain is important—we want to make sure any AGI/​ASI that is smarter than us, more powerful, or more adaptable at least shares our values, for our own sakes. AI need not even be intelligent to be a threat at all. AI controls and shapes so much of our lives and our civilisation’s infrastructure that a misaligned AI, even one who is nothing more than a fancy decision tree, could cause catastrophic social and civil collapse if it is allowed to be used unsafely.

Naturally, we are pursuing strategies to make sure that this has the lowest chance of coming to pass. We can’t do anything about luck, like that one time in 536AD when around 70% of the human population of Earth died in 18 months thanks to a supervolcano which caused a year-long night and a subsequent famine, then a huge plague as a result of that famine. Or that time a different supervolcano 75,000 years ago reduced the global human population to 3,000 − 10,000 individuals and caused us to inbreed ourselves to victory for a few centuries. The point is, our species flirts with extinction a lot (mostly because of volcanoes apparently, where our volcano longtermists at?) but a mixture of luck and skill sees us through. As long as no-one invents a superintelligent volcano at any point in the future, it seems to me that either a really smart or really stupid AI is the amongst biggest longterm threats to humankind.

However, most of us in this field are focused on the same area. There is a large technical AI safety community but very small communities based on other disciplines adjacent to the field—for example law, politics, social studies and governance. The focus and funding is overwhelmingly targeted towards instilling morals within the code of current and future AI to change how it thinks or behaves, but barely any focus is given to the ‘next stage’. A good forum post here touches on this idea related to ‘Phase 1’ and ‘Phase 2’ work and the values of orgs and individuals dedicated to action rather than theory.

To use an example, it’s kind of like realising we have a 50 years to escape Earth before it is destroyed, and dedicating most talent and funding we have into figuring out how to live on another planet whilst putting barely any talent or funding onto the problem of how to get there. Good luck with ozone-creating terraforming tools when the asteroid arrives and we didn’t build any rockets!

Humour aside, AI Safety is in the same position. There’s a load of research focused on how to instil morals in machines, or how to restrain machines from making bad decisions, but it all seems focused on the theory. There’s no investment into figuring out what we do with that theory, or how we’d metabolise and utilise success. Real life isn’t a lab environment, and in reality people don’t always make the right choice—but the easy choice.

The Problem

Most of my current PhD research is focused on exploring a particular AI harm, which is the current AI clustermess in the field of criminal law—particularly the concepts of reliability and fairness in intelligence and evidence. I spend a lot of timing looking at AI-generated evidence against people. Most of it deeply flawed. What I find time and time again is that AI companies and organisations deploy unsafe and unstable AI systems which cause lots of damage not because they don’t know better or because there’s no technical solution—but because it’s easy.

Before entering the legal field I did my BSc in AI, so when I saw these AI systems messing up again and again over simple things I got all excited because I knew how to technically fix them. It wasn’t until I started my research properly that I talked to stakeholders and found out that they know how to fix it too. They just don’t. Why? Because deploying AI for revenue requires two things—speed and profit margin. Creating unaligned, unsafe AI is faster and cheaper than not doing it. So that’s what they do.

This worries me because I know that we can discover the most perfect, amazing technical solution to AI alignment possible and create a way for AI to be morally attuned the way we need it to be, but that’s not the finish line. The finish line is getting that implemented. Just because the technical solution is available, it doesn’t mean people will use it in an unregulated environment. For AI Safety and Alignment, at some point we’re going to need a way to encourage people (with varying degrees of carrot and stick) to make the right choice in longtermist terms. The same way we did with disruptive technology in the past, like nuclear weapons or motorised vehicles.

The reason is that human society, law, and regulation don’t work like STEM, and relying on a purely STEM solution to AI Safety problems is a recipe for heartbreak at best. This is why it is concerning to take a look round at the AI Safety landscape. Take a look at most funding opportunities, AI Safety scholarships, AI Safety fellowships, and most recommended project ideas for funding. They all require or at least strongly prefer purely technical projects.

Another issue is that in STEM the field moves in leaps, those beautiful ‘eureka!’ moments that shoot the field forward like a grasshopper. This is not how many other fields tightly tied to AI Safety work. Take law for example—it moves in tiny, creeping increments, much more like a river eroding its own path through rock than a leaping grasshopper. The work required for AI Safety laws or governance in 50 years time can’t wait for 45 years to begin—it has to begin now. For the non-legal ones amongst us, a handy example is that the ‘Cartwright’ case (from the year 1569, before we started naming cases like MMA fixtures) deciding what technically defines a slave had significant impact on the Shanley v Harvey (1763) case which decided that a slave could inherit wealth, which then had impact on Forbes v Cochrane (1824) in deciding who could even be enslaved in the first place. These cases are 194 and then 61 years apart, yet still were impacting each other echoing into the future. Using slavery cases is kind of a left field move, but I’ve done it on purpose because law has dealt with the ideas of sentience vs rights before, just not on the good side of history.

The point is that fields outside computer science, such as law, politics, or social science are vital for the success of the technical solutions to AI safety and yet function very differently, especially in terms of pace and agility, and so will require investment. If we want to influence AI Alignment even in the year 2200, we need to start laying the interdisciplinary groundwork now.

The Solution

We need funding pots specifically for the non-technical elements of AI Safety. This would be for four reasons:

  1. To fund start-ups, NGOs, and organisations who can specialise in the legal, policy, social science and governance elements of AI Safety—attracting and cultivating talent in these areas and producing not only aligned research but actual current, measurable impact in the legal, political, and governance sphere to enable technical solutions, once discovered, to be implemented easier. I am aware some exist in the general areas of this such as the Legal Priorities Project and the Centre for AI Governance but they are towering cacti in an otherwise barren landscape. I’d start one, but feel too ‘early career’ to do this at this stage.

  2. To allow non-technical projects to be analysed and funded better. Much like explaining the intricacies of Bayesian mathematics to a family law researcher, explaining things like law, politics, or governance projects to a technical grantmaker can be difficult without being so general as to fail to display why the project is valuable. For example, research examining ‘whether transforming intellectual copyright and trade secret laws in both civil and common law jurisdictions to solve the AI Safety transparency issue are equally effective’ might seem baffling and weird to someone outside law, but to AI Safety researchers within the legal field it’s one of the major battlegrounds. I don’t know much about policy, politics, and social sciences but I’ve heard good suggestions for AI-adjacent research from researchers in those fields in the past

  3. Grants specifically made for non-technical elements would allow better, more in-depth feedback on projects by a wider variety of subject matter experts. It would also encourage researchers from these fields to move into EA focus areas.

  4. AI Safety will inevitably require public and political will in order to make its methods of safety compulsory. The STEM field is poorly equipped for this task, whereas other types of organisations are much better at this—for example launching legal cases, protests, political pressure groups, and more.

Additionally, the curation of and investment in more non-technical AI Safety community members would help future efforts, as well as help produce research beneficial to the AI Safety field. Talks on these topics at future events and introductory publications to the topic such as books or research reports would also be useful.

Criticism Already Received

When I spitballed to some people at EAG London 2022, response was generally good but some concerns were aired. I’ve responded to these here.

Organisations Exist
Some would argue that organisations such as the LPP, GovAI, Regulatory Institute, and the Alan Turing Institute are doing a good job already. I agree, actually. An amazing job. However, just like in technical research there are many different avenues of exploration and not all types of organisation (eg. ones who prioritise research over direct action, and vice versa) are effective in all areas. I would say a wider ecosystem of organisations interested in AI Safety, AI Alignment, AI-specific Longtermism etc would help support not only those fields but feed back more usefully to technical AI research.

Opportunities Exist for Non-Technical Involvement

Related to the previous suggestion, they probably do but are hard to find and few in number. This would help that, I feel.

Non-Technical people just don’t understand AI enough to contribute effectively

This could potentially be true, and I’ve found this at law conferences sometimes, but software developers don’t know enough about regulation to regulate effectively, and lawyers don’t know enough about social science to undertake AI research effectively, and policy researchers don’t understand law enough to work alone on policy effectively. It’s why the interdisciplinary method exists in the first place, and why an imbalance in focus doesn’t do us credit.

Conclusion

We can’t wait for the technical solutions to AI Safety, Alignment, and Longtermism before we begin thinking seriously about how they will be applied to a complex, and ever-changing society. An example is recent discussion about how international agreements not to pursue AGI/​ASI for certain uses would be a good idea—however, there’s very little international law talent within the AI Safety sector to write up what this would look like, how it would need to be put into place, how long it would take, what kind of foundation we would need to build first, or more. We must also be ready to invest in more esoteric yet potentially important legal theory research such as the pros and cons of AI legal personality, how ASI could weaponise our legal system to prevent efforts to thwart it, and more.

I know there already exist general funds, but my experience and the experience of some others I’ve talked to is that it is very difficult to get a non-technical AI safety project or research taken seriously by grantmakers, and that rejections rarely come with any feedback to ascertain whether the problem is indeed the subject area or something else. Perhaps we’ve all just been unlucky?

In either event, this is just a spitball idea for future areas worthy of focus—not a criticism of the fantastic work technical researchers are already doing. May they continue to save us from ourselves :)