Reasons for optimism about measuring malevolence to tackle x- and s-risks

Reducing the influence of malevolent actors seems useful for reducing existential risks (x-risks) and risks of astronomical suffering (s-risks). One promising strategy for doing this is to develop manipulation-proof measures of malevolence.

I think better measures would be useful because:

  1. We could use them with various high-leverage groups, like politicians or AGI lab staff.

  2. We could use them flexibly (for information-only purposes) or with hard cutoffs.

  3. We could use them in initial selection stages, before promotions, or during reviews.

  4. We could spread them more widely via HR companies or personal genomics companies.

  5. We could use small improvements in measurements to secure early adopters.

I think we can make progress on developing and using them because:

  1. It’s neglected, so there will be low-hanging fruit

  2. There’s historical precedent for tests and screening

  3. We can test on EA orgs

  4. Progress might be profitable

  5. The cause area has mainstream potential

So let’s get started on some concrete research!

Context

~4 years ago, David Althaus and Tobias Baumann posted about the impact potential of “Reducing long-term risks from malevolent actors”. They argued that:

Dictators who exhibited highly narcissistic, psychopathic, or sadistic traits were involved in some of the greatest catastrophes in human history. Malevolent individuals in positions of power could negatively affect humanity’s long-term trajectory by, for example, exacerbating international conflict or other broad risk factors. Malevolent humans with access to advanced technology—such as whole brain emulation or other forms of transformative AI—could cause serious existential risks and suffering risks… Further work on reducing malevolence would be valuable from many moral perspectives and constitutes a promising focus area for longtermist EAs.

I and many others were impressed with the post. It got lots of upvotes on the EA Forum and 80,000 Hours listed it as an area that they’d be “equally excited to see some of our readers… pursue” as their list of the most pressing world problems. But I haven’t seen much progress on the topic since.

One of the main categories of interventions that Althaus and Baumann proposed was “The development of manipulation-proof measures of malevolence… [which] could be used to screen for malevolent humans in high-impact settings, such as heads of government or CEOs.” Anecdotally, I’ve encountered scepticism that this would be either tractable or particularly useful, which surprised me. I seem to be more optimistic than anyone I’ve spoken to about it, so I’m writing up some thoughts explaining my intuitions.

My research has historically been of the form: “assuming we think X is good, how do we make X happen?” This post is in a similar vein, except it’s more ‘initial braindump’ than ‘research’. It’s more focused on steelmanning the case for than coming to a balanced, overall assessment.

I think better measures would be useful

We could use difficult-to-game measures of malevolence with various high-leverage groups:

  • Political candidates

  • Civil servants and others involved in the policy process

  • Staff at A(G)I labs

  • Staff at organisations inspired by effective altruism.

Some of these groups might be more tractable to focus on first, e.g. EA orgs. And we could test in less risky environments first, e.g. smaller AI companies before frontier labs, or bureaucratic policy positions before public-facing political roles.

The measures could be binding or used flexibly, for information-only purposes. For example, in a hiring process, there could either be some malevolence threshold above which a candidate is rejected without question, or test(s) for malevolent traits could just be used as one piece of information in a wider process, which the hiring manager could use as they pleased. Of course, flexible uses would be easier first stepping stones, even in cases where we’d hope for binding thresholds to be used eventually.

These measures could potentially be used at various stages of selection and review:

  1. Initial selection, screening, or hiring of candidates

  2. Before a promotion or additional responsibility

  3. As part of reviews or performance evaluations.

For example, in the case of a politician, they could be:

  1. Screened for malevolent traits by the party machine, when deciding whether to let them run for election in the first place

  2. Screened before being offered a government ministerial position

  3. Screened before re-election, or in response to some specific concerning behaviour.[1]

I’m most optimistic about the earliest stages: it’s generally easier to just reject a candidate than to fire them, and there’s more of an existing culture of rigorous checks as part of application processes than as part of review processes. But we tend to have higher expectations of individuals with more responsibility (e.g. we expect more from the Prime Minister than from a Member of Parliament, and far more than from a local councillor), which might make some types of screening before additional responsibilities more promising. And various random factors might make an organisation more amenable to using screening measures in evaluations than in initial hiring at a given time.[2]

There also seem to be some opportunities for wider dissemination and use, via business-to-business HR companies that offer candidate screening services. (Or possibly via personal genomics companies like 23andMe, sperm banks, IVF clinics, etc, though the benefits seem lower and the risks higher in these genetic contexts.)

Here, the theory of change is more diffuse, but benefits of more widespread use could include:

  • Increased culture of screening against malevolence, which in turn influences the higher-leverage organisations we’re most interested in.

  • Increased awareness and interest in advancing the science of malevolence more generally, which helps us develop better measures and implement the safeguards where they’re most needed.

  • Better data about malevolence, making relevant scientific advancements more feasible.

  • Increased awareness of the issues and therefore political will to implement anti-malevolence measures.

  • (Reduced frequency of highly malevolent individuals through parents selecting against embryos with high risk of malevolence.)

Change doesn’t happen overnight. I expect positive feedback loops between improved science/​tech and practical implementation. We can use better measures of malevolence to advocate that potential early adopters start introducing these measures. This initial interest might, in turn, spur further interest in developing the science, which we can then use to advocate for further implementation of screening, and so on.

I think we can make progress[3]

1. It’s neglected, so there will be low-hanging fruit

Neglected problems (and intervention types) tend to have low-hanging fruit that can be plucked. We don’t need to ‘fix’ malevolence overnight, or suddenly develop perfect, manipulation-proof tests of malevolence, in order to start making progress on tackling the problem and reducing the likelihood of catastrophic outcomes. Noisy and imperfect screening against malevolence is probably better than no screening.[4]

2. There’s historical precedent for tests and screening

The use of security clearances in government and state positions provides precedent for binding measures to be introduced. Of course, security clearances aren’t primarily[5] testing a personality trait, so this isn’t a perfect analogy for tests of dark tetrad traits.

But there are a plethora of non-binding tests and screening measures that provide broader precedent. Some relevant analogies include:

  • The extensive use of polygraph tests (lie detectors) in US government and law enforcement, despite their unreliability (Nelson, 2015 suggests 77-95% accuracy) and several legal rulings against them.

  • Public pressure on politicians to reveal their tax returns (and other documents?).[6]

  • Businesses and other institutions using cognitive ability tests or personality tests as part of their hiring processes, e.g. Google. (I’m not sure if these are ever binding with pre-set thresholds?)

  • Reference checks and other ‘due diligence’ on candidates both for hiring and promotion.

3. We can test on EA orgs

Organisations inspired by or associated with effective altruism, longtermism, or rationalism (hereafter just ‘EA orgs’) could be the guinea pigs for tests that are developed.

It may be especially useful for EA orgs — by their own goals — to screen against malevolence:

  • Talent search organisations often offer irrelevant incentives such as money, prestige, and credentials to attract promising applicants, but in doing so also attract ‘grifters’. Application processes designed to identify altruistic intent, scout mindset, and other desirable traits can sometimes be sidestepped and exploited by people who are willing to exaggerate or lie… which seems especially likely among malevolent individuals. Speaking as someone who runs such an organisation, I’d be pretty relieved to be able to test (reliably) for malevolence.

  • My guess is that organisations explicitly focused on having positive social impact are more vulnerable to bad PR arising from malevolent actions of (ex-)employees or participants. People hate hypocrisy.

  • It seems plausible (to me) that effective altruism and longtermism are disproportionately attractive to people high on at least some malevolent traits.[7] If EA orgs share this intuition, they might be more concerned about the risks of malevolence than other orgs.

Even beyond these reasons, I expect EA orgs to be more willing to incur some costs to help advance a project that may help reduce s-risks and x-risks. Tractability aside, it also seems genuinely pretty useful to screen against malevolence among these organisations, given that they may have non-negligible influence over the trajectory of the future.

4. Progress might be profitable

There might be profitable products that could be developed that help to advance the battle against malevolence. Again, these could be sold by business-to-business HR companies that offer candidate screening services (or personal genomics companies, sperm banks, IVF clinics, etc).

Advances might come from advocacy to established companies, or from mission-driven entrepreneurship. This might open up substantial additional resources.

Of course, profitable opportunities rely on demand for screening services, plus sufficiently strong science and technology to offer tests. I’m not sure if the conditions are yet ripe for any profitable businesses in this space. But if not, then initial progress on the relevant science, tech, or advocacy might “unlock” new resources by creating new, profitable opportunities.

5. The cause area has mainstream potential

Okay, there are super controversial (and plausibly genuinely bad!) aspects to at least some intervention possibilities for reducing the influence of malevolent actors. But not all of them… and the general case for reducing the influence of malevolent actors seems very robust across worldviews and priorities. It seems good for tackling both x-risks and s-risks, as well as for generally increasing justice, integrity, and other things that most people agree are good.[8]

As well as the detailed, rigorous, academic research that is needed, I can also imagine a very wide range of different contributions to advancing this cause, making it much more accessible than, say, research relevant to cooperation and conflict between AIs:

  • There are potential advocacy projects,[9] which opens up a whole host of classic nonprofit and social movement roles. Think large volunteer bases, campaign and marketing roles, nonprofit management roles, etc.

  • As noted above, it might be profitable, opening up classic for-profit business roles.

  • Psychology seems like the most relevant academic background here, which isn’t the case for any of the current most popular longtermist cause areas. Psychology is pretty popular as a PhD choice.[10] So developing this area might just open up opportunities for different sorts of people to contribute to reducing s-risks and x-risks.

Concrete research ideas

We can summarise existing science and collect relevant insights through (systematic) literature reviews. E.g. on[11]:

I think literature reviews are usually pretty tractable even for (junior) researchers without relevant prior expertise or training. Cost-effectiveness analyses and feasibility assessments of manipulation-proof measures (at scale) could be useful. Alternatively, social psychologists could focus on developing better constructs and measures of malevolence, even if the tests are easily gameable.[12]

There are also a number of research projects that aren’t focused on developing manipulation-proof measures themselves, but could help to (1) better understand the promise of the cause area, (2) learn practical lessons about pathways to implementation, (3) gain examples to point towards to help make advocacy more credible:

  • A broad investigation into the most relevant existing parallels for malevolence tests in various institutional contexts. This could be very ‘practical’, rather than ‘academic’.

  • More detailed case studies of the historical use of some of these technologies or tests.[13]

  • More detailed research (or summaries of the relevant implications of existing research) into seemingly malevolent historical leaders:

    • To what extent were they genuinely malevolent?

    • What were the effects of that?

    • How much influence did they really have on events?

    • Which factors have helped or hindered them gaining power?

    • Which actions might have reduced the harms they caused?

  • Lots of possible surveys about people’s current attitudes towards malevolence.[14]

  • Lots of possible other research or interventions relating to malevolence reduction but not difficult-to-game measures specifically, e.g. workshops on how to spot malevolent traits, whistleblowing on malevolent behaviour, political interventions, relevant AI evals.

You might be able to get funding and support to do this research, e.g. from:

Thank you to Clare Diane Harris, David Althaus, Tobias Baumann, and Lucius Caviola for comments/​suggestions on a draft of this post. All opinions and mistakes are mine.

  1. ^

    For a startup position, equivalents might be (1) hiring, (2) before being promoted to a C-Suite position, (3) during a quarterly or annual performance review.

  2. ^

    For example, they might be looking to make cuts for some reason anyway and interested in reasons to get rid of people; they might have already invested heavily into hiring processes but neglected staff review processes; they might have experienced some sort of scandal or external pressure suggesting cultural issues in the organisation that they want to crack down on.

  3. ^

    Stefan Torges has previously highlighted that malevolence reduction is more tractable than other s-risk work: it is a known problem, with known quantities, potential for buy-in from stakeholders, a decent window of opportunity, feedback signals, potential for talent absorption and lower infohazard risk. I think those considerations seem similarly or more important than the 5 I list here, although Stefan’s points aren’t focused on better measures of malevolence.

  4. ^

    Of course the status quo is not ‘no screening’, it’s actually ‘indirect and disorganised screening.’ The point probably holds though.

  5. ^

    Some security clearance processes do seem to assess personality traits. For example, the Australian Criminal Intelligence Commission says that they look for honesty, trustworthiness, being impartial, being respectful, and being ethical. (Thanks to Clare Diane Harris for this point.)

  6. ^

    Of course, the pressure is sometimes refused. There’s interesting discussion on testing for presidents here.

  7. ^
  8. ^

    I can imagine it having a very compelling emotional/​narrative appeal to it, along the lines of ‘literally battling evil.’

  9. ^

    Encouraging companies like 23andMe, sperm banks etc to include relevant tests; more general awareness raising to increase demand for tests of Dark Tetrad traits; potential pressure campaigns to demand such tests be carried out and used (likely at a later date).

  10. ^

    It’s somewhat relevant to improving institutional decision-making, global priorities research, and longtermist talent search. Overall, this point seems notably weaker than the previous two.

  11. ^

    Note, I haven’t actually looked up if any of these exist already. But my suggestion here is partly about distilling insights specifically for reducing malevolence anyway.

  12. ^

    Some initial progress here seems like it could be pretty easy for someone with relevant training already, and doable for others too if they’re willing to take the time to learn on the job.

  13. ^

    I’m pretty tempted to read this out of curiosity.

  14. ^

    I liked this comment from Saulius Simcikas: “I imagine that most people would support a law that candidates must take the test before elections and that this information should be made public. We can figure out if that’s true via a survey. And if it turned out that some candidate has those traits, I think that it would make people less likely to vote for that person. That can also be researched by doing surveys.”