Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.
FWIW many people are already very interested in capability evaluations related to AI acceleration of AI R&D.
For instance, at the UK AI Safety Institute, the Loss of Control team is interested in these evaluations.
Loss of control: As advanced AI systems become increasingly capable, autonomous, and goal-directed, there may be a risk that human overseers are no longer capable of effectively constraining the system’s behaviour. Such capabilities may emerge unexpectedly and pose problems should safeguards fail to constrain system behaviour. Evaluations will seek to avoid such accidents by characterising relevant abilities, such as the ability to deceive human operators, autonomously replicate, or adapt to human attempts to intervene. Evaluations may also aim to track the ability to leverage AI systems to create more powerful systems, which may lead to rapid advancements in a relatively short amount of time.
Build and lead a team focused on evaluating capabilities that are precursors to extreme harms from loss of control, with a current focus on autonomous replication and adaptation, and uncontrolled self-improvement.
FWIW many people are already very interested in capability evaluations related to AI acceleration of AI R&D.
For instance, at the UK AI Safety Institute, the Loss of Control team is interested in these evaluations.
Some quotes:
Introducing the AI Safety Institute:
Jobs
Thanks so much for those links, I hadn’t seen them!
(So much AI-related stuff coming out every day, it’s so hard to keep on top of everything!)
METR ‘Model Evaluation & Threat Research’ might also be worth mentioning. I wonder if there’s a list of capability evaluations projects somewhere