Ryan Greenblatt comments on William_MacAskill’s Quick takes

Ryan Greenblatt Jan 25, 2024, 2:27 AM
9 points
1 ∶ 0
Figuring out what a good operationalisation of transformative AI would be, for the purpose of creating an early tripwire to alert the world of an imminent intelligence explosion.
FWIW many people are already very interested in capability evaluations related to AI acceleration of AI R&D.
For instance, at the UK AI Safety Institute, the Loss of Control team is interested in these evaluations.

Some quotes:

Introducing the AI Safety Institute:
Loss of control: As advanced AI systems become increasingly capable, autonomous, and goal-directed, there may be a risk that human overseers are no longer capable of effectively constraining the system’s behaviour. Such capabilities may emerge unexpectedly and pose problems should safeguards fail to constrain system behaviour. Evaluations will seek to avoid such accidents by characterising relevant abilities, such as the ability to deceive human operators, autonomously replicate, or adapt to human attempts to intervene. Evaluations may also aim to track the ability to leverage AI systems to create more powerful systems, which may lead to rapid advancements in a relatively short amount of time.
Jobs
Loss of Control Evaluations Lead
Build and lead a team focused on evaluating capabilities that are precursors to extreme harms from loss of control, with a current focus on autonomous replication and adaptation, and uncontrolled self-improvement.
What links here?
- ryan_greenblatt's comment on Bogdan Ionut Cirstea’s Shortform by Bogdan Ionut Cirstea (LessWrong; Feb 27, 2024, 6:53 PM; 5 points)
- William_MacAskill Jan 25, 2024, 11:28 PM
  9 points
  2 ∶ 0
  Parent
  Thanks so much for those links, I hadn’t seen them!
  
  (So much AI-related stuff coming out every day, it’s so hard to keep on top of everything!)
- AnonymousTurtle Jan 26, 2024, 1:53 AM
  4 points
  0 ∶ 0
  Parent
  METR ‘Model Evaluation & Threat Research’ might also be worth mentioning. I wonder if there’s a list of capability evaluations projects somewhere

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer

Ryan Greenblatt comments on William_MacAskill’s Quick takes

Loss of Control Evaluations Lead