gergo comments on Aim for conditional pauses

gergo Sep 29, 2023, 7:34 AM
3 points
0 ∶ 0
Thank you for writing this post, I think I learnt a lot from it (including about things I didn’t expect I would, such as waste sites and failure modes in cryonics advocacy—excellent stuff!).
Question for anyone to chip in on:
I’m wondering whether if we’re to make the “conditional pause system” the post is advocating for universal, would it imply that the alignment community needs to drastically scale up (in terms of quantity of researchers) to be able to do similar work to what ARC Evals is doing?
After all, someone would actually need to check if systems at a given capability are safe, and as the post argues, you would not want AGI labs to do it for themselves. However, if all current top labs were to start throwing their cutting-edge models at ARC Evals, I imagine they would be quite overwhelmed. (And the demand for evals would just increase over time)
I could see this being less of an issue if the evaluations only need to happen for the models that are really the most capable at a given point in time, but my worry would be that as capabilities increase, even if we test the top models rigorously, the second-tier models could still end up doing harm.
(I guess it would also depend on whether you can “reuse” some of the insights you gain from evaluating the top models at a given time on the second-tier models at a given time, but I certainly don’t know enough about this topic to know if that would be feasible)
- AnonResearcherMajorAILab Sep 29, 2023, 7:48 AM
  3 points
  0 ∶ 0
  Parent
  Yes, in the longer term you would need to scale up the evaluation work that is currently going on. It doesn’t have to be done by the alignment community; there are lots of capable ML researchers and engineers who can do this (and I expect at least some would be interested in it).
  I could see this being less of an issue if the evaluations only need to happen for the models that are really the most capable at a given point in time
  Yes, I think that’s what you would do.
  my worry would be that as capabilities increase, even if we test the top models rigorously, the second-tier models could still end up doing harm.
  The proposal would be that if the top models are risky, then you pause. So, if you haven’t paused, then that means your tests concluded that the top models aren’t risky. In that case I don’t know why you expect the second-tier models to be risky?

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer