Lukas_Gloor comments on Aim for conditional pauses

Lukas_Gloor Sep 30, 2023, 9:49 AM
3 points
1 ∶ 0
I don’t think this depends on takeoff speeds at all, since I’d expect a conditional pause proposal to lead to a pause well before models are automating 20% of tasks (assuming no good mitigations in place by that point).
I should have emphasized that I’m talking about cognitive AI takeoff, not economic takeoff.
I don’t have a strong view whether there are ~20% of human tasks that are easy and regular/streamlined enough to automate with stochastic-parrot AI tools. Be that as it may, what’s more important is what happens once AIs pass the reliability threshold that makes someone a great “general assistant” in all sorts of domains. From there, I think it’s just a tiny step further to also being a great CEO. Because these capability levels are so close to each other on my model, the world may still look similar to ours at that point.
All of that said, it’s not like I consider it particularly likely that a system would blow past all the evals you’re talking about in a single swoop, especially since some of them will be (slightly) before the point of being a great “general assistant.” I also have significant trust that the people designing these evals will be thinking about these concerns. I think it’s going to be very challenging to make sure evals organizations (or evals teams inside labs in case it’s done lab-internally) have enough political power and stay uncorrupted by pressures to be friendly towards influential lab leadership. These problems are surmountable in theory, but I think it’ll be hard, so I’m hoping the people working on this are aware of all that could go wrong. I recently wrote up some quick thoughts on safety evals here. Overall, I’m probably happy enough with a really well-thought out “conditional pause” proposal, but I’d need to be reassured that the people who decide in favor of that can pass the Ideological Turing test for positions like fast takeoff or the point that economic milestones like “20% of tasks are automated” are probably irrelevant.
- AnonResearcherMajorAILab Sep 30, 2023, 11:01 AM
  2 points
  1 ∶ 0
  Parent
  Sounds like we roughly agree on actions, even if not beliefs (I’m less sold on fast / discontinuous takeoff than you are).
  As a minor note, to keep incentives good, you could pay evaluators / auditors based on how much performance they are able to elicit. You could even require that models be evaluated by at least three auditors, and split up payment between them based on their relative performances. In general it feels like there a huge space of possibilities that has barely been explored.