Chris Leong comments on Apply to Aether—Independent LLM Agent Safety Research Group

Chris Leong 21 Aug 2024 17:10 UTC
2 points
1 ∶ 0
It’s not clear that scaffolding is a one-time boost vs. a whole series of boosts, so I guess I feel that the idea of “burning up” the overhang is naive.
- Owen Cotton-Barratt 21 Aug 2024 17:24 UTC
  2 points
  0 ∶ 0
  Parent
  BTW, it’s possible I’m not appreciating the strongest forms of the arguments against advancing scaffolding being good. From my perspective, if I think of it purely as “software capabilities” then there’s a generic argument against improving capabilities, which I don’t totally buy but definitely gets some weight. But when you zoom in to thinking about different types of capabilities I end up thinking scaffolding is quite robustly good. I’d be interested if you know of a zoomed-in case against scaffolding.
  - Chris Leong 21 Aug 2024 17:48 UTC
    2 points
    0 ∶ 0
    Parent
    I don’t have a “zoomed-in case”, but if I did, I think it would be unwise to share it publicly.
    - Owen Cotton-Barratt 21 Aug 2024 18:10 UTC
      2 points
      0 ∶ 0
      Parent
      That’s very surprising to me. Can you explain why publicly?
      - Chris Leong 22 Aug 2024 3:23 UTC
        2 points
        0 ∶ 0
        Parent
        Surely to be convincing, I’d have to go into detail about the best ways to scaffold such systems. And even though my knowledge is likely not at the level where this would likely have any impact, it still seems bad in principle to do things that would be bad if they were done by someone with a higher level of knowledge and competence.
        Owen Cotton-Barratt 22 Aug 2024 6:23 UTC
        2 points
        0 ∶ 0
        Parent
        Oh I see. I definitely wasn’t expecting anything that zoomed in. Rather, I was thinking maybe you had some abstract model which separated out capabilities-from-base-model from capabilities-from-scaffolding, and could explain something about the counterfactual of advancing the latter, and how it all interacted with safety.
        Chris Leong 22 Aug 2024 7:31 UTC
        2 points
        0 ∶ 0
        Parent
        Sticking to generalities, there are many ways of scaffolding models and many of them can combine and many kinds of scaffolding that don’t work at lower base model levels will work at higher base model levels. You can basically just throw anything at humans and we’ll figure out how to make it work.
        Owen Cotton-Barratt 22 Aug 2024 9:09 UTC
        3 points
        0 ∶ 0
        Parent
        I agree with this. (While noting that some forms of scaffolding will work noticeably better with humans than others will, so there are still capabilities boosts to be had for organizations of humans from e.g. procedures and best practices.)
        
        But if our plan was to align organizations of some human-like entities that we were gradually training to be smarter, I’d be very into working out how to get value out of them by putting them into organizations during the training process, as I expect we’d learn important things about organizational design in the process (and this would better position us to ensure that the eventual organizations were pretty safe).
- Owen Cotton-Barratt 21 Aug 2024 17:16 UTC
  2 points
  0 ∶ 0
  Parent
  Sure, I totally expect it to be a series of boosts, but like most domains I expect you get diminishing returns to research effort put in. So there’s a question of at-the-margin, how much of a boost-from-the-low-hanging-fruit-of-scaffolding are you leaving?