It’s not clear that scaffolding is a one-time boost vs. a whole series of boosts, so I guess I feel that the idea of “burning up” the overhang is naive.
BTW, it’s possible I’m not appreciating the strongest forms of the arguments against advancing scaffolding being good. From my perspective, if I think of it purely as “software capabilities” then there’s a generic argument against improving capabilities, which I don’t totally buy but definitely gets some weight. But when you zoom in to thinking about different types of capabilities I end up thinking scaffolding is quite robustly good. I’d be interested if you know of a zoomed-in case against scaffolding.
Surely to be convincing, I’d have to go into detail about the best ways to scaffold such systems. And even though my knowledge is likely not at the level where this would likely have any impact, it still seems bad in principle to do things that would be bad if they were done by someone with a higher level of knowledge and competence.
Oh I see. I definitely wasn’t expecting anything that zoomed in. Rather, I was thinking maybe you had some abstract model which separated out capabilities-from-base-model from capabilities-from-scaffolding, and could explain something about the counterfactual of advancing the latter, and how it all interacted with safety.
Sticking to generalities, there are many ways of scaffolding models and many of them can combine and many kinds of scaffolding that don’t work at lower base model levels will work at higher base model levels. You can basically just throw anything at humans and we’ll figure out how to make it work.
I agree with this. (While noting that some forms of scaffolding will work noticeably better with humans than others will, so there are still capabilities boosts to be had for organizations of humans from e.g. procedures and best practices.)
But if our plan was to align organizations of some human-like entities that we were gradually training to be smarter, I’d be very into working out how to get value out of them by putting them into organizations during the training process, as I expect we’d learn important things about organizational design in the process (and this would better position us to ensure that the eventual organizations were pretty safe).
Sure, I totally expect it to be a series of boosts, but like most domains I expect you get diminishing returns to research effort put in. So there’s a question of at-the-margin, how much of a boost-from-the-low-hanging-fruit-of-scaffolding are you leaving?
It’s not clear that scaffolding is a one-time boost vs. a whole series of boosts, so I guess I feel that the idea of “burning up” the overhang is naive.
BTW, it’s possible I’m not appreciating the strongest forms of the arguments against advancing scaffolding being good. From my perspective, if I think of it purely as “software capabilities” then there’s a generic argument against improving capabilities, which I don’t totally buy but definitely gets some weight. But when you zoom in to thinking about different types of capabilities I end up thinking scaffolding is quite robustly good. I’d be interested if you know of a zoomed-in case against scaffolding.
I don’t have a “zoomed-in case”, but if I did, I think it would be unwise to share it publicly.
That’s very surprising to me. Can you explain why publicly?
Surely to be convincing, I’d have to go into detail about the best ways to scaffold such systems. And even though my knowledge is likely not at the level where this would likely have any impact, it still seems bad in principle to do things that would be bad if they were done by someone with a higher level of knowledge and competence.
Oh I see. I definitely wasn’t expecting anything that zoomed in. Rather, I was thinking maybe you had some abstract model which separated out capabilities-from-base-model from capabilities-from-scaffolding, and could explain something about the counterfactual of advancing the latter, and how it all interacted with safety.
Sticking to generalities, there are many ways of scaffolding models and many of them can combine and many kinds of scaffolding that don’t work at lower base model levels will work at higher base model levels. You can basically just throw anything at humans and we’ll figure out how to make it work.
I agree with this. (While noting that some forms of scaffolding will work noticeably better with humans than others will, so there are still capabilities boosts to be had for organizations of humans from e.g. procedures and best practices.)
But if our plan was to align organizations of some human-like entities that we were gradually training to be smarter, I’d be very into working out how to get value out of them by putting them into organizations during the training process, as I expect we’d learn important things about organizational design in the process (and this would better position us to ensure that the eventual organizations were pretty safe).
Sure, I totally expect it to be a series of boosts, but like most domains I expect you get diminishing returns to research effort put in. So there’s a question of at-the-margin, how much of a boost-from-the-low-hanging-fruit-of-scaffolding are you leaving?