ShayBenMoshe comments on Hiring engineers and researchers to help align GPT-3

ShayBenMoshe 3 Oct 2020 20:15 UTC
16 points
0 ∶ 0
Thank you for posting this, Paul. I have questions about two different aspects.
In the beginning of your post you suggest that this is “the real thing” and that these systems “could pose an existential risk if scaled up”.
I personally, and I believe other members of the community, would like to learn more about your reasoning.
In particular, do you think that GPT-3 specifically could pose an existential risk (for example if it falls into the wrong hands, or scaled up sufficiently)? If so, why, and what is a plausible mechanism by which it poses an x-risk?
On a different matter, what does aligning GPT-3 (or similar systems) mean for you concretely? What would the optimal result of your team’s work look like?
(This question assumes that GPT-3 is indeed a “prosaic” AI system, and that we will not gain a fundamental understanding of intelligence by this work.)
Thanks again!
- Paul_Christiano 5 Oct 2020 19:43 UTC
  20 points
  0 ∶ 0
  Parent
  I think that a scaled up version of GPT-3 can be directly applied to problems like “Here’s a situation. Here’s the desired result. What action will achieve that result?” (E.g. you can already use it to get answers like “What copy will get the user to subscribe to our newsletter?” and we can improve performance by fine-tuning on data about actual customer behavior or by combining GPT-3 with very simple search algorithms.)
  I think that if GPT-3 was more powerful then many people would apply it to problems like that. I’m concerned that such systems will then be much better at steering the future than humans are, and that none of these systems will be actually trying to help people get what they want.
  A bunch of people have written about this scenario and whether/how it could be risky. I wish that I had better writing to refer people to. Here’s a post I wrote last year to try to communicate what I’m concerned about.
  - ShayBenMoshe 5 Oct 2020 21:32 UTC
    8 points
    0 ∶ 0
    Parent
    Thanks for the response.
    I believe this answers the first part, why GPT-3 poses an x-risk specifically.
    Did you or anyone else ever write what aligning a system like GPT-3 looks like? I have to admit that it’s hard for me to even have a definition of being (intent) aligned for a system GPT-3, which is not really an agent on its own. How do you define or measure something like this?