Miguel comments on Apply to >50 AI safety funders in one application with the Nonlinear Network [Round Closed]

Miguel 13 Apr 2023 2:59 UTC
3 points
0 ∶ 3
Hello there,

Are you interested of funding this theory of mine that I submitted to AI alignment awards? I am able to make this work in GPT2 and now writing the results. I was able to make GPT2 shutdown itself (100% of the time) even if it’s aware of the shutdown instruction called “the Gauntlet” embedded through fine-tuning an artificially generated archetype called “the Guardian” essentially solving corrigibility, outer and inner alignment. https://twitter.com/whitehatStoic/status/1645758144537034752?t=ps-Ccu42tcScTmWg1qYuqA&s=19

Let me know if you guys are interested. I want to test it in higher parameter models like Llama and Alpaca but don’t have the means to finance the equipment.

I also found out that there is a weird setting in the temperature for GPT2 where in the range of .498 to .50 my shutdown code works really well, I still don’t know why though. But yeah I believe that there is an incentive to review what’s happening inside the transformer architecture.

Here was my original proposal: https://www.whitehatstoic.com/p/research-proposal-leveraging-jungian

I’ll post my paper for the corrigibility solution too once finished probably next week.

Looking forward to hearing from you.

Best regards,

Miguel
- Miguel 13 Apr 2023 8:18 UTC
  5 points
  0 ∶ 0
  Parent
  I have submitted an application no need to reply!
- Miguel 29 Apr 2023 2:03 UTC
  1 point
  0 ∶ 0
  Parent
  The final write-up for my project.
  
  https://www.lesswrong.com/posts/pu6D2EdJiz2mmhxfB/archetypal-transfer-learning-a-proposed-alignment-solution
- Miguel 13 Apr 2023 3:05 UTC
  1 point
  1 ∶ 0
  Parent
  Also using fine-tuning with traditional Jungian archetypes allowed GPT2 to tell stories either of depressing or motivational in nature 100%of the time. Thanks for reading!