Camille comments on My Proven AI Safety Explanation (as a computing student)

Camille Feb 6, 2024, 5:59 PM
7 points
0 ∶ 0
Thank you for this !
I’m not an expert, but I read enough argumentation theory and psychology of reasoning in the past, so I want to comment on your pitch to explain what I think makes it work.

Your argument is well constructed in that it starts with evidence (“reward hacking”), proceeds to explain how we go from the evidence to the claim (something called the Warrant in one argumentation theory), then clarifies the claim. This is rare. Most of the time, people make the claim, give the evidence, and either forget the explanation of how we go from here to there or get into a frantic misunderstanding when adressing this point. You then end by adressing a common objection (“We’ll stop it before it kills us”).

Here’s the passage where you explain the warrant :
If it’s really smart, it will realize that we don’t actually want this. We don’t want to turn all of our electronic devices into paperclips. But it’s not going to do what you wanted it to do, it will do what you programmed it with.
This is called (among others) an argument by dissociation, and it’s good (actually, it’s the only propper way to explain a warrant that I know of). I’ve seen this step phrased in several ways in the past, but this particular chaining (AI will understand you want X. AI will not do what you want. Beause it does what it’s been programmed with, not what it understands you to want. These two are distinct) articulates it way better than the other instances I’ve seen in the past, it forced me to do the crucial fork in my mental models between “what it’s programmed for” and “what you want”. It also does away with the “But the AI will understand what I really mean” objection.

I think that part of your argument’s strength is due to you seemingly (from what I can guess) adopting a collaborative posture when making it. You insert elements in a very smooth way, detail vivid examples, and I can imagine that you make sure your tone and body language do not seem to presume an interlocutor’s lack of intelligence or knowledge (something that is left too often unchecked in EA/world interactions).

Some research strongly suggest that interpersonal posture is of utmost importance when introducing new ideas, and I think that this explains a lot of why people would rather be convinced by you than by someone else.
- Nicholas / Heather Kross Feb 7, 2024, 1:10 AM
  1 point
  0 ∶ 0
  Parent
  TIL that a field called “argumentation theory” exists, thanks!
- Mica White Feb 6, 2024, 10:49 PM
  1 point
  0 ∶ 0
  Parent
  Thanks for your response! It’s cool to see that there is science supporting this approach. The step-by-step journey from what we already know to the conclusion was very important to us. I noticed a couple of years ago that I tend to dismiss people’s ideas very quickly, and since then I’ve been making the effort to not be too narcissistic.