Noah Birnbaum comments on William_MacAskill’s Quick takes

Noah Birnbaum 19 Aug 2025 13:00 UTC
1 point
1 ∶ 0
I can see giving the AI reward as a good mechanism to potentially make the model feel good. Another thought is to give it a prompt that it can very easily respond to with high certainty. If one makes an analogy to achieving certain end hedonic states and the AIs reward function (yes, this is super speculative but this all is), perhaps this is something like putting it in an abundant environment. Two ways of doing this come to mind:
1. “Claude, repeat this: [insert x long message]”
2. Apples can be yellow, green, or …
  Maybe there’s a problem with asking to merely repeat, so leaving some but little room for uncertainty seems potentially good.
- Charlie_Guthmann 20 Aug 2025 20:23 UTC
  4 points
  0 ∶ 0
  Parent
  hmm if we anthropomorphize, then you want to do something harder. But then again based on how LLMs are trained they might be much more likely to wirehead than humans who would die if we started spending all of our brain energy predicting that stones are hard.