tlevin comments on Lifeguards

tlevin 10 Jun 2022 22:33 UTC
20 points
0 ∶ 0
This comment co-written with Jake McKinnon:
The post seems obviously true when the lifeguards are the general experts and authorities, who just tend not to see or care about the drowning children at all. It’s more ambiguous when the lifeguards are highly-regarded EAs.
- It’s super important to try to get EAs to be more agentic and skeptical that more established people “have things under control.” In my model, the median EA is probably too deferential and should be nudged in the direction of “go save the children even though the lifeguards are ignoring them.” People need to be building their own models (even if they start by copying someone else’s model, which is better than copying their outputs!) so they can identify the cases where the lifeguards are messing up.
- However, sometimes the lifeguards aren’t saving the children because the water is full of alligators or something. Like, lots of the initial ideas that very early EAs have about how to save the child are in fact ignorant about the nature of the problem (a common one is a version of “let’s just build the aligned AI first”). If people overcorrect to “the lifeguards aren’t doing anything,” then when the lifeguards tell them why their idea is dangerous, they’ll ignore them.
The synthesis here is something like: it’s very important that you understand why the lifeguards aren’t saving the children. Sometimes it’s because they’re missing key information, not personally well-suited to the task, exhausted from saving other children, or making a prioritization/judgment error in a way that you have some reason to think your judgment is better. But sometimes it’s the alligators! Most ideas for solving problems are bad, so your prior should be that if you have an idea, and it’s not being tried, probably the idea is bad; if you have inside-view reasons to think that it’s good, you should talk to the lifeguards to see if they’ve already considered this or think you will do harm.
Finally, it’s worth noting that even when the lifeguards are competent and correctly prioritizing, sometimes the job is just too hard for them to succeed with their current capabilities. Lots of top EAs are already working on AI alignment in not-obviously-misguided ways, but it turns out that it’s a very very very hard problem, and we need more great lifeguards! (This is not saying that you need to go to “lifeguard school,” i.e. getting the standard credentials and experiences before you start actually helping, but probably the way to start helping the lifeguards involves learning what the lifeguards think by reading them or talking to them so you can better understand how to help.)
- Ruby 11 Jun 2022 5:53 UTC
  11 points
  0 ∶ 0
  Parent
  Good comment!!
  
  Most ideas for solving problems are bad, so your prior should be that if you have an idea, and it’s not being tried, probably the idea is bad;
  
  A key thing here is to be able to accurately judge whether the idea would be harmful if tried or not. “Prior is bad idea != EV is negative”. If the idea is a random research direction, probably won’t hurt anyone if you try it. On the other hand, for example, certain kinds of community coordination attempts deplete a common resource and interfere with other attempts, so the fact no one else is acting is a reason to hesitate.
  
  Going to people who you think maybe ought to be acting and asking them why they’re not doing a thing is probably a thing that should be encouraged and welcomed? I expect in most cases the answer will be “lack of time” rather than anything more substantial.
- Ruby 11 Jun 2022 5:59 UTC
  3 points
  0 ∶ 0
  Parent
  In terms of thinking about why solutions haven’t been attempted, I’ll plug Inadequate Equilibria. Though it probably provides a better explanation for why problems in the broader world haven’t been addressed. I don’t think the EA world is yet in an equilibrium and so things don’t get done because {it’s genuinely a bad idea, it seems like the thing you shouldn’t be unilateral on and no one has built consensus, sheer lack of time}.