We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”
We need a name for the following heuristic, I think, I think of it as one of those “tribal knowledge” things that gets passed on like an oral tradition without being citeable in the sense of being a part of a literature. If you come up with a name I’ll certainly credit you in a top level post!
I heard it from Abram Demski at AISU′21.
Suppose you’re either going to end up in world A or world B, and you’re uncertain about which one it’s going to be. Suppose you can pull lever LA which will be 100 valuable if you end up in world A, or you can pull lever LB which will be 100 valuable if you end up in world B. The heuristic is that if you pull LA but end up in world B, you do not want to have created disvalue, in other words, your intervention conditional on the belief that you’ll end up in world A should not screw you over in timelines where you end up in world B.
This can be fully mathematized by saying “if most of your probability mass is on ending up in world A, then obviously you’d pick a lever L such that V(L|A) is very high, just also make sure that V(L|B)>=0 or creates an acceptably small amount of disvalue.”, where V(L|A) is read “the value of pulling lever L if you end up in world A”