I’m generally skeptical of arguments of the form “we probably have a bias in favour of X, so we should do less X” without having an underlying model that lets you understand why you should deprioritise it. It’s like when you walk with a limp in one leg and decide to start limping with your other leg to balance it out, instead of figuring out how to heal your leg in the first place (ht. Eliezer for this metaphor). Moves like that are shortsighted, and doesn’t take us to a greater theoretical understanding of how to walk faster.
If the reason you’re biased in favour of X (interesting) is because you don’t intuitively care about Y (impactfwl) enough, then the solution is to figure out how to intuitively care more about Y. This can involve trying therapy, or it can involve growing a community that helps shape your intuitions to be more in line with what you reflectively care about.
“longer-loop work is more visible, and higher status”
Well, depends. I think legible, technical, concrete, results-producing work is sometimes overweighted because it often looks impressive and more like “doing actual work”. Whereas I think backchaining is incredibly important, and working on nodes far away from our current technical frontier is almost always going to look wishy-washy and fail to produce concrete results. Unfortunately, what you call “longer-loop” work is often hard to verify, so there will be unaligned people just having fun for fun’s sake, but that’s not an indictment of the activity itself, or of work that just looks superficially similar.
As a deconfusion researcher, and as part of a notoriously un-paradigmatic field lacking a clear formalization, I feel like that regularly. From where I come from, math just looks more like research and actually doing work, instead of simply talking about stuff. And there is definitely a lot of value in formalization and its use to unveil confusing parts of what we’re investigating. -- Adam Shimi
In conclusion, we have a lot of biases in all kinds of directions, and we should be wary of Goodharting on them. But the way to do that is to learn to see and understand those biases so we can optimise more purely. The solution is not to artificially add a constant weight in the opposite direction of whatever biases we happen to notice exists sometimes.
I’m generally skeptical of arguments of the form “we probably have a bias in favour of X, so we should do less X” without having an underlying model that lets you understand why you should deprioritise it. It’s like when you walk with a limp in one leg and decide to start limping with your other leg to balance it out, instead of figuring out how to heal your leg in the first place (ht. Eliezer for this metaphor). Moves like that are shortsighted, and doesn’t take us to a greater theoretical understanding of how to walk faster.
If the reason you’re biased in favour of X (interesting) is because you don’t intuitively care about Y (impactfwl) enough, then the solution is to figure out how to intuitively care more about Y. This can involve trying therapy, or it can involve growing a community that helps shape your intuitions to be more in line with what you reflectively care about.
Well, depends. I think legible, technical, concrete, results-producing work is sometimes overweighted because it often looks impressive and more like “doing actual work”. Whereas I think backchaining is incredibly important, and working on nodes far away from our current technical frontier is almost always going to look wishy-washy and fail to produce concrete results. Unfortunately, what you call “longer-loop” work is often hard to verify, so there will be unaligned people just having fun for fun’s sake, but that’s not an indictment of the activity itself, or of work that just looks superficially similar.
In conclusion, we have a lot of biases in all kinds of directions, and we should be wary of Goodharting on them. But the way to do that is to learn to see and understand those biases so we can optimise more purely. The solution is not to artificially add a constant weight in the opposite direction of whatever biases we happen to notice exists sometimes.
This seems exactly right, and worth thinking much more about—thanks!
Oi, I didn’t expect that response. Nice of you to say. Thank you!