Interesting suggestion! Continuous or pseudo-continuous threshold raising isn’t something I considered. Here are some quick thoughts:
Continuous scaling could make eval validity easier, because the jump between eval-train (n-1) and eval-deploy (n) is smaller.
Continuous scaling encourages training to be done quickly, because you want to get your model launched before it is outdated.
Continuous scaling means you give up on the idea of models being evaluated side-by-side.
Current theme: default
Less Wrong (text)
Less Wrong (link)
Interesting suggestion! Continuous or pseudo-continuous threshold raising isn’t something I considered. Here are some quick thoughts:
Continuous scaling could make eval validity easier, because the jump between eval-train (n-1) and eval-deploy (n) is smaller.
Continuous scaling encourages training to be done quickly, because you want to get your model launched before it is outdated.
Continuous scaling means you give up on the idea of models being evaluated side-by-side.