If you think of thinking as generating a bunch of a priori datapoints (your thoughts) and trying to find a model that fits those data, we can use this to classify some overthinking failure-modes. These classes may overlap somewhat.
You overfit your model to the datapoints because you underestimate their variance (regularization failure).
Your datapoints may not compare very well to the real-world thing you’re trying to optimize, so by underestimating their bias you may make your model less generalizable out of training distribution (distribution mismatch).
If you over-update on each new datapoint because you underestimate the breadth of the landscape (your a priori datapoints about a thing may be a very limited distribution compared to the thing-in-itself), you may prematurely descend into a local optima (greediness failure).
If you think of thinking as generating a bunch of a priori datapoints (your thoughts) and trying to find a model that fits those data, we can use this to classify some overthinking failure-modes. These classes may overlap somewhat.
You overfit your model to the datapoints because you underestimate their variance (regularization failure).
Your datapoints may not compare very well to the real-world thing you’re trying to optimize, so by underestimating their bias you may make your model less generalizable out of training distribution (distribution mismatch).
If you over-update on each new datapoint because you underestimate the breadth of the landscape (your a priori datapoints about a thing may be a very limited distribution compared to the thing-in-itself), you may prematurely descend into a local optima (greediness failure).