Ah, well explained, thank you. Yes, I agree now that you can theoretically improve to a limit without having that limit being a local maxima. Although I’m unsure if the procedure could end up being equivalent in practice to a local maximisation with a modified goal function (say one that penalises going above “reward + 1” with exponential cost). Maybe something to think about when going forward.
Thanks for answering the questions, best of luck with the endeavour!
Ah, well explained, thank you. Yes, I agree now that you can theoretically improve to a limit without having that limit being a local maxima. Although I’m unsure if the procedure could end up being equivalent in practice to a local maximisation with a modified goal function (say one that penalises going above “reward + 1” with exponential cost). Maybe something to think about when going forward.
Thanks for answering the questions, best of luck with the endeavour!