As an analogy: consider a variant of rock paper scissors where you get to see your opponent’s move in advance—but it’s encrypted with RSA. In some sense this game is much harder than proving Fermat’s last theorem, since playing optimally requires breaking the encryption scheme. But if you train a policy and find that it wins 33% of the time at encrypted rock paper scissors, it’s not super meaningful or interesting to say that the task is super hard, and in the relevant intuitive sense it’s an easier task than proving Fermat’s last theorem.
As an analogy: consider a variant of rock paper scissors where you get to see your opponent’s move in advance—but it’s encrypted with RSA. In some sense this game is much harder than proving Fermat’s last theorem, since playing optimally requires breaking the encryption scheme. But if you train a policy and find that it wins 33% of the time at encrypted rock paper scissors, it’s not super meaningful or interesting to say that the task is super hard, and in the relevant intuitive sense it’s an easier task than proving Fermat’s last theorem.