One of the major limitations of using existing LLMs is their unreliability. No important processes can currently be trusted to LLMs, because we have very little understanding of how they work, limited knowledge of the limits of their capabilities, and a poor understanding of how and when they fail.
I don’t disagree with this, but I think it’s very likely to stop being true in practice as the tech is commercialized. It won’t be perfect, but the current generation of tweaks already pushes it into the range of at least 3-4 9s of reliability for non-adversarial settings, which seems like it will be enough for many applications, and for better work on how to make it even more reliable. More than that, business applications, or a lack of success thereof, will show whether or not this is true in the coming year, well before we hit GPT-5+.
I don’t disagree with this, but I think it’s very likely to stop being true in practice as the tech is commercialized. It won’t be perfect, but the current generation of tweaks already pushes it into the range of at least 3-4 9s of reliability for non-adversarial settings, which seems like it will be enough for many applications, and for better work on how to make it even more reliable. More than that, business applications, or a lack of success thereof, will show whether or not this is true in the coming year, well before we hit GPT-5+.