One thing always puzzle me about provable AI. If we able to prove that AI will do X and only X after unlimitedly many generations of self-improvemnet, it still not clear how to choose right X.
For example we could be sure that paperclip maximizer will still makes clip after billion generations.
So my question is what we are proving about provable AI?
As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to make an AI that is “provably X” for some value of X. I personally doubt AGI teams be able to literally prove anything substantial about how well the system will work in practice, though I expect that they will be able to get some decent statistical guarantees.
There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)
I’m not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.
One thing always puzzle me about provable AI. If we able to prove that AI will do X and only X after unlimitedly many generations of self-improvemnet, it still not clear how to choose right X.
For example we could be sure that paperclip maximizer will still makes clip after billion generations.
So my question is what we are proving about provable AI?
As Tsvi mentioned, and as Luke has talked about before, we’re not really researching “provable AI”. (I’m not even quite sure what that term would mean.) We are trying to push towards AI systems where the way they reason is principled and understandable. We suspect that that will involve having a good understanding ourselves of how the system performs its reasoning, and when we study different types of reasoning systems we sometimes build models of systems that are trying to prove things as part of how they reason; but that’s very different from trying to make an AI that is “provably X” for some value of X. I personally doubt AGI teams be able to literally prove anything substantial about how well the system will work in practice, though I expect that they will be able to get some decent statistical guarantees.
There are some big difficulties related to the problem of choosing the right objective to optimize, but currently, that’s not where my biggest concerns are. I’m much more concerned with scenarios where AI scientists figure out how to build misaligned AGI systems well before they figure out how to build aligned AGI systems, as that would be a dangerous regime. My top priority is making it the case that the first AGI designs humanity develops are the kinds of system it’s technologically possible to align with operator intentions in practice. (I’ll write more on this subject later.)
Thanks! Could link there you will write about this subject later?
I’m not exactly sure what venue it will show up in, but it will very likely be mentioned on the MIRI blog (or perhaps just posted there directly). intelligence.org/blog.