So as I understand it, what MIRI is doing now is to think about theoretical issues and strategies and write papers about this, in the hope that the theory you develop can be made use of by others?
Does MIRI think of ever:
Developing AI yourselves at some point?
Creating a goal-alignment/safy-framework to be used by people developing AGI? (Where e.g. reinforcement learners or other AI-compinents can be “plugged in”, but in some sense are abstracted away.)
Also (feel free to skip this part of the question if it is too big/demanding):
Personally, I have a goal of progressing the field of computer-assisted proofs by making them more automated and by making the process of making them more user-friendly. The system would be made available through a website where people can construct proofs and see the proofs, but the components of the system would also be made available for use elsewhere. One of the goals would be to make it possible and practical to construct claims that are in natural language and are made using components of natural language, but also have an unambiguous logical notation (probably in Martin-Löf type theory). The hope would be that this could be used for rigorous proofs about self-inproving AI, and that the technologies/code-base developed and the vocabulary/defnitions/claims/proofs in the system could be of use for a goal-alignment/safy-framework.
(Anyone reading this who are interested in hearing more, could get in touch with me, and/or take a look at this document:
If I got across what it is that I’m hoping to make; does it sound like this could be useful to the field of AI safety / goal alignment? Or are you unsure? Or does it seem like my understanding of what the field needs is flawed to some degree, and that my efforts in all probability would be better spent elsewhere?
Kinda. The current approach is more like “Pretend you’re trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you’re trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first.”
(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figure out how to make highly reliable practical generally intelligent systems, all before anyone else gets remotely close to AGI, then we might consider teching up towards designing AI systems ourselves. I currently find this scenario unlikely.
(2) We’re currently far enough away from knowing what the actual architectures will look like that I don’t think it’s useful to try to build AI components intended for use in an actual AGI at this juncture.
(3) I think that making theorem provers easier to use is an important task and a worthy goal. I’m not optimistic about attempts to merge natural language with Martin-Lof type theory. If you’re interested in improving theorem-proving tools in ways that might make it easier to design safe reflective systems in the future, I’d point you more towards trying to implement (e.g.) Marcello’s Waterfall in a dependently typed language (which may well involve occasionally patching the language, at this stage).
So as I understand it, what MIRI is doing now is to think about theoretical issues and strategies and write papers about this, in the hope that the theory you develop can be made use of by others?
Does MIRI think of ever:
Developing AI yourselves at some point?
Creating a goal-alignment/safy-framework to be used by people developing AGI? (Where e.g. reinforcement learners or other AI-compinents can be “plugged in”, but in some sense are abstracted away.)
Also (feel free to skip this part of the question if it is too big/demanding):
Personally, I have a goal of progressing the field of computer-assisted proofs by making them more automated and by making the process of making them more user-friendly. The system would be made available through a website where people can construct proofs and see the proofs, but the components of the system would also be made available for use elsewhere. One of the goals would be to make it possible and practical to construct claims that are in natural language and are made using components of natural language, but also have an unambiguous logical notation (probably in Martin-Löf type theory). The hope would be that this could be used for rigorous proofs about self-inproving AI, and that the technologies/code-base developed and the vocabulary/defnitions/claims/proofs in the system could be of use for a goal-alignment/safy-framework.
(Anyone reading this who are interested in hearing more, could get in touch with me, and/or take a look at this document:
https://docs.google.com/document/d/1GTTFO7RgEAJxy8HRUprCIKZYpmF4KJiVAGRHXF_Sa70/edit)
If I got across what it is that I’m hoping to make; does it sound like this could be useful to the field of AI safety / goal alignment? Or are you unsure? Or does it seem like my understanding of what the field needs is flawed to some degree, and that my efforts in all probability would be better spent elsewhere?
Kinda. The current approach is more like “Pretend you’re trying to solve a much easier version of the problem, e.g. where you have a ton of computing power and you’re trying to maximize diamond instead of hard-to-describe values. What parts of the problem would you still not know how to solve? Try to figure out how to solve those first.”
(1) If we manage to (a) generate a theory of advanced agents under many simplifying assumptions, and then (b) generate a theory of bounded rational agents under far fewer simplifying assumptions, and then (c) figure out how to make highly reliable practical generally intelligent systems, all before anyone else gets remotely close to AGI, then we might consider teching up towards designing AI systems ourselves. I currently find this scenario unlikely.
(2) We’re currently far enough away from knowing what the actual architectures will look like that I don’t think it’s useful to try to build AI components intended for use in an actual AGI at this juncture.
(3) I think that making theorem provers easier to use is an important task and a worthy goal. I’m not optimistic about attempts to merge natural language with Martin-Lof type theory. If you’re interested in improving theorem-proving tools in ways that might make it easier to design safe reflective systems in the future, I’d point you more towards trying to implement (e.g.) Marcello’s Waterfall in a dependently typed language (which may well involve occasionally patching the language, at this stage).