The ideal MIRI researcher is someone who’s able to think about thorny philosophical problems and break off parts of them to formalize mathematically. In the case of logical uncertainty, researchers started by thinking about the initially vague problem of reasoning well about uncertain mathematical statements, turned some of these thoughts into formal desiderata and algorithms (producing intermediate possibility and impossibility results), and eventually found a way to satisfy many of these desiderata at once. We’d like to do a lot more of this kind of work in the future.
Probably the main difference between MIRI research and typical AI research is that we focus on problems of the form “if we had capability X, how would we achieve outcome Y?” rather than “how can we build a practical system achieving outcome Y?”. We focus less on computational tractability and more on the philosophical question of how we would build a system to achieve Y in principle, given e.g. unlimited computing resources or access to extremely powerful machine learning systems. I don’t think we have much special knowledge that others don’t have (or vice versa), given that most relevant AI research is public; it’s more that we have a different research focus that will lead us to ask different questions. Of course, our different research focus is motivated by our philosophy about AI, and we have significant philosophical differences with most AI researchers (which isn’t actually saying much given how much philosophical diversity there is in the field of AI).
Work in the field of AI can inform us about what approaches are most promising (e.g., the theoretical questions in the “Alignment for Advanced Machine Learning Systems” agenda are of more interest if variants of deep learning are sufficient to achieve AGI), and can directly provide useful theoretical tools (e.g., in the field of statistical learning theory). Typically, we will want to get a high-level view of what the field is doing and otherwise focus mainly on the more theoretical work relevant to our research interests.
We definitely need some way of dealing with the fact that we don’t know which AI paradigm(s) will be the foundation of the first AGI systems. One strategy is to come up with abstractions that work across AI paradigms; we can ask the question “if we had access to extremely powerful reinforcement learning systems, how would we use them to safely achieve some concrete objective in the world?” without knowing how these reinforcement learning systems work internally. A second strategy is to prioritize work related to types of AI systems that seem more promising (deep learning seems more promising than symbolic GOFAI at the moment, for example). A third strategy is to do what people sometimes do when coming up with new AI paradigms: think about how good reasoning works, formalize some of these aspects, and design algorithms performing good reasoning according to these desiderata. In thinking about AI alignment, we apply all three of these strategies.
Here’s a third resolution. Consider a utility function that is a weighted sum of:
how close a region’s population level is to the “ideal” population level for that region (i.e. not underpopulated or overpopulated)
average utility of individuals in this region (not observer-moments in this region)
AMF is replacing lots of lives that are short (therefore low-utility) with fewer lives that are long (therefore higher utility), without affecting population level much. The effect of this could be summarized as “35 DALYs”, as in “we increased the average lifespan by 35 DALYs / total population”.
(warning: made-up numbers follow). Suppose we make someone live for 40 years instead of 5 years by curing malaria. This reduces the fertility rate; let’s say one fewer 35-year life happens as a result. This has no effect on the average population level (part 1). We replaced a 5-year life plus a 35-year life with a 40-year life. If average lives in the region are 35 years long (and we’re pretending that life utility = length of life), then most of the effect on part 2 of the utility function comes from preventing a worse-than-average life from happening.
Suppose instead that we extend someone’s life from 40 years to 75 years (a gain of 35 DALYs). This reduces the fertility rate; let’s pretend that this prevents a 35-year life from happening. So we’re replacing a 40-year life plus a 35-year life with a 75-year life. From the perspective of part 2 of the utility function, this is exactly as good as curing a case of malaria. So it seems like you can measure life-extending measures in DALYs pretty naively and things work out (both 35-DALY improvements are equally good under the utility function).