That’s a good question: we don’t have a practical AGI to poke at, so why do we expect that we can do work today that’s likely to be relevant many years down the line?
I’ll answer in part with an analogy: Say you went back in time and dropped by to visit Kolmogorov back when he was trying to formalize probability theory, and you asked “working without concrete feedback, how are you planning to increase the chance that your probability theory will be relevant to people trying to reason probabilistically in the future?” It seems like the best response is for him to sort of cock his head and say “well, uh, I’m still trying to formalize what I mean by “chance” and “probability” and so on; once we’ve got those things ironed out, then we can chat.”
Similarly, we’re still trying to formalize the theory of advanced agents: right now, if you handed me unlimited computing power, I wouldn’t know how to program it to reliably and “intelligently” pursue a known goal, even a very simple goal, such as “produce as much diamond as possible.” There are parts of the problem of designing highly reliable advanced agents that we don’t understand even in principle yet. We don’t even know how to brute force the solution yet. We’re still trying to formalize the problems :-)
Also, note that working on theory doesn’t mean you can’t get feedback: we make various mathematical models that attempt to capture part of the problem, we investigate their behavior, we see which parts of the problems they do and don’t capture, and so on. (For example: Stuart Armstrong came up with a formal definition of a utility-indifferent agent; Benja responded by identifying a way Stuart’s agent succumbs to blackmail. I think this counts as pretty concrete feedback: it doesn’t get that much more concrete than “your idea provably doesn’t work”!)
As for relevance, there are definitely paths where this sort of work wouldn’t end up being relevant (jumping straight to whole-brain emulation, jumping straight to nanotech, etc.) but I currently don’t think those scenarios are all that likely. Other cases where it turns out these problems are irrelevant include (a) we needed the theory, but didn’t complete it in time, (b) it turns out you can build a safe AGI even if you don’t understand why it’s working, not even in theory, and (c) someone else got to the theory first. I’m trying to avoid (a), (b) doesn’t seem likely enough to bet the universe on it, and I’d count (c) as a win :-)
That’s a good question: we don’t have a practical AGI to poke at, so why do we expect that we can do work today that’s likely to be relevant many years down the line?
I’ll answer in part with an analogy: Say you went back in time and dropped by to visit Kolmogorov back when he was trying to formalize probability theory, and you asked “working without concrete feedback, how are you planning to increase the chance that your probability theory will be relevant to people trying to reason probabilistically in the future?” It seems like the best response is for him to sort of cock his head and say “well, uh, I’m still trying to formalize what I mean by “chance” and “probability” and so on; once we’ve got those things ironed out, then we can chat.”
Similarly, we’re still trying to formalize the theory of advanced agents: right now, if you handed me unlimited computing power, I wouldn’t know how to program it to reliably and “intelligently” pursue a known goal, even a very simple goal, such as “produce as much diamond as possible.” There are parts of the problem of designing highly reliable advanced agents that we don’t understand even in principle yet. We don’t even know how to brute force the solution yet. We’re still trying to formalize the problems :-)
Also, note that working on theory doesn’t mean you can’t get feedback: we make various mathematical models that attempt to capture part of the problem, we investigate their behavior, we see which parts of the problems they do and don’t capture, and so on. (For example: Stuart Armstrong came up with a formal definition of a utility-indifferent agent; Benja responded by identifying a way Stuart’s agent succumbs to blackmail. I think this counts as pretty concrete feedback: it doesn’t get that much more concrete than “your idea provably doesn’t work”!)
As for relevance, there are definitely paths where this sort of work wouldn’t end up being relevant (jumping straight to whole-brain emulation, jumping straight to nanotech, etc.) but I currently don’t think those scenarios are all that likely. Other cases where it turns out these problems are irrelevant include (a) we needed the theory, but didn’t complete it in time, (b) it turns out you can build a safe AGI even if you don’t understand why it’s working, not even in theory, and (c) someone else got to the theory first. I’m trying to avoid (a), (b) doesn’t seem likely enough to bet the universe on it, and I’d count (c) as a win :-)