I was in the process of writing a comment trying to debunk this. My counterexample didn’t work so now I’m convinced this is a pretty good post. This is a nice way of thinking about ITN quantitatively.
The counterexample I was trying to make might still be interesting for some people to read as an illustration of this phenomenon. Here it is:
Scale “all humans” trying to solve “all problems” down to “a single high school student” trying to solve “math problems”. Then tractability (measured as % of problem solved / % increase in resources) for this person to solve different math problems is as follows:
A very large arithmetic question like “find 123456789123456789^2 by hand” requires ~10 hours to solve
A median international math olympiad question probably requires ~100 hours of studying to solve
A median research question requires an undergraduate degree (~2000 hours) and then specialized studying (~1000 hours) to solve
A really tough research question takes a decade of work (~20,000 hours) to solve
A way ahead of its time research question (maybe, think developing ML theory results before there were even computers) I could see taking 100,000+ hours of work
Here tractability varies by 4 orders of magnitude (10-100,000 hours) if you include all kinds of math problems. If you exclude very easy or very hard things (as Thomas was describing) you end up with 2 orders of magnitude (~1000-100,000 hours).
We’ve had several researchers who have been working on technical AI alignment for multiple years, and no consensus on a solution, although some might think some systems are less risky than others, and we’ve made progress on those. Say 20 researchers working 20 hours a week, 50 weeks a year, for 5 years. That’s 20 * 20 * 5 * 50 = 100,000 hours of work. I think the number of researchers is much larger now. This also excludes a lot of the background studying, which would be duplicated.
Maybe AI alignment is not “one problem”, and it’s not exactly rigorously posed yet (it’s pre-paradigmatic), but those are also reasons to think it’s especially hard. Technical AI alignment has required building a new field of research, not just using existing tools.
(posting this so ideas from our chat can be public)
Ex ante, the tractability range is narrower than 2 orders of magnitude unless you have really strong evidence. Say you’re a high school student presented with a problem of unknown difficulty, and you’ve already spent 100 hours on it without success. What’s the probability that you solve it in the next doubling?
Obviously less than 100%
Probably more than 1%, even if it looks really hard—you might find some trick that solves it!
And you have to have a pretty strong indication that it’s hard (e.g. using concepts you’ve tried and failed to understand) to even put your probability below 3%.
There can be evidence that it’s really hard (<0.1%), maybe for problems like “compute tan(10^123) to 9 decimal places” or “solve this problem that John Conway failed to solve”. This means you’ve updated away from your ignorance prior (which spans many orders of magnitude) and now know the true structure of the problem, or something.
If I’ve spent 100 hours on a (math?) problem without success as a high school student and can’t get hints or learn new material (or have already tried those and failed), then I don’t think less than 1% to solving it in the next 100 hours is unreasonable. I’d probably already have exhausted all the tools I know of by then. Of course, this depends on the person.
The time and resources you (or others) spent on a problem without success (or substantial progress) are evidence for its intractability.
I was going to come back to this and write a comment saying why I either agree or disagree and why, but I keep flipping back and forth.
I now think there are some classes of problems for which I could easily get under 1%, and some for which I can’t, and this partially depends on whether I can learn new material (if I can, I think I’d need to exhaust every promising-looking paper). The question is which is the better reference class for real problems.
You could argue that not learning new material is the better model, because we can’t get external help in real life. But on the other hand, the large action space of real life feels more similar to me to a situation in which we can learn new material—the intuition that the high school student will just “get stuck” seems less strong with an entire academic subfield working on alignment, say.
I was in the process of writing a comment trying to debunk this. My counterexample didn’t work so now I’m convinced this is a pretty good post. This is a nice way of thinking about ITN quantitatively.
The counterexample I was trying to make might still be interesting for some people to read as an illustration of this phenomenon. Here it is:
Scale “all humans” trying to solve “all problems” down to “a single high school student” trying to solve “math problems”. Then tractability (measured as % of problem solved / % increase in resources) for this person to solve different math problems is as follows:
A very large arithmetic question like “find 123456789123456789^2 by hand” requires ~10 hours to solve
A median international math olympiad question probably requires ~100 hours of studying to solve
A median research question requires an undergraduate degree (~2000 hours) and then specialized studying (~1000 hours) to solve
A really tough research question takes a decade of work (~20,000 hours) to solve
A way ahead of its time research question (maybe, think developing ML theory results before there were even computers) I could see taking 100,000+ hours of work
Here tractability varies by 4 orders of magnitude (10-100,000 hours) if you include all kinds of math problems. If you exclude very easy or very hard things (as Thomas was describing) you end up with 2 orders of magnitude (~1000-100,000 hours).
We’ve had several researchers who have been working on technical AI alignment for multiple years, and no consensus on a solution, although some might think some systems are less risky than others, and we’ve made progress on those. Say 20 researchers working 20 hours a week, 50 weeks a year, for 5 years. That’s 20 * 20 * 5 * 50 = 100,000 hours of work. I think the number of researchers is much larger now. This also excludes a lot of the background studying, which would be duplicated.
Maybe AI alignment is not “one problem”, and it’s not exactly rigorously posed yet (it’s pre-paradigmatic), but those are also reasons to think it’s especially hard. Technical AI alignment has required building a new field of research, not just using existing tools.
(posting this so ideas from our chat can be public)
Ex ante, the tractability range is narrower than 2 orders of magnitude unless you have really strong evidence. Say you’re a high school student presented with a problem of unknown difficulty, and you’ve already spent 100 hours on it without success. What’s the probability that you solve it in the next doubling?
Obviously less than 100%
Probably more than 1%, even if it looks really hard—you might find some trick that solves it!
And you have to have a pretty strong indication that it’s hard (e.g. using concepts you’ve tried and failed to understand) to even put your probability below 3%.
There can be evidence that it’s really hard (<0.1%), maybe for problems like “compute tan(10^123) to 9 decimal places” or “solve this problem that John Conway failed to solve”. This means you’ve updated away from your ignorance prior (which spans many orders of magnitude) and now know the true structure of the problem, or something.
If I’ve spent 100 hours on a (math?) problem without success as a high school student and can’t get hints or learn new material (or have already tried those and failed), then I don’t think less than 1% to solving it in the next 100 hours is unreasonable. I’d probably already have exhausted all the tools I know of by then. Of course, this depends on the person.
The time and resources you (or others) spent on a problem without success (or substantial progress) are evidence for its intractability.
I was going to come back to this and write a comment saying why I either agree or disagree and why, but I keep flipping back and forth.
I now think there are some classes of problems for which I could easily get under 1%, and some for which I can’t, and this partially depends on whether I can learn new material (if I can, I think I’d need to exhaust every promising-looking paper). The question is which is the better reference class for real problems.
You could argue that not learning new material is the better model, because we can’t get external help in real life. But on the other hand, the large action space of real life feels more similar to me to a situation in which we can learn new material—the intuition that the high school student will just “get stuck” seems less strong with an entire academic subfield working on alignment, say.