I suspect it’s worth forming an explicit model of how much work “should” be understandable by what kinds of parties at what stage in scientific research.
To summarize my own take:
It seems to me that research moves down a pathway from (1) “totally inarticulate glimmer in the mind of a single researcher” to (2) “half-verbal intuition one can share with a few officemates, or others with very similar prejudices” to (3) “thingy that many in a field bother to read, and most find somewhat interesting, but that there’s still no agreement about the value of” to (4) “clear, explicitly statable work whose value is universally recognized valuable within its field”. (At each stage, a good chunk of work falls away as a mirage.)
In “The Structure of Scientific Revolutions”, Thomas Kuhn argues that fields begin in a “preparadigm” state in which nobody’s work gets past (3). (He gives a bunch of historical examples that seem to meet this pattern.)
Kuhn’s claim seems right to me, and AI Safety work seems to me to be in a “preparadigm” state in that there is no work past stage (3) now. (Paul’s work is perhaps closest, but there is are still important unknowns / disagreement about foundations, whether it’ll work out, etc.)
It seems to me one needs epistemic humility more in a preparadigm state, because, in such states, the correct perspective is in an important sense just not discovered yet. One has guesses, but the guesses cannot be established in common as established knowledge.
It also seems to me that the work of getting from (3) to (4) (or from 1 or 2 to 3, for that matter) is hard, that moving along this spectrum requires technical research (it basically is a core research activity), and one shouldn’t be surprised if it sometimes takes years—even in cases where the research is good. (This seems to me to also be true in e.g. math departments, but to be extra hard in preparadigm fields.)
(Disclaimer: I’m on the MIRI board, and I worked at MIRI from 2008-2012, but I’m speaking only for myself here.)
Relatedly, it seems to me that in general, preparadigm fields probably develop faster if:
Different research approaches can compete freely for researchers (e.g., if researchers have secure, institution-independent funding, and can work on whatever approach pleases them). (The reason: there is a strong relationship between what problems can grab a researcher’s interest, and what problems may go somewhere. Also, researchers are exactly the people who have leisure to form a detailed view of the field and what may work. cf also the role of play in research progress.)
The researchers themselves feel secure, and do not need to attempt to optimize for work for “what others will evaluate as useful enough to keep paying me”. (Since such evaluations are unreliable in pre paradigm fields, and since one wants to maximize the odds that the right approach is tried. This security may well increase the amount of non-productivity in the median case, but it should also increase the usefulness of the tails. And the tails are where most of the value is.)
Different research approaches somehow do not need to compete for funding, PR, etc., except via researchers’ choices as to where to engage. There are no organized attempts to use social pressure or similar to override researchers’ intuitions as to where may be fruitful to engage (nor to override research institutions’ choice of what programs to enable, except via the researchers’ interests). (Funders’ intuitions seem less likely to be detailed than are the intuitions of the researcher-on-that-specific-problem; attempts to be clear/explainable/respectable are less likely to pull in good directions.)
The pool of researchers includes varied good folks with intuitions formed in multiple fields (e.g., folks trained in physics; other folks trained in math; other folks trained in AI; some usually bright folks just out of undergrad with less-developed disciplinary prejudices), to reduce the odds of monoculture.
(Disclaimer: I’m on the MIRI board, and I worked at MIRI from 2008-2012, but I’m speaking only for myself here.)
I generally agree with both of these comments. I think they’re valuable points which express more clearly than I did some of what I was getting at with wanting a variety of approaches and thinking I should have some epistemic humility.
One point where I think I disagree:
attempts to be clear/explainable/respectable are less likely to pull in good directions.
I don’t want to defend pulls towards being respectable, and I’m not sure about pulls towards being explainable, but I think that attempts to be clear are extremely valuable and likely to improve work.I think that clarity is a useful thing to achieve, as it helps others to recognise the value in what you’re doing and build on the ideas where appropriate (I imagine that you agree with this part).
I also think that putting a decent fraction of total effort into aiming for clarity is likely to improve research directions. This is based on research experience -- I think that putting work into trying to explain things very clearly is hard and often a bit aversive (because it can take you from an internal sense of “I understand all of this” to a realisation that actually you don’t). But I also think it’s useful for making progress purely internally, and that getting a crisper idea of the foundations can allow for better work building on this (or a realisation that this set of foundations isn’t quite going to work).
In considering whether incentives toward clarity (e.g., via being able to explain one’s work to potential funders) are likely to pull in good or bad directions, I think it’s important to distinguish between two different motions that might be used as a researcher (or research institution) responds to those incentives.
Motion A: Taking the research they were already doing, and putting a decent fraction of effort into figuring out how to explain it, figuring out how to get it onto firm foundations, etc.
Motion B: Choosing which research to do by thinking about which things will be easy to explain clearly afterward.
It seems to me that “attempts to be clear” in the sense of Motion A are indeed likely to be helpful, and are worth putting a significant fraction of one’s effort into. I agree also that they can be aversive and that this aversiveness (all else equal) may tend to cause underinvestment in them.
Motion B, however, strikes me as more of a mixed bag. There is merit in choosing which research to do by thinking about what will be explainable to other researchers, such that other researchers can build on it. But there is also merit to sometimes attempting research on the things that feel most valuabe/tractable/central to a given researcher, without too much shame if it then takes years to get their research direction to be “clear”.
As a loose analogy, one might ask whether “incentives to not fail” have a good or bad effect on achievement. And it seems like a mixed bag. The good part (analogous to Motion A) is that, once one has chosen to devote hours/etc. to a project, it is good to try to get that project to succeed. The more mixed part (analogous to Motion B) is that “incentives to not fail” sometimes cause people to refrain from attempting ambitious projects at all. (Of course, it sometimes is worth not trying a particular project because its success-odds are too low — Motion B is not always wrong.)
I agree with all this. I read your original “attempts to be clear” as Motion A (which I was taking a stance in favour of), and your original “attempts to be exainable” as Motion B (which I wasn’t sure about).
Gotcha. Your phrasing distinction makes sense; I’ll adopt it. I agree now that I shouldn’t have included “clarity” in my sentence about “attempts to be clear/explainable/respectable”.
The thing that confused me is that it is hard to incentivize clarity but not the explainability; the easiest observable is just “does the person’s research make sense to me?”, which one can then choose how to interpret, and how to incentivize.
It’s easy enough to invest in clarity / Motion A without investing in explainability / Motion B, though. My random personal guess is that MIRI invests about half of their total research effort into clarity (from what I see people doing around the office), but I’m not sure (and I could ask the researchers easily enough). Do you have a suspicion about whether MIRI over- or under-invests in Motion A?
My suspicion is that MIRI significantly underinvests/misinvests in Motion A, although of course this is a bit hard to assess from outside.
I think that they’re not that good at clearly explaining their thoughts, but that this is a learnable (and to some extent teachable) skill, and I’m not sure their researchers have put significant effort into trying to learn it.
I suspect that they don’t put enough time into trying to clearly explain the foundations for what they’re doing, relative to trying to clearly explain their new results (though I’m less confident about this, because so much is unobserved).
I think they also sometimes indugle in a motion where they write to try to persuade the reader that what they’re doing is the correct approach and helpful on the problem at hand, rather than trying to give the reader the best picture of the ways in which their work might or might not actually be applicable. I think at a first pass this is trying to substitute for Motion B, but it actively pushes against Motion A.
I’d like to see explanations which trend more towards:
Clearly separating out the motivation for the formalisation from the parts using the formalisation. Then these can be assessed separately. (I think they’ve got better at this recently.)
Putting their cards on the table and giving their true justification for different assumptions. In some cases this might be “slightly incoherent intuition”. If that’s what they have, that’s what they should write. This would make it easier for other people to evaluate, and to work out which bits to dive in on and try to shore up.
I went to a MIRI workshop on decision theory last year. I came away with an understanding of a lot of points of how MIRI approaches these things that I’d have a very hard time writing up. In particular, at the end of the workshop I promised to write up the “Pi-maximising agent” idea and how it plays into MIRI’s thinking. I can describe this at a party fairly easily, but I get completely lost trying to turn it into a writeup. I don’t remember other things quite as well (eg “playing chicken with the Universe”) but they have the same feel. An awful lot of what MIRI knows seems to me folklore like this.
This is interesting and interacts with my comment in reply to Anna on clarity of communication. I think I’d like to see them write up more such folklore as carefully as possible; I’m not optimistic about attempts to outsource such write-ups.
I agree that this makes sense in the “ideal” world, where potential donors have better mental models of this sort of research pathway, and have found this sort of thinking useful as a potential donor.
From an organizational perspective, I think MIRI should put more effort into producing visible explanations of their work (well, depending on their strategy to get funding). As worries about AI risk become more widely known, there will be a larger pool potential donations to research in the area. MIRI risks becoming out-competed by others who are better at explaining how their work decreases risk from advanced AI (I think this concern applies both to talent and money, but here I’m specifically talking about money).
High-touch, extremely large donors will probably get better explanations, reports on progress, etc from organizations, but the pool of potential $ from donors who just read what’s available online may be very large, and very influenced by clear explanations about the work. This pool of donors is also more subject to network effects, cultural norms, and memes. Given that MIRI is running public fundraisers to close funding gaps, it seems that they do rely on these sorts of donors for essential funding. Ideally, they’d just have a bunch of unrestricted funding to keep them secure forever (including allaying the risk of potential geopolitical crises and macroeconomic downturns).
I suspect it’s worth forming an explicit model of how much work “should” be understandable by what kinds of parties at what stage in scientific research.
To summarize my own take:
It seems to me that research moves down a pathway from (1) “totally inarticulate glimmer in the mind of a single researcher” to (2) “half-verbal intuition one can share with a few officemates, or others with very similar prejudices” to (3) “thingy that many in a field bother to read, and most find somewhat interesting, but that there’s still no agreement about the value of” to (4) “clear, explicitly statable work whose value is universally recognized valuable within its field”. (At each stage, a good chunk of work falls away as a mirage.)
In “The Structure of Scientific Revolutions”, Thomas Kuhn argues that fields begin in a “preparadigm” state in which nobody’s work gets past (3). (He gives a bunch of historical examples that seem to meet this pattern.)
Kuhn’s claim seems right to me, and AI Safety work seems to me to be in a “preparadigm” state in that there is no work past stage (3) now. (Paul’s work is perhaps closest, but there is are still important unknowns / disagreement about foundations, whether it’ll work out, etc.)
It seems to me one needs epistemic humility more in a preparadigm state, because, in such states, the correct perspective is in an important sense just not discovered yet. One has guesses, but the guesses cannot be established in common as established knowledge.
It also seems to me that the work of getting from (3) to (4) (or from 1 or 2 to 3, for that matter) is hard, that moving along this spectrum requires technical research (it basically is a core research activity), and one shouldn’t be surprised if it sometimes takes years—even in cases where the research is good. (This seems to me to also be true in e.g. math departments, but to be extra hard in preparadigm fields.)
(Disclaimer: I’m on the MIRI board, and I worked at MIRI from 2008-2012, but I’m speaking only for myself here.)
Relatedly, it seems to me that in general, preparadigm fields probably develop faster if:
Different research approaches can compete freely for researchers (e.g., if researchers have secure, institution-independent funding, and can work on whatever approach pleases them). (The reason: there is a strong relationship between what problems can grab a researcher’s interest, and what problems may go somewhere. Also, researchers are exactly the people who have leisure to form a detailed view of the field and what may work. cf also the role of play in research progress.)
The researchers themselves feel secure, and do not need to attempt to optimize for work for “what others will evaluate as useful enough to keep paying me”. (Since such evaluations are unreliable in pre paradigm fields, and since one wants to maximize the odds that the right approach is tried. This security may well increase the amount of non-productivity in the median case, but it should also increase the usefulness of the tails. And the tails are where most of the value is.)
Different research approaches somehow do not need to compete for funding, PR, etc., except via researchers’ choices as to where to engage. There are no organized attempts to use social pressure or similar to override researchers’ intuitions as to where may be fruitful to engage (nor to override research institutions’ choice of what programs to enable, except via the researchers’ interests). (Funders’ intuitions seem less likely to be detailed than are the intuitions of the researcher-on-that-specific-problem; attempts to be clear/explainable/respectable are less likely to pull in good directions.)
The pool of researchers includes varied good folks with intuitions formed in multiple fields (e.g., folks trained in physics; other folks trained in math; other folks trained in AI; some usually bright folks just out of undergrad with less-developed disciplinary prejudices), to reduce the odds of monoculture.
(Disclaimer: I’m on the MIRI board, and I worked at MIRI from 2008-2012, but I’m speaking only for myself here.)
I generally agree with both of these comments. I think they’re valuable points which express more clearly than I did some of what I was getting at with wanting a variety of approaches and thinking I should have some epistemic humility.
One point where I think I disagree:
I don’t want to defend pulls towards being respectable, and I’m not sure about pulls towards being explainable, but I think that attempts to be clear are extremely valuable and likely to improve work.I think that clarity is a useful thing to achieve, as it helps others to recognise the value in what you’re doing and build on the ideas where appropriate (I imagine that you agree with this part).
I also think that putting a decent fraction of total effort into aiming for clarity is likely to improve research directions. This is based on research experience -- I think that putting work into trying to explain things very clearly is hard and often a bit aversive (because it can take you from an internal sense of “I understand all of this” to a realisation that actually you don’t). But I also think it’s useful for making progress purely internally, and that getting a crisper idea of the foundations can allow for better work building on this (or a realisation that this set of foundations isn’t quite going to work).
Not sure how much this is a response to you, but:
In considering whether incentives toward clarity (e.g., via being able to explain one’s work to potential funders) are likely to pull in good or bad directions, I think it’s important to distinguish between two different motions that might be used as a researcher (or research institution) responds to those incentives.
Motion A: Taking the research they were already doing, and putting a decent fraction of effort into figuring out how to explain it, figuring out how to get it onto firm foundations, etc.
Motion B: Choosing which research to do by thinking about which things will be easy to explain clearly afterward.
It seems to me that “attempts to be clear” in the sense of Motion A are indeed likely to be helpful, and are worth putting a significant fraction of one’s effort into. I agree also that they can be aversive and that this aversiveness (all else equal) may tend to cause underinvestment in them.
Motion B, however, strikes me as more of a mixed bag. There is merit in choosing which research to do by thinking about what will be explainable to other researchers, such that other researchers can build on it. But there is also merit to sometimes attempting research on the things that feel most valuabe/tractable/central to a given researcher, without too much shame if it then takes years to get their research direction to be “clear”.
As a loose analogy, one might ask whether “incentives to not fail” have a good or bad effect on achievement. And it seems like a mixed bag. The good part (analogous to Motion A) is that, once one has chosen to devote hours/etc. to a project, it is good to try to get that project to succeed. The more mixed part (analogous to Motion B) is that “incentives to not fail” sometimes cause people to refrain from attempting ambitious projects at all. (Of course, it sometimes is worth not trying a particular project because its success-odds are too low — Motion B is not always wrong.)
I agree with all this. I read your original “attempts to be clear” as Motion A (which I was taking a stance in favour of), and your original “attempts to be exainable” as Motion B (which I wasn’t sure about).
Gotcha. Your phrasing distinction makes sense; I’ll adopt it. I agree now that I shouldn’t have included “clarity” in my sentence about “attempts to be clear/explainable/respectable”.
The thing that confused me is that it is hard to incentivize clarity but not the explainability; the easiest observable is just “does the person’s research make sense to me?”, which one can then choose how to interpret, and how to incentivize.
It’s easy enough to invest in clarity / Motion A without investing in explainability / Motion B, though. My random personal guess is that MIRI invests about half of their total research effort into clarity (from what I see people doing around the office), but I’m not sure (and I could ask the researchers easily enough). Do you have a suspicion about whether MIRI over- or under-invests in Motion A?
My suspicion is that MIRI significantly underinvests/misinvests in Motion A, although of course this is a bit hard to assess from outside.
I think that they’re not that good at clearly explaining their thoughts, but that this is a learnable (and to some extent teachable) skill, and I’m not sure their researchers have put significant effort into trying to learn it.
I suspect that they don’t put enough time into trying to clearly explain the foundations for what they’re doing, relative to trying to clearly explain their new results (though I’m less confident about this, because so much is unobserved).
I think they also sometimes indugle in a motion where they write to try to persuade the reader that what they’re doing is the correct approach and helpful on the problem at hand, rather than trying to give the reader the best picture of the ways in which their work might or might not actually be applicable. I think at a first pass this is trying to substitute for Motion B, but it actively pushes against Motion A.
I’d like to see explanations which trend more towards:
Clearly separating out the motivation for the formalisation from the parts using the formalisation. Then these can be assessed separately. (I think they’ve got better at this recently.)
Putting their cards on the table and giving their true justification for different assumptions. In some cases this might be “slightly incoherent intuition”. If that’s what they have, that’s what they should write. This would make it easier for other people to evaluate, and to work out which bits to dive in on and try to shore up.
I went to a MIRI workshop on decision theory last year. I came away with an understanding of a lot of points of how MIRI approaches these things that I’d have a very hard time writing up. In particular, at the end of the workshop I promised to write up the “Pi-maximising agent” idea and how it plays into MIRI’s thinking. I can describe this at a party fairly easily, but I get completely lost trying to turn it into a writeup. I don’t remember other things quite as well (eg “playing chicken with the Universe”) but they have the same feel. An awful lot of what MIRI knows seems to me folklore like this.
This is interesting and interacts with my comment in reply to Anna on clarity of communication. I think I’d like to see them write up more such folklore as carefully as possible; I’m not optimistic about attempts to outsource such write-ups.
I agree that this makes sense in the “ideal” world, where potential donors have better mental models of this sort of research pathway, and have found this sort of thinking useful as a potential donor.
From an organizational perspective, I think MIRI should put more effort into producing visible explanations of their work (well, depending on their strategy to get funding). As worries about AI risk become more widely known, there will be a larger pool potential donations to research in the area. MIRI risks becoming out-competed by others who are better at explaining how their work decreases risk from advanced AI (I think this concern applies both to talent and money, but here I’m specifically talking about money).
High-touch, extremely large donors will probably get better explanations, reports on progress, etc from organizations, but the pool of potential $ from donors who just read what’s available online may be very large, and very influenced by clear explanations about the work. This pool of donors is also more subject to network effects, cultural norms, and memes. Given that MIRI is running public fundraisers to close funding gaps, it seems that they do rely on these sorts of donors for essential funding. Ideally, they’d just have a bunch of unrestricted funding to keep them secure forever (including allaying the risk of potential geopolitical crises and macroeconomic downturns).