But I think the best intervention, in this case, is probably just to push the ideas “outside views are often given too much weight” or “heavily reliance on outside views shouldn’t be seen as praiseworthy” or “the correct way to integrate outside views with more inside-view reasoning is X.” Tabooing the term itself somehow feels a little roundabout to me, like a linguistic solution to a methodological disagreement.
On the contrary; tabooing the term is more helpful, I think. I’ve tried to explain why in the post. I’m not against the things “outside view” has come to mean; I’m just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!
I’m pretty confident that the average intellectual doesn’t pay enough attention to “outside views”—and I think that, absent positive reinforcement from people in your community, it actually does take some degree of discipline to take outside views sufficiently seriously.
By what definition of “outside view?” There is some evidence that in some circumstances people don’t take reference class forecasting seriously enough; that’s what the original term “outside view” meant. What evidence is there that the things on the Big List O’ Things People Describe as Outside View are systematically underrated by the average intellectual?
On the contrary; tabooing the term is more helpful, I think. I’ve tried to explain why in the post. I’m not against the things “outside view” has come to mean; I’m just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!
My initial comment was focused on your point about conflation, because I think this point bears on the linguistic question more strongly than the other points do. I haven’t personally found conflation to be a large issue. (Recognizing, again, that our experiences may differ.) If I agreed with the point about conflation, though, then I would think it might be worth tabooing the term “outside view.”
By what definition of “outside view?”
By “taking an outside view on X” I basically mean “engaging in statistical or reference-class-based reasoning.” I think it might also be best defined negatively: “reasoning that doesn’t substantially involve logical deduction or causal models of the phenomenon in question. “[1]
I think most of the examples in your list fit these definitions.
Epistemic deference is a kind of statistical/reference-class-based reasoning, for example, which doesn’t involve applying any sort of causal model of the phenomenon in question. The logic is “Ah, I should update downward on this claim, since experts in domain X disagree with it and I think that experts in domain X will typically be right.”
Same for anti-weirdness: The idea is that weird claims are typically wrong.
I’d say that trend extrapolation also fits: You’re not doing logical reasoning or relying on a causal model of the relevant phenomenon. You’re just extrapolating a trend forward, largely based on the assumption that long-running trends don’t typically end abruptly.
“Foxy aggregation,” admittedly, does seem like a different thing to me: It arguably fits the negative definition, depending on how you generate your weights, but doesn’t seem to fit statistical/reference-class one. It also feels like more of a meta-level thing. So I wouldn’t personally use the term “outside view” to talk about foxy aggregation. (I also don’t think I’ve personally heard people use the term “outside view” to talk about foxy aggegration, although I obviously believe you have.)
There is some evidence that in some circumstances people don’t take reference class forecasting seriously enough; that’s what the original term “outside view” meant. What evidence is there that the things on the Big List O’ Things People Describe as Outside View are systematically underrated by the average intellectual?
A condensed answer is: (a) I think most public intellectuals barely use any of the items on this list (with the exception of the anti-weirdness heuristic); (b) I think some of the things on this list are often useful; (c) I think that intellectuals and people more generally are very often bad at reasoning causally/logically about complex social phenomena; (d) I expect intellectuals to often have a bias against outside-view-style reasoning, since it often feels somewhat unsatisfying/unnatural and doesn’t allow them to display impressive-seeming domain-knowledge, interesting models of the world, or logical reasoning skills; and (e) I do still think Tetlock’s evidence is at least somewhat relevant to most things on the list, in part because I think they actually are somewhat related to each other, although questions of external validity obviously grow more serious the further you move from the precise sorts of questions asked in his tournaments and the precise styles of outside-view reasoning displayed by participants. [2]
There’s also, of course, a bit of symmetry here. One could also ask: “What evidence is there that the things on the Big List O’ Things People Describe as Outside View are systematically overrated by the average intellectual?” :)
These definitions of course aren’t perfect, and other people sometimes use the term more broadly than I do, but, again, some amount of fuzziness seems OK to me. Most concepts have fuzzy boundaries and are hard to define precisely.
On the Tetlock evidence: I think one thing his studies suggest, which I expect to generalize pretty well to many different contexts, is that people who are trying to make predictions about complex phenemona (especially complex social phenemona) often do very poorly when they don’t incorporate outside views into their reasoning processes. (You can correct me if this seems wrong, since you’ve thought about Tetlock’s work far more than I have.) So, on my understanding, Tetlock’s work suggests that outside-view-heavy reasoning processes would often substitute for reasoning processes that lead to poor predictions anyways. At least for most people, then, outside-view-heavy reasoning processes don’t actually need to be very reliable to constitute improvements—and they need to be pretty bad to, on average, lead to worse predictions.
Another small comment here: I think Tetlock’s work also counts, in a somewhat broad way, against the “reference class tennis” objection to reference-class-based forecasting. On its face, the objection also applies to the use of reference classes in standard forecasting tournaments. There are always a ton of different reference classes someone could use to forecast any given political event. Forecasters need to rely on some sort of intuition, or some sort of fuzzy reasoning, to decide on which reference classes to take seriously; it’s a priori plausible that people would be just consistently very bad at this, given the number of degrees of freedom here and the absence of clear principles for making one’s selections. But this issue doesn’t actually seem to be that huge in the context of the sorts of questions Tetlock asked his participants. (You can again correct me if I’m wrong.) The degrees-of-freedom problem might be far larger in other contexts, but the fact that the issue is manageable in Tetlockian contexts presumably counts as at least a little bit of positive evidence.
I said in the post, I’m a fan of reference classes. I feel like you think I’m not? I am! I’m also a fan of analogies. And I love trend extrapolation. I admit I’m not a fan of the anti-weirdness heuristic, but even it has its uses. In general most of what you are saying in this thread is stuff I agree with, which makes me wonder if we are talking past each other. (Example 1: Your second small comment about reference class tennis. Example 2: Your first small comment, if we interpret instances of “outside view” as meaning “reference classes” in the strict sense, though not if we use the broader definition you favor. Example 3: your points a, b, c, and e. (point d, again, depends on what you mean by ‘outside view,’ and also what counts as often.)
My problem is with the term “Outside view.” (And “inside view” too!) I don’t think you’ve done much to argue in favor of it in this thread. You have said that in your experience it doesn’t seem harmful; fair enough, point taken. In mine it does. You’ve also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by “reference class forecasting” you mean the stuff Tetlock’s studies are about, then it really shouldn’t include the anti-weirdness heuristic, but it seems like you are saying it does?) I found myself repeatedly thinking “but what does he mean by outside view? I agree or don’t agree depending on what he means...” even though you had defined it earlier. You’ve said that you think the practices you call “outside view” are underrated and deserve positive reinforcement; I totally agree that some of them are, but I maintain that some of them are overrated, and would like to discuss each of them on a case by case basis instead of lumping them all together under one name. Of course you are free to use whatever terms you like, but I intend to continue to ask people to be more precise when I hear “outside view” or “inside view.” :)
It’s definitely entirely plausible that I’ve misunderstood your views.
My interpretation of the post was something like this:
There is a bag of things that people in the EA community tend to describe as “outside views.” Many of the things in this bag are over-rated or mis-used by members of the EA community, leading to bad beliefs.
One reason for this over-use or mis-use is that the the term “outside view” has developed an extremely positive connotation within the community. People are applauded for saying that they’re relying on “outside views” — “outside view” has become “an applause light” — and so will rely on items in the bag to an extent that is epistemically unjustified.
The things in the bag are also pretty different from each other — and not everyone who uses the term “outside view” agrees about exactly what belongs in the bag. This conflation/ambiguity can lead to miscommunication.
More importantly, when it comes to the usefulness of the different items in the bag, some have more evidential support than others. Using the term “outside view” to refer to everything in the bag might therefore lead people to overrated certain items that actually have weak evidential support.
To sum up, tabooing the term “outside view” might solve two problems. First, it might reduce miscommunication. Second, more importantly, it might cause people to stop overrating some of the reasoning processes that they currently characterize as involving “outside views.” The mechanisms by which tabooing the term can help to solve the second problem are: (a) it takes away an “applause light,” whose existence incentivizes excessive use of these reasoning processes, and (b) it allows people to more easily recognize that some of these reasoning processes don’t actually have much empirical support.
I’m curious if this feels roughly right, or feels pretty off.
Part of the reason I interpreted your post this way: The quote you kicked the post off suggested to me that your primary preoccupation was over-use or mis-use of the tools people called “outside views,” including more conventional reference-class forecasting. It seemed like the quote is giving an example of someone who’s refusing to engage in causal reasoning, evaluate object-level arguments, etc., based on the idea that outside views are just strictly dominant in the context of AI forecasting. It seemed like this would have been an issue even if the person was doing totally orthodox reference-class forecasting and there was no ambiguity about what they were doing.[1]
I don’t think that you’re generally opposed to the items in the “outside view” bag or anything like that. I also don’t assume that you disagree with most of the points I listed in my last comment, for why I think intellectuals probably on average underrated the items in the bag. I just listed all of them because you asked for an explanation for my view, I suppose with some implication that you might disagree with it.
You’ve also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by “reference class forecasting” you mean the stuff Tetlock’s studies are about, then it really shouldn’t include the anti-weirdness heuristic, but it seems like you are saying it does?)
I think it’s probably not worth digging deeper on the definitions I gave, since I definitely don’t think they’re close to perfect. But just a clarification here, on the anti-weirdness heuristic: I’m thinking of the reference class as “weird-sounding claims.”
Suppose someone approaches you not the street and hands you a flyer claiming: “The US government has figured out a way to use entangled particles to help treat cancer, but political elites are hoarding the particles.” You quickly form a belief that the flyer’s claim is almost certainly false, by thinking to yourself: “This is a really weird-sounding claim, and I figure that virtually all really weird-sounding claims that appear in random flyers are wrong.”
In this case, you’re not doing any deductive reasoning about the claim itself or relying on any causal models that directly bear on the claim. (Although you could.) For example, you’re not thinking to yourself: “Well, I know about quantum mechanics, and I know entangled particles couldn’t be useful for treating cancer for reason X.” Or: “I understand economic incentives, or understand social dynamics around secret-keeping, so I know it’s unlikely this information would be kept secret.” You’re just picking a reference class — weird-sounding claims made on random flyers — and justifying your belief that way.
I think it’s possible that Tetlock’s studies don’t bear very strongly on the usefulness of this reference class, since I imagine participants in his studies almost never used it. (“The claim ‘there will be a coup in Venezuela in the next five years’ sounds really weird to me, and most claims that sound weird to me aren’t true, so it’s probably not true!”) But I think the anti-weirdness heuristic does fit with the definitions I gave, as well as the definition you give that characterizes the term’s “original meaning.” I also do think that Tetlock’s studies remain at least somewhat relevant when judging the potential usefulness of the heuristic.
I initially engaged on the miscommunication, point, though, since this is the concern that would mostly strongly make me want to taboo the term. I’d rather address the applause light problem, if it is a problem, but trying get people in the EA community stop applauding, and the evidence problem, if it is a problem, by trying to just directly make people in the EA community more aware of the limits of evidence.
Wow, that’s an impressive amount of charitable reading + attempting-to-ITT you did just there, my hat goes off to you sir!
I think that summary of my view is roughly correct. I think it over-emphasizes the applause light aspect compared to other things I was complaining about; in particular, there was my second point in the “this expansion of meaning is bad” section, about how people seem to think that it is important to have an outside view and an inside view (but only an inside view if you feel like you are an expert) which is, IMO, totally not the lesson one should draw from Tetlock’s studies etc., especially not with the modern, expanded definition of these terms. I also think that while I am mostly complaining about what’s happened to “outside view,” I also think similar things apply to “inside view” and thus I recommend tabooing it also.
In general, the taboo solution feels right to me; when I imagine re-doing various conversations I’ve had, except without that phrase, and people instead using more specific terms, I feel like things would just be better. I shudder at the prospect of having a discussion about “Outside view vs inside view: which is better? Which is overrated and which is underrated?” (and I’ve worried that this thread may be tending in that direction) but I would really look forward to having a discussion about “let’s look at Daniel’s list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate.”
Now I’ll try to say what I think your position is:
1. If people were using “outside view” without explaining more specifically what they mean, that would be bad and it should be tabood, but you don’t see that in your experience 2. If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock’s studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn’t the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock’s studies are about. So rather than taboo “outside view” we should continue to use the term but mildly prune the list. 3. There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.
I shudder at the prospect of having a discussion about “Outside view vs inside view: which is better? Which is overrated and which is underrated?” (and I’ve worried that this thread may be tending in that direction) but I would really look forward to having a discussion about “let’s look at Daniel’s list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate.”
I also shudder a bit at that prospect.
I am sometimes happy making pretty broad and sloppy statements. For example: “People making political predictions typically don’t make enough use of ‘outside view’ perspectives” feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)
But if you want to dig in deep, for example when evaluating the rationality of a particular prediction, you should definitely shift toward making more specific and precise statements. For example, if someone has based their own AI timelines on Katja’s expert survey, and they wanted to defend their view by simply evoking the principle “outside views are better than inside views,” I think this would probably a horrible conversation. A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: “Tetlock’s work suggests it’s easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn’t be so confident that your own ‘inside view’ prediction will be very good either.” But mostly you should be more specific.
Now I’ll try to say what I think your position is:
If people were using “outside view” without explaining more specifically what they mean, that would be bad and it should be tabood, but you don’t see that in your experience
If the things in the first Big List were indeed super diverse and disconnected from the evidence in Tetlock’s studies etc., then there would indeed be no good reason to bundle them together under one term. But in fact this isn’t the case; most of the things on the list are special cases of reference-class / statistical reasoning, which is what Tetlock’s studies are about. So rather than taboo “outside view” we should continue to use the term but mildly prune the list.
There may be a general bias in this community towards using the things on the first Big List, but (a) in your opinion the opposite seems more true, and (b) at any rate even if this is true the right response is to argue for that directly rather than advocating the tabooing of the term.
How does that sound?
I’d say that sounds basically right!
The only thing is that I don’t necessarily agree with 3a.
I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they’re overconfident in their own abilities to reason causally/deductively in certain domains. I’m not sure which is overall more problematic, at the moment, in part because I’m not sure how people actually should be integrating different considerations in domains like AI forecasting.
There also seem to be biases that cut in both directions. I think the ‘baseline bias’ is pretty strongly toward causal/deductive reasoning, since it’s more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it’s rare to possess), is probably typically more interesting and emotionally satisfying, and doesn’t as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of ‘epistemic virtue.’ I guess the pro-causal/deductive bias often feels more salient to me, but I don’t really want to make any confident claim here that it actually is more powerful.
As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”
A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used to support an argument for short timelines, since the claim was also made that we now had roughly insect-level compute. If insect-level intelligence has arrived around the same time as insect-level compute, then, it seems to follow, we shouldn’t be at all surprised if we get ‘human-level intelligence’ at roughly the point where we get human-level compute. And human-level compute might be achieved pretty soon.
For a couple of reasons, I think some people updated their timelines too strongly in response to this argument. First, it seemed like there are probably a lot of opportunities to make mistakes when constructing the argument: it’s not clear how “insect-level intelligence” or “human-level intelligence” should be conceptualised, it’s not clear how best to map AI behaviour onto insect behaviour, etc. The argument also hadn’t yet been vetted closely or expressed very precisely, which seemed to increase the possibility of not-yet-appreciated issues.
Second, we know that there are previous of examples of smart people looking at AI behaviour and forming the impression that it suggests “insect-level intelligence.” For example, in Nick Bostrom’s paper “How Long Before Superintelligence?” (1998) he suggested that “approximately insect-level intelligence” was achieved sometime in the 70s, as a result of insect-level computing power being achieved in the 70s. In Moravec’s book Mind Children (1990), he also suggested that both insect-level intelligence and insect-level compute had both recently been achieved. Rodney Brooks also had this whole research program, in the 90s, that was based around going from “insect-level intelligence” to “human-level intelligence.”
I think many people didn’t give enough weight to the reference class “instances of smart people looking at AI systems and forming the impression that they exhibit insect-level intelligence” and gave too much weight to the more deductive/model-y argument that had been constructed.
This case is obviously pretty different than the sorts of cases that Tetlock’s studies focused on, but I do still feel like the studies have some relevance. I think Tetlock’s work should, in a pretty broad way, make people more suspicious of their own ability to perform to linear/model-heavy reasoning about complex phenomena, without getting tripped up or fooling themselves. It should also make people somewhat more inclined to take reference classes seriously, even when the reference classes are fairly different from the sorts of reference classes good forecasters used in Tetlock’s studies. I do also think that the terms “inside view” and “outside view” apply relatively neatly, in this case, and are nice bits of shorthand — although, admittedly, it’s far from necessary to use them.
This is the sort of case I have in the back of my mind.
(There are also, of course, cases that point in the opposite direction, where many people seemingly gave too much weight to something they classified as an “outside view.” Early under-reaction to COVID is arguably one example.)
In retrospect we know that the AI project couldn’t possibly have succeeded at that stage. The hardware was simply not powerful enough. It seems that at least about 100 Tops is required for human-like performance, and possibly as much as 10^17 ops is needed. The computers in the seventies had a computing power comparable to that of insects. They also achieved approximately insect-level intelligence.
I would have guessed this is just a funny quip, in the sense that (i) it sure sounds like it’s just a throw-away quip, no evidence is presented for those AI systems being competent at anything (he moves on to other topics in the next sentence), “approximately insect-level” seems appropriate as a generic and punchy stand in for “pretty dumb,” (ii) in the document he is basically just thinking about AI performance on complex tasks and trying to make the point that you shouldn’t be surprised by subhuman performance on those tasks, which doesn’t depend much on the literal comparison to insects, (iii) the actual algorithms described in the section (neural nets and genetic algorithms) wouldn’t plausibly achieve insect-level performance in the 70s since those algorithms in fact do require large training processes (and were in fact used in the 70s to train much tinier neural networks).
(Of course you could also just ask Nick.)
I also think it’s worth noting that the prediction in that section looks reasonably good in hindsight. It was written right at the beginning of resurgent interest in neural networks (right before Yann LeCun’s paper on MNIST with neural networks). The hypothesis “computers were too small in the past so that’s why they were lame” looks like it was a great call, and Nick’s tentative optimism about particular compute-heavy directions looks good. I think overall this is a significantly better take than mainstream opinions in AI. I don’t think this literally affects your point, but it is relevant if the implicit claim is “And people talking about insect comparisons were lead astray by these comparisons.”
I suspect you are more broadly underestimating the extent to which people used “insect-level intelligence” as a generic stand-in for “pretty dumb,” though I haven’t looked at the discussion in Mind Children and Moravec may be making a stronger claim. I’d be more inclined to tread carefully if some historical people tried to actually compare the behavior of their AI system to the behavior of an insect and found it comparable as in posts like this one (it’s not clear to me how such an evaluation would have suggested insect-level robotics in the 90s or even today, I think the best that can be said is that today it seems compatible with insect-level robotics in simulation today). I’ve seen Moravec use the phrase “insect-level intelligence” to refer to the particular behaviors of “following pheromone trails” or “flying towards lights,” so I might also read him as referring to those behaviors in particular. (It’s possible he is underestimating the total extent of insect intelligence, e.g. discounting the complex motor control performed by insects, though I haven’t seen him do that explicitly and it would be a bit off brand.)
ETA: While I don’t think 1990s robotics could plausibly be described as “insect-level,” I actually do think that the linked post on bee vision could plausibly have been written in the 90s and concluded that computer vision was bee-level, it’s just a very hard comparison to make and the performance of the bees in the formal task is fairly unimpressive.
I suspect you are more broadly underestimating the extent to which people used “insect-level intelligence” as a generic stand-in for “pretty dumb,” though I haven’t looked at the discussion in Mind Children and Moravec may be making a stronger claim.
I think that’s good push-back and a fair suggestion: I’m not sure how seriously the statement in Nick’s paper was meant to be taken. I hadn’t considered that it might be almost entirely a quip. (I may ask him about this.)
Moravec’s discussion in Mind Children is similarly brief: He presents a graph of the computing power of different animal’s brains and states that “lab computers are roughly equal in power to the nervous systems of insects.”He also characterizes current AI behaviors as “insectlike” and writes: “I believe that robots with human intelligence will be common within fifty years. By comparison, the best of today’s machines have minds more like those of insects than humans. Yet this performance itself represents a giant leap forward in just a few decades.” I don’t think he’s just being quippy, but there’s also no suggestion that he means anything very rigorous/specific by his suggestion.
Rodney Brooks, I think, did mean for his comparisons to insect intelligence to be taken very seriously. The idea of his “nouvelle AI program” was to create AI systems that match insect intelligence, then use that as a jumping-off point for trying to produce human-like intelligence. I think walking and obstacle navigation, with several legs, was used as the main dimension of comparison. The Brooks case is a little different, though, since (IIRC) he only claimed that his robots exhibited important aspects of insect intelligence or fell just short insect intelligence, rather than directly claiming that they actually matched insect intelligence. On the other hand, he apparently felt he had gotten close enough to transition to the stage of the project that was meant to go from insect-level stuff to human-level stuff.
A plausible reaction to these cases, then, might be:
OK, Rodney Brooks did make a similar comparison, and was a major figure at the time, but his stuff was pretty transparently flawed. Moravec’s and Bostrom’s comments were at best fairly off-hand, suggesting casual impressions more than they suggest outcomes of rigorous analysis. The more recent “insect-level intelligence” claim is pretty different, since it’s built on top of much more detailed analysis than anything Moravec/Bostrom did, and it’s less obviously flawed than Brooks’ analysis. The likelihood that it reflects an erroneous impression is, therefore, a lot lower. The previous cases shouldn’t actually do much to raise our suspicion levels.
I think there’s something to this reaction, particularly if there’s now more rigorous work being done to operationalize and test the “insect-level intelligence” claim. I hadn’t yet seen the recent post you linked to, which, at first glance, seems like a good and clear piece of work. The more rigorous work is done to flesh out the argument, the less I’m inclined to treat the Bostrom/Moravec/Brooks cases as part of an epistemically relevant reference class.
My impression a few years ago was that the claim wasn’t yet backed by any really clear/careful analysis. At least, the version that filtered down to me seemed to be substantially based on fuzzy analogies between RL agent behavior and insect behavior, without anyone yet knowing much about insect behavior. (Although maybe this was a misimpression.) So I probably do stand by the reference class being relevant back then.
Overall, to sum up, my position here is something like: “The Bostrom/Moravec/Brooks cases do suggest that it might be easy to see roughly insect-level intelligence, if that’s what you expect to see and you’re relying on fuzzy impressions, paying special attention to stuff AI systems can already do, or not really operationalizing your claims. This should make us more suspicious of modern claims that we’ve recently achieved ‘insect-level intelligence,’ unless they’re accompanied by transparent and pretty obviously robust reasoning. Insofar as this work is being done, though, the Bostrom/Moravec/Brooks cases become weaker grounds for suspicion.”
I do think my main impression of insect <-> simulated robot parity comes from very fuzzy evaluations of insect motor control vs simulated robot motor control (rather than from any careful analysis, of which I’m a bit more skeptical though I do think it’s a relevant indicator that we are at least trying to actually figure out the answer here in a way that wasn’t true historically). And I do have only a passing knowledge of insect behavior, from watching youtube videos and reading some book chapters about insect learning. So I don’t think it’s unfair to put it in the same reference class as Rodney Brooks’ evaluations to the extent that his was intended as a serious evaluation.
I am sometimes happy making pretty broad and sloppy statements. For example: “People making political predictions typically don’t make enough use of ‘outside view’ perspectives” feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)
I guess we can just agree to disagree on that for now. The example statement you gave would feel fine to me if it used the original meaning of “outside view” but not the new meaning, and since many people don’t know (or sometimes forget) the original meaning…
A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: “Tetlock’s work suggests it’s easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn’t be so confident that your own ‘inside view’ prediction will be very good either.” But mostly you should be more specific.
100% agreement here, including on the bolded bit.
I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they’re overconfident in their own abilities to reason causally/deductively in certain domains. I’m not sure which is overall more problematic, at the moment, in part because I’m not sure how people actually should be integrating different considerations in domains like AI forecasting.
Also agree here, but again I don’t really care which one is overall more problematic because I think we have more precise concepts we can use and it’s more helpful to use them instead of these big bags.
There also seem to be biases that cut in both directions. I think the ‘baseline bias’ is pretty strongly toward causal/deductive reasoning, since it’s more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it’s rare to possess), is probably typically more interesting and emotionally satisfying, and doesn’t as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of ‘epistemic virtue.’ I guess the pro-causal/deductive bias often feels more salient to me, but I don’t really want to make any confident claim here that it actually is more powerful.
I think I agree with all this as well, noting that this causal/deductive reasoning definition of inside view isn’t necessarily what other people mean by inside view, and also isn’t necessarily what Tetlock meant. I encourage you to use the term “causal/deductive reasoning” instead of “inside view,” as you did here, it was helpful (e.g. if you had instead used “inside view” I would not have agreed with the claim about baseline bias)
On the contrary; tabooing the term is more helpful, I think. I’ve tried to explain why in the post. I’m not against the things “outside view” has come to mean; I’m just against them being conflated with / associated with each other, which is what the term does. If my point was simply that the first Big List was overrated and the second Big List was underrated, I would have written a very different post!
By what definition of “outside view?” There is some evidence that in some circumstances people don’t take reference class forecasting seriously enough; that’s what the original term “outside view” meant. What evidence is there that the things on the Big List O’ Things People Describe as Outside View are systematically underrated by the average intellectual?
My initial comment was focused on your point about conflation, because I think this point bears on the linguistic question more strongly than the other points do. I haven’t personally found conflation to be a large issue. (Recognizing, again, that our experiences may differ.) If I agreed with the point about conflation, though, then I would think it might be worth tabooing the term “outside view.”
By “taking an outside view on X” I basically mean “engaging in statistical or reference-class-based reasoning.” I think it might also be best defined negatively: “reasoning that doesn’t substantially involve logical deduction or causal models of the phenomenon in question. “[1]
I think most of the examples in your list fit these definitions.
Epistemic deference is a kind of statistical/reference-class-based reasoning, for example, which doesn’t involve applying any sort of causal model of the phenomenon in question. The logic is “Ah, I should update downward on this claim, since experts in domain X disagree with it and I think that experts in domain X will typically be right.”
Same for anti-weirdness: The idea is that weird claims are typically wrong.
I’d say that trend extrapolation also fits: You’re not doing logical reasoning or relying on a causal model of the relevant phenomenon. You’re just extrapolating a trend forward, largely based on the assumption that long-running trends don’t typically end abruptly.
“Foxy aggregation,” admittedly, does seem like a different thing to me: It arguably fits the negative definition, depending on how you generate your weights, but doesn’t seem to fit statistical/reference-class one. It also feels like more of a meta-level thing. So I wouldn’t personally use the term “outside view” to talk about foxy aggregation. (I also don’t think I’ve personally heard people use the term “outside view” to talk about foxy aggegration, although I obviously believe you have.)
A condensed answer is: (a) I think most public intellectuals barely use any of the items on this list (with the exception of the anti-weirdness heuristic); (b) I think some of the things on this list are often useful; (c) I think that intellectuals and people more generally are very often bad at reasoning causally/logically about complex social phenomena; (d) I expect intellectuals to often have a bias against outside-view-style reasoning, since it often feels somewhat unsatisfying/unnatural and doesn’t allow them to display impressive-seeming domain-knowledge, interesting models of the world, or logical reasoning skills; and (e) I do still think Tetlock’s evidence is at least somewhat relevant to most things on the list, in part because I think they actually are somewhat related to each other, although questions of external validity obviously grow more serious the further you move from the precise sorts of questions asked in his tournaments and the precise styles of outside-view reasoning displayed by participants. [2]
There’s also, of course, a bit of symmetry here. One could also ask: “What evidence is there that the things on the Big List O’ Things People Describe as Outside View are systematically overrated by the average intellectual?” :)
These definitions of course aren’t perfect, and other people sometimes use the term more broadly than I do, but, again, some amount of fuzziness seems OK to me. Most concepts have fuzzy boundaries and are hard to define precisely.
On the Tetlock evidence: I think one thing his studies suggest, which I expect to generalize pretty well to many different contexts, is that people who are trying to make predictions about complex phenemona (especially complex social phenemona) often do very poorly when they don’t incorporate outside views into their reasoning processes. (You can correct me if this seems wrong, since you’ve thought about Tetlock’s work far more than I have.) So, on my understanding, Tetlock’s work suggests that outside-view-heavy reasoning processes would often substitute for reasoning processes that lead to poor predictions anyways. At least for most people, then, outside-view-heavy reasoning processes don’t actually need to be very reliable to constitute improvements—and they need to be pretty bad to, on average, lead to worse predictions.
Another small comment here: I think Tetlock’s work also counts, in a somewhat broad way, against the “reference class tennis” objection to reference-class-based forecasting. On its face, the objection also applies to the use of reference classes in standard forecasting tournaments. There are always a ton of different reference classes someone could use to forecast any given political event. Forecasters need to rely on some sort of intuition, or some sort of fuzzy reasoning, to decide on which reference classes to take seriously; it’s a priori plausible that people would be just consistently very bad at this, given the number of degrees of freedom here and the absence of clear principles for making one’s selections. But this issue doesn’t actually seem to be that huge in the context of the sorts of questions Tetlock asked his participants. (You can again correct me if I’m wrong.) The degrees-of-freedom problem might be far larger in other contexts, but the fact that the issue is manageable in Tetlockian contexts presumably counts as at least a little bit of positive evidence.
I said in the post, I’m a fan of reference classes. I feel like you think I’m not? I am! I’m also a fan of analogies. And I love trend extrapolation. I admit I’m not a fan of the anti-weirdness heuristic, but even it has its uses. In general most of what you are saying in this thread is stuff I agree with, which makes me wonder if we are talking past each other. (Example 1: Your second small comment about reference class tennis. Example 2: Your first small comment, if we interpret instances of “outside view” as meaning “reference classes” in the strict sense, though not if we use the broader definition you favor. Example 3: your points a, b, c, and e. (point d, again, depends on what you mean by ‘outside view,’ and also what counts as often.)
My problem is with the term “Outside view.” (And “inside view” too!) I don’t think you’ve done much to argue in favor of it in this thread. You have said that in your experience it doesn’t seem harmful; fair enough, point taken. In mine it does. You’ve also given two rough definitions of the term, which seem quite different to me, and also quite fuzzy. (e.g. if by “reference class forecasting” you mean the stuff Tetlock’s studies are about, then it really shouldn’t include the anti-weirdness heuristic, but it seems like you are saying it does?) I found myself repeatedly thinking “but what does he mean by outside view? I agree or don’t agree depending on what he means...” even though you had defined it earlier. You’ve said that you think the practices you call “outside view” are underrated and deserve positive reinforcement; I totally agree that some of them are, but I maintain that some of them are overrated, and would like to discuss each of them on a case by case basis instead of lumping them all together under one name. Of course you are free to use whatever terms you like, but I intend to continue to ask people to be more precise when I hear “outside view” or “inside view.” :)
It’s definitely entirely plausible that I’ve misunderstood your views.
My interpretation of the post was something like this:
I’m curious if this feels roughly right, or feels pretty off.
Part of the reason I interpreted your post this way: The quote you kicked the post off suggested to me that your primary preoccupation was over-use or mis-use of the tools people called “outside views,” including more conventional reference-class forecasting. It seemed like the quote is giving an example of someone who’s refusing to engage in causal reasoning, evaluate object-level arguments, etc., based on the idea that outside views are just strictly dominant in the context of AI forecasting. It seemed like this would have been an issue even if the person was doing totally orthodox reference-class forecasting and there was no ambiguity about what they were doing.[1]
I don’t think that you’re generally opposed to the items in the “outside view” bag or anything like that. I also don’t assume that you disagree with most of the points I listed in my last comment, for why I think intellectuals probably on average underrated the items in the bag. I just listed all of them because you asked for an explanation for my view, I suppose with some implication that you might disagree with it.
I think it’s probably not worth digging deeper on the definitions I gave, since I definitely don’t think they’re close to perfect. But just a clarification here, on the anti-weirdness heuristic: I’m thinking of the reference class as “weird-sounding claims.”
Suppose someone approaches you not the street and hands you a flyer claiming: “The US government has figured out a way to use entangled particles to help treat cancer, but political elites are hoarding the particles.” You quickly form a belief that the flyer’s claim is almost certainly false, by thinking to yourself: “This is a really weird-sounding claim, and I figure that virtually all really weird-sounding claims that appear in random flyers are wrong.”
In this case, you’re not doing any deductive reasoning about the claim itself or relying on any causal models that directly bear on the claim. (Although you could.) For example, you’re not thinking to yourself: “Well, I know about quantum mechanics, and I know entangled particles couldn’t be useful for treating cancer for reason X.” Or: “I understand economic incentives, or understand social dynamics around secret-keeping, so I know it’s unlikely this information would be kept secret.” You’re just picking a reference class — weird-sounding claims made on random flyers — and justifying your belief that way.
I think it’s possible that Tetlock’s studies don’t bear very strongly on the usefulness of this reference class, since I imagine participants in his studies almost never used it. (“The claim ‘there will be a coup in Venezuela in the next five years’ sounds really weird to me, and most claims that sound weird to me aren’t true, so it’s probably not true!”) But I think the anti-weirdness heuristic does fit with the definitions I gave, as well as the definition you give that characterizes the term’s “original meaning.” I also do think that Tetlock’s studies remain at least somewhat relevant when judging the potential usefulness of the heuristic.
I initially engaged on the miscommunication, point, though, since this is the concern that would mostly strongly make me want to taboo the term. I’d rather address the applause light problem, if it is a problem, but trying get people in the EA community stop applauding, and the evidence problem, if it is a problem, by trying to just directly make people in the EA community more aware of the limits of evidence.
Wow, that’s an impressive amount of charitable reading + attempting-to-ITT you did just there, my hat goes off to you sir!
I think that summary of my view is roughly correct. I think it over-emphasizes the applause light aspect compared to other things I was complaining about; in particular, there was my second point in the “this expansion of meaning is bad” section, about how people seem to think that it is important to have an outside view and an inside view (but only an inside view if you feel like you are an expert) which is, IMO, totally not the lesson one should draw from Tetlock’s studies etc., especially not with the modern, expanded definition of these terms. I also think that while I am mostly complaining about what’s happened to “outside view,” I also think similar things apply to “inside view” and thus I recommend tabooing it also.
In general, the taboo solution feels right to me; when I imagine re-doing various conversations I’ve had, except without that phrase, and people instead using more specific terms, I feel like things would just be better. I shudder at the prospect of having a discussion about “Outside view vs inside view: which is better? Which is overrated and which is underrated?” (and I’ve worried that this thread may be tending in that direction) but I would really look forward to having a discussion about “let’s look at Daniel’s list of techniques and talk about which ones are overrated and underrated and in what circumstances each is appropriate.”
Now I’ll try to say what I think your position is:
How does that sound?
Thank you (and sorry for my delayed response)!
I also shudder a bit at that prospect.
I am sometimes happy making pretty broad and sloppy statements. For example: “People making political predictions typically don’t make enough use of ‘outside view’ perspectives” feels fine to me, as a claim, despite some ambiguity around the edges. (Which perspectives should they use? How exactly should they use them? Etc.)
But if you want to dig in deep, for example when evaluating the rationality of a particular prediction, you should definitely shift toward making more specific and precise statements. For example, if someone has based their own AI timelines on Katja’s expert survey, and they wanted to defend their view by simply evoking the principle “outside views are better than inside views,” I think this would probably a horrible conversation. A good conversation would focus specifically on the conditions under which it makes sense to defer heavily to experts, whether those conditions apply in this particular case, etc. Some general Tetlock stuff might come into the conversation, like: “Tetlock’s work suggests it’s easy to trip yourself up if you try to use your own detailed/causal model of the world to make predictions, so you shouldn’t be so confident that your own ‘inside view’ prediction will be very good either.” But mostly you should be more specific.
I’d say that sounds basically right!
The only thing is that I don’t necessarily agree with 3a.
I think some parts of the community lean too much on things in the bag (the example you give at the top of the post is an extreme example). I also think that some parts of the community lean too little on things in the bag, in part because (in my view) they’re overconfident in their own abilities to reason causally/deductively in certain domains. I’m not sure which is overall more problematic, at the moment, in part because I’m not sure how people actually should be integrating different considerations in domains like AI forecasting.
There also seem to be biases that cut in both directions. I think the ‘baseline bias’ is pretty strongly toward causal/deductive reasoning, since it’s more impressive-seeming, can suggest that you have something uniquely valuable to bring to the table (if you can draw on lots of specific knowledge or ideas that it’s rare to possess), is probably typically more interesting and emotionally satisfying, and doesn’t as strongly force you to confront or admit the limits of your predictive powers. The EA community has definitely introduced an (unusual?) bias in the opposite direction, by giving a lot of social credit to people who show certain signs of ‘epistemic virtue.’ I guess the pro-causal/deductive bias often feels more salient to me, but I don’t really want to make any confident claim here that it actually is more powerful.
As a last thought here (no need to respond), I thought it might useful to give one example of a concrete case where: (a) Tetlock’s work seems relevant, and I find the terms “inside view” and “outside view” natural to use, even though the case is relatively different from the ones Tetlock has studied; and (b) I think many people in the community have tended to underweight an “outside view.”
A few years ago, I pretty frequently encountered the claim that recently developed AI systems exhibited roughly “insect-level intelligence.” This claim was typically used to support an argument for short timelines, since the claim was also made that we now had roughly insect-level compute. If insect-level intelligence has arrived around the same time as insect-level compute, then, it seems to follow, we shouldn’t be at all surprised if we get ‘human-level intelligence’ at roughly the point where we get human-level compute. And human-level compute might be achieved pretty soon.
For a couple of reasons, I think some people updated their timelines too strongly in response to this argument. First, it seemed like there are probably a lot of opportunities to make mistakes when constructing the argument: it’s not clear how “insect-level intelligence” or “human-level intelligence” should be conceptualised, it’s not clear how best to map AI behaviour onto insect behaviour, etc. The argument also hadn’t yet been vetted closely or expressed very precisely, which seemed to increase the possibility of not-yet-appreciated issues.
Second, we know that there are previous of examples of smart people looking at AI behaviour and forming the impression that it suggests “insect-level intelligence.” For example, in Nick Bostrom’s paper “How Long Before Superintelligence?” (1998) he suggested that “approximately insect-level intelligence” was achieved sometime in the 70s, as a result of insect-level computing power being achieved in the 70s. In Moravec’s book Mind Children (1990), he also suggested that both insect-level intelligence and insect-level compute had both recently been achieved. Rodney Brooks also had this whole research program, in the 90s, that was based around going from “insect-level intelligence” to “human-level intelligence.”
I think many people didn’t give enough weight to the reference class “instances of smart people looking at AI systems and forming the impression that they exhibit insect-level intelligence” and gave too much weight to the more deductive/model-y argument that had been constructed.
This case is obviously pretty different than the sorts of cases that Tetlock’s studies focused on, but I do still feel like the studies have some relevance. I think Tetlock’s work should, in a pretty broad way, make people more suspicious of their own ability to perform to linear/model-heavy reasoning about complex phenomena, without getting tripped up or fooling themselves. It should also make people somewhat more inclined to take reference classes seriously, even when the reference classes are fairly different from the sorts of reference classes good forecasters used in Tetlock’s studies. I do also think that the terms “inside view” and “outside view” apply relatively neatly, in this case, and are nice bits of shorthand — although, admittedly, it’s far from necessary to use them.
This is the sort of case I have in the back of my mind.
(There are also, of course, cases that point in the opposite direction, where many people seemingly gave too much weight to something they classified as an “outside view.” Early under-reaction to COVID is arguably one example.)
The Nick Bostrom quote (from here) is:
I would have guessed this is just a funny quip, in the sense that (i) it sure sounds like it’s just a throw-away quip, no evidence is presented for those AI systems being competent at anything (he moves on to other topics in the next sentence), “approximately insect-level” seems appropriate as a generic and punchy stand in for “pretty dumb,” (ii) in the document he is basically just thinking about AI performance on complex tasks and trying to make the point that you shouldn’t be surprised by subhuman performance on those tasks, which doesn’t depend much on the literal comparison to insects, (iii) the actual algorithms described in the section (neural nets and genetic algorithms) wouldn’t plausibly achieve insect-level performance in the 70s since those algorithms in fact do require large training processes (and were in fact used in the 70s to train much tinier neural networks).
(Of course you could also just ask Nick.)
I also think it’s worth noting that the prediction in that section looks reasonably good in hindsight. It was written right at the beginning of resurgent interest in neural networks (right before Yann LeCun’s paper on MNIST with neural networks). The hypothesis “computers were too small in the past so that’s why they were lame” looks like it was a great call, and Nick’s tentative optimism about particular compute-heavy directions looks good. I think overall this is a significantly better take than mainstream opinions in AI. I don’t think this literally affects your point, but it is relevant if the implicit claim is “And people talking about insect comparisons were lead astray by these comparisons.”
I suspect you are more broadly underestimating the extent to which people used “insect-level intelligence” as a generic stand-in for “pretty dumb,” though I haven’t looked at the discussion in Mind Children and Moravec may be making a stronger claim. I’d be more inclined to tread carefully if some historical people tried to actually compare the behavior of their AI system to the behavior of an insect and found it comparable as in posts like this one (it’s not clear to me how such an evaluation would have suggested insect-level robotics in the 90s or even today, I think the best that can be said is that today it seems compatible with insect-level robotics in simulation today). I’ve seen Moravec use the phrase “insect-level intelligence” to refer to the particular behaviors of “following pheromone trails” or “flying towards lights,” so I might also read him as referring to those behaviors in particular. (It’s possible he is underestimating the total extent of insect intelligence, e.g. discounting the complex motor control performed by insects, though I haven’t seen him do that explicitly and it would be a bit off brand.)
ETA: While I don’t think 1990s robotics could plausibly be described as “insect-level,” I actually do think that the linked post on bee vision could plausibly have been written in the 90s and concluded that computer vision was bee-level, it’s just a very hard comparison to make and the performance of the bees in the formal task is fairly unimpressive.
I think that’s good push-back and a fair suggestion: I’m not sure how seriously the statement in Nick’s paper was meant to be taken. I hadn’t considered that it might be almost entirely a quip. (I may ask him about this.)
Moravec’s discussion in Mind Children is similarly brief: He presents a graph of the computing power of different animal’s brains and states that “lab computers are roughly equal in power to the nervous systems of insects.”He also characterizes current AI behaviors as “insectlike” and writes: “I believe that robots with human intelligence will be common within fifty years. By comparison, the best of today’s machines have minds more like those of insects than humans. Yet this performance itself represents a giant leap forward in just a few decades.” I don’t think he’s just being quippy, but there’s also no suggestion that he means anything very rigorous/specific by his suggestion.
Rodney Brooks, I think, did mean for his comparisons to insect intelligence to be taken very seriously. The idea of his “nouvelle AI program” was to create AI systems that match insect intelligence, then use that as a jumping-off point for trying to produce human-like intelligence. I think walking and obstacle navigation, with several legs, was used as the main dimension of comparison. The Brooks case is a little different, though, since (IIRC) he only claimed that his robots exhibited important aspects of insect intelligence or fell just short insect intelligence, rather than directly claiming that they actually matched insect intelligence. On the other hand, he apparently felt he had gotten close enough to transition to the stage of the project that was meant to go from insect-level stuff to human-level stuff.
A plausible reaction to these cases, then, might be:
I think there’s something to this reaction, particularly if there’s now more rigorous work being done to operationalize and test the “insect-level intelligence” claim. I hadn’t yet seen the recent post you linked to, which, at first glance, seems like a good and clear piece of work. The more rigorous work is done to flesh out the argument, the less I’m inclined to treat the Bostrom/Moravec/Brooks cases as part of an epistemically relevant reference class.
My impression a few years ago was that the claim wasn’t yet backed by any really clear/careful analysis. At least, the version that filtered down to me seemed to be substantially based on fuzzy analogies between RL agent behavior and insect behavior, without anyone yet knowing much about insect behavior. (Although maybe this was a misimpression.) So I probably do stand by the reference class being relevant back then.
Overall, to sum up, my position here is something like: “The Bostrom/Moravec/Brooks cases do suggest that it might be easy to see roughly insect-level intelligence, if that’s what you expect to see and you’re relying on fuzzy impressions, paying special attention to stuff AI systems can already do, or not really operationalizing your claims. This should make us more suspicious of modern claims that we’ve recently achieved ‘insect-level intelligence,’ unless they’re accompanied by transparent and pretty obviously robust reasoning. Insofar as this work is being done, though, the Bostrom/Moravec/Brooks cases become weaker grounds for suspicion.”
I do think my main impression of insect <-> simulated robot parity comes from very fuzzy evaluations of insect motor control vs simulated robot motor control (rather than from any careful analysis, of which I’m a bit more skeptical though I do think it’s a relevant indicator that we are at least trying to actually figure out the answer here in a way that wasn’t true historically). And I do have only a passing knowledge of insect behavior, from watching youtube videos and reading some book chapters about insect learning. So I don’t think it’s unfair to put it in the same reference class as Rodney Brooks’ evaluations to the extent that his was intended as a serious evaluation.
Yeah, FWIW I haven’t found any recent claims about insect comparisons particularly rigorous.
I guess we can just agree to disagree on that for now. The example statement you gave would feel fine to me if it used the original meaning of “outside view” but not the new meaning, and since many people don’t know (or sometimes forget) the original meaning…
100% agreement here, including on the bolded bit.
Also agree here, but again I don’t really care which one is overall more problematic because I think we have more precise concepts we can use and it’s more helpful to use them instead of these big bags.
I think I agree with all this as well, noting that this causal/deductive reasoning definition of inside view isn’t necessarily what other people mean by inside view, and also isn’t necessarily what Tetlock meant. I encourage you to use the term “causal/deductive reasoning” instead of “inside view,” as you did here, it was helpful (e.g. if you had instead used “inside view” I would not have agreed with the claim about baseline bias)