Researcher (on bio) at FHI
For my part, I’m more partial to ‘blaming the reader’, but (evidently) better people mete out better measure than I in turn.Insofar as it goes, I think the challenge (at least for me) is qualitative terms can cover multitudes (or orders of magnitudes) of precision. I’d take ~0.3% to be ‘significant’ credence for some values of significant. ‘Strong’ ‘compelling’ or ‘good’ arguments could be an LR of 2 (after all, RCT confirmation can be ~3) or 200. I also think quantitative articulation would help the reader (or at least this reader) better benchmark the considerations here. Taking the rough posterior of 0.1% and prior of 1 in 100 million, this implies a likelihood ratio of ~~100 000 - loosely, ultra-decisive evidence. If we partition out the risk-based considerations (which it discussion seems to set as ‘less than decisive’ so <100), the other considerations (perhaps mostly those in S5) give you a LR of > ~1000 - loosely, very decisive evidence. Yet the discussion of the considerations in S5 doesn’t give the impression we should conclude they give us ‘massive updates’. You note there are important caveats to these considerations, you say in summing up these arguments are ‘far from watertight’, and I also inferred the sort of criticisms given in S3 around our limited reasoning ability and scepticism of informal arguments would also apply here too. Hence my presumption these other considerations, although more persuasive than object level arguments around risks, would still end up below the LR ~ 100 for ‘decisive’ evidence, rather than much higher. Another way this would help would be illustrating the uncertainty. Given some indicative priors you note vary by ten orders of magnitude, the prior is not just astronomical but extremely uncertain. By my lights, the update doesn’t greatly reduce our uncertainty (and could compound it, given challenges in calibrating around very high LRs). If the posterior odds could be ‘out by 100 000x either way’ the central estimate being at ~0.3% could still give you (given some naive log-uniform) 20%+ mass distributed at better than even odds of HH. The moaning about hiding the ball arises from the sense this numerical articulation reveals (I think) some powerful objections the more qualitative treatment obscures. E.g.
Typical HH proponents are including considerations around earliness/single planet/ etc. in their background knowledge/prior when discussing object level risks. Noting the prior becomes astronomically adverse when we subtract these out of background knowledge, and so the object level case for (e.g.) AI risk can’t possibly be enough to carry the day alone seems a bait-and-switch: you agree the prior becomes massively less astronomical when we include single planet etc. in background knowledge, and in fact things like ‘we live on only one planet’ are in our background knowledge (and were being assumed at least tacitly by HH proponents).
The attempt to ‘bound’ object level arguments by their LR (e.g. “Well, these are informal, and it looks fishy, etc. so it is hard to see how you can get LR >100 from these”) doesn’t seem persuasive when your view is that the set of germane considerations (all of which seem informal, have caveats attached, etc.) in concert are giving you an LR of ~100 000 or more. If this set of informal considerations can get you more than half way from the astronomical prior to significant credence, why be so sure additional ones (e.g.) articulating a given danger can’t carry you the rest of the way?
I do a lot of forecasting, and I struggle to get a sense of what priors of 1/ 100 M or decisive evidence to the tune of LR 1000 would look like in ‘real life’ scenarios. Numbers this huge (where you end up virtually ‘off the end of the tail’ of your stipulated prior) raise worries about consilience (cf. “I guess the sub-prime morgage crisis was a 10 sigma event”), but moreover pragmatic defeat: there seems a lot to distrust in an epistemic procedure along the lines of “With anthropics given stipulated subtracted background knowledge we end up with an astronomically minute prior (where we could be off by many orders of magnitude), but when we update on adding back in elements of our actual background knowledge this shoots up by many orders of magnitude (but we are likely still off by many orders of magnitude)”. Taking it face value would mean a minute update to our ‘pre theoretic prior’ on the topic before embarking on this exercise (providing these overlapped and was not as radically uncertain, varying no more than a couple rather than many orders of magnitude). If we suspect (which I think we should) this procedure of partitioning out background knowledge into update steps which approach log log variance and where we have minimal calibration is less reliable than using our intuitive gestalt over our background knowledge as whole, we should discount its deliverances still further.
But what is your posterior? Like Buck, I’m unclear whether your view is the central estimate should be (e.g.) 0.1% or 1 / 1 million. I want to push on this because if your own credences are inconsistent with your argument, the reasons why seem both important to explore and to make clear to readers, who may be mislead into taking this at ‘face value’. From this on page 13 I guess a generous estimate (/upper bound) is something like 1/ 1 million for the ‘among most important million people’:
[W]e can assess the quality of the arguments given in favour of the Time of Perils or Value Lock-in views, to see whether, despite the a priori implausibility and fishiness of HH, the evidence is strong enough to give us a high posterior in HH. It would take us too far afield to discuss in sufficient depth the arguments made in Superintelligence, or Pale Blue Dot, or The Precipice. But it seems hard to see how these arguments could be strong enough to move us from a very low prior all the way to significant credence in HH. As a comparison, a randomised controlled trial with a p-value of 0.05, under certain reasonable assumptions, gives a Bayes factor of around 3 in favour of the hypothesis; a Bayes factor of 100 is regarded as ‘decisive’ evidence. In order to move from a prior of 1 in 100 million to a posterior of 1 in 10, one would need a Bayes factor of 10 million — extraordinarily strong evidence.
I.e. a prior of ~ 1/ 100 million (which is less averse than others you moot earlier), and a Bayes factor < 100 (i.e. we should not think the balance of reason, all considered, is ‘decisive’ evidence), so you end up at best at ~1/ 1 million. If this argument is right, you can be ‘super confident’ giving a credence of 0.1% is wrong (out by an ratio of >~ 1000, the difference between ~ 1% and 91%), and vice-versa. Yet I don’t think your credence on ‘this is the most important century’ is 1/ 1 million. Among other things it seems to imply we can essentially dismiss things like short TAI timelines, Bostrom-Yudkowsky AI accounts etc, as these are essentially upper-bounded by the 1/ 1M credence above.*So (presuming I’m right and you don’t place negligible credence on these things) I’m not sure how these things can be in reflective equilibrium.1: ‘Among the most important million people’ and ‘this is the most important century’ are not the same thing, and so perhaps one has a (much) higher prior on the latter than the former. But if the action really was here, then the precisification of ‘hinge of history’ as the former claim seems misguided: “Oh, this being the most important century could have significant credence, but this other sort-of related proposition nonetheless has an astronomically adverse prior” confuses rather than clarifies.2: Another possibility is there are sources of evidence which give us huge updates, even if the object level arguments in (e.g.) Superintelligence, The Precipice etc. are not among them. Per the linked conversation, maybe earliness gives a huge shift up from the astronomically adverse prior, so this plus the weak object level evidence gets you to lowish but not negligible credence. Whether cashed out via prior or update, it seems important to make such considerations explicit, as the true case in favour of HH would include these considerations too. Yet the discussion of ‘how far you should update’ on p11-13ish doesn’t mention these massive adjustments, instead noting reasons to be generally sceptical (e.g. fishiness) and the informal/heuristic arguments for object level risks should not be getting you Bayes factors ~100 or more. This seems to be hiding the ball if in fact your posterior is ultimately 1000x or more your astronomically adverse prior, but not for reasons which are discussed (and so a reader may neglect to include when forming their own judgement). *: I think there’s also a presumptuous philosopher-type objection lurking here too. Folks (e.g.) could have used a similar argument to essentially rule out any x-risk from nuclear winter before any scientific analysis, as this implies significant credence in HH, which the argument above essentially rules out. Similar to ‘using anthropics to hunt’, something seems to be going wrong where the mental exercise of estimating potentially-vast future populations can also allow us to infer the overwhelming probable answers for disparate matters in climate modelling, AI development, the control problem, civilisation recovery and so on.
“It’s not clear why you’d think that the evidence for x-risk is strong enough to think we’re one-in-a-million, but not stronger than that.” This seems pretty strange as an argument to me. Being one-in-a-thousand is a thousand times less likely than being one-in-a-million, so of course if you think the evidence pushes you to thinking that you’re one-in-a-million, it needn’t push you all the way to thinking that you’re one-in-a-thousand. This seems important to me. Yes, you can give me arguments for thinking that we’re (in expectation at least) at an enormously influential time—as I say in the blog post and the comments, I endorse those arguments! I think we should update massively away from our prior, in particular on the basis of the current rate of economic growth. (My emphasis)
Asserting an astronomically adverse prior, then a massive update, yet being confident you’re in the right ballpark re. orders of magnitude does look pretty fishy though. For a few reasons:First, (in the webpage version you quoted) you don’t seem sure of a given prior probability, merely that it is ‘astronomical’: yet astronomical numbers (including variations you note about whether to multiply by how many accessible galaxies there are or not, etc.) vary by substantially more than three orders of magnitude—you note two possible prior probabilities (of being among the million most influential people) of 1 in a million trillion (10^-18) and 1 in a hundred million (10^-8) - a span of 10 orders of magnitude. It seems hard to see how a Bayesian update from this (seemingly) extremely wide prior would give a central estimate at a (not astronomically minute) value, yet confidently rule against values ‘only’ 3 orders of magnitude higher (a distance a ten millionth the width of this implicit span in prior probability). [It also suggests the highest VoI is to winnow this huge prior range, rather than spending effort evaluating considerations around the likelihood ratio]Second, whatever (very) small value we use for our prior probability, getting to non-astronomical posteriors implies likelihood ratios/Bayes factors which are huge. From (say) 10^-8 to 10^-4 is a factor of 10 000. As you say in your piece, this is much much stronger than the benchmark for decisive evidence of ~100. It seems hard to say (e.g.) evidence from the rate of economic growth is ‘decisive’ in this sense, and so it is hard to see how in concert with other heuristic considerations you get 10-100x more confirmation (indeed, your subsequent discussion seems to supply many defeaters exactly this). Further, similar to worries about calibration out on the tail, it seems unlikely many of us can accurately assess LRs > 100 which are not direct observations within orders of magnitude. Third, priors should be consilient, and can be essentially refuted by posteriors. A prior that get surprised to the tune of a 1-in-millions should get hugely penalized versus any alternative (including naive intuitive gestalts) which do not. It seems particularly costly as non-negligible credences in (e.g.) nuclear winter, the industrial revolution being crucial etc. are facially represent this prior being surprised by ‘1 in large X’ events at a rate much greater than 1/X.To end up with not-vastly lower posteriors than your interlocutors (presuming Buck’s suggestion of 0.1% is fair, and not something like 1/million), it seems one asserts both a much lower prior which is mostly (but not completely) cancelled out by a much stronger update step. This prior seems to be ranging over many orders of magnitude, yet the posterior does not—yet it is hard to see where the orders of magnitude of better resolution are arising from (if we knew for sure the prior is 10^-12 versus knowing for sure it is 10^-8, shouldn’t the posterior shift a lot between the two cases?)It seems more reasonable to say ‘our’ prior is rather some mixed gestalt on considering the issue as a whole, and the concern about base-rates etc. should be seen as an argument for updating this downwards, rather than a bid to set the terms of the discussion.
I agree with this in the abstract, but for the specifics of this particular case, do you in fact think that online mobs / cancel culture / groups who show up to protest your event without warning should be engaged with on a good faith assumption? I struggle to imagine any of these groups accepting anything other than full concession to their demands, such that you’re stuck with the BATNA regardless.
I think so. In the abstract, ‘negotiating via ultimatum’ (e.g. “you must cancel the talk, or I will do this”) does not mean one is acting in bad faith. Alice may foresee there is no bargaining frontier, but is informing you what your BATNA looks like and gives you the opportunity to consider whether ‘giving in’ is nonetheless better for you (this may not be very ‘nice’, but it isn’t ‘blackmail’). A lot turns on whether her ‘or else’ is plausibly recommended by the lights of her interests (e.g. she would do these things if we had already held the event/she believed our pre-commitment to do so) or she is threatening spiteful actions where their primary value is her hope they alter our behaviour (e.g. she would at least privately wish she didn’t have to ‘follow through’ if we defied her). The reason these are important to distinguish is ‘folk game theory’ gives a pro tanto reason to not give in the latter case, even if doing so is better than suffering the consequences (as you deter future attempts to coerce you). But not in the former one, as Alice’s motivation to retaliate does not rely on the chance you may acquiesce to her threats, and so she will not ‘go away’ after you’ve credibly demonstrated to her you will never do this. On the particular case I think some of it was plausibly bad faith (i.e. if a major driver was ‘fleet in being’ threat that people would antisocially disrupt the event) but a lot of it probably wasn’t: “People badmouthing/thinking less of us for doing this” or (as Habryka put it) the ‘very explicit threat’ of an organisation removing their affiliation from EA Munich are all credibly/probably good faith warnings even if the only way to avoid them would have been complete concession. (There are lots of potential reasons I would threaten to stop associating with someone or something where the only way for me to relent is their complete surrender)(I would be cautious about labelling things as mobs or cancel culture.)
[G]iven that she’s taking actions that destroy value for Bob without generating value for Alice (except via their impact on Bob’s actions), I think it is fine to think of this as a threat. (I am less attached to the bully metaphor—I meant that as an example of a threat.)
Let me take a more in-group example readers will find sympathetic.When the NYT suggested it will run an article using Scott’s legal name, may of his supporters responded by complaining to the editor, organising petitions, cancelling their subscriptions (and encouraging others to do likewise), trying to coordinate sources/public figures to refuse access to NYT journalists, and so on. These are straightforwardly actions which ‘destroy value’ for the NYT, are substantially motivated to try and influence its behaviour, and was an ultimatum to boot (i.e. the only way the NYT can placate this ‘online mob’ is to fully concede on not using Scott’s legal name). Yet presumably this strategy was not predicated on ‘only we are allowed to (or smart enough to) use game theory, so we can expect the NYT to irrationally give in to our threats when they should be ostentatiously doing exactly what we don’t want them to do to demonstrate they won’t be bullied’. For although these actions are ‘threats’, they are warnings/ good faith/ non-spiteful, as these responses are not just out of hope to coerce: these people would be minded to retaliate similarly if they only found out NYT’s intention after the article had been published. Naturally the hope is that one can resolve conflict by a meeting of the minds: we might hope we can convince Alice to see things our way; and the NYT probably hopes the same. But if the disagreement prompting conflict remains, we should be cautious about how we use the word threat, especially in equivocating between commonsense use of the term (e.g. “I threaten to castigate Charlie publicly if she holds a conference on holocaust denial”) and the subspecies where folk game theory—and our own self-righteousness—strongly urges us to refute (e.g. “Life would be easier for us at the NYT if we acquiesced to those threatening to harm our reputation and livelihoods if we report things they don’t want us to. But we will never surrender the integrity of our journalism to bullies and blackmailers.”)
Another case where ‘precommitment to refute all threats’ is an unwise strategy (and a case more relevant to the discussion, as I don’t think all opponents to hosting a speaker like Hanson either see themselves or should be seen as bullies attempting coercion) is where your opponent is trying to warn you rather than trying to blackmail you. (cf. 1, 2)Suppose Alice sincerely believes some of Bob’s writing is unapologetically misogynistic. She believes it is important one does not give misogynists a platform and implicit approbation. Thus she finds hosting Bob abhorrent, and is dismayed that a group at her university is planning to do just this. She approaches this group, making clear her objections and stating her intention to, if this goes ahead, to (e.g.) protest this event, stridently criticise the group in the student paper for hosting him, petition the university to withdraw affiliation, and so on. This could be an attempt to bully (where usual game theory provides a good reason to refuse to concede anything on principle). But it also could not be: Alice may be explaining what responses she would make to protect her interests which the groups planned action would harm, and hoping to find a better negotiated agreement for her and the EA group besides “They do X and I do Y”. It can be hard to tell the difference, but some elements in this example speak against Alice being a bully wanting to blackmail the group to get her way: First is the plausibility of her interests recommending these actions to her even if they had no deterrent effect whatsoever (i.e. she’d do the same if the event had already happened). Second the actions she intends falls roughly falls in ‘fair game’ of how one can retaliate against those doing something they’re allowed to do which you deem to be wrong. Alice is still not a bully even if her motivating beliefs re. Bob are both completely mistaken and unreasonable. She’s also still not a bully even if Alice’s implied second-order norms are wrong (e.g. maybe the public square would be better off if people didn’t stridently object to hosting speakers based on their supposed views on topics they are not speaking upon, etc.) Conflict is typically easy to navigate when you can dictate to your opponent what their interests should be and what they can license themselves to do. Alas such cases are rare.It is extremely important not to respond to Alice as if she was a bully if in fact she is not, for two reasons. First, if she is acting in good faith, pre-committing to refuse any compromise for ‘do not give in to bullying’ reasons means one always ends up at ones respective BATNAs even if there was mutually beneficial compromises to be struck. Maybe there is no good compromise with Alice this time, but there may be the next time one finds oneself at cross-purposes.Second, wrongly presuming bad faith for Alice seems apt to induce her to make a symmetrical mistake presuming bad faith for you. To Alice, malice explains well why you were unwilling to even contemplate compromise, why you considered yourself obliged out of principle to persist with actions that harm her interests, and why you call her desire to combat misogyny bullying and blackmail. If Alice also thinks about these things through the lens of game theory (although perhaps not in the most sophisticated way), she may reason she is rationally obliged to retaliate against you (even spitefully) to deter you from doing harm again. The stage is set for continued escalation. Presumptive bad faith is pernicious, and can easily lead to martyring oneself needlessly on the wrong hill. I also note that ‘leaning into righteous anger’ or ‘take oneself as justified in thinking the worst of those opposed to you’ are not widely recognised as promising approaches in conflict resolution, bargaining, or negotiation.
This isn’t much more than a rotation (or maybe just a rephrasing), but:When I offer a 10 second or less description of Effective Altruism, it is hard avoid making it sound platitudinous. Things like “using evidence and reason to do the most good”, or “trying to find the best things to do, then doing them” are things I can imagine the typical person nodding along with, but then wondering what the fuss is about (“Sure, I’m also a fan of doing more good rather than less good—aren’t we all?”) I feel I need to elaborate with a distinctive example (e.g. “I left clinical practice because I did some amateur health econ on how much good a doctor does, and thought I could make a greater contribution elsewhere”) for someone to get a good sense of what I am driving at.I think a related problem is the ‘thin’ version of EA can seem slippery when engaging with those who object to it. “If indeed intervention Y was the best thing to do, we would of course support intervention Y” may (hopefully!) be true, but is seldom the heart of the issue. I take most common objections are not against the principle but the application (I also suspect this may inadvertently annoy an objector, given this reply can paint them as—bizarrely - ‘preferring less good to more good’). My best try at what makes EA distinctive is a summary of what you spell out with spread, identifiability, etc: that there are very large returns to reason for beneficence (maybe ‘deliberation’ instead of ‘reason’, or whatever). I think the typical person does “use reason and evidence to do the most good”, and can be said to be doing some sort of search for the best actions. I think the core of EA (at least the ‘E’ bit) is the appeal that people should do a lot more of this than they would otherwise—as, if they do, their beneficence would tend to accomplish much more.Per OP, motivating this is easier said than done. The best case is for global health, as there is a lot more (common sense) evidence one can point to about some things being a lot better than others, and these object level matters a hypothetical interlocutor is fairly likely to accept also offers support for the ‘returns to reason’ story. For most other cause areas, the motivating reasons are typically controversial, and the (common sense) evidence is scant-to-absent. Perhaps the best moves are here would be pointing to these as salient considerations which plausibly could dramatically change ones priorities, and so exploring to uncover these is better than exploiting after more limited deliberation (but cf. cluelessness).
I’m afraid I’m also not following. Take an extreme case (which is not that extreme given I think ’average number of forecasts per forecaster per question on GJO is 1.something). Alice predicts a year out P(X) = 0.2 and never touches her forecast again, whilst Bob predicts P(X) = 0.3, but decrements proportionately as time elapses. Say X doesn’t happen (and say the right ex ante probability a year out was indeed 0.2). Although Alice > Bob on the initial forecast (and so if we just scored that day she would be better), if we carry forward Bob overtakes her overall [I haven’t checked the maths for this example, but we can tweak initial forecasts so he does].
As time elapses, Alice’s forecast steadily diverges from the ‘true’ ex ante likelihood, whilst Bob’s converges to it. A similar story applies if new evidence emerges which dramatically changes the probability, if Bob updates on it and Alice doesn’t. This seems roughly consonant with things like the stock-market—trading off month (or more) old prices rather than current prices seems unlikely to go well.
FWIW I agree with Owen. I agree the direction of effect supplies a pro tanto consideration which will typically lean in favour of other options, but it is not decisive (in addition to the scenarios he notes, some people have pursued higher degrees concurrently with RSP).
So I don’t think you need to worry about potentially leading folks astray by suggesting this as an option for them to consider—although, naturally, they should carefully weigh their options up (including considerations around which sorts of career capital are most valuable for their longer term career planning).
As such, blackmail feels like a totally fair characterization [of a substantial part of the reason for disinviting Hanson (though definitely not 100% of it).]
As your subsequent caveat implies, whether blackmail is a fair characterisation turns on exactly how substantial this part was. If in fact the decision was driven by non-blackmail considerations, the (great-)grandparent’s remarks about it being bad to submit to blackmail are inapposite.
Crucially, (q.v. Daniel’s comment), not all instances where someone says (or implies), “If you do X (which I say harms my interests), I’m going to do Y (and Y harms your interests)” are fairly characterised as (essentially equivalent to) blackmail. To give a much lower resolution of Daniel’s treatment, if (conditional on you doing X) it would be in my interest to respond with Y independent of any harm it may do to you (and any coercive pull it would have on you doing X in the first place), informing you of my intentions is credibly not a blackmail attempt, but a better-faith “You do X then I do Y is our BATNA here, can we negotiate something better?” (In some treatments these are termed warnings versus threats, or using terms like ‘spiteful’, ‘malicious’ or ‘bad faith’ to make the distinction).
The ‘very explicit threat’ of disassociation you mention is a prime example of ‘plausibly (/prima facie) not-blackmail’. There are many credible motivations to (e.g.) renounce (or denounce) a group which invites a controversial speaker you find objectionable independent from any hope threatening this makes them ultimately resile from running the event after all. So too ‘trenchantly criticising you for holding the event’, ‘no longer supporting your group’, ‘leaving in protest (and encouraging others to do the same)’ etc. etc. Any or all of these might be wrong for other reasons—but (again, per Daniels) ‘they’re trying to blackmail us!’ is not necessarily one of them.
(Less-than-coincidentally, the above are also acts of protest which are typically considered ‘fair game’, versus disrupting events, intimidating participants, campaigns to get someone fired, etc. I presume neither of us take various responses made to the NYT when they were planning to write an article about Scott to be (morally objectionable) attempts to blackmail them, even if many of them can be called ‘threats’ in natural language).
Of course, even if something could plausibly not be a blackmail attempt, it may in fact be exactly this. I may posture that my own interests would drive me to Y, but I would privately regret having to ‘follow through’ with this after X happens; or I may pretend my threat of Y is ‘only meant as a friendly warning’. Yet although our counterparty’s mind is not transparent to us, we can make reasonable guesses.
It is important to get this right, as the right strategy to deal with threats is a very wrong one to deal with warnings. If you think I’m trying to blackmail you when I say “If you do X, I will do Y”, then all the usual stuff around ‘don’t give in to the bullies’ applies: by refuting my threat, you deter me (and others) from attempting to bully you in future. But if you think I am giving a good-faith warning when I say this, it is worth looking for a compromise. Being intransigent as a matter of policy—at best—means we always end up at our mutual BATNAs even when there were better-for-you negotiated agreements we could have reached.
At worst, it may induce me to make the symmetrical mistake—wrongly believing your behaviour in is bad faith. That your real reasons for doing X, and for being unwilling to entertain the idea of compromise to mitigate the harm X will do to me, are because you’re actually ‘out to get me’. Game theory will often recommend retaliation as a way of deterring you from doing this again. So the stage is set for escalating conflict.
Directly: Widely across the comments here you have urged for charity and good faith to be extended to evaluating Hanson’s behaviour which others have taken exception to—that adverse inferences (beyond perhaps “inadvertently causes offence”) are not only mistaken but often indicate a violation of discourse norms vital for EA-land to maintain. I’m a big fan of extending charity and good faith in principle (although perhaps putting this into practice remains a work in progress for me). Yet you mete out much more meagre measure to others than you demand from them in turn, endorsing fervid hyperbole that paints those who expressed opposition to Munich inviting Hanson as bullies trying to blackmail them, and those sympathetic to the decision they made as selling out. Beyond this being normatively unjust, it is also prudentially unwise—presuming bad faith in those who object to your actions is a recipe for making a lot of enemies you didn’t need to, especially in already-fractious intellectual terrain.
You could still be right—despite the highlighted ‘very explicit threat’ which is also very plausibly not blackmail, despite the other ‘threats’ alluded to which seem also plausibly not blackmail and ‘fair game’ protests for them to make, and despite what the organisers have said (publicly) themselves, the full body of evidence should lead us to infer what really happened was bullying which was acquiesced to. But I doubt it.
I’m fairly sure the real story is much better than that, although still bad in objective terms: In culture war threads, the typical norms re karma roughly morph into ‘barely restricted tribal warfare’. So people have much lower thresholds both to slavishly upvote their ‘team’,and to downvote the opposing one.
Talk of ‘blackmail’ (here and elsethread) is substantially missing the mark. To my understanding, there were no ‘threats’ being acquiesced to here.
If some party external to the Munich group pressured them into cancelling the event with Hanson (and without this, they would want to hold the event), then the standard story of ‘if you give in to the bullies you encourage them to bully you more’ applies.
Yet unless I’m missing something, the Munich group changed their minds of their own accord, and not in response to pressure from third parties. Whether or not that was a good decision, it does not signal they’re vulnerable to ‘blackmail threats’. If anything, they’ve signalled the opposite by not reversing course after various folks castigated them on Twitter etc.
The distinction between ‘changing our minds on the merits’ and ‘bowing to public pressure’ can get murky (e.g. public outcry could genuinely prompt someone to change their mind that what they were doing was wrong after all, but people will often say this insincerely when what really happened is they were cowed by opprobrium). But again, the apparent absence of people pressuring Munich to ‘cancel Hanson’ makes this moot.
(I agree with Linch that the incentives look a little weird here given if Munich had found out about work by Hanson they deemed objectionable before they invited him, they presumably would not have invited him and none of us would be any the wiser. It’s not clear “Vet more carefully so you don’t have to rescind invitations to controversial speakers (with attendant internet drama) rather than not inviting them in the first place” is the lesson folks would want to be learned from this episode.)
I recall Hsiung being in favour of conducting disruptive protests against EAG 2015:
I honestly think this is an opportunity. “EAs get into fight with Elon Musk over eating animals” is a great story line that would travel well on both social and possibly mainstream media.
Organize a group. Come forward with an initially private demand (and threaten to escalate, maybe even with a press release). Then start a big fight if they don’t comply.
Even if you lose, you still win because you’ll generate massive dialogue!
It is unclear whether the motivation was more ‘blackmail threats to stop them serving meat’ or ‘as Elon Musk will be there we can co-opt this to raise our profile’. Whether Hsiung calls himself an EA or not, he evidently missed the memo on ‘eschew narrow minded obnoxious defection against others in the EA community’.
For similar reasons, it seems generally wiser for a community not to help people who previously wanted to throw it under the bus.
My reply is a mix of the considerations you anticipate. With apologies for brevity:
It’s not clear to me whether avoiding anchoring favours (e.g.) round numbers or not. If my listener, in virtue of being human, is going to anchor on whatever number I provide them, I might as well anchor them on a number I believe to be more accurate.
I expect there are better forms of words for my examples which can better avoid the downsides you note (e.g. maybe saying ‘roughly 12%’ instead of ’12%′ still helps, even if you give a later articulation).
I’m less fussed about precision re. resilience (e.g. ‘I’d typically expect drift of several percent from this with a few more hours to think about it’ doesn’t seem much worse than ‘the standard error of this forecast is 6% versus me with 5 hours more thinking time’ or similar). I’d still insist something at least pseudo-quantitative is important, as verbal riders may not put the listener in the right ballpark (e.g. does ‘roughly’ 10% pretty much rule out it being 30%?)
Similar to the ‘trip to the shops’ example in the OP, there’s plenty of cases where precision isn’t a good way to spend time and words (e.g. I could have counter-productively littered many of the sentences above with precise yet non-resilient forecasts). I’d guess there’s also cases where it is better to sacrifice precision to better communicate with your listener (e.g. despite the rider on resilience you offer, they will still think ’12%′ is claimed to be accurate to the nearest percent, but if you say ‘roughly 10%’ they will better approximate what you have in mind). I still think when the stakes are sufficiently high, it is worth taking pains on this.
I had in mind the information-theoretic sense (per Nix). I agree the ‘first half’ is more valuable than the second half, but I think this is better parsed as diminishing marginal returns to information.
Very minor, re. child thread: You don’t need to calculate numerically, as: loga(xy)=y⋅loga(x), and 100=102. Admittedly the numbers (or maybe the remark in the OP generally) weren’t chosen well, given ‘number of decimal places’ seems the more salient difference than the squaring (e.g. per-thousandths does not have double the information of per-cents, but 50% more)
It’s fairly context dependent, but I generally remain a fan.
There’s a mix of ancillary issues:
There could be a ‘why should we care what you think?’ if EA estimates diverge from consensus estimates, although I imagine folks tend to gravitate to neglected topics etc.
There might be less value in ‘relative to self-ish’ accounts of resilience: major estimates in a front facing report I’d expect to be fairly resilient, and so less “might shift significantly if we spent another hour on it”.
Relative to some quasi-ideal seems valuable though: E.g. “Our view re. X is resilient, but we have a lot of knightian uncertainty, so we’re only 60% sure we’d be within an order of magnitude of X estimated by a hypothetical expert panel/liquid prediction market/etc.”
There might be better or worse ways to package this given people are often sceptical of any quantitative assessment of uncertainty (at least in some domains). Perhaps something like ‘subjective confidence intervals’ (cf.), although these aren’t perfect.
But ultimately, if you want to tell someone an important number you aren’t sure about, it seems worth taking pains to be precise, both on it and its uncertainty.
It is true that given the primary source (presumably this), the implication is that rounding supers to 0.1 hurt them, but 0.05 didn’t:
To explore this relationship, we rounded forecasts to the nearest 0.05, 0.10, or 0.33 to see whether Brier scores became less accurate on the basis of rounded forecasts rather than unrounded forecasts. [...]
For superforecasters, rounding to the nearest 0.10 produced significantly worse Brier scores [by implication, rounding to the nearest 0.05 did not]. However, for the other two groups, rounding to the nearest 0.10 had no influence. It was not until rounding was done to the nearest 0.33 that accuracy declined.
That said, despite the absent evidence I’m confident accuracy with superforecasters (and ~anyone else—more later, and elsewhere) does numerically drop with rounding to 0.05 (or anything else), even if has not been demonstrated to be statistically significant:
From first principles, if the estimate has signal, shaving bits of information from it by rounding should make it less accurate (and it obviously shouldn’t make it more accurate, pretty reliably setting the upper bound of our uncertainty to 0).
Further, there seems very little motivation for the idea we have n discrete ‘bins’ of probability across the number line (often equidistant!) inside our heads, and as we become better forecasters n increases. That we have some standard error to our guesses (which ~smoothly falls with increasing skill) seems significantly more plausible. As such the ‘rounding’ tests should be taken as loose proxies to assess this error.
Yet if error process is this, rather than ‘n real values + jitter no more than 0.025’, undersampling and aliasing should introduce a further distortion. Even if you think there really are n bins someone can ‘really’ discriminate between, intermediate values are best seen as a form of anti-aliasing (“Think it is more likely 0.1 than 0.15, but not sure, maybe its 60⁄40 between them so I’ll say 0.12”) which rounding ablates. In other words ‘accurate to the nearest 0.1’ does not mean the second decimal place carries no information.
Also, if you are forecasting distributions rather than point estimates (cf. Metaculus), said forecast distributions typically imply many intermediate value forecasts.
Empirically, there’s much to suggest a T2 error explanation of the lack of a ‘significant’ drop. As you’d expect, the size of the accuracy loss grows with both how coarsely things are rounded, and the performance of the forecaster. Even if relatively finer coarsening makes things slightly worse, we may expect to miss it. This looks better to me on priors than these trends ‘hitting a wall’ at a given level of granularity (so I’d guess untrained forecasters are numerically worse if rounded to 0.1, even if the worse performance means there is less signal to be lost, and in turn makes this hard to ‘statistically significantly’ detect).
I’d adduce other facts against too. One is simply that superforecasters are prone to not give forecasts on a 5% scale, using intermediate values instead: given their good callibration, you’d expect them to iron out this Brier-score-costly jitter (also, this would be one of the few things they are doing worse than regular forecasters). You’d also expect discretization in things like their calibration curve (e.g. events they say happen 12% of the time in fact happen 10% of time, whilst events that they say happen 13% of the time in fact happen 15% of the time), or other derived figures like ROC.
This is ironically foxy, so I wouldn’t be shocked for this to be slain by the numerical data. But I’d bet at good odds (north of 3:1) things like “Typically, for ‘superforecasts’ of X%, these events happened more frequently than those forecast at (X-1)%, (X-2)%, etc.”
On-site image hosting for posts/comments? This is mostly a minor QoL benefit, and maybe there would be challenges with storage. Another benefit would be that images would not vanish if their original source does.