Marcel D comments on Model-Based Policy Analysis under Deep Uncertainty

Marcel D 6 Mar 2023 19:00 UTC
3 points
0 ∶ 0

But someone else with slightly different (but pretty arbitrary) precise probabilities could get the opposite sign and still huge expected impact. It would seem bad to bet a lot on one side if the sign and magnitude of the expected value is sensitive to arbitrarily chosen numbers.

I wonder if the problem here is a failure to disentangle “what is our best estimate currently” and “what do we expect is the value of doing further analysis, given how fragile our current estimates are.”

If my research agent Alice said “I think there’s a 50% chance that doing X leads to +2,000,000,000 utils and a 50% chance that doing X leads to −1,000,000,000 utils (and the same probabilities that not doing X leads to the opposite outcomes), but these probability estimates are currently just pure 1/n uncertainty; such estimates could easily shift over time pending further analysis” I would probably say “wow I don’t like the uncertainty here, can we maybe do further analysis to make sure we’re right before choosing to do X?”

In other words, the concern seems to be that you don’t want to misrepresent the potential for new information to change your estimates.

However, suppose Alice actually says “… and no matter how much more research effort we apply (within real-world constraints) we are confident that our probability estimates will not meaningfully change.” At that point, there is no chance at improving, so you are stuck with pure, 1/n ignorance.

Perhaps I’m just unclear what it would even mean to be in a situation where you “can’t” put a probability estimate on things that does as good as or better than pure 1/n ignorance. I can understand the claim that in some scenarios you perhaps “shouldn’t” because it risks miscommunicating about the potential value of trying to improve your probability estimates, but that doesn’t seem like an insurmountable problem (I.e., we could develop better terms and communication norms for this)?
- MichaelStJules 7 Mar 2023 6:37 UTC
  4 points
  1 ∶ 0
  Parent
  (and the same probabilities that not doing X leads to the opposite outcomes)
  I’m not sure exactly what you mean by this, and I expect this will make it more complicated to think about than just giving utility differences with the counterfactual.
  The idea of sensitivity to new information has been called credal resilience/credal fragility, but the problem I’m concerned with is having justified credences. I would often find it deeply unsatisfying (i.e. it seems unjustifiable) to represent my beliefs with a single probability distribution; I’d feel like I’m pulling numbers out of my ass, and I don’t think we should base important decisions on such numbers. So, I’d often rather give ranges for my probabilities. You literally can give single distributions/precise probabilities, but it seems unjustifiable, overconfident and silly.
  If you haven’t already, I’d recommend reading the illustrative example here. I’d say it’s not actually justifiable to assign precisely 50-50 in that case or in almost any realistic situation that actually matters, because:
  1. if you actually tried to build a model, it would be extraordinarily unlikely for you to get 50-50 unless you specifically pick your model parameters to get that result (which would be motivated reasoning and kind of defeat the purpose of building the model in the first place) or round the results, given that the evidence isn’t symmetric and you’d have multiple continuous parameters.
  2. if you thought 50-50 was a good estimate before the evidential sweetening, then you can’t use 50-50 after, even though it seems just as appropriate for it. Furthermore, if you would have used 50-50 if originally presented with the sweetened information, then your beliefs depend on the timing/order in which you become aware of evidence (say you just miscounted witnesses the first time), which should be irrelevant and is incompatible with Bayesian rationality (unless you have specific reasons for dependence on the timing/order).
  For the same reasons, in almost any realistic situation that actually matters, Alice in your example could not justifiably get 50-50. And in general, you shouldn’t get numbers with short exact decimal or fractional representations.
  So, say in your example, it comes out 51.28… to 48.72..., but could have gone the other way under different reasonable parameter assignments; those are just the ones Alice happened to pick at that particular time. Maybe she also tells you it seems pretty arbitrary, and she could imagine having come up with the opposite conclusion and probabilities much further from 50-50 in each direction. And that she doesn’t have a best guess, because, again, it seems too arbitrary.
  How would you respond if there isn’t enough time to investigate further? But you could instead support something that seems cost-effective without being so sensitive to pretty arbitrary parameter assignments, but not nearly as cost-effective as Alice’s intervention or an intervention doing the opposite.
  Also imagine Bob gets around 47-53, and agrees with Alice about the arbitrariness and reasonable ranges. Furthermore, you can’t weigh Alice and Bob’s distributions evenly, because Alice has slightly more experience as a researcher and/or a slightly better score in forecasting, so you should give her estimate more weight.
  - Utilon 7 Mar 2023 7:05 UTC
    3 points
    1 ∶ 0
    Parent
    Great to see people digging into the crucial assumptions!
    In my view, @MichaelStJules makes great counter points to @Harrison Durland’s objection. I would like to add to further points.
    The notion of 1/n probability breaks kind of down if you look an infinite number of scenarios or uncertainty values (if you talk about one particular uncertain variable). For example, let’s take population growth in economic models. Depending on your model and potential sensitivities to initial conditions, the resolution of this variable matters. For some context, the current population growth is at 1.1% per annum. But we might be uncertain about how this will develop in the future. Maybe 1.0%? Maybe 1.2%? Maybe that the resolution of 0.1% is enough. And this case, what range would feel comfortable to put a probability distribution over? [0.6, 1.5] maybe? So, that n=10 and with a uniform distribution, you get 1.4% population growth to be 10% likely? But what if minor changes are important? You end up with an infinite number of potential values – even if you restrict the range of possible values. How do we square this situation with the 1/n approach? I’m uncertain.
    My other point is more a disclaimer. I’m not advocating for throwing out expected-utility thinking completely. And I’m still a Bayesian at heart (which sometimes means that I pull numbers out my behind^^). My point is that it is sometimes problematic to use a model, run it in a few configurations (i.e. for a few scenarios), calculate a weighted average of the outcomes and call it a day. This is especially problematic if we look at complex systems and models in which non-linearities are compounding quickly. If you have 10 uncertainty variables, each of them of type float with huge ranges of plausible values, how do you decide what scenarios (points in uncertainty space) to run? Posteriori weighted averaging likely fails to capture the complex interactions and the outcome distributions. What I’m trying to say is that I’m still going to assume probabilities and probability distributions in daily life. And I will still conduct expected utility calculations. However, when things get more complex (e.g. in model land), I might advocate for more caution.
    - Marcel D 8 Mar 2023 3:25 UTC
      3 points
      1 ∶ 0
      Parent
      I’m not sure I understand the concern with (1); I would first say that I think infinities are occasionally thrown around too lightly, and in this example it seems like it might be unjustified to say there are infinite possible values, especially since we are talking about units of people/population (which is composed of finite matter and discrete units). Moreover, the actual impact of a difference between 1.0000000000002% and 1.00000000000001% in most values seems unimportant for practical decision-making considerations—which, notably, are not made with infinite computation and data and action capabilities—even if it is theoretically possible to have such a difference. If something like that which seems so small is actually meaningful (e.g., it flips signs), however, then that might update you towards beliefs like “within analytical constraints the current analysis points to [balancing out |OR| one side being favored].” In other words, perhaps not pure uncertainty, since now you plausibly have some information that leans one way or another (with some caveats I won’t get into).
      
      I think I would agree to some extent with (2). My main concern is mostly that I see people write things that (seemingly) make it sound like you just logically can’t do expected utility calculations when you face something like pure uncertainty; you just logically have to put a “?” in your models instead of “1/n,” which just breaks the whole model. Sometimes (like the examples I mentioned), the rest of the model is fine!
      
      I contest that you can use “1/n”, it’s more just a matter of “should you do so given that you run the risk of misleading yourself or your audience towards X, Y, and Z failure modes (e.g., downplaying the value of doing further analysis, putting too many eggs in one basket/ignoring non-linear utility functions, creating bad epistemic cultures which disincentivize people from speaking out against overconfidence, …).”
      
      In other words, I would prefer to see clearer disentangling of epistemic/logical claims from strategic/communication claims.
      - Richard Nerland 14 Mar 2023 22:39 UTC
        4 points
        2 ∶ 0
        Parent
        “While useful, even models that produced a perfect probability density function for precisely selected outcomes would not prove sufficient to answer such questions. Nor are they necessary.”
        
        I recommend reading DMDU since it goes into much more detail than I can do justice.
        
        Yet, I believe you are focusing heavily on the concept of the distribution existing while the claim should be restated.
        
        Deep uncertainty implies that the range of reasonable distributions allows so many reasonable decisions that attempting to “agree on assumptions then act” is a poor frame. Instead, you want to explore all reasonable distributions then “agree on decisions”.
        
        If you are in a state where reasonable people are producing meaningfully different decisions, ie different sign from your convention above, based on the distribution and weighting terms. Then it becomes more useful to focus on the timeline and tradeoffs rather than the current understanding of the distribution:
        
        Explore the largest range of scenarios (in the 1/n case each time you add another plausible scenario it changes all scenario weights)
        
        Understand the sequence of actions/information released
        
        Identify actions that won’t change with new info
        
        Identify information that will meaningfully change your decision
        
        Identify actions that should follow given the new information
        
        Quantify tradeoffs forced with decisions
        
        This results is building an adapting policy pathway rather than making a decision or even choosing a model framework.
        
        Value is derived from expanding the suite of policies, scenarios and objectives or illustrating the tradeoffs between objectives and how to minimize those tradeoffs via sequencing.
        
        This is in contrast to emphasizing the optimal distribution (or worse, point estimate) conditional on all available data. Since that distribution is still subject to change in time and evaluated under different weights by different stakeholders.
  - Marcel D 8 Mar 2023 2:42 UTC
    2 points
    0 ∶ 0
    Parent
    
    I’m not sure exactly what you mean by this, and I expect this will make it more complicated to think about than just giving utility differences with the counterfactual.
    
    I just added this in hastily to address any objection that says something like “What if I’m risk averse and prefer a 100% chance of getting 0 utility instead of an x% chance of getting very negative utility.” It would probably have been better to just say something like “ignore risk aversion and non-linear utility.”
    
    I would often find it deeply unsatisfying (i.e. it seems unjustifiable) to represent my beliefs with a single probability distribution; I’d feel like I’m pulling numbers out of my ass, and I don’t think we should base important decisions on such numbers. So, I’d often rather give ranges for my probabilities. You literally can give single distributions/precise probabilities, but it seems unjustifiable, overconfident and silly.
    
    I think this boils down to my point about the fear of miscommunicating—the questions like “how should I communicate my findings,” “what do my findings say about doing further analysis,” and “what are my findings current best-guess estimates.” If you think it goes beyond that—that it is actually “intrinsically incorrect-as-written,” I could write up a longer reply elaborating on the following: I’d pose the question back at you and ask whether it’s really justified or optimal to include ambiguity-laden “ranges” assuming there will be no miscommunication risks (e.g., nobody assumes “he said 57.61% so he must be very confident he’s right and doing more analysis won’t be useful”)? If you say “there’s a 1%-99% chance that a given coin will land on heads” because the coin is weighted but you don’t know whether it’s for heads or tails, how is this functionally any different from saying “my best guess is that on one flip the coin has a 50% chance of landing on heads”? (Again, I could elaborate further if needed)
    
    if you actually tried to build a model, it would be extraordinarily unlikely for you to get 50-50
    
    Sure, I agree. But that doesn’t change the decision in the example I gave, at least when you leave it at “upon further investigation it’s actually about 51-49.” In either case, the expected benefit-cost ratio is still roughly around 2:1. When facing analytical constraints and for this purely theoretical case, it seems optimal to do the 1/n estimate rather than “NaN” or “” or “???” which breaks your whole model and prevents you from calculating anything, so long as you’re setting aside all miscommunication risks (which was the main point of my comment: to try to disentangle miscommunication and related risks from the ability to use 1/n probabilities as a default optimal). To paraphrase what I said for a different comment, in the real world maybe it is better to just throw a wrench in the whole model and say “dear principal: no, stop, we need to disengage autopilot and think longer.” But I’m not at the real world yet, because I want to make sure I am clear on why I see so many people say things like you can’t give probability estimates for pure uncertainty (when in reality it seems nothing is certain anyway and thus you can’t give 100.0% “true” point or range estimates for anything).
- David Johnston 6 Mar 2023 20:35 UTC
  3 points
  1 ∶ 0
  Parent
  
  Perhaps I’m just unclear what it would even mean to be in a situation where you “can’t” put a probability estimate on things that does as good as or better than pure 1/n ignorance.
  
  Suppose you think you might come up with new hypotheses in the future which will cause you to reevaluate how the existing evidence supports your current hypotheses. In this case probabilistically modelling the phenomenon doesn’t necessarily get you the right “value of further investigation” (because you’re not modelling hypothesis X), but you might still be well advised to hold off acting and investigate further—collecting more data might even be what leads to you thinking of the new hypothesis, leading to a “non Bayesian update”. That said, I think you could separately estimate the probability of a revision of this type.
  
  Similarly, you might discover a new outcome that’s important that you’d previously neglected to include in your models.
  
  One more thing: because probability is difficult to work with, even if it is in principle compatible with adaptive plans, it might in practice tend to steer away from them.
  - Marcel D 7 Mar 2023 4:46 UTC
    3 points
    0 ∶ 1
    Parent
    
    In this case probabilistically modelling the phenomenon doesn’t necessarily get you the right “value of further investigation” (because you’re not modelling hypothesis X)
    
    I basically agree (although it might provide a decent amount of information to this end), but this does not reject the idea that you can make a probability estimate equally or more accurate than pure 1/n uncertainty.
    
    Ultimately, if you want to focus on “what is the expected value of doing further analyses to improve my probability estimates,” I say go for it. You often shouldn’t default to accepting pure 1/n ignorance. But I still can’t imagine a situation that truly matches “Level 4 or Level 5 Uncertainty,” where there is nothing as good as or better than pure 1/n ignorance. If you truly know absolutely and purely nothing about a probability distribution—which almost never happens—then it seems 1/n estimates will be the default optimal distribution, because anything else would require being able to offer supposedly-nonexistent information to justify that conclusion.
    
    Ultimately, a better framing (to me) would seem like “if you find yourself at 1/n ignorance, you should be careful not to accept that as a legitimate probability estimate unless you are really rock solid confident it won’t improve.” No?
    - David Johnston 7 Mar 2023 6:24 UTC
      2 points
      1 ∶ 0
      Parent
      I think this question—whether it’s better to take 1/n probabilities (or maximum entropy distributions or whatever) or to adopt some “deep uncertainty” strategy—does not have an obvious answer
      - Marcel D 8 Mar 2023 1:49 UTC
        3 points
        1 ∶ 0
        Parent
        I actually think it probably (pending further objections) does have a somewhat straightforward answer with regards to the rather narrow, theoretical cases that I have in mind, which relate to the confusion I had which started this comment chain.
        
        It’s hard to accurately convey the full degree of my caveats/specifications, but one simple example is something like “Suppose you are forced to choose whether to do X or nothing (Y). You are purely uncertain whether X will lead to outcome Great (Q), Good (P), or Bad (W), and there is guaranteed to be no way to get further information on this. However, you can safely assume that outcome Q is guaranteed to lead to +1,000 utils, P is guaranteed to lead to +500 utils, and W is guaranteed to lead to −500 utils. Doing nothing is guaranteed to lead to 0 utils. What should you do, assuming utils do not have non-linear effects?”
        
        In this scenario, it seems very clear to me that a strategy of “do nothing” is inferior to doing X: even though you don’t know what the actual probabilities of Q, P, and W are, I don’t understand how the 1/n default will fail to work (across a sufficiently large number of 1/n cases). And when taking the 1/n estimate as a default, the expected utility is positive.
        
        Of course, outside of barebones theoretical examples (I.e., in the real world) I don’t think there is a simple, straightforward algorithm for deciding when to pursue more information vs. act on limited information with significant uncertainty.
        Utilon 8 Mar 2023 12:03 UTC
        1 point
        0 ∶ 0
        Parent
        Good point! I think this is also a matter of risk aversion. How severe is it to get to a state of −500 utils? If you are very risk-averse, it might be better to do nothing. But I cannot make such a blanket statement.
        I’d like to emphasize at this point that the DMDU approach is trying to avoid to
        test the performance of a set of policies for a set number of scenarios,
        decide how likely each scenario is (this is the crux), and
        calculate some weighted average for each policy.
        Instead, we use DMDU to consider the full range of plausible scenarios to explore and identify particularly vulnerable scenarios. We want to pay special attention to these scenarios and find optimal and robust solutions for them. Like this, we cover tail risks which is quite in line IMO with mitigation efforts of GCRs, x-risks, and s-risks.
    - Utilon 7 Mar 2023 7:16 UTC
      1 point
      0 ∶ 0
      Parent
      If you truly know absolutely and purely nothing about a probability distribution—which almost never happens
      I would disagree with this particular statement. I’m not saying the opposite either. I think, it’s reasonable in a lot of cases to assume some probability distributions. However, there are lot of cases, where we just do not know at all. E.g., take the space of possible minds. What’s our probability distribution of our first AGI over this space? I personally don’t know. Even looking at binary events – What’s our probability distribution for AI x-risk this century? 10%? I find this widely used number implausible.
      But I agree that we can try gathering more information to get more clarity on that. What is often done in DMDU analysis is that we figure out that some uncertainty variables don’t have much of an impact on our system anyway (so we fix the variables to some value) or that we constrain their value ranges to focus on more relevant subspaces. The DMDU framework does not necessitate or advocate for total ignorance. I think, there is room for an in-between.