Postdoc in statistics. Three kids, two cats, one wife. I write about statistics, EA, psychometrics, and other things at my blog
Jonas Moss
FYI: I wrote a post about the statistics used in pairwise comparison experiments of the sort used in this post.
Estimating value from pairwise comparisons
Thanks a lot!
I will probably try to write a paper in the future but have some other stuff to finish first. Even though the concepts and mathematics are worked out, rewriting this post into a journal article would probably take a month. Thanks for the encouraging words though =)
That said, if anyone would like to write a paper about this topic together with me, please get in touch.
That’s sufficient information to calculate the conditional prediction curves I’m proposing. What you need is . If you have and , which you can find by integrating the density for “when will X happen”, you can calculate .
and are indices for the causes. I wrote because you don’t have to assume that and are independent for the math to work. But everything else will have to be independent.
Maybe the uncertainties shouldn’t be independent, but often they will. Our uncertainty about the probability of AI doom is probably not related to our uncertainty about the probability of pandemics doom, for instance.
If the probability of extinction by cause is and the probability reduction for that cause is , the probability of extinction becomes if you choose to focus on cause .
A model about the effect of total existential risk on career choice
Thank you!
I think the integral will converge when we use bounded scoring rules. The integral can be rewritten to have its integration limits finite with probability one, as it in practice integrates from to the event time , which should be assumed to be finite with probability 1. (The score is from to , since equals the outcome at any time after . I might not have been sufficiently clear about that though.)
Interesting post! Two comments:
You claim that Rwanda is a concrete example of a case where peacebuilding could have averted the conflict, but it’s not obvious to me how.
More concrete examples would’ve helped too. What do you think about South Sudan for instance?
We’ll probably have to differentiate between “ok” and “excellent” peacebuilding efforts, but it’s not clear if that is likely to be sufficiently easy to make this a high-impact area.
I agree the argument doesn’t work, but there are at least two arguments for investing in charities with sub-optimal expected values that critically depend on time.
-
Going bust. Suppose you have two charity investments with expected values . Here , but there’s a potential for in the future, for instance since you receive better information about the charities. If you invest once, investing everything in is the correct answer since . Now suppose that each time you don’t invest in , it has a chance of going bust. Then, if you invest more than once, it would be best to invest something in if the probability of going bust is high enough and with a sufficiently high probability.
-
Signaling effects. Not investing in the charity may signal to charity entrepreneurs that there is nothing to gain by starting in a new charity similar to , thus limiting your future pool of potential investments. I can imagine this to be especially important if your calculation of the expected value is contentious, or if has high epistemic uncertainty.
Edit: I think “going bust” example is similar to the spirit of the Kelly criterion, so I suppose you might say the argument does work.
-
Updating on the passage of time and conditional prediction curves
Here’s a rough sketch of how we could, potentially, think about anthropic problems. Let be a sequence of true, bird’s-eye view probability measures and your own measures, trying to mimic as closely as possible. These measures aren’t defined on the same sigma-algebra. The sequence of true measures is defined on some original sigma-algebra , but your measure is defined only on the sigma-algebra .
Now, the best-known probability measure defined on this set is the conditional probabilityThis is, in a sense, the probability measure that most closely mimics . On the other hand, the measure that mimics most closely is , hands down. This measure has a problem though, namely that , hence it isn’t a probability measure anymore.
I think the main reason why I intuitively want to condition on the color of the sky is that I want to work with proper probability measures, not just measures bounded by 0 and 1. (That’s why I’m talking about, e.g., being “uncomfortable pretending we could have observed non-existence”.) But your end goal is to have the best measure on the data you can actually observe, taking into account possibilities you can’t observe. This naturally leads us to instead of
The roulette example might get to the heart of the problem with the worm’s-eye view! From the worm’s-eye view, the sky will always be blue, so , making it impossible to deal with problems where the sky might turn green in the future.
In the roulette example, we’re effectively dealing with an expected utility problem where we condition on existence when learning about the probability, but not when we act. That looks incoherent to me; we can’t condition and uncondition on an event willy-nilly: Either we will live in a world where an event must be true, or we don’t. So yeah, it seems like you’re right, and we’re effectively treating existence as a certainty when looking at the problem from the worm’s-eye view.
As I see it, this strongly suggests we should take the bird’s-eye view, as you proposed, and not the worm’s-eye view. Or something else entirely; I’m still uncomfortable pretending we could have observed non-existence.
I wouldn’t say you treat existence as certainty, as you could certainly be dead, but you have to condition on it when you’re alive. You have to condition on it since you will never find yourself outside the space of existence (or blue skies! ) in anthropic problems. And that’s the purpose / meaning of conditioning; you restrict your probability space to the certain subset of basic events you can possibly see.
Then again, there might be nothing very special about existence here. Let’s revisit the green sky problem again, but consider it from a slightly different point of view. Instead of living in the word with a blue or a green sky, imagine yourself living outside of that whole universe. I promise to give you a sample of a world, with registered catastrophes and all, but I will not show you a world with a green sky (i.e., I will sample worlds until the sky turns out blue). In this case, the math is clear. You should condition on the sky being green. Is there a relevant difference between the existence scenario and this scenario?
Maybe there is? You are not guaranteed to see a world at all in the “existence” scenario, as you will not exist if the world turns out to be a blue-sky world, but you are guaranteed an observation in the “outside view” scenario. Does this matter though? I don’t think it does, as you can’t do an analysis either way if you’re dead, but I might be wrong. Maybe this is where our disagreement lies?
I don’t find the objection of the Russian roulette persuasive at all. Intuition shouldn’t be trusted in probability, as e.g. the Monty Hall experiment tells us, and least of all in confusing anthropic problems. We should focus on getting the definitions, concepts, and math correctly without stopping to think about how intuitive different solutions are. (By the way, I don’t even find the Russian roulette experiment weird or contra-intuitive. I find it intuitive and obvious. Strange? Maybe not. Philosophical intuitions aren’t as widely shared as one would believe.)
> “The fine tuning of the cosmological constants for the existence of life is (Bayesian) evidence of a multiverse.”
> My impression is that this statement is generally accepted by people who engage in anthropic reasoning, but you can’t explain it if you treat existence as a certainty. If existence is never surprising, then the fine tuning of cosmological constants for life cannot be evidence for anything.
I don’t know if that’s true, though it might be. I suppose the problem about fine-tuning could be sufficiently different from this one to warrant its own analysis.
“The BIG PROBLEM with PhDs (at least in my opinion) is that you can learn most of these skills in other settings as well but with less suffering.” Could you elaborate on the suffering part?
I’ve never used Squiggle, but I imagine its main benefit is the ease of use and transparency. Consider the line
> transfer_efficiency = 0.75 to 0.9
in the Squiggle doc. In Numpy, you’d most likely have to select number samples, initiate an rng object (at least if you do as Numpy recommend), transform the quantiles (0.05,0.95)-quantiles 0.75 and 0.9 into mean and sigma, call the log-normal random generator and store them in an array, then call the appropriate plot function. Most of these steps are minor nuisances, except for the transformation of quantiles, which might be beyond the analyst’s skill level to do efficiently.
Here’s my replication in Python, which was kind of a chore to make… All of this can be done in one line in Squiggle.
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as pltrng = np.random.default_rng(313)
n = 10000# Translate quantiles
a = np.log(0.75)
b = np.log(0.9)k1 = st.norm.ppf(0.05)
k2 = st.norm.ppf(0.95)
sigma = (b—a) / (k2 - k1)
mean = b—sigma * k2transfer_efficiency = np.random.lognormal(
mean=mean,
sigma=sigma,
size=n)x = np.linspace(0.7, 1, 100)
plt.plot(x, st.lognorm.pdf(x/np.exp(mean), sigma)) # Scipy’s parameterization of the log-normal is stupid. Cost me another 5 minutes to figure out how to do this one.
### It’s prudent to check if I’ve done the calculations correctly too..
np.quantile(transfer_efficiency, [0.05, 0.95]) # array([0.75052923, 0.90200089])
- What is estimational programming? Squiggle in context by Aug 12, 2022, 6:01 PM; 26 points) (
- What is estimational programming? Squiggle in context by Aug 12, 2022, 6:39 PM; 14 points) (LessWrong;
- Aug 7, 2022, 12:57 AM; 2 points) 's comment on Announcing Squiggle: Early Access by (
- Aug 7, 2022, 5:46 PM; 2 points) 's comment on Announcing Squiggle: Early Access by (
Edit: I don’t endorse the arguments of this post anymore!
Your example with the sky turning green is illuminating, as it shows there is nothing super special about the event “the observer exists” in anthropic problems (at least some of them). But I don’t think the rest of your analysis is likely to be correct, as you’re looking at the wrong likelihood from the start.
In the anthropic shadow problem, as in most selection problems, we are dealing with two likelihoods.
The first is the likelihood from the bird’s-eye view. This is the ideal likelihood, with no selection at all, and the starting point of any analysis. In our case, the bird’s-eye view likelihood at the time is
where equals if the sky has turned green within time and is the number of catastrophic events up to time , and is some parameter (corresponding to in your post). From the bird’s-eye view, we observe every regardless of the outcome of , and your Bayesian analysis is correct. But we do not have the bird’s-eye view, as we only observe the s associated with !
The second likelihood is from the worm’s-eye view. To make your green-and-blue sky analogy truly analogous to the anthropic shadow, you will have to take into account that you will never be in a world with a green sky. In our case, we could suppose that worms cannot live in a world with a green sky, making a certainty. That entails conditioning on the event in the likelihood above, yielding the conditional likelihood
The likelihood from the bird’s-eye view and the likelihood from the worm’s-eye view are not the same, they do not even have the same signature. We find that the worm’s-eye view likelihood is
where is the (independent) probability of the sky turning green whenever a catastrophic event occurs.
The posterior from the bird’s-eye view is
and is independent of , as you said. However, the posterior from the worm’s-eye view is
As you can see, the factor can’t be canceled out in the integrating factor.
By the way, the likelihood proportional to is not always hard to work with. If we assume that is binomial with success probabiltiy , one can use the binomial theorem to show that the integrating constant is , yielding the normalized pmf .
- Sep 28, 2024, 9:16 PM; 7 points) 's comment on Dispelling the Anthropic Shadow by (
I am undecided on the best approach to take going forward as I can see good arguments on both sides. Ranking interventions in terms of cost-effectiveness makes ESH’s work more unique and practical.
Do both? And include brave, precise practical recommendations. Ranking solutions in terms of cost-effectiveness is riskier for you (you will be criticized more), but probably the most helpful for most of your audience. But effect sizes are nice to know too.
One of the greatest barriers to starting an important transformation is the practicalities (which of one 20+ group sessions in the gym should I sign up for?) Choice paralysis makes everything harder, it makes it hard just to start. For instance, I think that you bravely taking a stand on questions such as which group session I should go to (Zumba? Cardio combat? Indoor running?) would help me lot.
I like the following quote of Karolina Sarek:
> What about a situation where your audience really wants an answer? Maybe they’re a funder who is thinking about funding a new project or organization. If you, as the researcher, do not draw conclusions from your own research, in a way you’re passing this responsibility through to your audience. And very often, the audience has less knowledge or time to explore all of the nuances of your research as a whole, and therefore could draw worse conclusions compared to those that you would.So yeah, you’re passing on the cost-effectiveness research to people who know much less about the subject than you.
Scoring scientific fields
Epistemic Institutions
Some fields of science are uncontroversially more reliable than others. Physics is more reliable than theoretical sociology, for example. But other fields aren’t that easy to score. Should you believe the claims of a random sleep research paper? Or a paper from personality psychology? Efficacy is just as important, as a scientific field with low efficacy is probably not worth engaging with at all.
A scientific field can be evaluated by giving it a score along one or more dimensions, where a lower score indicates the field might not be worth taking seriously. Right now, people score fields of science informally. For instance, it is common to be skeptical of results from social psychology due to the replication crisis. Claims of nutrition scientists are often ignored due to their over-reliance on observational studies. If the field hasn’t been well investigated, the consumer of the scientific literature is on his own.Scoring can be based on measurable factors such as
community norms in the field,
degree of p-hacking and publication bias,
reliance on observational studies over experimentation,
amount of “skin in the game”,
open data and open code,
how prestige-driven it is.
Scoring of the overall quality of a field serves multiple purposes.
A low score can dissuade people from taking the field seriously, potentially saving lots of time and money.
The scores can be used informally when forming an opinion. More formally, they can be used as input into other methods, e.g. to correct p-values when reading a paper.
If successful, the scores can incite reform in the poorly performing subfields.
Can be used as input to other EA organizations such as 80,000 hours.
Thank you for telling about this! In economics, the discrete choice model is used to estimate a scale-free utility function in similar way. It is used in health research for estimating QALYs, among other things, see e.g. this review paper.
But discrete choice / the Schulze method should probably not be used by themselves, as they cannot give us information about scale, only ordering. A possibility, which I find promising, is to combine the methods. Say that I have ten items I0…I9 I want you to rate. Then I can ask “Do you prefer Ii to Ij?” for some pairs and “How many times better is Ii than Ij?” for other pairs, hopefully in an optimal way. Then we would lessen the cognitive load of the study participants and make it easier to scale this kind of thing up.
(The congitive load of using distributions is the main reason why I’m skeptical about having participants using them in place of point estimates when doing pairwise comparisons.)