Jonas Moss

Karma: 198

Postdoc in statistics. Three kids, two cats, one wife. I write about statistics, EA, psychometrics, and other things at my blog

Updating on the passage of time and conditional prediction curves

Jonas Moss11 Aug 2022 18:18 UTC

37 points

6 comments12 min readEA link

Estimating value from pairwise comparisons

Jonas Moss5 Oct 2022 11:23 UTC

34 points

3 comments1 min readEA link

(blog.jonasmoss.com)

A peek at pairwise preference estimation in economics, marketing, and statistics

Jonas Moss8 Oct 2022 4:56 UTC

31 points

5 comments3 min readEA link

(blog.jonasmoss.com)

Jonas Moss 15 Jan 2023 9:57 UTC
16 points
12 ∶ 13
on: The writing style here is bad
I agree that academic language should be avoided in both forums and research papers.

It might be a good idea for forum writers to use a tool like ChatGPT to make their posts more readable before posting them. For example, they can ask ChatGPT to “improve the readability” of their text. This way, writers don’t have to change their writing style too much and can avoid feeling uncomfortable while writing. Plus, it saves time by not having to go back and edit clunky sentences. Additionally, by asking ChatGPT to include more slang or colloquial language, the tool can better match the writer’s preferred style. (Written with the aid of ChatGPT in exactly the way I proposed. :p)

Jonas Moss 5 Jun 2022 11:35 UTC
16 points
0 ∶ 0
in reply to: Vasco Grilo’s comment on: Quantifying Uncertainty in GiveWell’s GiveDirectly Cost-Effectiveness Analysis
I’ve never used Squiggle, but I imagine its main benefit is the ease of use and transparency. Consider the line
> transfer_efficiency = 0.75 to 0.9
in the Squiggle doc. In Numpy, you’d most likely have to select number samples, initiate an rng object (at least if you do as Numpy recommend), transform the quantiles (0.05,0.95)-quantiles 0.75 and 0.9 into mean and sigma, call the log-normal random generator and store them in an array, then call the appropriate plot function. Most of these steps are minor nuisances, except for the transformation of quantiles, which might be beyond the analyst’s skill level to do efficiently.

Here’s my replication in Python, which was kind of a chore to make… All of this can be done in one line in Squiggle.

import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt
rng = np.random.default_rng(313)
n = 10000
# Translate quantiles
a = np.log(0.75)
b = np.log(0.9)
k1 = st.norm.ppf(0.05)
k2 = st.norm.ppf(0.95)
sigma = (b—a) / (k2 - k1)
mean = b—sigma * k2
transfer_efficiency = np.random.lognormal(
mean=mean,
sigma=sigma,
size=n)
x = np.linspace(0.7, 1, 100)
plt.plot(x, st.lognorm.pdf(x/np.exp(mean), sigma)) # Scipy’s parameterization of the log-normal is stupid. Cost me another 5 minutes to figure out how to do this one.

### It’s prudent to check if I’ve done the calculations correctly too..
np.quantile(transfer_efficiency, [0.05, 0.95]) # array([0.75052923, 0.90200089])
What links here?

Jonas Moss 6 Oct 2022 5:49 UTC
13 points
0 ∶ 0
on: Valuing research works by eliciting comparisons from EA researchers
FYI: I wrote a post about the statistics used in pairwise comparison experiments of the sort used in this post.

A model about the effect of total existential risk on career choice

Jonas Moss10 Sep 2022 7:18 UTC

12 points

4 comments2 min readEA link

Jonas Moss 4 Mar 2022 6:33 UTC
8 points
0 ∶ 0
on: New Book: “Reasoned Politics” + Why I have written a book about politics
Great! Looking forward to reading this.

For those of us using ebook readers, there’s an .epub here https://www.smashwords.com/books/view/1134610 (Magnus, maybe add the link after the pdf?)

Jonas Moss 7 Oct 2022 14:31 UTC
7 points
1 ∶ 2
on: Getting on a different train: can Effective Altruism avoid collapsing into absurdity?

Thomas Hurka’s St Petersburg Paradox: Suppose you are offered a deal—you can press a button that has a 51% chance of creating a new world and doubling the total amount of utility, but a 49% chance of destroying the world and all utility in existence. If you want to maximise total expected utility, you ought to press the button—pressing the button has positive expected value. But the problem comes when you are asked whether you want to press the button again and again and again—at each point, the person trying to maximise expected utility ought to agree to press the button, but of course, eventually they will destroy everything.[2]

I have two gripes with this thought experiment. First, time is not modelled. Second, it’s left implicit why we should feel uneasy about the thought experiment. And that doesn’t work due to highly variable philosophical intuitions. I honestly don’t feel uneasy about the thought experiment at all (only slightly annoyed). But maybe I would have it been completely specified.

I can see two ways to add a time dimension to the problem. First, you could let all the presses be predetermined and in one go, where we get into Satan’s apple territory. Second, you could have 30 seconds pause between all presses. But in that case, we would accumulate massive amounts of utility in a very short time—just the seconds in-between presses would be invaluable! And who cares if the world ends in five minutes with probability $1 - {0.49}^{10}$ when every second it survives is so sweet? :p

Jonas Moss 21 Feb 2023 14:28 UTC
6 points
2 ∶ 0
in reply to: Jan_Kulveit’s comment on: There are no coherence theorems
Do I understand you correctly here?

Each agent has a computable partial preference ordering $x \leq y$ that decides if it prefers $x$ to $y$ .

We’d like this partial relation to be complete (i.e., defined for all $x, y$ ) and transitive (i.e., $x \leq y$ and $y \leq z$ implies $x \leq z$ ).

Now, if the relation is sufficiently non-trivial, it will be expensive to compute for some $x, y$ . So it’s better left undefined...?

If so, I can surely relate to that, as I often struggle computing my preferences. Even if they are theoretically complete. But it seems to me the relationship is still defined, but might not be practical to compute.

It’s also possible to think of it in this way: You start out with partial preference ordering, and need to calculate one of its transitive closures. But that is computationally difficult, and not unique either.

I’m unsure what these observations add to the discussion, though.

Jonas Moss 9 Oct 2022 5:38 UTC
6 points
0 ∶ 0
in reply to: Geoffrey Miller’s comment on: A peek at pairwise preference estimation in economics, marketing, and statistics
Thanks for your suggestions! Big fan of yours for many years, by the way. Mating intelligence being the article collection that made we want to become an evolutionary psychologist (ended up a statistician though, mostly due to its much safer career path).

Now I noticed that I didn’t write in the post that these four points are just a summary. The meat of the post is being linked to. I think I have explained these terms in the linked post, at least graded pairwise comparisons and discrete choice models. But yeah… I will modify the summary to use less technical jargon and provide an introduction.

I think it’s important to build more connections between EA approaches to value (e.g. in AI alignment) and existing behavioral sciences methods for studying values.

Yes, and also to academia in general. I honestly didn’t think about AI alignment when writing this post, but that could be one of the applications.

Jonas Moss 16 Jan 2023 11:55 UTC
4 points
5 ∶ 0
in reply to: Guy Raveh’s comment on: The writing style here is bad
Sure, if your goal is to be a good writer! But, I’m not worried about that. I just want people to understand me.
What links here?
- Guy Raveh's comment on The writing style here is bad by Michał Zabłocki (16 Jan 2023 17:23 UTC; 4 points)

Jonas Moss 7 Oct 2022 14:21 UTC
4 points
1 ∶ 0
in reply to: DirectedEvolution’s comment on: Getting on a different train: can Effective Altruism avoid collapsing into absurdity?
I don’t understand the relevance of the Kelly criterion. The wikipedia page for the Kelly criterion states that “[t]he Kelly bet size is found by maximizing the expected value of the logarithm of wealth,” but that’s not relevant here, is it?

Jonas Moss 5 Jun 2022 9:21 UTC
4 points
0 ∶ 0
on: Against Anthropic Shadow
Edit: I don’t endorse the arguments of this post anymore!
Your example with the sky turning green is illuminating, as it shows there is nothing super special about the event “the observer exists” in anthropic problems (at least some of them). But I don’t think the rest of your analysis is likely to be correct, as you’re looking at the wrong likelihood from the start.
In the anthropic shadow problem, as in most selection problems, we are dealing with two likelihoods.
The first is the likelihood from the bird’s-eye view. This is the ideal likelihood, with no selection at all, and the starting point of any analysis. In our case, the bird’s-eye view likelihood at the time $t$ is
$P (S_{t} = n, Y_{t} = y ∣ θ),$
where $y \in {0, 1}$ equals $1$ if the sky has turned green within time $t$ and $S_{t}$ is the number of catastrophic events up to time $t$ , and $θ$ is some parameter (corresponding to $n$ in your post). From the bird’s-eye view, we observe every $S_{t}$ regardless of the outcome of $Y_{t}$ , and your Bayesian analysis is correct. But we do not have the bird’s-eye view, as we only observe the $S_{t}$ s associated with $Y_{t} = 0$ !
The second likelihood is from the worm’s-eye view. To make your green-and-blue sky analogy truly analogous to the anthropic shadow, you will have to take into account that you will never be in a world with a green sky. In our case, we could suppose that worms cannot live in a world with a green sky, making $Y_{t} = 0$ a certainty. That entails conditioning on the event $Y_{t} = 0$ in the likelihood above, yielding the conditional likelihood
$P (S_{t} = n ∣ θ, Y_{t} = 0) .$
The likelihood from the bird’s-eye view and the likelihood from the worm’s-eye view are not the same, they do not even have the same signature. We find that the worm’s-eye view likelihood is
$P (S_{t} = n ∣ θ, Y_{t} = 0) = \frac{P (S_{t} = n ∣ θ) (1 - q)^{n}}{\sum_{i = 0}^{t} P (S_{t} = i ∣ θ) (1 - q)^{i}} \propto P (S_{t} = n ∣ θ) (1 - q)^{n},$
where $q$ is the (independent) probability of the sky turning green whenever a catastrophic event occurs.
The posterior from the bird’s-eye view is
$p (θ, 0 ∣ n) = \frac{P (S_{t} = n ∣ θ) (1 - q)^{n}}{\int P (S_{t} = n ∣ θ) (1 - q)^{n} d θ} p (θ) = \frac{P (S_{t} = n ∣ θ)}{\int P (S_{t} = n ∣ θ) p (θ) d θ} p (θ)$
and is independent of $Y_{t}$ , as you said. However, the posterior from the worm’s-eye view is
$p (θ, 0 ∣ n) = \frac{\frac{P (S_{t} = n ∣ θ) (1 - q)^{n}}{\sum_{i = 0}^{t} P (S_{t} = i ∣ θ) (1 - q)^{i}}}{\int \frac{P (S_{t} = n ∣ θ) (1 - q)^{n}}{\sum_{i = 0}^{t} P (S_{t} = i ∣ θ) (1 - q)^{i}} p (θ) d θ} p (θ) = \frac{\frac{P (S_{t} = n ∣ θ)}{\sum_{i = 0}^{t} P (S_{t} = i ∣ θ) (1 - q)^{i}}}{\int \frac{P (S_{t} = n ∣ θ)}{\sum_{i = 0}^{t} P (S_{t} = i ∣ θ) (1 - q)^{i}} p (θ) d θ} p (θ) \neq p (θ ∣ n, 0) .$
As you can see, the factor $(1 - q)^{i}$ can’t be canceled out in the integrating factor.
By the way, the likelihood proportional to $P (S_{t} = n ∣ θ) (1 - q)^{n}$ is not always hard to work with. If we assume that $S_{t}$ is binomial with success probabiltiy $p$ , one can use the binomial theorem to show that the integrating constant is $1 - p q$ , yielding the normalized pmf $P (S_{t} = n ∣ p) = \frac{(\frac{t}{n}) (p (1 - q))^{n} (1 - p)^{n - k}}{{(1 - p q)}^{t}}$ .

Jonas Moss 10 Oct 2022 12:59 UTC
3 points
0 ∶ 0
in reply to: Peter McLaughlin’s comment on: Getting on a different train: can Effective Altruism avoid collapsing into absurdity?

Satan cuts an apple into a countable infinity of slices and offers it to Eve, one piece at a time. Each slice has positive utility for Eve. If Eve eats only finitely many pieces, there is no difficulty; she simply enjoys her snack. If she eats infinitely many pieces, however, she is banished from Paradise. To keep things simple, we may assume that the pieces are numbered: in each time interval, the choice is Take piece n or Don’t take piece n. Furthermore, Eve can reject piece n, but take later pieces. Taking any countably infinite set leads to the bad outcome (banishment). Finally, regardless of whether or not she is banished, Eve gets to keep (and eat) her pieces of apple. Call this the original version of Satan’s apple.

We shall sometimes discuss a simplified version of Satan’s apple, different from the original version in two respects. First, Eve is banished only if she takes all the pieces. Second, once Eve refuses a piece, she cannot take any more pieces. These restrictions make Satan’s apple a close analogue to the two earlier puzzles.

Problem: When should Eve stop taking pieces?

Source: Satan, Saint Peter and Saint Petersburg

Jonas Moss 7 Oct 2022 3:56 UTC
3 points
0 ∶ 0
in reply to: Sjlver’s comment on: Estimating value from pairwise comparisons
Thank you for telling about this! In economics, the discrete choice model is used to estimate a scale-free utility function in similar way. It is used in health research for estimating QALYs, among other things, see e.g. this review paper.

But discrete choice / the Schulze method should probably not be used by themselves, as they cannot give us information about scale, only ordering. A possibility, which I find promising, is to combine the methods. Say that I have ten items $I_{0} \dots I_{9}$ I want you to rate. Then I can ask “Do you prefer $I_{i}$ to $I_{j}$ ?” for some pairs and “How many times better is $I_{i}$ than $I_{j}$ ?” for other pairs, hopefully in an optimal way. Then we would lessen the cognitive load of the study participants and make it easier to scale this kind of thing up.

(The congitive load of using distributions is the main reason why I’m skeptical about having participants using them in place of point estimates when doing pairwise comparisons.)

Jonas Moss 10 Sep 2022 19:54 UTC
3 points
0 ∶ 0
in reply to: WilliamKiely’s comment on: Updating on the passage of time and conditional prediction curves
That’s sufficient information to calculate the conditional prediction curves I’m proposing. What you need is $P_{s} (X = 1 ∣ T \geq t)$ . If you have $P_{s} (X = 1)$ and $P_{s} (T \geq t)$ , which you can find by integrating the density for “when will X happen”, you can calculate $P_{s} (X = 1 ∣ T \geq t)$ .

Jonas Moss 10 Sep 2022 19:38 UTC
3 points
0 ∶ 0
in reply to: MathiasKB’s comment on: A model about the effect of total existential risk on career choice
$j$ and $k$ are indices for the causes. I wrote $j \neq k$ because you don’t have to assume that $d_{k}$ and $p_{k}$ are independent for the math to work. But everything else will have to be independent.

Maybe the uncertainties shouldn’t be independent, but often they will. Our uncertainty about the probability of AI doom is probably not related to our uncertainty about the probability of pandemics doom, for instance.

Jonas Moss 10 Sep 2022 19:34 UTC
3 points
0 ∶ 0
in reply to: JakubK’s comment on: A model about the effect of total existential risk on career choice
If the probability of extinction by cause $k$ is $p_{k}$ and the probability reduction for that cause is $d_{k}$ , the probability of extinction becomes $p_{k} - d_{k}$ if you choose to focus on cause $k$ .

Jonas Moss 12 Aug 2022 15:59 UTC
3 points
0 ∶ 0
in reply to: Isaac Dunn’s comment on: Uncorrelated Bets: an easy to understand and very important decision theory consideration, which helps tease out nonobvious but productive criticism of the EA movement
I agree the argument doesn’t work, but there are at least two arguments for investing in charities with sub-optimal expected values that critically depend on time.
- Going bust. Suppose you have two charity investments with expected values $E X_{t} = x_{t}, E Y_{t} = y_{t}$ . Here $x_{1} > y_{1}$ , but there’s a potential for $y_{t} > x_{t}$ in the future, for instance since you receive better information about the charities. If you invest once, investing everything in $X$ is the correct answer since $x_{1} > y_{1}$ . Now suppose that each time you don’t invest in $Y$ , it has a chance of going bust. Then, if you invest more than once, it would be best to invest something in $Y$ if the probability of $Y$ going bust is high enough and $y_{t + 1} > x_{t + 1}$ with a sufficiently high probability.
- Signaling effects. Not investing in the charity $Y_{t}$ may signal to charity entrepreneurs that there is nothing to gain by starting in a new charity similar to $Y$ , thus limiting your future pool of potential investments. I can imagine this to be especially important if your calculation of the expected value is contentious, or if $E Y_{t}$ has high epistemic uncertainty.
Edit: I think “going bust” example is similar to the spirit of the Kelly criterion, so I suppose you might say the argument does work.

Jonas Moss

Up­dat­ing on the pas­sage of time and con­di­tional pre­dic­tion curves

Es­ti­mat­ing value from pair­wise comparisons

A peek at pair­wise prefer­ence es­ti­ma­tion in eco­nomics, mar­ket­ing, and statistics

A model about the effect of to­tal ex­is­ten­tial risk on ca­reer choice

Updating on the passage of time and conditional prediction curves

Estimating value from pairwise comparisons

A peek at pairwise preference estimation in economics, marketing, and statistics

A model about the effect of total existential risk on career choice