I’m a theoretical CS grad student at Columbia specializing in mechanism design. I write a blog called Unexpected Values which you can find here: https://ericneyman.wordpress.com/. My academic website can be found here: https://sites.google.com/view/ericneyman/.
Eric Neyman
Great comment, I think that’s right.
I know that “give your other values an extremely high weight compared with impact” is an accurate description of how I behave in practice. I’m kind of tempted to bite that same bullet when it comes to my extrapolated volition—but again, this would definitely be biting a bullet that doesn’t taste very good (do I really endorse caring about the log of my impact?). I should think more about this, thanks!
Thanks for writing this up; I agree with your conclusions.
There’s a neat one-to-one correspondence between proper scoring rules and probabilistic opinion pooling methods satisfying certain axioms, and this correspondence maps Brier’s quadratic scoring rule to arithmetic pooling (averaging probabilities) and the log scoring rule to logarithmic pooling (geometric mean of odds). I’ll illustrate the correspondence with an example.
Let’s say you have two experts: one says 10% and one says 50%. You see these predictions and need to come up with your own prediction, and you’ll be scored using the Brier loss: (1 - x)^2, where x is the probability you assign to whichever outcome ends up happening (you want to minimize this). Suppose you know nothing about pooling; one really basic thing you can do is to pick an expert to trust at random: report 10% with probability 1⁄2 and 50% with probability 1⁄2. Your expected Brier loss in the case of YES is (0.81 + 0.25)/2 = 0.53, and your expected loss in the case of NO is (0.01 + 0.25)/2 = 0.13.
But, you can do better. Suppose you say 35% -- then your loss is 0.4225 in the case of YES and 0.1225 in the case of NO—better in both cases! So you might ask: what is the strategy the gives me the largest possible guaranteed improvement over choosing a random expert? The answer is linear pooling (averaging the experts). This gets you 0.49 in the case of YES and 0.09 in the case of NO (an improvement of 0.04 in each case).
Now suppose you were instead being scored with a log loss—so your loss is -ln(x), where x is the probability you assign to whichever outcome ends up happening. Your expected log loss in the case of YES is (-ln(0.1) - ln(0.5))/2 ~ 1.498, and in the case of NO is (-ln(0.9) - ln(0.5))/2 ~ 0.399.
Again you can ask: what is the strategy that gives you the largest possible guaranteed improvement of this “choose a random expert” strategy? This time, the answer is logarithmic pooling (taking the geometric mean of the odds). This is 25%, which has a loss of 1.386 in the case of YES and 0.288 in the case of NO, an improvement of about 0.111 in each case.
(This works just as well with weights: say you trust one expert more than the other. You could choose an expert at random in proportion to these weights; the strategy that guarantees the largest improvement over this is to take the weighted pool of the experts’ probabilities.)
This generalizes to other scoring rules as well. I co-wrote a paper about this, which you can find here, or here’s a talk if you prefer.
What’s the moral here? I wouldn’t say that it’s “use arithmetic pooling if you’re being scored with the Brier score and logarithmic pooling if you’re being scored with the log score”; as Simon’s data somewhat convincingly demonstrated (and as I think I would have predicted), logarithmic pooling works better regardless of the scoring rule.
Instead I would say: the same judgments that would influence your decision about which scoring rule to use should also influence your decision about which pooling method to use. The log scoring rule is useful for distinguishing between extreme probabilities; it treats 0.01% as substantially different from 1%. Logarithmic pooling does the same thing: the pool of 1% and 50% is about 10%, and the pool of 0.01% and 50% is about 1%. By contrast, if you don’t care about the difference between 0.01% and 1% (“they both round to zero”), perhaps you should use the quadratic scoring rule; and if you’re already not taking distinctions between low and extremely low probabilities seriously, you might as well use linear pooling.
- 13 Nov 2021 10:37 UTC; 5 points) 's comment on When pooling forecasts, use the geometric mean of odds by (
Historically, there have been ~24 Republicans vs ~19 Democrats as senators (and 1 independent) from Oregon, so partisan affiliation doesn’t seem that important.
A better way of looking at this is the partisan lean of his particular district. The answer is D+7, meaning that in a neutral environment (i.e. an equal number of Democratic and Republican votes nationally), a Democrat would be expected to win this district by 7 percentage points.
This year is likely to be a Republican “wave” year, i.e. Republicans are likely to outperform Democrats (the party out of power almost always overperforms in midterm elections); however, D+7 is a substantial lean that’s hard to overcome. I’d give Carrick a 75% chance of winning the general election conditional on winning the primary. His biggest challenge is winning the primary election.
Hi! I’m an author of this paper and am happy to answer questions. Thanks to Jsevillamol for the summary!
A quick note regarding the context in which the extremization factor we suggest is “optimal”: rather than taking a Bayesian view of forecast aggregation, we take a robust/”worst case” view. In brief, we consider the following setup:
(1) you choose an aggregation method.
(2) an adversary chooses an information structure (i.e. joint probability distribution over the true answer and what partial information each expert knows) to make your aggregation method do as poorly as possible in expectation (subject to the information structure satisfying the projective substitutes condition).
In this setup, the 1.73 extremization constant is optimal, i.e. maximizes worst-case performance.
That said, I think it’s probably possible to do even better by using a non-linear extremization technique. Concretely, I strongly suspect that the less variance there is in experts’ forecasts, the less it makes sense to extremize (because the experts have more overlap in the information they know). I would be curious to see how low a loss it’s possible to get by taking into account not just the average log odds, but also the variance in the experts’ log odds. Hopefully we will have formal results to this effect (together with a concrete suggestion for taking variance into account) sometime soon :)
Does anyone have an estimate of how many dollars donated to the campaign are about equal in value to one hour spent phonebanking? Thanks!
Thanks—I should have been a bit more careful with my words when I wrote that “measurement noise likely follows a distribution with fatter tails than a log-normal distribution”. The distribution I’m describing is your subjective uncertainty over the standard error of your experimental results. That is, you’re (perhaps reasonably) modeling your measurement as being the true quality plus some normally distributed noise. But—normal with what standard deviation? There’s an objectively right answer that you’d know if you were omniscient, but you don’t, so instead you have a subjective probability distribution over the standard deviation, and that’s what I was modeling as log-normal.
I chose the log-normal distribution because it’s a natural choice for the distribution of an always-positive quantity. But something more like a power law might’ve been reasonable too. (In general I think it’s not crazy to guess that the standard error of your measurement is proportional to the size of the effect you’re trying to measure—in which case, if your uncertainty over the size of the effect follows a power law, then so would your uncertainty over the standard error.)
(I think that for something as clean as a well-set-up experiment with independent trials of a representative sample of the real world, you can estimate the standard error well, but I think the real world is sufficiently messy that this is rarely the case.)
(Comment is mostly cross-posted comment from Nuño’s blog.)
In “Unflattering aspects of Effective Altruism”, you write:
Third, I feel that EA leadership uses worries about the dangers of maximization to constrain the rank and file in a hypocritical way. If I want to do something cool and risky on my own, I have to beware of the “unilateralist curse” and “build consensus”. But if Open Philanthropy donates $30M to OpenAI, pulls a not-so-well-understood policy advocacy lever that contributed to the US overshooting inflation in 2021, funds Anthropic13 while Anthropic’s President and the CEO of Open Philanthropy were married, and romantic relationships are common between Open Philanthropy officers and grantees, that is ¿an exercise in good judgment? ¿a good ex-ante bet? ¿assortative mating? ¿presumably none of my business?
I think the claim that Open Philanthropy is hypocritical re: the unilateralist’s curse doesn’t quite make sense to me. To explain why, consider the following two scenarios.
Scenario 1: you and 999 other people smart, thoughtful people have a button. You know there’s 1000 people with such a button. If anyone presses the button, all mosquitoes will disappear.
Scenario 2: you and you alone have a button. You know that you’re the only person with such a button. If you press the button, all mosquitoes will disappear.
The unilateralist’s curse applies to Scenario 1 but *not* Scenario 2. That’s because, in Scenario 1, your estimate of the counterfactual impact of pressing the button should be your estimate of the expected utility of all mosquitoes disappearing, *conditioned on no one else pressing the button*. In Scenario 2, where no one else has the button, your estimate of the counterfactual impact of pressing the button should be your estimate of the (unconditional) expected utility of all mosquitoes disappearing.
So, at least the way I understand the term, the unilateralist’s curse refers to the fact that taking a unilateral action is worse than it naively appears, *if other people also have the option of taking the unilateral action*.
This relates to Open Philanthropy because, at the time of buying the OpenAI board seat, Dustin was one of the only billionaires approaching philanthropy with an EA mindset (maybe the only?). So he was sort of the only one with the “button” of having this option, in the sense of having considered the option and having the money to pay for it. So for him it just made sense to evaluate whether or not this action was net positive in expectation.
Now consider the case of an EA who is considering launching an organization with a potentially large negative downside, where the EA doesn’t have some truly special resource or ability. (E.g., AI advocacy with inflammatory tactics—think DxE for AI.) Many people could have started this organization, but no one did. And so, when deciding whether this org would be net positive, you have to condition on this observation.
Thanks for asking! The first thing I want to say is that I got lucky in the following respect. The set of possible outcomes isn’t the interior of the ellipse I drew; rather, it is a bunch of points that are drawn at random from a distribution, and when you plot that cloud of points, it looks like an ellipse. The way I got lucky is: one of the draws from this distribution happened to be in the top-right corner. That draw is working at ARC theory, which has just about the most intellectually interesting work in the world (for my interests) and is also just about the most impactful place for me to work (given my skills and my models of what sort of work is impactful). I interned there for 4-5 months and I’ll be starting there full-time soon!
Now for my report card, as for how well I checked in (in the ways listed in the post):
Writing the above post was useful in an interesting way: I formed some amount of identity around “I care about things besides impact” in a way that somewhat decreased value drift. (I endorse this, I think.) This manifested as me thinking a lot over the last year about whether I’m happy. Sometimes the answer was “not really”! But I noticed this and took steps toward fixing it. In particular, I noticed when I was in Berkeley last summer that I had a need for a social group that doesn’t talk about maximizing impact all the time. This was super relevant to my criteria for choosing a living situation when I came back to Berkeley in October. I ended up choosing a “chill” group house, and I think that was the right choice.
I had the goal of keeping a monthly diary about my values. I updated it four times—in June, July, October, and March—and I think that captured most of the value. (I’m not sure that this was a particularly valuable intervention.)
Regarding the four specific non-EA things I cared about that I listed above:
Family and non-EA friends: I continue to be close with my family and remain similarly close with the non-EA friends I had at the time.
Puzzles and puzzle hunts: I continue caring about this. Empirically I haven’t done many puzzle hunts over the last year, but that was more for a lack of good opportunities. But I recently joined a new puzzle hunt team, so I might have more opportunities ahead!
Spending time in nature: yup, I continue to care about this. I went to Alaska for a few weeks last month and it was great.
Random statistical analyses: honestly, much less? Which I’m a bit sad about.
One interested that I had not listed because I had mixed feelings about how much I endorsed the interest was politics. I indeed care less about politics now (though still do a decent amount).
I also picked up an interest—I’m part of the Bayesian Choir! I’ve also been playing some small amount of tennis, for the first time since high school.
I didn’t do any of the CFAR techniques, like focusing or internal double crux.
I’d say that this looks pretty good.
I do think that there are a couple of yellow flags, though:
I currently believe that the Berkeley EA community is unhealthy (I’m not sure whether to add the caveat “for me” or whether I think it’s unhealthy, period). The main reason for this, I think, is that there’s a status hierarchy. The way I sometimes put this is: if you asked me which of my friends in college are highest status, I would’ve been like ”...what does that even mean, that question doesn’t make sense”. But unfortunately I think if you asked about people’s status in this community, I’d often have thoughts. I have a theory that this comes out of having a large group of people with really similar values and goals. To elaborate on this: in college, everyone was pursuing their own thing and had their own values, which means that different people had very different standards for what it meant for someone to be cool. (There would have been way more status if, say, everyone were trying to be a member of some society; my impression is that this caused status dynamics in parts of my college that I didn’t interact with.) In the Berkeley EA community, most people have pretty similar goals (such as furthering AI safety or having interesting conversations). If people agree on what’s important then naturally they’ll agree more on who’s good at the important things (who’s good at AI safety research, or who’s good at having interesting conversations—and by the way, there’s way more agreement in the Berkeley EA community about what constitutes an interesting conversation than there is in college).
This theory would predict that political party organizations (the Democratic and Republican parties) have a strong social status hierarchy, since they mostly share the same goals (get the party into a position of power). If I learn that actually these organizations mostly don’t have strong social status hierarchies, I’ll retract my diagnosis.
I weakly think that something about the Berkeley EA community makes it harder for me to have original thoughts. Maybe it’s that there’s so much stuff going on that I don’t spend very much time alone with my thoughts. Or maybe it’s that there’s more of a “party line” about the right takes, in a way that discourages free-thinking. Or maybe it’s that people in this community really like talking about some things but not other things, and this implicitly discourages thinking about the “other things”.
I haven’t figured out how to navigate this. These may be genuine trade-offs—a case where I can’t both work at ARC and be immune from these downsides—or maybe I’ll learn to deal with the downsides over time. I do think that the benefits of my decision to work at ARC are worth the costs for me, though.
Cool idea! Some thoughts I have:
A different thing you could do, instead of trading models, is compromise by assuming that there’s a 50% chance that your model is right and a 50% chance that your peer’s model is right. Then you can do utility calculations under this uncertainty. Note that this would have the same effect as the one you desire in your motivating example: Alice would scrub surfaces and Bob would wear a mask.
This would however make utility calculations twice as difficult as compared just using your own model, since you’d need to compute the expected utility under each model. But note that this amount of computational intensity is already assumed by the premise that it makes sense for Alice and Bob to trade models. In order for Alice and Bob to reach this conclusion, each needs to compute their utility under each action in each of their models.
I would say that this is more epistemically sound than switching models with your peer, since it’s reasonably well-motivated by the notion that you are epistemic peers and could have ended up in a world where you had had the information your peer has and vice versa.
But the fundamental issue you’re getting at here is that reaching an agreement can be hard, and we’d like to make good/informed decisions anyway. This motivates the question: how can you effectively improve your decision making without paying the cost required by trying to reach an agreement?
One answer is that you can share partial information with your peer. For instance, maybe Alice and Bob decide that they will simply tell each other their best guess about the percentage of COVID transmission that is airborne and leave it at that (without trying to resolve subsequent disagreement). This is enough to, in most circumstances, cause each of them to update a lot (and thus be much better informed in expectation) without requiring a huge amount of communication.
Which is better: acting as if each model is 50% to be correct, or sharing limited information and then updating? I think the answer depends on (1) how well you can conceptualize your peer’s model, (2) how hard updating is, and (3) whether you’ll want to make similar decisions in the future but without communicating. The sort of case when the first approach is better is when both Alice and Bob have simple-to-describe models and will want to make good COVID-related decisions in the future without consulting each other. The sort of case when the second approach is better is when Alice and Bob have difficult-to-describe models, but have pretty good heuristics about how to update their probabilities based on the other’s probabilities.
I started making a formal model of the “sharing partial information” approach and came up with an example of where it makes sense for Alice and Bob to swap behaviors upon sharing partial information. But ultimately this wasn’t super interesting because the underlying behavior was that they were updating on the partial information. So while there are some really interesting questions of the form “How can you improve your expected outcome the most while talking to the other person as little as possible”, ultimately you’re getting at something different (if I understand correctly) -- that adopting a different model might be easier than updating your own. I’d love to see a formal approach to this (and may think some more about it later!)
This is probably my favorite proposal I’ve seen so far, thanks!
I’m a little skeptical that warnings from the organization you propose would have been heeded (especially by people who don’t have other sources of funding and so relying on FTX was their only option), but perhaps if the organization had sufficient clout, this would have put pressure on FTX to engage in less risky business practices.
There sort of is—I’ve seen some EAs use the light bulb emoji 💡 on Twitter (I assume this comes from the EA logo) -- but it’s not widely used, and it’s unclear to me whether it means “identifies as an EA” or “is a practicing EA” (i.e. donates a substantial percentage of their income to EA causes and/or does direct work on those causes).
I’m unsure whether I want there to be an easy way to “identify as EA”, since identities do seem to make people worse at thinking clearly. I’ve thought/written about this (in the context of a neoliberal identity too, as it happens), and my conclusion was basically that a strong EA identity would be okay so long as the centerpiece of the identity continues to be a question (“How can we do the most good?”) as opposed to any particular answer. I’m not sure how realistic that is, though.
Thanks for putting this together; I might be interested!
I just want to flag that if your goal is to avoid internships, then (at least for American students) I think the right time to do this would be late May-early June rather than late June-early July as you suggest on the Airtable form. I think the most common day for internships to start is the day after Memorial Day, which in 2022 will be May 31st. (Someone correct me if I’m wrong.)
My understanding is that the Neoliberal Project is a part of the Progressive Policy Institute, a DC think tank (correct me if I’m wrong).
Are you guys trying to lobby for any causes, and if so, what has your experience been on the lobbying front? Are there any lessons you’ve learned that may be helpful to EAs lobbying for EA causes like pandemic preparedness funding?
Thanks for the thoughts. I agree that the first thing you point out is a problem, but let me just point out: in the event that it becomes a problem, that means that our platform is already a wild success. After all, I’d be very happy if our platform took out single-digit millions of money out of politics (compared to the single-digit billions that are spent). If we become a large fraction of all money going into politics, then yeah, this will become a problem, perhaps solvable in the way you suggest.
Regarding your thoughts on ads, that seems like a plausible hypothesis. But regarding matching funds going toward anti-polarization organizations: well, I’d be quite interested in that if there were effective anti-polarization organizations. And maybe there are, but I’m not aware of any, and I’m not super optimistic.
I think my crux with this argument is “actions are taken by individuals”. This is true, strictly speaking; but when e.g. a member of U.S. Congress votes on a bill, they’re taking an action on behalf of their constituents, and affecting the whole U.S. (and often world) population. I like to ground morality in questions of a political philosophy flavor, such as: “What is the algorithm that we would like legislators to use to decide which legislation to support?”. And as I see it, there’s no way around answering questions like this one, when decisions have significant trade-offs in terms of which people benefit.
And often these trade-offs need to deal with population ethics. Imagine, as a simplified example, that China is about to deploy an AI that has a 50% chance of killing everyone and a 50% chance of creating a flourishing future of many lives like the one many longtermists like to imagine. The U.S. is considering deploying its own “conservative” AI, which we’re pretty confident is safe, and which will prevent any other AGIs from being built but won’t do much else (so humans might be destined for a future that looks like a moderately improved version of the present). Should the U.S. deploy this AI? It seems like we need to grapple with population ethics to answer this question.
(And so I also disagree with “I can’t imagine a reasonable scenario in which I would ever have the power to choose between such worlds”, insofar as you’ll have an effect on what we choose, either by voting or more directly than that.)
Maybe you’d dispute that this is a plausible scenario? I think that’s a reasonable position, though my example is meant to point at a cluster of scenarios involving AI development. (Abortion policy is a less fanciful example: I think any opinion on the question built on consequentialist grounds needs to either make an empirical claim about counterfactual worlds with different abortion laws, or else wrestle with difficult questions of population ethics.)
Yeah, I agree this would be bad. I talk a bit about this here: https://ericneyman.wordpress.com/2019/09/15/incentives-in-the-election-charity-platform/
A possible solution is to send only half of any matched money to charity. Then, from an apolitical altruist’s perspective, donating $100 to the platform would cause at most $100 extra to go to charity, and less if their money doesn’t end up matched. (On the other hand, this still leaves the problem of s slightly political altruist, who cares somewhat about politics but more about charity; I don’t know how to solve this problem.)
And yeah, we’ve run into Repledge++ and are trying a small informal trial with it right now!
Let’s take the very first scatter plot. Consider the following alternative way of labeling the x and y axes. The y-axis is now the quality of a health intervention, and it consists of two components: short-term effects and long-term effects. You do a really thorough study that perfectly measures the short-term effects, while the long-term effects remain unknown to you. The x-value is what you measured (the short-term effects); the actual quality of the intervention is the x-value plus some unknown, mean zero variance 1 number.
So whereas previously (i.e. in the setting I actually talk about), we have E[measurement | quality] = quality (I’m calling this the frequentist sense of “unbiased”), now we have E[quality | measurement] = measurement (what I call the Bayesian sense of “unbiased”).
Great question—you absolutely need to take that into account! You can only bargain with people who you expect to uphold the bargain. This probably means that when you’re bargaining, you should weight “you in other worlds” in proportion to how likely they are to uphold the bargain. This seems really hard to think about and probably ties in with a bunch of complicated questions around decision theory.
Yup—that would be the limiting case of an ellipse tilted the other way!
The idea for the ellipse is that what EA values is correlated (but not perfectly) with my utility function, so (under certain modeling assumptions) the space of most likely career outcomes is an ellipse, see e.g. here.
I guess I have two reactions. First, which of the categories are you putting me in? My guess is you want to label me as a mop, but “contribute as little as they reasonably can in exchange” seems an inaccurate description of someone who’s strongly considering devoting their career to an EA cause; also I really enjoy talking about the weird “new things” that come up (like idk actually trade between universes during the long reflection).
My second thought is that while your story about social gradients is a plausible one, I have a more straightforward story about who EA should accept which I like more. My story is: EA should accept/reward people in proportion to (or rather, in a monotone increasing fashion of) how much good they do.* For a group that tries to do the most good, this pretty straightforwardly incentivizes doing good! Sure, there are secondary cultural effects to consider—but I do think they should be thought of as secondary to doing good.
*You can also reward trying to do good to the best of each’s ability. I think there’s a lot of merit to this approach, but might create some not-great incentives of the form “always looking like you’re trying” (regardless of whether you really are trying effectively).