Thanks for your thoughtful comment. Based on your comment and others, I am going to focus my next post in this series on how I think about movement reputation in general, including some specific replies to your points here. Just flagging that lack of a substantive reply here is because I’m going to write a full-scale post on the subject, hopefully over the next few days.
DirectedEvolution
Is the 10% Giving What We Can Pledge Core to EA’s Reputation?
Further defense of the 2% fuzzies/8% EA causes pledge proposal
That makes sense. I don’t think there are any official prerequisites to being an EA, but there are community norms. I think the GWWC pledge (or a direct-work equivalent) is a common-enough practical or aspirational norm that I’m comfortable with eliding EA and GWWC-adjacent-EA for the purposes of this post, but I acknowledge you’d prefer to split these apart for a sensible reason.
I’d guess donating for warm fuzzies is generally an ineffective way to gain influence/status.
As a simple and costless way to start operationalizing this disagreement, I claim that if I ask my mom (not an EA, pretty opposed to the vibe) if she’d like EA better with a 2%/8% standard, she’d prefer it and say that she’d think warmly of a movement that encouraged this style of donating. I’m only sort of being facetious here—I think having accurate models about how to build reputation for the movement are important and that EAs need a way to gather evidence and update.
My post is related to the Giving What We Can pledge and the broad idea of focusing on “utilons, not fuzzies.” From the wording of your comment I’m unclear on whether you’re unfamiliar with these ideas or whether you are just taking this as an opportunity to say that you disagree with them. If you don’t think that standards like the GWWC pledge are good for EA, then what do you think about the 2%/8% norm I propose here as a better alternative, even if far suboptimal to no pledge at all?
Roughly, I think the community isn’t able (isn’t strong enough?) to both think much about how it’s perceived and think well or in-a-high-integrity-manner about how to do good, and I’d favor thinking well and in a high-integrity manner.
Just want to flag that I completely disagree with this, and that moreover I find it bewildering that in EA and rationalism this seemingly passes almost as a truism.
I think we can absolutely think both about perceptions and charitable effectiveness—their tradeoffs, how to get the most of one without sacrificing too much of the other, how they might go together—and both my post here and jenn’s post that I link to are examples of that.
People can think about competing values and priorities, and they do it all the time. I want to have fun, but I also want to make ends meet. I want to do good, but I also want to enjoy my life. I want to be liked, but I also want to be authentic. These are normal dilemmas that just about everybody deals with all the time. The people I meet in EA are mostly smart, sophisticated people, and I think that’s more than sufficient to engage in this kind of tradeoffs-and-strategy-based reasoning.
EAs should donate 2% to warm fuzzy causes and 8% to EA causes
I would not be surprised if this small cohort of volunteers accelerated the pace of getting to this result by a year or more. I’m not going to take a chance on plugging in numbers, but that’s a lot of lives saved per volunteer. While most of the badass points/moral credit goes to the people who received the jab, we should also feel proud of the people who were lined up behind them ready to endure the same.
Yes it does, thank you for the added context!
That makes sense! Thank you.
I have a second question. You compared before/after intervention malaria rates for the treated vs. control districts, and found that the multiplier was 52.5% lower in the treated areas. Do we have information on how this compares to historical data? Also, were the districts randomly selected for the treatment vs. control group, or was it chosen on a convenience basis?
I am thinking about the possibility that the treated and control districts may have significantly different base rates of malarial increase at the seasonal time points chosen for the before and after measurements, since there are only 7 districts and it sounds like they may be ecologically and demographically heterogeneous.
Having looked at the original paper, I found a partial answer in table 3:
Before the intervention, the treatment district had a malaria rate 3.3 times higher than the control district. After the intervention, the treatment district had a malaria rate 1.6 times higher than the control district. There were large differences in the levels of malaria incidence between the two districts before the intervention.
As far as I can tell, there has not been an attempt to rule out the possibility and size of any systematic differences in how malaria fluctuates or how it is measured between the treatment and control districts. To address this, historical data showing the average multiplier in these districts during the same time of year in previous years when the treatment was not applied could be used to compare with the current base rates.
If historical data is available for these districts, it seems like it ought to be possible to examine that historical data prior to rolling out a larger-scale $6 million RCT.
If I am making mistakes in this analysis, please let me know and I will correct my comment. Thank you!
This looks like excellent work, a very logical intervention and where you’ve put a lot of effort into putting together the data to attract serious funding for a scale-up.
One thing I would like to know: in urban areas, I presume access to medical care is higher, and so I am wondering whether the death rate, as well as incidence, may be lower. I see that you achieved a 52% reduction in cases, and I am wondering if you have data, or will be gathering data, on the effect on deaths due to malaria?
I have also encountered deletionism. When I was improving the aptamer article for a good article nomination, the reviewer recommended splitting a section on peptide aptamers into a separate article. After some thinking, I did so. Then some random editor who I’d never interacted with before deleted the whole peptide aptamer article and accused me of plagiarism/copying it from someplace else on the internet, and never responded to my messages trying to figure out what he was doing or why.
It’s odd to me because the Foreign Dredge Act is a political issue, while peptide aptamers are an extremely niche topic. And the peptide aptamer article contained nothing but info that had been on Wikipedia for years, while I wrote the Dredge Act article from scratch. Hard to see rhyme or reason, and very frustrating that there’s no apparent process for dealing with a vandal who thinks of themselves as an “editor.”
That hasn’t been entirely my experience. In fact, when I made the page for the Foreign Dredge Act of 1906, I was pleasantly surprised at how quickly others jumped in to improve on my basic efforts—it was clearly a case of just needing the page to exist at all before it started getting the attention it deserved.
By contrast, I’ve found that trying to do things like good article nominations, where you’re trying to satisfy the demands of self-selected nonexpert referees, can be frustrating. The same is true for trying to improve pages already getting a lot of attention. Even minor improvements to the Monkeypox page during the epidemic were the subject of heated debate and accusations on the talk page. When a new page is created, it doesn’t have egos invested in it yet, so you don’t really have to argue with anybody very much.
I’d be interested in learning more about your experiences that leads you to say it’s harder to create than improve pages. I’m not that novice but you seem like you have a lot more experience than me.
If that database would have been important for pandemic prevention and vaccine development, I would have expected the virologists to write OPs publically calling on China to release the data. That they didn’t is a clear statement about what they think for how useful that data is for pandemic prevention and how afraid they are that people look critically at the Wuhan Institute of Virology.
Are you sure that virologists didn’t write such OPs?
The virologists seemed to ignore the basic science questions such as “How do these viruses spread?” and “Are they airborne?” that actually mattered.
My understanding is that in the US, they actually studied these questions hard and knew about things like airborn transmission and asymptomatic spread pretty early on, but were suppressed by the Trump administration. That doesn’t excuse them—they ought to have grown a spine! - but it’s important to recognize the cause of failure accurately so that we can work on the right problem.
I have a developing app on my github called aiRead, which is a text-based reading app integrating a number of chatbot prompts to do all sorts of interactive features with the text you’re reading. It’s unpolished, as I’m focusing on the prompt engineering and figuring out how to work with it more effectively rather than making it attractive for general consumption. If you’d like to check it out, here’s the link—I’d be happy to answer questions if you find it confusing! Just requires the ability to run a python script.
A couple other important ideas:
Ask the model to summarize AND compress the previous work every other prompt. This increases the amount of data in its context window.
Ask it to describe ideas in no more than 3 high-level concepts. Then select one and ask it to break it down to 3 sub-points, etc.
Start by asking it to break down your goal to verify it understands what you are trying to do before you ask it to execute. You can ask for positive and negative examples.
If you get a faulty reply, regenerate or edit your prompt, rather than critiquing it with a follow up prompt. Keep the context window as pure as possible.
Honey baked ham is 5g fat and 3g sugar/3oz. ~28g = 1oz, so that’s 6% fat and 4% sugar, so ice cream is about 5x sugarier and ~2x fattier than honey-baked ham. In other words, for sugar and fat content, honey-drenched fat > ice cream > honey-baked ham. Honey-baked ham is therefore not a modern American equivalent to honey-drenched Gazelle fat, a sentence I never thought I’d write but I’m glad I had the chance to once in my life.
Although you are right that modesty (or deference) often outperforms one’s own personal judgment, this isn’t always the case. Results below are based on Monte Carlo simulations I haven’t published yet.
Take the case of a crowd estimating a cow’s weight. The members of the crowd announce their guesses sequentially. They adopt a uniform rule of D% deference, so that each person’s guess is a weighted average of a sample from a Normal distribution centered on the cow’s true weight, and of the current crowd average guess:
Guess_i = D*(Crowd average) + (1-D)*(Direct observation)
Under this rule, as deference increases, the crowd converges more slowly on the cow’s true weight: deference is bad for group epistemics. This isn’t the same thing as an information cascade, because the crowd will converge eventually unless they are completely deferent.
Furthermore, the benefits of deference for individual guess accuracy are maximized at about 78% deference except potentially on very long timescales. Beyond this point, the group converges so slowly that it undermines, though doesn’t fully cancel out, the individual benefit of adopting the group average.
Finally, when faced with a choice of whether to make a guess according to the group’s rule for deference or whether to be 100% deferent and simply guess the current group average, you will actually do better to make a partially deferent guess than a 100% deferent guess if the group is more than about 78% deferent. Below that point, it’s better for individual accuracy to 100% defer, which suggests a Prisoner’s Game Dilemma model in which individuals ‘defect’ on the project of obtaining high crowd accuracy by deferring to the crowd rather than contributing their own independent guess, leading to very high and deeply suboptimal but not maximally bad levels of deference.
These results depend on specific modeling assumptions.
We might penalize inaccuracy according to its square. This makes deference less useful for individual accuracy, putting the optimal level closer to 65% rather than 78%.
We can also imagine that instead of using the crowd average, the deferent portion of the guess is sampled from a Normal distribution about the crowd average. In this case, the optimal level of deference is closer to 45%, and beyond about 78% deference, it’s better to ignore the crowd entirely and just do pure direct observation.
I haven’t simulated this yet, but I am curious to know what happens if we assume a fixed 1% of guesses are purely independent.
We are evaluating the whole timeline of guesses and observations. What if we are thinking about a person joining a mature debate, where the crowd average has had more time to converge?
It assumes that making and sharing observations is cost-free. In reality, of course, if every scientist had to redo all the experiments in their field (i.e. never deferred to previous results), science could not progress, and this is true everywhere else as well.
Whether or not the cost of further observations is linked to the current group accuracy. If we imagine a scenario where individual accuracy is a heavy driver of the costs of individual observations, then we might want to prioritize deference to keep costs down and permit more or faster guesses. If instead it is crowd accuracy that controls the costs of observations, then we might want to focus on independent observations.
Overall, I think we need to refocus the debate on epistemic modesty around tradeoffs and modeling assumptions in order to help people make the best choice given their goals:
How much to defer seems to depend on a few key factors:
The cost of making independent observations vs. deferring, and whether or not these costs are linked to current group or individual accuracy
How inaccuracy is penalized
How deferent we think the group is
Whether we are prioritizing our own individual accuracy or the speed with which the group converges on the truth
Basically all the problems with deference can be eliminated if we are able to track the difference between independent observations and deferent guesses.
My main takeaways are that:
Intuition is only good enough to be dangerous for thinking in the abstract about deference
Real-world empirical information is crucial for making the right choice of modeling assumptions to decide on how much to defer
Deference is a common and important topic in the rationalist and EA communities on a number of subjects, and should motivate us to try and take a lot more guesses
It is probably worth trying to figure out how to better track the difference between deferent guesses and independent observations in our discourse