My post is related to the Giving What We Can pledge and the broad idea of focusing on “utilons, not fuzzies.” From the wording of your comment I’m unclear on whether you’re unfamiliar with these ideas or whether you are just taking this as an opportunity to say that you disagree with them. If you don’t think that standards like the GWWC pledge are good for EA, then what do you think about the 2%/8% norm I propose here as a better alternative, even if far suboptimal to no pledge at all?
DirectedEvolution
Roughly, I think the community isn’t able (isn’t strong enough?) to both think much about how it’s perceived and think well or in-a-high-integrity-manner about how to do good, and I’d favor thinking well and in a high-integrity manner.
Just want to flag that I completely disagree with this, and that moreover I find it bewildering that in EA and rationalism this seemingly passes almost as a truism.
I think we can absolutely think both about perceptions and charitable effectiveness—their tradeoffs, how to get the most of one without sacrificing too much of the other, how they might go together—and both my post here and jenn’s post that I link to are examples of that.
People can think about competing values and priorities, and they do it all the time. I want to have fun, but I also want to make ends meet. I want to do good, but I also want to enjoy my life. I want to be liked, but I also want to be authentic. These are normal dilemmas that just about everybody deals with all the time. The people I meet in EA are mostly smart, sophisticated people, and I think that’s more than sufficient to engage in this kind of tradeoffs-and-strategy-based reasoning.
I would not be surprised if this small cohort of volunteers accelerated the pace of getting to this result by a year or more. I’m not going to take a chance on plugging in numbers, but that’s a lot of lives saved per volunteer. While most of the badass points/moral credit goes to the people who received the jab, we should also feel proud of the people who were lined up behind them ready to endure the same.
Yes it does, thank you for the added context!
That makes sense! Thank you.
I have a second question. You compared before/after intervention malaria rates for the treated vs. control districts, and found that the multiplier was 52.5% lower in the treated areas. Do we have information on how this compares to historical data? Also, were the districts randomly selected for the treatment vs. control group, or was it chosen on a convenience basis?
I am thinking about the possibility that the treated and control districts may have significantly different base rates of malarial increase at the seasonal time points chosen for the before and after measurements, since there are only 7 districts and it sounds like they may be ecologically and demographically heterogeneous.
Having looked at the original paper, I found a partial answer in table 3:
Before the intervention, the treatment district had a malaria rate 3.3 times higher than the control district. After the intervention, the treatment district had a malaria rate 1.6 times higher than the control district. There were large differences in the levels of malaria incidence between the two districts before the intervention.
As far as I can tell, there has not been an attempt to rule out the possibility and size of any systematic differences in how malaria fluctuates or how it is measured between the treatment and control districts. To address this, historical data showing the average multiplier in these districts during the same time of year in previous years when the treatment was not applied could be used to compare with the current base rates.
If historical data is available for these districts, it seems like it ought to be possible to examine that historical data prior to rolling out a larger-scale $6 million RCT.
If I am making mistakes in this analysis, please let me know and I will correct my comment. Thank you!
This looks like excellent work, a very logical intervention and where you’ve put a lot of effort into putting together the data to attract serious funding for a scale-up.
One thing I would like to know: in urban areas, I presume access to medical care is higher, and so I am wondering whether the death rate, as well as incidence, may be lower. I see that you achieved a 52% reduction in cases, and I am wondering if you have data, or will be gathering data, on the effect on deaths due to malaria?
I have also encountered deletionism. When I was improving the aptamer article for a good article nomination, the reviewer recommended splitting a section on peptide aptamers into a separate article. After some thinking, I did so. Then some random editor who I’d never interacted with before deleted the whole peptide aptamer article and accused me of plagiarism/copying it from someplace else on the internet, and never responded to my messages trying to figure out what he was doing or why.
It’s odd to me because the Foreign Dredge Act is a political issue, while peptide aptamers are an extremely niche topic. And the peptide aptamer article contained nothing but info that had been on Wikipedia for years, while I wrote the Dredge Act article from scratch. Hard to see rhyme or reason, and very frustrating that there’s no apparent process for dealing with a vandal who thinks of themselves as an “editor.”
That hasn’t been entirely my experience. In fact, when I made the page for the Foreign Dredge Act of 1906, I was pleasantly surprised at how quickly others jumped in to improve on my basic efforts—it was clearly a case of just needing the page to exist at all before it started getting the attention it deserved.
By contrast, I’ve found that trying to do things like good article nominations, where you’re trying to satisfy the demands of self-selected nonexpert referees, can be frustrating. The same is true for trying to improve pages already getting a lot of attention. Even minor improvements to the Monkeypox page during the epidemic were the subject of heated debate and accusations on the talk page. When a new page is created, it doesn’t have egos invested in it yet, so you don’t really have to argue with anybody very much.
I’d be interested in learning more about your experiences that leads you to say it’s harder to create than improve pages. I’m not that novice but you seem like you have a lot more experience than me.
If that database would have been important for pandemic prevention and vaccine development, I would have expected the virologists to write OPs publically calling on China to release the data. That they didn’t is a clear statement about what they think for how useful that data is for pandemic prevention and how afraid they are that people look critically at the Wuhan Institute of Virology.
Are you sure that virologists didn’t write such OPs?
The virologists seemed to ignore the basic science questions such as “How do these viruses spread?” and “Are they airborne?” that actually mattered.
My understanding is that in the US, they actually studied these questions hard and knew about things like airborn transmission and asymptomatic spread pretty early on, but were suppressed by the Trump administration. That doesn’t excuse them—they ought to have grown a spine! - but it’s important to recognize the cause of failure accurately so that we can work on the right problem.
I have a developing app on my github called aiRead, which is a text-based reading app integrating a number of chatbot prompts to do all sorts of interactive features with the text you’re reading. It’s unpolished, as I’m focusing on the prompt engineering and figuring out how to work with it more effectively rather than making it attractive for general consumption. If you’d like to check it out, here’s the link—I’d be happy to answer questions if you find it confusing! Just requires the ability to run a python script.
A couple other important ideas:
Ask the model to summarize AND compress the previous work every other prompt. This increases the amount of data in its context window.
Ask it to describe ideas in no more than 3 high-level concepts. Then select one and ask it to break it down to 3 sub-points, etc.
Start by asking it to break down your goal to verify it understands what you are trying to do before you ask it to execute. You can ask for positive and negative examples.
If you get a faulty reply, regenerate or edit your prompt, rather than critiquing it with a follow up prompt. Keep the context window as pure as possible.
Honey baked ham is 5g fat and 3g sugar/3oz. ~28g = 1oz, so that’s 6% fat and 4% sugar, so ice cream is about 5x sugarier and ~2x fattier than honey-baked ham. In other words, for sugar and fat content, honey-drenched fat > ice cream > honey-baked ham. Honey-baked ham is therefore not a modern American equivalent to honey-drenched Gazelle fat, a sentence I never thought I’d write but I’m glad I had the chance to once in my life.
Thank you for contributing more information.
I understand and appreciate the thinking behind step in Ren’s argument. However, the ultimate result is this:
I experienced “disabling”-level pain for a couple of hours, by choice and with the freedom to stop whenever I want. This was a horrible experience that made everything else seem to not matter at all...
A single laying hen experiences hundreds of hours of this level of pain during their lifespan, which lasts perhaps a year and a half—and there are as many laying hens alive at any one time as there are humans. How would I feel if every single human were experiencing hundreds of hours of disabling pain?
My main takeaway is that the breadth and variety of experience that arguably falls under the umbrella of “disabling pain” is enormous, and we can only have low-moderate confidence in animal welfare pain metrics. As a result, I am updating toward increased skepticism in high-level summaries of animal welfare research.
The impact of nest deprivation on laying hen welfare may still be among the most pressing animal welfare issues. But, if tractability was held constant, I might prefer to focus on alleviating physical pain among a smaller number of birds.
Also, to disagreevoters, I’m genuinely curious about why you disagree! Were you already appropriately skeptical before? Do you think I am being too skeptical? Why or why not?
I had more trouble understanding how nest deprivation could be equivalent to “**** me, make it stop. Like someone slicing into my leg with a hot, sharp live wire.” So I looked up the underpinnings of this metric, in Ch. 6 of the book they build their analysis on (pg. 6-9 is the key material).
They base this on the fact that chickens pace, preen, show aggressive competition for nests when availability is limited, and will work as hard to push open heavy doors to access nests as they will to access food after 4-28 hours of food deprivation. Based on this, the authors categorize nest deprivation as a disabling experience that each hen endures for an average of about 45 minutes per day.
This is a technically accurate definition, but I still had trouble intuiting this as equivalent to a daily experience of disabling physical pain equivalent to having your leg sliced open with a hot, sharp live wire.
Researchers are limited to showing that chickens exhibit distress during nest deprivation, or, in more sophisticated research, that they work as hard to access nest boxes as they do to access food after 4-28 hours of food deprivation.
I am suspicious of the claim that these methods are adequate to allow us to make comparisons of physical and emotional pain across species. This is especially true with the willingness-to-work metric they use to compare the severity of nest deprivation and starvation on chickens.
Willingness-to-work is probably mediated by energy. After starvation, chickens will be low-energy, and willingness-to-work probably underestimates their suffering. A starving person would like to do 100 pushups to access an all-you-can-eat buffet, but physically is unable to do so. If he’s also willing to do 100 pushups to join the football team, does that mean that keeping him off the team is as bad as starving him?
People show distressed behaviors in the absence of suffering. I bite my fingernails pretty severely. Sometimes, they even bleed. It’s not motivated by severe anxiety in those moments. It’s just force of habit. Chickens may be hardwired by evolution to work hard to access nests, without necessary suffering while they do so.
Our perceptions of how distressed a behavior is is culturally-specific, not to mention species-specific. I pace and walk around the neighborhood when I’m thinking hard. People get piercings and tattoos. People fight recreationally. We don’t assume that people are experiencing high emotional distress in the moments they choose to do these things. Why do we assume that about chickens?
I’ve spent too long writing this comment, so I’m going to just stop here.
I’ve used ChatGPT for writing landing pages for my own websites, and as you say, it does a “good enough” job. It’s the linguistic equivalent of a house decorated in knick knacks from Target. For whatever reason, we have had a cultural expectation that websites have to have this material in order to look respectable, but it’s not business-critical beyond that.
By contrast, software remains business-critical. One of the key points that’s being made again and again is that many business applications require extremely high levels of reliability. Traditional software and hardware engineering can accomplish that. For now, at least, large language models cannot, unless they are imitating existing high-reliability software solutions.
A large language model can provide me with reliable working code for an existing sorting algorithm, but when applications become large, dynamic, and integrated with the real world, it won’t be possible to built a whole application off a short, simple prompt. Instead, the work is going to be about using both human and AI-generated code to put together these applications more efficiently, debug them, improve the features, and so on.
This is one reason why I think that LLMs are unlikely to replace software engineers, even though they are replacing copy editors, and even though they can write code: SWEs create business-critical high-reliability products, while copy editors create non-critical low-reliability products, which LLMs are eminently suitable for.
IMO, the main potential power of a boycott is symbolic, and I think you only achieve that is by eschewing LLMs entirely. Instead, we can use them to communicate, plan, and produce examples. As I see it, this needs to be a story about engaged and thoughtful users advocating for real responsibility with potentially dangerous tech, not panicky luddites mounting a weak looking protest.
Seems to me that we’ll only see a change in course from relentless profit-seeking LLM development if intermediate AIs start misbehaving—smart enough to seek power and fight against control, but dumb enough to be caught and switched off.
I think instead of a boycott, this is a time to practice empathic communication with the public now that the tech is on everybody’s radar and AI x-risk arguments are getting a respectability boost from folks like Ezra Klein.
A poster on LessWrong recently harvested a comment from a NY Times reader that talked about x-risk in a way that clearly resonated with the readership. Figuring out how to scale that up seems like a good task for an LLM. In this theory of change, we need to double down on our communication skills to steer the conversation in appropriate ways. And we’ll need LLMs to help us do that. A boycott takes us out of the conversation, so I don’t think that’s the right play.
This might be an especially good time to enter the field. Instead of having to compete with more experienced SWEs in writing code the old fashioned way, you can be on a nearly level playing field with incorporating LLMs into your workflow. You’ll still need to learn a traditional language, at least for now, but you will be able to learn more quickly with the assistance of an LLM tutor. As the field increasingly adapts to a whole new way to write code, you can learn along with everybody else.
As a simple and costless way to start operationalizing this disagreement, I claim that if I ask my mom (not an EA, pretty opposed to the vibe) if she’d like EA better with a 2%/8% standard, she’d prefer it and say that she’d think warmly of a movement that encouraged this style of donating. I’m only sort of being facetious here—I think having accurate models about how to build reputation for the movement are important and that EAs need a way to gather evidence and update.