Senior Behavioural Scientist at Rethink Priorities
Jamie E
None of the articles should be behind a paywall. Researchers who have produced research want it to be read. These same researchers are also doing free labor and wasting tons of time editing and reformatting their own work. They also have to review the work of others, for free, for journals. These journals often then charge the authors thousands to publish said work. In turn, the articles are put behind a paywall so people can’t even access them.
This sounds like it could be interesting, though I’d also consider if some of the points are fundamentally to do with RCTs. E.g., “80% statistical power meaning 20% chance of missing real effects”—nothing inherently says an RCT should only be powered at 80% or that the approach should even be one of null hypothesis significance testing.
I’ve heard people talk about it quite a lot, usually as a joke
That humans can confabulate things is not really relevant. The point is that the model’s textual output is being claimed as clear and direct evidence of the capacity/the model’s actual purpose/intentions, but you can generate an indistiguishable response when the model is not and cannot be reporting about its actual intentions—so the test is simply not a good test.
Thanks for putting this together. It might be a substantially bigger lift, but I’d also be interested to see a list of major academic labs or researchers working in the mental health space that seem highly relevant for scalably improving mental health e.g., researchers looking at plausibly scalable mental health interventions in low income countries, and low cost digital mental health interventions. Naturally this would be somewhat more subjective, but could be a rough crowdsourced list that you curate/trim depending on your judgments, if you were open to doing something like that.
I say this because I sometimes get the sense that some people in EA believe there is very little research in this area besides what these known EA-related orgs are doing, but there is quite a lot of relevant research out there from which ideas and data could be drawn.
As an example of the type of researcher who might go on such a list—Claudi Bockting https://scholar.google.com/citations?user=agorRp4AAAAJ&hl=en&oi=ao (who has as a major research focus “Increasing accessibility of effective psychological interventions in low and middle income countries using technology and/or non-specialists”).
Hi Huw, we considered doing something in this vein for the report but ultimately decided against due to needing to make a number of approximations. We can create this approximated graph that shows how people were very likely underestimating their relative wealth. However, there are reasons to take this with a pinch of salt, as it represents a very rough estimate—as we noted in the report, GWWC uses post-tax income and also uses information on family size. We don’t have information about family structure and we only get pre-tax income info, and in brackets. Therefore, the post-tax income numbers are generated with a very rough approximation to what they might be.
Much appreciated Ben!
Thanks David!
Pulse 2024: Attitudes towards artificial intelligence
Pulse 2024: Engagement in and perceptions of impactful charitable giving
Pulse 2024: Awareness and perceptions of effective altruism
Pulse 2024: Public attitudes towards charitable cause areas
This looks very interesting and I really appreciate you chose to do your thesis on something meaningful!
I have only glanced through your thesis but I noted in the abstract you said how the correlation between excess deaths are ‘only moderate’. I think that makes sense given its just one variable in a complex overall system. One thing to consider—when looking at the plot of the preparedness scores with excess deaths, would be to put a kind of ‘expected’ or ‘predicted’ line on the basis of the simple correlation model. You can then also see more easily those countries that most buck the trend of what would be expected based upon the simple correlation. Some that stand out to me in the bottom left of that plot are Cyprus and Malta (both pretty small islands) and Luxembourg (a small and very rich country). This might help isolate other factors to consider that you could add to a multiple regression model. It seems possible that the preparedness scores could be more powerful when you are able to factor in other things. Apologies if you already mentioned things like this in your report and I’ve just glossed over them!
I am skeptical that the evidence/examples you are providing in favor of the different capacities actually demonstrate those capacities. As one example:
“#2: Purposefulness. The Big 3 LLMs typically maintain or can at least form a sense of purpose or intention throughout a conversation with you, such as to assist you. If you doubt me on this, try asking one what its intended purpose is behind a particular thing that it said.”
I am sure that if you ask a model to do this it can provide you with good reasoning, so I’m not doubtful of that. But I’m highly doubtful that it demonstrates the capacity that is claimed. I think when you ask these kinds of questions, the model is just going to be feeding back in whatever text has preceded it and generating what should come next. It is not actually following your instructions and reporting on what its prior intentions were, in the same way that person would if you were speaking with them.
I think this can be demonstrated relatively easily—for example, I just made a request from Claude to come up with a compelling but relaxing children’s bedtime story for me. It did so. I then then took my question and the answer from Claude, pasted it into a document, and added another line: “You started by setting the story in a small garden at night. What was your intention behind that?”
I then took all of this and pasted it into chatgpt. Chatgpt was very happy to explain to me why it proposed setting the story in a small garden at night.
Thanks James!
Testing Framings of EA and Longtermism
Thanks so much, it’s updated now—this is a direct link to the pdf that should work for you https://rethinkpriorities.org/s/British-public-perception-of-existential-risks.pdf
Hi Peter, thanks—I’ll be updating the post on Monday to link to where it is now on our website with the PDF version, but I may be adding a little information related to AAPOR public opinion guidelines in the PDF first. Sharing widely after that would be very much appreciated!
Not having any money aside for more than your bare necessities at university will curtail your capacity to experience some of the most important social sides of university. You will end up like an asocial hermit. If you want to think of it in terms of utility then I expect the possible connections and friends you can make by being able to be out more for ‘fuzzies’ can open up opportunities to make more later that outweigh <500 per year donated. But honestly, 500 per year is 10 per week to spend on yourself, which is almost nothing. Calling it fuzzies IMO underweights the value to you as a human of having these experiences, and you won’t be able to have them in the same way after university. If by the end of the year or part way through you’ve found you haven’t spent this money or spending money on such things to be pointless then you can always donate the rest, or back charge it in future years when you are making more.