A useful post and interesting starting point for further discussion
Few more that spring to mind
is there an intuitively plausible alternative/complementary causal factor which might explain the results? Has the study made some sort of attempt to control for this or estimate the effect size?
Does the paper include a large number of hypotheses and find some of them to be statistically significant at the 5 or 10% level.? You would expect that to happen by random chance and it smells of p-hacking, though including them all in the paper is at least more intellectually honest than the alternative approach of testing a lot of hypotheses and only acknowledging the ones that were [incidentally] statistically significant. This is why preregistration is valuable. Note that testing so many hypotheses one of them is bound to be a “finding” is not the same thing as testing whether an association between x and y persists across a large number of alternative regressions including other variables that might plausibly have a relationship; that’s good practice.
Is there a standard methodology for conducting research of this nature in the field? has it been used, and if not, has a plausible rationale for taking a different approach been provided?
would be interesting to hear some of the more specialized ones that organizations like GiveWell and Rethink Priorities that evaluate a lot of research papers in particular fields use.
n.b. on the SMC vs ITN example, I’m fairly confident the answer is that the ITNs are a baseline directly comparable to “no treatment” as they shouldn’t affect the progression from bite to symptomatic malaria infection targeted by SMC at all; they simply reduce the frequency of bites (but not to zero if the sample size is sufficiently large). Prevalance of malarial bites varies between cohorts in “no treatment studies” already. Access to some level of treatment after the fact (HMM) isn’t a problem of study construction either; it complicates comparing severe malaria or death statistics with papers where sufferers may have had no treatment at all, but if anything would probably reduce the reported effect size for SMC. Medical ethics means the appropriate baseline/comparator for lifesaving treatment usually isn’t “do absolutely nothing”, it’s “do the [next] best alternative”
In this case Anthropic chose to supply the DoW via a partnership with a company deeply embedded in the administration’s part of the political spectrum, and even pointedly denied any objections to being used to support the administration’s little expedition in Venezuela, and the administration decided that wasn’t enough. There are many criticisms that can be made of Anthropic’s stance on those issues; reluctance to engage with the current US administration isn’t one of them.
If declining to actively support MAGA’s demands they support development of AI with the explicit purpose of being an autonomous killing device is “virtue signalling”, what’s left of “AI alignment” to pursue?