In general, there is no reason to expect the Atlas’ founders to spend money needlessly. Nobody is suspecting that they are spending it on themselves (excepting the alleged expensive table), and just like enterprises I expect them to be at least trying to use their resources in the most efficient way possible.
You raise imho valid arguments. To address some of your points:
I guess the Atlas Foundation is going off a model where impact is heavy tailed, in which it makes sense to spend what seems disproportionate resources on attracting the most talented. In such a model, attracting a fellow from the 99th “potential impact” percentile rather than 10 fellows from the 95th percentile would still be worth spending some marginal 45k for, even though it sounds excessive.
“From friends who are Atlas Fellows, they said many Atlas Fellows do not require the scholarship as their parents earn a lot and can already pay for college.” If true, this is evidence in favor of offering them such a ludicrous amount of money. They do not really need the money, so the marginal value is reduced and you need to offer more money to entice such potential students (or think of other benefits). And an unfortunate fact of life seems to be that a person’s financial earnings are highly correlated with those of their parents. Taking earnings as a proxy for potential impact means that a program like the Atlas Fellowship should also consider privileged students as people worth attracting.
And maybe that’s just me, but some of the phrasing comes off as somewhat combative (on the other hand I am aware that many people here think we should state our opinions more directly). As an example, the question in the title: “why do high schoolers need $50k each?” is not really truthful and sounds rhetorical, because nobody has claimed that the applicants need that money, just like high frequency traders do not need high compensation but still firms pay that amount to hire them.
I would usually not go around tone-policing, but I think it would be beneficial in controversial times like this to remember that as **a community we wanted to move away **from evaluating charitable initiatives based on how they sound and instead evaluate them on their results. In that vein, I do not think that it is helpful to quote rumoured single sentences by founders without any context (“not believing in budgets”) and without actually engaging with that sentence. The founders do not owe us accountability of private sentences that they might have uttered at some point.
Hits based giving means that Open Phil should not police the furniture of their grantees, and I am also unsure whether the way they manage inventory is indeed of public interest, as they are not soliciting donations from the public at the moment.
Great post, we need more summaries of disagreeing view points!
Having said that, here are a few replies:
I am only slightly acquainted with Bay area AI safety discourse, but my impression is indeed that people lack familiarity with some of the empirically true and surprising points made by skeptics e.g. Yann LeCun(LLMs DO lack common sense and robustness), and that is bad. Nevertheless, I do not think you are outright banished if you express such a viewpoint. IIRC Yudkowsky himself asserted in the past that LLMs are not sufficient for AGI (he made a point about being surprised at GPT-4 abilities on the Lex Fridman podcast). I would not put too much stock into LW upvotes as a measure of AIS researchers POV, as most LW users are engaging with AIS as a hobby and consequently do not have a very sophisticated understanding of the current pitfalls in LLMs.
On priors, it seems odd to place very high credence in results on exactly one benchmark. The fate of most “fundamentally difficult for LLMs, this time we mean it” benchmarks has usually been that next gen LLMs perform substantially better at them, which is also a point “Situational Awareness” makes. (e.g. Winograd schemas, GPQA). Focusing on the ARC challenge now and declaring it the actual true test of intelligence is a little bit survivorship bias.
Acknowledging that status games are bad in general, I do think that it is valid to point out that historically speaking the “Scale is almost all you need” worldview has so far been much more predictive of the performances that we do see with large models. The fact that this has been taken seriously by the AIS-community/Scott/Open Phil (I think) well before GPT-3 came out, whereas mainstream academic research thought of them as fun toys of little practical significance is a substantial win.
Even under uncertainty about whether the scaling hypothesis turns out to be essentially correct, it makes a lot of sense to focus on the possibility that it is indeed correct and plan/work accordingly. If it is not correct, we only have the opportunity cost of what else we could have done with our time and money. If it is correct, well.. you know the scenarios.