Great post, we need more summaries of disagreeing view points!
Having said that, here are a few replies:
I think this makes me very concern of a strong ideological and philosophical bubble in the Bay regarding these core questions of AI
I am only slightly acquainted with Bay area AI safety discourse, but my impression is indeed that people lack familiarity with some of the empirically true and surprising points made by skeptics e.g. Yann LeCun(LLMs DO lack common sense and robustness), and that is bad. Nevertheless, I do not think you are outright banished if you express such a viewpoint. IIRC Yudkowsky himself asserted in the past that LLMs are not sufficient for AGI (he made a point about being surprised at GPT-4 abilities on the Lex Fridman podcast). I would not put too much stock into LW upvotes as a measure of AIS researchers POV, as most LW users are engaging with AIS as a hobby and consequently do not have a very sophisticated understanding of the current pitfalls in LLMs.
On priors, it seems odd to place very high credence in results on exactly one benchmark. The fate of most “fundamentally difficult for LLMs, this time we mean it” benchmarks has usually been that next gen LLMs perform substantially better at them, which is also a point “Situational Awareness” makes. (e.g. Winograd schemas, GPQA). Focusing on the ARC challenge now and declaring it the actual true test of intelligence is a little bit survivorship bias.
Scale Maximalists, both within the EA community and without, would stand to lose a lot of Bayes points/social status/right to be deferred to
Acknowledging that status games are bad in general, I do think that it is valid to point out that historically speaking the “Scale is almost all you need” worldview has so far been much more predictive of the performances that we do see with large models. The fact that this has been taken seriously by the AIS-community/Scott/Open Phil (I think) well before GPT-3 came out, whereas mainstream academic research thought of them as fun toys of little practical significance is a substantial win.
Even under uncertainty about whether the scaling hypothesis turns out to be essentially correct, it makes a lot of sense to focus on the possibility that it is indeed correct and plan/work accordingly. If it is not correct, we only have the opportunity cost of what else we could have done with our time and money. If it is correct, well.. you know the scenarios.
In my experience, often these kind of recommendations make a lot of sense for running a university group in US/UK universities, but perhaps less so in mainland Europe. Or more precisely, in universities with a large natural pool of potential interest in EA, vs universities where fewer people have heard of EA or where there is a much weaker culture of having university groups in the first place, etc.
Having done around five years of university group organizing at various levels of engagement, I am not sure how impactful it was, but it was certainly more often than not a slug and unrewarding. This was mostly because group sizes were so small that meetings often felt awkward and not fun, and consequently advertising and preparing talks did not feel like a good use of time. Things might have changed since two or three years ago, but I cannot confirm that the existing resources and knowledge from orgs like CEA or in various fora helped us in a substantial way.