Interesting work! It’s fascinating that the “Egrecore” analysis is essentially likening EA to a religion, it reads like it was written by an EA critic. Maybe it was influenced by the introduction of a mystical term like “egrecore”, or perhaps external criticism read by the chatbots have seeped in.
I am skeptical of the analysis of “epistemic quality”. I don’t think chatbots are very good at epistemology, and frankly most humans aren’t either. I worry that you’re actually measuring other things, like the tone or complexity of language used. These signfifiers would also correlate with forum karma.
I wonder if the question about “holistic epistemic quality” is influencing this: this does not appear to be a widely used term in the wider world. Would a more plain language question give the same results?
I’d describe myself as also skeptical of model / human ability here! And I’d agree we are to some extent measuring things LLMs confuse for Quality, or whatever target metric we’re interested in. But I think humans/models can be bad at this, and the practice can still be valuable.
My take is even crude measures of quality are helpful, which is why ea forum uses ‘karma’. And most the time crowdsourced quality scores are not available, e.g. outside of EA forum or before publication. LLMs change this by providing cheap cognitive labor. They’re (informally) providing quality feedback to users all the time. So marginally better quality judgement might marginally improve human epistemics.
I think right now LLM quality judgement is not near the ceiling of human capability. Their quality measures will probably be worse than, for example, ea karma. This is (somewhat) supported by the finding that better models produce quality scores that are more correlated with karma. Depending on how much better human judgement is than model judgement one could potentially use things like karma as tentpoles to optimize prompts, context scaffolding, or even models towards.
One nice thing about models maybe worth discussing here, is that they are sensitive to prompts. If you tell them to score quality they’ll try to score quality and if you tell them to score controversy they’ll try to score controversy. Both model graded Quality and Controversy correlate with both post karma and number of post comments. But it looks like Quality correlates more with karma and Controversy correlates more with number of post comments. You can see this in the correlations tab here: https://moreorlesswrong.streamlit.app/. So careful prompts should (to an extent*) help with overfitting.
Got to agree with the AI “analysis” being pretty limited, even though it flatters me by describing my analysis as “rigorous”.[1] It’s not a positive sign that this news update and jobs listing is flagged as having particularly high “epistemic quality”
That said, I enjoyed the ‘egregore’ section bits about the “ritualistic displays of humility”, “elevating developers to a priesthood” and “compulsive need to model, quantify, and systematize everything, even with acknowledged high uncertainty and speculative inputs ⇒ illusion of rigor”.[2] Gemini seems to have absorbed the standard critiques of EA and rationalism better than many humans, including humans writing criticisms of and defences of those belief systems. It’s also not wrong.
For a start I think most people reading our posts would conclude that Vasco and I disagree on far too much to be considered “intellectually aligned”, even if we do it mostly politely by drilling down to the details of each others’ arguments
It’s a good callout that potentially ‘holistic epistemic quality’ could possibly confuse models. I wanted to as concisely as possible articulate the quality we were interested in, but maybe not the best choice. But which result would you like to see replicated (or not) with a more natural prompt?
It could be a fun experiment to see how different wording affects the correlation with karma, for example. Would you get the result if you asked it to evaluate “logical and empirical rigor”? What if you asked about simpler things like how “well structured” or “articulate” the articles are? You could maybe get a sense for what aspects of writing are valued in the forum.
I agree! Egregore is occult so definitely religion-adjacent. But I also believe EA as a concept/community is religion-adjacent (not necessarily in a bad way).
It’s a community, ethical belief, there is suggested tithing, sense of purpose/meaning, etc.
Funny—I don’t think it feels written by a critic, but definitely a pointed outsider (somewhat neutral?) 3rd party analysis.
I do expect the Egregore report to trigger some people (in good and bad ways, see the comment below about feeling heard). The purpose is to make things known that are pushed into the shadows, the good and the bad. Usually things are pushed into the shadows because people don’t want to or can’t talk about them openly.
I’ll let @alejbo take this question—I think it’s a good one
Although at the high level I somewhat disagree with “I don’t think chatbots are very good at epistemology”, my guess would be they’re better than you think, but I agree not perfect or amazing
But as you admit, most humans aren’t either, so it’s already a low bar
Interesting work! It’s fascinating that the “Egrecore” analysis is essentially likening EA to a religion, it reads like it was written by an EA critic. Maybe it was influenced by the introduction of a mystical term like “egrecore”, or perhaps external criticism read by the chatbots have seeped in.
I am skeptical of the analysis of “epistemic quality”. I don’t think chatbots are very good at epistemology, and frankly most humans aren’t either. I worry that you’re actually measuring other things, like the tone or complexity of language used. These signfifiers would also correlate with forum karma.
I wonder if the question about “holistic epistemic quality” is influencing this: this does not appear to be a widely used term in the wider world. Would a more plain language question give the same results?
I’d describe myself as also skeptical of model / human ability here! And I’d agree we are to some extent measuring things LLMs confuse for Quality, or whatever target metric we’re interested in. But I think humans/models can be bad at this, and the practice can still be valuable.
My take is even crude measures of quality are helpful, which is why ea forum uses ‘karma’. And most the time crowdsourced quality scores are not available, e.g. outside of EA forum or before publication. LLMs change this by providing cheap cognitive labor. They’re (informally) providing quality feedback to users all the time. So marginally better quality judgement might marginally improve human epistemics.
I think right now LLM quality judgement is not near the ceiling of human capability. Their quality measures will probably be worse than, for example, ea karma. This is (somewhat) supported by the finding that better models produce quality scores that are more correlated with karma. Depending on how much better human judgement is than model judgement one could potentially use things like karma as tentpoles to optimize prompts, context scaffolding, or even models towards.
One nice thing about models maybe worth discussing here, is that they are sensitive to prompts. If you tell them to score quality they’ll try to score quality and if you tell them to score controversy they’ll try to score controversy. Both model graded Quality and Controversy correlate with both post karma and number of post comments. But it looks like Quality correlates more with karma and Controversy correlates more with number of post comments. You can see this in the correlations tab here: https://moreorlesswrong.streamlit.app/. So careful prompts should (to an extent*) help with overfitting.
Got to agree with the AI “analysis” being pretty limited, even though it flatters me by describing my analysis as “rigorous”.[1] It’s not a positive sign that this news update and jobs listing is flagged as having particularly high “epistemic quality”
That said, I enjoyed the ‘egregore’ section bits about the “ritualistic displays of humility”, “elevating developers to a priesthood” and “compulsive need to model, quantify, and systematize everything, even with acknowledged high uncertainty and speculative inputs ⇒ illusion of rigor”.[2] Gemini seems to have absorbed the standard critiques of EA and rationalism better than many humans, including humans writing criticisms of and defences of those belief systems. It’s also not wrong.
Its poetry is still Vogon-level though.
For a start I think most people reading our posts would conclude that Vasco and I disagree on far too much to be considered “intellectually aligned”, even if we do it mostly politely by drilling down to the details of each others’ arguments
OK, if my rigour is illusory maybe that complement is more backhanded than I thought :)
It’s a good callout that potentially ‘holistic epistemic quality’ could possibly confuse models. I wanted to as concisely as possible articulate the quality we were interested in, but maybe not the best choice. But which result would you like to see replicated (or not) with a more natural prompt?
It could be a fun experiment to see how different wording affects the correlation with karma, for example. Would you get the result if you asked it to evaluate “logical and empirical rigor”? What if you asked about simpler things like how “well structured” or “articulate” the articles are? You could maybe get a sense for what aspects of writing are valued in the forum.
On the Egregore / religion part
I agree! Egregore is occult so definitely religion-adjacent. But I also believe EA as a concept/community is religion-adjacent (not necessarily in a bad way).
It’s a community, ethical belief, there is suggested tithing, sense of purpose/meaning, etc.
Funny—I don’t think it feels written by a critic, but definitely a pointed outsider (somewhat neutral?) 3rd party analysis.
I do expect the Egregore report to trigger some people (in good and bad ways, see the comment below about feeling heard). The purpose is to make things known that are pushed into the shadows, the good and the bad. Usually things are pushed into the shadows because people don’t want to or can’t talk about them openly.
I’ll let @alejbo take this question—I think it’s a good one
Although at the high level I somewhat disagree with “I don’t think chatbots are very good at epistemology”, my guess would be they’re better than you think, but I agree not perfect or amazing
But as you admit, most humans aren’t either, so it’s already a low bar