I can see that you’ve put a lot of effort into this, and I think that if there were some way of reliably automating it I’d say “go for it.” And perhaps there’s just something I’m missing about all this!
But I’ll be entirely honest: this feels entirely overwhelming and overcomplicated relative to the value that it might provide, especially since it tries going for 200% implementation before we’ve even tried the prototypical 20% version: 7 vectors with 25 dimensions plus another vector with “68 values”. That’s an enormous ask.
And it’s for the purpose of enabling “users to know where in the EA landscape” a post fits at a glance? 1) I don’t think it would accomplish that for most people; you’d still have to reason through where it fits in by thinking in your octo-vectorial space. 2) Does that really matter even if you do achieve it? 3) Is it not already possible to roughly understand where it fits—at least to the extent that such understanding would be valuable—by looking at the title, author, and tags? 4) I don’t think that objective rating will be as reliable/consistent as you hope—assuming people even try to provide all the metrics.
In contrast, I was expecting this article to talk about something like “the option to see narrower ratings such as ‘how interesting was this,’ ‘how clear was it’, ‘how valuable was it to the level that I understood it,’ etc.”
That seems plausibly implementable and still directly valuable for users.
Sure, what about 20% version 1) encouraging users to write collections and summaries of posts that they recommend—then, if I meet someone whose work or perspectives I like or would like to respond to it can be easier to learn and contribute if there is a summary and 2) tags under Longtermism: Human survival, Human agency, Human wellbeing, Sentience wellbeing, and Non-wellbeing objectives, and 3) ‘red’ tags which show in grey Repugnant Conclusion and Sadistic Conclusion?
Responding to your points:
1) Steep learning curve? Human minds are faster than you think?
2) No, by the time I achieve it posts will avoid scoring poorly on these metrics so it does not matter what the pictures are at any post. It is a guidance on how to write good posts, kind of. Again, human mind—can synthesize from these categories and optimize for an overall great content, considering complementarity with other post / ability to score high more uniquely? Otherwise, users may optimize for attention …
3) Not the title—you cannot know if it is for example writing trying to catch readers and provides valuable solution- (or problem- or otherwise valuable) oriented content or a neutral title where the content motivates impulsive reasoning, for example. The tags—also not really, if something is tagged as ‘Community infrastructure,’ for example, you are not sure if it is a scale-up write up, innovation, problem, solution, inspiration for synthesis, directive recommendation, etc. If you are specifically looking for posts with this ‘spirit’ of ‘I employed emotional reasoning to synthesize problems and am offering solutions that I am quite certain about in the long term and are inclusive in wellbeing,’ you cannot use tags. Can you look at the author? Not really either, because there are many people who you do not know and who may be presenting certain public-facing narratives, also due to otherwise their posts being scored low? But sure, somewhat you can just glance at the preview and see what the post is about.
4) Hm, yes that is a real risk: if something becomes defined as ‘wellness,’ for instance, by the community but then entities are suffering it is challenging to change it (although I actually paid attention to this in the math which is that users have to continuously reallocate scarce points) - so, another example, posts with high ‘Agency of some humans’ score that later are discovered that are actually limiting human agency can decrease users’ ability to point out that these limit agency, because ‘no the bar is high so they safeguard it.’ Even thinking about scoring these categories can be valuable and the overall picture can be quite informative?
What do you mean? Like something that enables the users to become better writers by seeing an (imperfect) score and normalizes the judging of posts based on conformity to Western standard of writing, plus motivates rejecting some content based on ‘did not go though’ - no, I think this is not a good idea users will be optimizing for conformity due to fear of being publicly shamed and will limit creativity and innovation but something like ‘Is there a concise and comprehensive summary?’ ‘How I felt reading it?’ ‘Did I read it or skim it?’ ‘Who should read it (what level of expertise in what field)?’ can be less judging the author according to arbitrary standards and more motivating readers to engage with the authors to whom they can provide valuable feedback while letting others know how is normal to engage with the post.
I’m not sure I follow how your 20% version relates to original post/proposal about categorized voting: summaries seem reasonable/good but unrelated, and the two points about tagging just seem to be “it would be nice if we used/had more tags.”
There are a lot of other points/responses I could address, but I think that it’s probably better to step back and summarize my big-picture concerns rather than continue narrowing in:
Time: How much time would this system require on the part of users?
Quality: At the estimated time input, will the quality/consistency reach a point where the system can actually be reliably used to the extent that it saves time/improves understanding?
I think the answer to (1) is “probably a lot”:
Suppose there are 10 relevant posts per day on average.
Suppose that each of the 25 dimensions requires an average minimum of ~1 minute of thought to make a single passable evaluation (especially before users become familiar with doing this, and then even once they become familiar they “have to continuously reallocate scarce points”). We’ll just ignore the eighth vector.
This produces an estimate of ~250 minutes (>4 hours) per day for a single perspective on each article, on average.
It seems plausible that for the metric to have much value, it probably warrants at least 2–3 perspectives per article, effectively >doubling the time commitment for it to be valuable.
I’m not going to go much deeper to cover (2), as I think the issue is fairly understandable, but I will just highlight that the time and quality are clearly proportional, and so skimping on time will make the quality suffer.
Ultimately, I do not see this metric being sufficiently valuable to be worth a daily commitment of >5 hours of EA time; I would much rather people spend that time creating new posts, commenting on existing posts, etc.
1. Anyone who would opt in to switch or add voting matrices, about 30 minutes to learn on their favorite post and then similarly to one-score voting, times how many categories/subcategories they want to vote on (if you intuitively assign an upvote, you would just intuitively assign maybe 3 upvotes by clicking on images).
2. Yes, depending on the learning curve, and assuming people who would spend too much time learning would not opt in, this would be sufficiently accurate and quick. This would also provide aggregate data—however, it may be easier if experts who have seen a lot of posts make estimates. So, assuming that one to a few humans keeps awareness of posts and can assess what a person may like, then someone like an EA Librarian can recommend posts an individual would best benefit from. The recommendations can be of higher quality and more efficient. So, you may be right, the quality/time ratio may be much worse than the best alternative.
Oh, yes, if there is a moderator who would have to be digitizing their perspective—plus, would probably not capture the complexity of the post by these categories—the human brain is much better in this—a reminder note can function better. But, if you upvote only one post per week by clicking once and you would have to upvote one post per week by clicking 4x4 times, on average, it is still ok. Yes, the reallocation of the points—users would be so affected they would even stop paying attention to FB or other media since there are these demands on upvoting .. Yes, at lest 10 similar perspectives can be taken as saturation, unless new perspectives emerge?
Hm, I guess you are not so much about intuitive understanding of these infographics—in general, when persons develop something then it is much easier for them to orient in the summary (including an image) - so, somehow everyone would need to be involved in the development of scoring metrics.
I would be much rather if people regularly pause their posting and commenting to reflect where their actions are leading, why they do what they do, if they are missing something, if there are solutions already developed, what are some problems, who is liking what in the community, etc. This can improve epistemics and cooperation efficiency.
I may agree with you that categorized scoring metrics are not the only way to achieve this objective. There may be much better ways, such as expert recommendations of posts and cooperation opportunities.
I can see that you’ve put a lot of effort into this, and I think that if there were some way of reliably automating it I’d say “go for it.” And perhaps there’s just something I’m missing about all this!
But I’ll be entirely honest: this feels entirely overwhelming and overcomplicated relative to the value that it might provide, especially since it tries going for 200% implementation before we’ve even tried the prototypical 20% version: 7 vectors with 25 dimensions plus another vector with “68 values”. That’s an enormous ask.
And it’s for the purpose of enabling “users to know where in the EA landscape” a post fits at a glance? 1) I don’t think it would accomplish that for most people; you’d still have to reason through where it fits in by thinking in your octo-vectorial space. 2) Does that really matter even if you do achieve it? 3) Is it not already possible to roughly understand where it fits—at least to the extent that such understanding would be valuable—by looking at the title, author, and tags? 4) I don’t think that objective rating will be as reliable/consistent as you hope—assuming people even try to provide all the metrics.
In contrast, I was expecting this article to talk about something like “the option to see narrower ratings such as ‘how interesting was this,’ ‘how clear was it’, ‘how valuable was it to the level that I understood it,’ etc.” That seems plausibly implementable and still directly valuable for users.
Sure, what about 20% version 1) encouraging users to write collections and summaries of posts that they recommend—then, if I meet someone whose work or perspectives I like or would like to respond to it can be easier to learn and contribute if there is a summary and 2) tags under Longtermism: Human survival, Human agency, Human wellbeing, Sentience wellbeing, and Non-wellbeing objectives, and 3) ‘red’ tags which show in grey Repugnant Conclusion and Sadistic Conclusion?
Responding to your points:
1) Steep learning curve? Human minds are faster than you think?
2) No, by the time I achieve it posts will avoid scoring poorly on these metrics so it does not matter what the pictures are at any post. It is a guidance on how to write good posts, kind of. Again, human mind—can synthesize from these categories and optimize for an overall great content, considering complementarity with other post / ability to score high more uniquely? Otherwise, users may optimize for attention …
3) Not the title—you cannot know if it is for example writing trying to catch readers and provides valuable solution- (or problem- or otherwise valuable) oriented content or a neutral title where the content motivates impulsive reasoning, for example. The tags—also not really, if something is tagged as ‘Community infrastructure,’ for example, you are not sure if it is a scale-up write up, innovation, problem, solution, inspiration for synthesis, directive recommendation, etc. If you are specifically looking for posts with this ‘spirit’ of ‘I employed emotional reasoning to synthesize problems and am offering solutions that I am quite certain about in the long term and are inclusive in wellbeing,’ you cannot use tags. Can you look at the author? Not really either, because there are many people who you do not know and who may be presenting certain public-facing narratives, also due to otherwise their posts being scored low? But sure, somewhat you can just glance at the preview and see what the post is about.
4) Hm, yes that is a real risk: if something becomes defined as ‘wellness,’ for instance, by the community but then entities are suffering it is challenging to change it (although I actually paid attention to this in the math which is that users have to continuously reallocate scarce points) - so, another example, posts with high ‘Agency of some humans’ score that later are discovered that are actually limiting human agency can decrease users’ ability to point out that these limit agency, because ‘no the bar is high so they safeguard it.’ Even thinking about scoring these categories can be valuable and the overall picture can be quite informative?
What do you mean? Like something that enables the users to become better writers by seeing an (imperfect) score and normalizes the judging of posts based on conformity to Western standard of writing, plus motivates rejecting some content based on ‘did not go though’ - no, I think this is not a good idea users will be optimizing for conformity due to fear of being publicly shamed and will limit creativity and innovation but something like ‘Is there a concise and comprehensive summary?’ ‘How I felt reading it?’ ‘Did I read it or skim it?’ ‘Who should read it (what level of expertise in what field)?’ can be less judging the author according to arbitrary standards and more motivating readers to engage with the authors to whom they can provide valuable feedback while letting others know how is normal to engage with the post.
I’m not sure I follow how your 20% version relates to original post/proposal about categorized voting: summaries seem reasonable/good but unrelated, and the two points about tagging just seem to be “it would be nice if we used/had more tags.”
There are a lot of other points/responses I could address, but I think that it’s probably better to step back and summarize my big-picture concerns rather than continue narrowing in:
Time: How much time would this system require on the part of users?
Quality: At the estimated time input, will the quality/consistency reach a point where the system can actually be reliably used to the extent that it saves time/improves understanding?
I think the answer to (1) is “probably a lot”:
Suppose there are 10 relevant posts per day on average.
Suppose that each of the 25 dimensions requires an average minimum of ~1 minute of thought to make a single passable evaluation (especially before users become familiar with doing this, and then even once they become familiar they “have to continuously reallocate scarce points”). We’ll just ignore the eighth vector.
This produces an estimate of ~250 minutes (>4 hours) per day for a single perspective on each article, on average.
It seems plausible that for the metric to have much value, it probably warrants at least 2–3 perspectives per article, effectively >doubling the time commitment for it to be valuable.
I’m not going to go much deeper to cover (2), as I think the issue is fairly understandable, but I will just highlight that the time and quality are clearly proportional, and so skimping on time will make the quality suffer.
Ultimately, I do not see this metric being sufficiently valuable to be worth a daily commitment of >5 hours of EA time; I would much rather people spend that time creating new posts, commenting on existing posts, etc.
Hm, ok, maybe just more tags is the solution.
1. Anyone who would opt in to switch or add voting matrices, about 30 minutes to learn on their favorite post and then similarly to one-score voting, times how many categories/subcategories they want to vote on (if you intuitively assign an upvote, you would just intuitively assign maybe 3 upvotes by clicking on images).
2. Yes, depending on the learning curve, and assuming people who would spend too much time learning would not opt in, this would be sufficiently accurate and quick. This would also provide aggregate data—however, it may be easier if experts who have seen a lot of posts make estimates. So, assuming that one to a few humans keeps awareness of posts and can assess what a person may like, then someone like an EA Librarian can recommend posts an individual would best benefit from. The recommendations can be of higher quality and more efficient. So, you may be right, the quality/time ratio may be much worse than the best alternative.
Oh, yes, if there is a moderator who would have to be digitizing their perspective—plus, would probably not capture the complexity of the post by these categories—the human brain is much better in this—a reminder note can function better. But, if you upvote only one post per week by clicking once and you would have to upvote one post per week by clicking 4x4 times, on average, it is still ok. Yes, the reallocation of the points—users would be so affected they would even stop paying attention to FB or other media since there are these demands on upvoting .. Yes, at lest 10 similar perspectives can be taken as saturation, unless new perspectives emerge?
Hm, I guess you are not so much about intuitive understanding of these infographics—in general, when persons develop something then it is much easier for them to orient in the summary (including an image) - so, somehow everyone would need to be involved in the development of scoring metrics.
I would be much rather if people regularly pause their posting and commenting to reflect where their actions are leading, why they do what they do, if they are missing something, if there are solutions already developed, what are some problems, who is liking what in the community, etc. This can improve epistemics and cooperation efficiency.
I may agree with you that categorized scoring metrics are not the only way to achieve this objective. There may be much better ways, such as expert recommendations of posts and cooperation opportunities.
Thank you very much for the reply.