Improving Karma: $8mn of possible value (my estimate)

Karma Police

[Epistemic confidence: 80%: I may offer specific bets at the bottom]

Summary

1 min video summary
The Forum is a key community space
Regular Forum users should have opportunity to know the most valuable EA information
If they don’t, it is probably because someone didn’t write it the Forum didn’t surface it
I did some simple analysis and the Forum karma system does okay
Even with this, I estimate that if we change the way that the Forum ranks posts, that would create -$9mn and $250mn 90% CI (15 min video explanation here)

The variance is wide here, but maybe we can become more confident that the gains are postive

We should test alternative ranking systems such as
- You choose whose karma you listen to
- Forecasting the result of the next Forum Review (watch this space)
My sense is that the Forum team know this, but until I read something about it, I think surfacing the size of the problem is postive

The value of the EA Forum

The EA Forum is and should be a sufficient way to learn about EA. Those who scroll through the main feed should expect to find the most useful EA-related information and community discussions without having to look too hard. There are other ways that people learn about EA (for example, by reading foundational texts or listening to the 80,000 Hours podcast); but if you just read the top Forum posts, you ought to be up to speed with current discussions.

As such, the Forum is an important space that should be sorted and curated in a way that boosts high-value content. There might be other ways that we want to sort content — for example, by how much people enjoy it. But the most relevant ranking is ‘how much does this post contribute to making the world better?’ ^[1]

There is room for other rankings—we could rank posts by how funny they are. I’m happy for such a ranking to exist. But the primary goal of this community is to use resources to help the most and so the Forum should primarily rank things according to that metric. This has been discussed before, but I wanted to add some analysis and concrete suggestions^[2].

Theory: The Forum ranks too much by consensus and not enough by value

My theory is as follows:

Everyone gets to upvote or downvote posts
Most of us are not experts in a given topic
The posts which receive many upvotes are those that the median EA can understand and test
The are likely to be about broad concepts like community, spending, giving or EAGs
They are unlikely to be specific, valuable and hard to verify criticisms
The Forum shows the incorrect balance of the two
This leads to us creating and consuming less valuable content than we might otherwise

More broadly, I think we underrate how valuable it can be to lower the barriers to community education. When I was a Christian we would have called this “equipping”—ensuring that the church community is well prepared for the tasks it needs to do. Personally, I understand what I am to do in forecasting and twitter spaces, but as an EA in general, I struggle to understand my job/place/role less well and I guess others feel the same. I think improving the forum algorithm is low hanging fruit (another example of this is the fact that there aren’t that many high-quality brief summaries of EA content—see this article). I discuss community roles more broadly in this article, What are we for?.

Before we test my theory, this is another possible response: Karma doesn’t measure quality and nor should it. Karma measures popularity, and that’s fine actually, because the purpose of karma isn’t to measure quality but to nudge people towards reading some posts and away from reading others. And random forum users will get more out of an accessible post than a niche or technical post, even a high-quality one.

My response here is that I think the crux here is whether we can do better at providing people with things that they themselves will look back on and consider to have been valuable. It’s not a case of everyone reading OpenPhil grants, but rather thinking about which key pieces of content are worth everyone’s time.^[3]

Testing the theory

First, I think the Forum team knows much of what I’m going to say and probably either agrees or has good reasons to disagree. They are struggling to hire and developers are at a particular premium given our views on AI. I still think surfacing the magnitude of the problem is the correct call.

If the theory (that the ranking algorithm leaves lots of room for improvement) were wrong, we’d expect to see:

High karma posts well-correlated with some agreed ranking of “the best posts”. We’re gonna use the Decade Review
Few posts that are either very overrated or very underrated by karma

My work here is mediocre and may have errors. If you want to improve on it, I’m happy to include yours and split any prizes for this piece.

First, let’s look at the Decade Review. The Review took place about 6 months ago. The aim was for people to choose which content had been most valuable to them. Any forum user could give posts from −9 to + 9 review karma. Then posts were ranked according to (some function of) these scores. This is a good way to get a good retrospective ranking, though it still likely undervalues posts which delivered huge value to small groups within EA.

If the Forum is good at ranking things, we should see the karma ranking and the Decade review ranking being similar.

The column on the left shows top-ranked posts in the Decade Review. The column on the right shows the top-ranked posts on the Forum according to “Top (inflation-adjusted)”. Because the forum used to be smaller, older good posts have less karma. This was the Forum’s way of accounting for that. For that reason, the column on the right isn’t in strict numerical order. The arrows show where the top posts in the Decade Review fall in the Karma rankings.

The Decade Review has (at times) years of hindsight for us to think about which items were valuable. Karma has much less time for reflection. I think it does okay here.

I also plotted the two rankings against one another. The x-axis is Decade Review, the y-axis is Top (inflation-adjusted) Forum karma

I scraped the first 150 articles by Decade review and the first 371 by articles by karma. Any Decade Review article that wasn’t in the 371 top-rated Forum articles I gave a score of 1000. If you want to improve, my data is here^[4]

The correlation (CORREL in google sheets) here is .47 which I’m told is pretty good.

On the other hand, over half of the entries in Decade Review didn’t show up in the first 371 articles by Karma.

What’s more, the Decade Review isn’t a perfect metric for which posts are important ex-post. There were a lot of articles to read and I’m sure that content got missed. Influential EA content such as Doing Good Better, the Precipice and popular 80,000 Hours podcasts were missing, which suggests that the aim of the Decade Review was not to decide ‘what is the most valuable EA content.’

This seems like a passing grade though I think it warrants more investigation.

An alternate criticism is that the Forum does not merely rank incorrectly, it doesn’t rank quantitatively. You only have two choices, upvote or strong upvote, but some posts are 1000x more valuable than others. How do we signal that and how does the Forum signal that back to us?^[5]^[6]^[7]

Ranking affects incentives

Some of the top articles on the Forum were written by people as part of their jobs. But many weren’t. These people could have easily decided to just not write the articles that they did.

Let’s take a look at Nuño Sempere’s piece ‘A Critical Review of Open Philanthropy’s Bet On Criminal Justice Reform’. (Full disclosure, Nuño is my friend). I don’t claim that this piece was ranked badly. While it is Nuño’s job to work on estimation, he isn’t an expert on Criminal Justice Reform. The article gives an in-depth evaluation of Open Philanthropy’s $200m in criminal justice spending. If Nuño had been incentivised to write this piece two years earlier than he did, perhaps Open Philanthropy would have spent less on criminal justice—I would guess between $1 million and $30 million less. That would have been a pretty significant amount of impact achieved by writing a Forum post.

Other articles like this don’t even get the clout this one did. Take a look at Daniel_Eth’s ‘Some potential lessons from Carrick’s Congressional bid’, which explores the risks and potential pitfalls associated with EAs running for office. How much value are we losing by not incentivising people to a) write this kind of content and b) read it? If your response is that there is a better way to incentivise this kind of content, great, but here is a tractable option with good befits compared to costs.

I made a squiggle model^[8] to try and estimate the value of improving the forum.

The model gives a median benefit of $8mn in better allocated funding with a 5% chance of costing $9mn or more and a 5% chance of benefitting $250mn or more

Here is a link but I’ve also put the code here (and I explain it below). 

//critical stuff
pc_to(a,b) = truncate(a to b,0,1)
//ranking improvmenet can close what % of distance between current and perfection
chance_ranked_correctly_before = pc_to(.4,.95) // for what it's worth ranking articles 40% might be really good
ranking_error = truncate(.1 to 1.1,0,2) //this needs to get smaller when better
timing_and_contribution_sensitivity = pc_to(.01, .5) //how sensitive are EAs writing articles in time to forum ranking? it might 10% to 10x improve

//stuff that's the same before and after

criminal_justice_error_spending = 0 to 30000000 //let's imagine the is the upper bound for an non-expert article on the forum
chance_change_given_article = pc_to(.05, .5)
//there is some chance that the forum is ranking perfectly
//the article needs to be written in time
value_per_nonexpert_article(article_value_upper_bound, chance_article_written_in_time, chance_change_given_article) = 0 to truncateLeft(article_value_upper_bound * chance_change_given_article * chance_article_written_in_time,0.01)
// then the article needs to be written at all
number_of_EAs = 2000 to 20000

//how this affects percentages is non-trivial
percentages_improving_by_percent(percentage, change, sensitivity) = truncate(percentage + (1 - percentage) * change * sensitivity,0,1)

//before
percent_nonexpert_EAs_contribute_before = pc_to(.001, .02)
chance_article_written_in_time_before = pc_to(.01, .1)

value_per_nonexpert_article_before = value_per_nonexpert_article(criminal_justice_error_spending, chance_article_written_in_time_before, chance_change_given_article)
eA_nonexpert_value_before = number_of_EAs*percent_nonexpert_EAs_contribute_before*value_per_nonexpert_article_before

//after

percent_nonexpert_EAs_contribute_after = percentages_improving_by_percent(percent_nonexpert_EAs_contribute_before, ranking_error, timing_and_contribution_sensitivity)
chance_article_written_in_time_after = percentages_improving_by_percent(chance_article_written_in_time_before, ranking_error, timing_and_contribution_sensitivity)

value_per_nonexpert_article_after = value_per_nonexpert_article(criminal_justice_error_spending, chance_article_written_in_time_after, chance_change_given_article)
eA_nonexpert_value_after = number_of_EAs*percent_nonexpert_EAs_contribute_after*value_per_nonexpert_article_after

//value creation possible

ranking_value_possible = eA_nonexpert_value_after - eA_nonexpert_value_before

If you are interested I recommend this video explanation (15 min). Run the model in your browser.

When I give 2 values I’m giving 90% confidence eg 90% of the time I think the value is more than a and less than b. Alternatively, there is a 5% chance that the value is below a and a 5% chance it is above b.

Roughly, the model works like this:

A great EA non-expert article provides value of somewhere between, 0 and $30000000
There are somewhere between 2000 and 20000 EAs
Between .1% and 2% of EAs contribute outside their areas of expertise
There is a .1% to 1% chance that they will contribute a really valuable article in time for a change to happen.
If 0 is random and 1 is perfection, the Forum ranking is currently somewhere between .4 and .95
If we try and improve the ranking we might make it 10% worse or move it 90% closer to perfection
The Forum ranking has somewhere between a 1% and a 50% effect on whether people write non-expert articles and wether they write them in time.

All models a wrong, but some are useful. Ideally you will criticise my model and we can make it better.

Some comments:

The 90% confidence interval is −9.2M to 250M. The median is 9.3M
The reason 5% negative is because maybe changing the Forum’s ranking system will make it worse
The 95% might seem high but if the community corrects a few key decisions each year, we could easily see that much value
The model is very sensitive to the effects of karma on non-expert contributions. Perhaps you have different intuitions here, but for me, karma matters a lot!
The downside risk shouldn’t make us give up here. I suggest we can do more work to become more confident of a value in this range.
The model is about shifting money, because that was something that was easy for me to think about. That money might be in global health, or longtermist giving

Solutions

As I say above, I would prefer that one of these suggestions replaced karma as the central method of ranking; but if you want a more defensible view, imagine I’m arguing for alternative options.

Here are a few possible ranking methods, scored:

‘Core EA’ rankings

CEA denotes ~50 accounts as “core EAs” (elites) and allows you to look at the rankings from these accounts alone. Honestly, this seems like a perfectly serviceable solution to me. Some commenters told me to change the word “elites” because it had bad connotations. I cannot think of a more accurate word and I mean it positively.

Ease: ¹⁰⁄₁₀
Close to ideal ranking: ⁷⁄₁₀

Forecast the Decade Review (or if we do it more often):

Manifold Markets is building a system to forecast the results of EA contests (here is the Open Phil cause priortisation contest) and then allow people to bet on them. We then use the forecasts to rank the posts. They could easily do the same for the EA Forum review (ideally it wouldn’t be a decade away^[9]). This would be easy for the Forum to integrate (they just need to take a feed from the API) and would get very close to the decade review, which I’d argue is one of the best things we have.

Ease ¹⁰⁄₁₀
Close to ideal ranking ⁹⁄₁₀

Reduce community posts (the forum already does this)

The forum puts a negative modifier on all posts with the “community” tag. This was a good move but has bad incentives—people try to avoid receiving the community tag

Ease: ¹⁰⁄₁₀ (done)
Close to ideal ranking ⁵⁄₁₀

ML algorithm

Ben West found my draft and told me he’s excited about this. I am excited about it too. That said, I think it’s probably harder than several other methods here. And I’m not sure it will give a higher information feed. Feel free to disagree

Ease: ⁰⁄₁₀
Close to ideal ranking 6.5/10

Create your own karma

Choose a set of accounts, only see karma generated by their upvotes and downvotes. This probably produces a better ranking, but seems like a lot of work.

Ease: ²⁄₁₀
Close to ideal ranking: ⁸⁄₁₀

“Most-followed” karma

As above, but with an additional step where the top 5% of accounts that people follow are denoted as “most-followed”. This is like ‘create your own karma’, but now we can rank by some community consensus on who elites are. I think this will turn out pretty similar to CEA’s choice of elites, but at least I can pick my own elites.

Ease ²⁄₁₀
Close to ideal ranking 8.5/10

Subforums

Again, I hear that some of the Forum team are investigating this one. I think my concern would be that it would fracture the community—I want to see what other people are working on. But maybe it would creat higher fidelity chat in those areas. Probably net postive

Ease ⁰⁄₁₀
Close to ideal ranking 6 to ⁹⁄₁₀

Impact markets

Everyone has said this, but the right way to do this is to allow people to purchase the impact from everything and then trade the certificates. This would be a huge pain to implement but if the value is high enough, it would be worth it. I sense that it is, but I’m not suggesting it here.

Ease ⁰⁄₁₀
Close to ideal ranking ¹⁰⁄₁₀

Potential criticisms of these options

‘I am unconvinced that elite choices or forecasting are a good way to predict value’

I argue here that elite choices are a good way to predict value. If you don’t think public markets to forecast value (with loans to incentivise long term betting) are a good way to forecast things, I’d be interested to hear why. If your reason is that this creates a high barrier to voting on posts, see below.

‘I am concerned that I will trust a voting system less if I understand it less’

Fair point. But while I think that these suggestions are better than the current karma system, and they should be the main systems, perhaps new users shouldn’t have them initially.

‘It’s expensive and difficult to implement’

I think the value we could generate from implementing these changes is worth the initial headache of fixing the current system. My model above assumes that the bestorum posts help reallocate 10s of millions of dollars. The value created here is at median $10 million a year. This is worth the expense.

Having worked with software developers, I can say that having an alternate ranking (alt button) seems like a task for more than half a PM and half a developer. This would cost about $200k a year.

‘This would be non-transparent and create barriers to voting’

If we cared about making this simple to understand, I think we could. Likewise this is one of the key EA community spaces (alongside the in-person community, 80k and Twitter). We should take finding ways to improve it seriously.

This could lead to some bad outcomes

Please comment on what they are. If you can’t, then that argument could apply to anything. Likewise, a new system can first exist alongside the current one. This way, we’d get to see which system people prefer, and adjust for any unexpected effects.

Conclusion: let’s sort this out

We could relatively easily implement a ranking system of CEA’s top 50 EAs or a prediction market for each of the Yearly and Decade Review. We could see how it compared to the current karma system and then replace karma, remove the changes, or allow people to have some mix of the both.

It seems to me that we are losing maybe $10 milliona year from not ranking content properly, and I feel that we generally don’t talk about how great a loss this is. I get that it’s hard to hire developers. I get that it’s tiresome to overhaul the current system. But I want to make sure that we’re all on the same page about what we’re missing out on and what the solutions are. This seems pretty fixable if we’re willing to put the effort in.

I originally spoke this to Rob Long who transcribed it (a fun afternoon with a friend is to do this back and forth) then Rachel Frankwick and I wrote it up and Amber Ace copy-edited it. Then I added a load more spelling errors back in. It takes a village.

This post is part of a test I’m doing on works-in-progress.

Appendix: Post publication edits

Appendix: Bets about this article

Appendix: Broader thoughts

As we grow as a community, our systems for gathering what we consider to be the most important content are going to be more important. We want a way for segments of the community to surface their disagreement early in order that we can deal with it. Currently I think we don’t take the ranking of the forum (or its dissemination) nearly seriously enough.

Appendix: other problems with the forum as a repository

The forum presents high barriers to comment, by not making it easy to write in the right style. I’d like writing an article to have a process it leads users through
It isn’t easy to pull information from the forum. As I wrote here the forum should have a much wider range of email, podcast, RSS options
It doesn’t contain everything. Every EA and EA adjacent book should have a link post.
It has no methods of synthesis. As I wrote here there should be a way for the community to aggregate content. The wiki isn’t it because:
It’s hard to find wiki pages from the front page
The forum team want the wiki pages to be more like tag explainers than community summaries

^
Do I need to spend more time on this?
^
Also, I wish that I could combine my version with Arepo’s. One day
^
I also think the ranking isn’t quantitative and incentivises new things more than value creation
^
If you want a longer explanation, there are two rankings—one good one, one that we are testing. I got both sets and matched them. If something appears highly in the good ranking but not in the other ranking, that’s a point against the other ranking. To try and show that, I pretending it had been given a very poor rank (1000th). Feel free to suggest better methods.
^
I’d love suggestions for what a more quantative solution might look like. Currently all I have is impact markets.
^
It’s frustrating that writing footnotes takes you out of context and that you can’t search forum pages to link.
^
There is room for a discussion of the types of post that I think are misranked but I couldn’t think of how to do that
^
Squiggle is really easy to use. Go on, click it.
^
though even then Manifold have recently done some work on long term bets

This incident is probably bad enough that we should probably be throwing an error here. I added an issue here.