Without having the data, it seems the controversy graph could be driven substantially by posts which get exactly zero downvotes.
Almost all posts get at least one vote (magnitude >= 1), and balance>=0, so magnitude^balance >=1. Since the controversy graph goes below 1, I assume you are including the handling which sets controversy to zero if there are zero downvotes, per the Reddit code you linked to.
e.g. if a post has 50 upvotes: 0 downvotes --> controversy 0 (not 1.00) 1 downvote --> controversy 1.08 2 downvotes --> controversy 1.17 10 downvotes --> controversy 2.27 so a lot of the action is in whether a post gets 0 downvotes or at least 1, and we know a lot of posts get 0 downvotes because the graph is often below 1.
If this is a major contributor, the spikes would look different if you run the same calculation without the handling (or, equivalently, with the override being to 1 instead of 0). This discontinuity also makes me suspect that Reddit uses this calculation for ordering only, not as a cardinal measure—or that zero downvotes is an edge case on Reddit!
Thanks for this.
Without having the data, it seems the controversy graph could be driven substantially by posts which get exactly zero downvotes.
Almost all posts get at least one vote (magnitude >= 1), and balance>=0, so magnitude^balance >=1. Since the controversy graph goes below 1, I assume you are including the handling which sets controversy to zero if there are zero downvotes, per the Reddit code you linked to.
e.g. if a post has 50 upvotes:
0 downvotes --> controversy 0 (not 1.00)
1 downvote --> controversy 1.08
2 downvotes --> controversy 1.17
10 downvotes --> controversy 2.27
so a lot of the action is in whether a post gets 0 downvotes or at least 1, and we know a lot of posts get 0 downvotes because the graph is often below 1.
If this is a major contributor, the spikes would look different if you run the same calculation without the handling (or, equivalently, with the override being to 1 instead of 0). This discontinuity also makes me suspect that Reddit uses this calculation for ordering only, not as a cardinal measure—or that zero downvotes is an edge case on Reddit!
That’s a valid point. Here’s the controversy graph if you exclude all posts that don’t have any downvotes:
Overall trend seems to be similar though. And it makes me even more interested what happened in 2018 that sparked so much controversy^^