I think this is a reasonable exercise in the abstract, and could help people more easily communicate how they approach different forms of evidence.
However, if actually implemented practically, I think it would be too easily gamed to be of any use. Using your system as an example, if person A has a mathematical proof of X (20 points), but person B makes 11 clever tweets suggesting not x (2*11 = 22 points), then person B “wins” the argument.
The other problem I see is that there’s no modifier here for “actually being correct”. If person A presents a correct mathematical proof for X, and person B presents a mathematical proof for not X that is actually false, do they both get 20 points?
The other problem I see is that there’s no modifier here for “actually being correct”. If person A presents a correct mathematical proof for X, and person B presents a mathematical proof for not X that is actually false, do they both get 20 points?
If you check the proofs yourself and you can see that one is obviously wrong and the other is not obviously (to you) wrong then you only give the not-obviously-wrong one 20 points. If you can’t tell which is wrong then they cancel out. If a professor then comes along and says “that proof is wrong, because [reason that you can’t understand], but the other one is OK” then epistemically it boils down to “tenured academic in field − 6 points” for the proof that the professor says is OK.
This equation was definitely meant as a rough initial guide. I think it’s still usable as a heuristic—i.e. most of the time, you pay attention to higher point evidence than lower point evidence. It’s meant to be better than other heuristics, not a complete solution.
if person A has a mathematical proof of X (20 points), but person B makes 11 clever tweets suggesting not x (2*11 = 22 points), then person B “wins” the argument.
I didn’t get into adding evidence, for this reason. I think it’s very clear that things are not linearly-additive like that. I think that an aggregation function would take into account the similarity of different sorts of content (two tweets that are clever, but near-identical), but also the similarity of the types of content (it’s better to have a diverse set of different kinds of content, like a meta-study and “businesses commonly use it”). There would be quick leveling off—so that 50 tweets would have the evidence strength of something like 2 to 5 or so.
The other problem I see is that there’s no modifier here for “actually being correct”.
I thought this was fairly obvious to add. Again, I think this would need a lot more complexity, depending on how much you actually rely on it.
I think this is a reasonable exercise in the abstract, and could help people more easily communicate how they approach different forms of evidence.
However, if actually implemented practically, I think it would be too easily gamed to be of any use. Using your system as an example, if person A has a mathematical proof of X (20 points), but person B makes 11 clever tweets suggesting not x (2*11 = 22 points), then person B “wins” the argument.
The other problem I see is that there’s no modifier here for “actually being correct”. If person A presents a correct mathematical proof for X, and person B presents a mathematical proof for not X that is actually false, do they both get 20 points?
If you check the proofs yourself and you can see that one is obviously wrong and the other is not obviously (to you) wrong then you only give the not-obviously-wrong one 20 points. If you can’t tell which is wrong then they cancel out. If a professor then comes along and says “that proof is wrong, because [reason that you can’t understand], but the other one is OK” then epistemically it boils down to “tenured academic in field − 6 points” for the proof that the professor says is OK.
This equation was definitely meant as a rough initial guide. I think it’s still usable as a heuristic—i.e. most of the time, you pay attention to higher point evidence than lower point evidence. It’s meant to be better than other heuristics, not a complete solution.
I didn’t get into adding evidence, for this reason. I think it’s very clear that things are not linearly-additive like that. I think that an aggregation function would take into account the similarity of different sorts of content (two tweets that are clever, but near-identical), but also the similarity of the types of content (it’s better to have a diverse set of different kinds of content, like a meta-study and “businesses commonly use it”). There would be quick leveling off—so that 50 tweets would have the evidence strength of something like 2 to 5 or so.
I thought this was fairly obvious to add. Again, I think this would need a lot more complexity, depending on how much you actually rely on it.