The greatest downside, though, is precision: you lose half the information if you round percents to per-tenths.
Could you explain what you mean by this?
I’d have thought that, in many cases, a estimate to the nearest 10% has far more than half the information contained in an estimate to the nearest percent. E.g., let’s say I start out with no idea what you’d estimate the chance of X is (and thus my median guess would be that you’d estimate there’s a 50% chance of X). If I then learn you believe it’s roughly 20%, doesn’t that provides most of the value I’d get from learning you believe it’s 23% or 18%?
In a literal information-theoretic sense, a percentage has log2(100)≈6.6 bits of information while a per-tenth has log2(10)≈3.3 bits. This might have been what was meant?
I agree that the half of the information that is preserved is the much more valuable half, however.
I agree that the half of the information that is preserved is the much more valuable half, however.
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, I’d usually pay a lot more to know what X is than what Y is.
(there are exceptions if most of the VoI is knowing whether you think the event is, eg, >1%, but the main point still stands).
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, I’d usually pay a lot more to know what X is than what Y is.
As you should, but Greg is still correct in saying that Y should be provided.
Regarding the bits of information, I think he’s wrong because I’d assume information should be independent of the numeric base you use. So I think Y provides 10% of the information of X. (If you were using base 4 numbers, you’d throw away 25%, etc.)
But again, there’s no point in throwing away that 10%.
In the technical information-theoretic sense, ‘information’ counts how many bits are required to convey a message. And bits describe proportional changes in the number of possibilities, not absolute changes. The first bit of information reduces 100 possibilities to 50, the second reduces 50 possibilities to 25, etc. So the bit that takes you from 100 possibilities to 50 is the same amount of information as the bit that takes you from 2 possibilities to 1.
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case you’re reducing the number of possibilities by a factor of 10.
To take your example: If you were using two digits in base four to represent per-sixteenths, then each digit contains the 50% of the information (two bits each, reducing the space of possibilities by a factor of four). To take the example of per-thousandths: Each of the three digits contains a third of the information (3.3 bits each, reducing the space of possibilities by a factor of 10).
But upvoted for clearly expressing your disagreement. :)
And bits describe proportional changes in the number of possibilities, not absolute changes...
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case you’re reducing the number of possibilities by a factor of 10.
Ahhh. Thanks for clearing that up for me. Looking at the entropy formula, that makes sense and I get the same answer as you for each digit (3.3). If I understand, I incorrectly conflated “information” with “value of information”.
I had in mind the information-theoretic sense (per Nix). I agree the ‘first half’ is more valuable than the second half, but I think this is better parsed as diminishing marginal returns to information.
Very minor, re. child thread: You don’t need to calculate numerically, as: loga(xy)=y⋅loga(x), and 100=102. Admittedly the numbers (or maybe the remark in the OP generally) weren’t chosen well, given ‘number of decimal places’ seems the more salient difference than the squaring (e.g. per-thousandths does not have double the information of per-cents, but 50% more)
I think this is better parsed as diminishing marginal returns to information.
How does this account for the leftmost digit giving the most information, rather than the rightmost digit (or indeed any digit between them)?
per-thousandths does not have double the information of per-cents, but 50% more
Let’s say I give you $1 + $Y where Y is either 0, $0.1, $0.2 … or $0.9. (Note $1 is analogous to 1%, and Y is equivalent adding a decimal place. I.e. per-thousandths vs per-cents.) The average value of Y, given a uniform distribution, is $0.45. Thus, against $1, Y adds almost half the original value, i.e. $0.45/$1 (45%). But what if I instead gave you $99 + $Y? $0.45 is less than 1% of the value of $99.
The leftmost digit is more valuable because it corresponds to a greater place value (so the magnitude of the value difference between places is going to be dependent on the numeric base you use). I don’t know information theory, so I’m not sure how to calculate the value of the first two digits compared to the third, but I don’t think per-thousandths has 50% more information than per-cents.
Could you explain what you mean by this?
I’d have thought that, in many cases, a estimate to the nearest 10% has far more than half the information contained in an estimate to the nearest percent. E.g., let’s say I start out with no idea what you’d estimate the chance of X is (and thus my median guess would be that you’d estimate there’s a 50% chance of X). If I then learn you believe it’s roughly 20%, doesn’t that provides most of the value I’d get from learning you believe it’s 23% or 18%?
(Hopefully this isn’t a stupid question.)
In a literal information-theoretic sense, a percentage has log2(100)≈6.6 bits of information while a per-tenth has log2(10)≈3.3 bits. This might have been what was meant?
I agree that the half of the information that is preserved is the much more valuable half, however.
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, I’d usually pay a lot more to know what X is than what Y is.
(there are exceptions if most of the VoI is knowing whether you think the event is, eg, >1%, but the main point still stands).
As you should, but Greg is still correct in saying that Y should be provided.
Regarding the bits of information, I think he’s wrong because I’d assume information should be independent of the numeric base you use. So I think Y provides 10% of the information of X. (If you were using base 4 numbers, you’d throw away 25%, etc.)
But again, there’s no point in throwing away that 10%.
In the technical information-theoretic sense, ‘information’ counts how many bits are required to convey a message. And bits describe proportional changes in the number of possibilities, not absolute changes. The first bit of information reduces 100 possibilities to 50, the second reduces 50 possibilities to 25, etc. So the bit that takes you from 100 possibilities to 50 is the same amount of information as the bit that takes you from 2 possibilities to 1.
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case you’re reducing the number of possibilities by a factor of 10.
To take your example: If you were using two digits in base four to represent per-sixteenths, then each digit contains the 50% of the information (two bits each, reducing the space of possibilities by a factor of four). To take the example of per-thousandths: Each of the three digits contains a third of the information (3.3 bits each, reducing the space of possibilities by a factor of 10).
But upvoted for clearly expressing your disagreement. :)
Ahhh. Thanks for clearing that up for me. Looking at the entropy formula, that makes sense and I get the same answer as you for each digit (3.3). If I understand, I incorrectly conflated “information” with “value of information”.
I had in mind the information-theoretic sense (per Nix). I agree the ‘first half’ is more valuable than the second half, but I think this is better parsed as diminishing marginal returns to information.
Very minor, re. child thread: You don’t need to calculate numerically, as: loga(xy)=y⋅loga(x), and 100=102. Admittedly the numbers (or maybe the remark in the OP generally) weren’t chosen well, given ‘number of decimal places’ seems the more salient difference than the squaring (e.g. per-thousandths does not have double the information of per-cents, but 50% more)
How does this account for the leftmost digit giving the most information, rather than the rightmost digit (or indeed any digit between them)?
Let’s say I give you $1 + $Y where Y is either 0, $0.1, $0.2 … or $0.9. (Note $1 is analogous to 1%, and Y is equivalent adding a decimal place. I.e. per-thousandths vs per-cents.) The average value of Y, given a uniform distribution, is $0.45. Thus, against $1, Y adds almost half the original value, i.e. $0.45/$1 (45%). But what if I instead gave you $99 + $Y? $0.45 is less than 1% of the value of $99.
The leftmost digit is more valuable because it corresponds to a greater place value (so the magnitude of the value difference between places is going to be dependent on the numeric base you use). I don’t know information theory, so I’m not sure how to calculate the value of the first two digits compared to the third, but I don’t think per-thousandths has 50% more information than per-cents.