The greatest downside, though, is precision: you lose half the information if you round percents to per-tenths.
Could you explain what you mean by this?
Iād have thought that, in many cases, a estimate to the nearest 10% has far more than half the information contained in an estimate to the nearest percent. E.g., letās say I start out with no idea what youād estimate the chance of X is (and thus my median guess would be that youād estimate thereās a 50% chance of X). If I then learn you believe itās roughly 20%, doesnāt that provides most of the value Iād get from learning you believe itās 23% or 18%?
In a literal information-theoretic sense, a percentage has log2(100)ā6.6 bits of information while a per-tenth has log2(10)ā3.3 bits. This might have been what was meant?
I agree that the half of the information that is preserved is the much more valuable half, however.
I agree that the half of the information that is preserved is the much more valuable half, however.
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, Iād usually pay a lot more to know what X is than what Y is.
(there are exceptions if most of the VoI is knowing whether you think the event is, eg, >1%, but the main point still stands).
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, Iād usually pay a lot more to know what X is than what Y is.
As you should, but Greg is still correct in saying that Y should be provided.
Regarding the bits of information, I think heās wrong because Iād assume information should be independent of the numeric base you use. So I think Y provides 10% of the information of X. (If you were using base 4 numbers, youād throw away 25%, etc.)
But again, thereās no point in throwing away that 10%.
In the technical information-theoretic sense, āinformationā counts how many bits are required to convey a message. And bits describe proportional changes in the number of possibilities, not absolute changes. The first bit of information reduces 100 possibilities to 50, the second reduces 50 possibilities to 25, etc. So the bit that takes you from 100 possibilities to 50 is the same amount of information as the bit that takes you from 2 possibilities to 1.
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case youāre reducing the number of possibilities by a factor of 10.
To take your example: If you were using two digits in base four to represent per-sixteenths, then each digit contains the 50% of the information (two bits each, reducing the space of possibilities by a factor of four). To take the example of per-thousandths: Each of the three digits contains a third of the information (3.3 bits each, reducing the space of possibilities by a factor of 10).
But upvoted for clearly expressing your disagreement. :)
And bits describe proportional changes in the number of possibilities, not absolute changes...
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case youāre reducing the number of possibilities by a factor of 10.
Ahhh. Thanks for clearing that up for me. Looking at the entropy formula, that makes sense and I get the same answer as you for each digit (3.3). If I understand, I incorrectly conflated āinformationā with āvalue of informationā.
I had in mind the information-theoretic sense (per Nix). I agree the āfirst halfā is more valuable than the second half, but I think this is better parsed as diminishing marginal returns to information.
Very minor, re. child thread: You donāt need to calculate numerically, as: loga(xy)=yā loga(x), and 100=102. Admittedly the numbers (or maybe the remark in the OP generally) werenāt chosen well, given ānumber of decimal placesā seems the more salient difference than the squaring (e.g. per-thousandths does not have double the information of per-cents, but 50% more)
I think this is better parsed as diminishing marginal returns to information.
How does this account for the leftmost digit giving the most information, rather than the rightmost digit (or indeed any digit between them)?
per-thousandths does not have double the information of per-cents, but 50% more
Letās say I give you $1 + $Y where Y is either 0, $0.1, $0.2 ā¦ or $0.9. (Note $1 is analogous to 1%, and Y is equivalent adding a decimal place. I.e. per-thousandths vs per-cents.) The average value of Y, given a uniform distribution, is $0.45. Thus, against $1, Y adds almost half the original value, i.e. $0.45/ā$1 (45%). But what if I instead gave you $99 + $Y? $0.45 is less than 1% of the value of $99.
The leftmost digit is more valuable because it corresponds to a greater place value (so the magnitude of the value difference between places is going to be dependent on the numeric base you use). I donāt know information theory, so Iām not sure how to calculate the value of the first two digits compared to the third, but I donāt think per-thousandths has 50% more information than per-cents.
Could you explain what you mean by this?
Iād have thought that, in many cases, a estimate to the nearest 10% has far more than half the information contained in an estimate to the nearest percent. E.g., letās say I start out with no idea what youād estimate the chance of X is (and thus my median guess would be that youād estimate thereās a 50% chance of X). If I then learn you believe itās roughly 20%, doesnāt that provides most of the value Iād get from learning you believe itās 23% or 18%?
(Hopefully this isnāt a stupid question.)
In a literal information-theoretic sense, a percentage has log2(100)ā6.6 bits of information while a per-tenth has log2(10)ā3.3 bits. This might have been what was meant?
I agree that the half of the information that is preserved is the much more valuable half, however.
Yes, in most cases if somebody has important information that an event has XY% probability of occurring, Iād usually pay a lot more to know what X is than what Y is.
(there are exceptions if most of the VoI is knowing whether you think the event is, eg, >1%, but the main point still stands).
As you should, but Greg is still correct in saying that Y should be provided.
Regarding the bits of information, I think heās wrong because Iād assume information should be independent of the numeric base you use. So I think Y provides 10% of the information of X. (If you were using base 4 numbers, youād throw away 25%, etc.)
But again, thereās no point in throwing away that 10%.
In the technical information-theoretic sense, āinformationā counts how many bits are required to convey a message. And bits describe proportional changes in the number of possibilities, not absolute changes. The first bit of information reduces 100 possibilities to 50, the second reduces 50 possibilities to 25, etc. So the bit that takes you from 100 possibilities to 50 is the same amount of information as the bit that takes you from 2 possibilities to 1.
And similarly, the 3.3 bits that take you from 100 possibilities to 10 are the same amount of information as the 3.3 bits that take you from 10 possibilities to 1. In each case youāre reducing the number of possibilities by a factor of 10.
To take your example: If you were using two digits in base four to represent per-sixteenths, then each digit contains the 50% of the information (two bits each, reducing the space of possibilities by a factor of four). To take the example of per-thousandths: Each of the three digits contains a third of the information (3.3 bits each, reducing the space of possibilities by a factor of 10).
But upvoted for clearly expressing your disagreement. :)
Ahhh. Thanks for clearing that up for me. Looking at the entropy formula, that makes sense and I get the same answer as you for each digit (3.3). If I understand, I incorrectly conflated āinformationā with āvalue of informationā.
I had in mind the information-theoretic sense (per Nix). I agree the āfirst halfā is more valuable than the second half, but I think this is better parsed as diminishing marginal returns to information.
Very minor, re. child thread: You donāt need to calculate numerically, as: loga(xy)=yā loga(x), and 100=102. Admittedly the numbers (or maybe the remark in the OP generally) werenāt chosen well, given ānumber of decimal placesā seems the more salient difference than the squaring (e.g. per-thousandths does not have double the information of per-cents, but 50% more)
How does this account for the leftmost digit giving the most information, rather than the rightmost digit (or indeed any digit between them)?
Letās say I give you $1 + $Y where Y is either 0, $0.1, $0.2 ā¦ or $0.9. (Note $1 is analogous to 1%, and Y is equivalent adding a decimal place. I.e. per-thousandths vs per-cents.) The average value of Y, given a uniform distribution, is $0.45. Thus, against $1, Y adds almost half the original value, i.e. $0.45/ā$1 (45%). But what if I instead gave you $99 + $Y? $0.45 is less than 1% of the value of $99.
The leftmost digit is more valuable because it corresponds to a greater place value (so the magnitude of the value difference between places is going to be dependent on the numeric base you use). I donāt know information theory, so Iām not sure how to calculate the value of the first two digits compared to the third, but I donāt think per-thousandths has 50% more information than per-cents.