The source of randomness needs to be generated independently from both CEA and all possible entrants
The resulting random number needs to be published publicly
The randomness needs to be generated at a specific, precommitted time in the future
The method for arriving at the final number should ideally be open to public inspection
This is because, if we generated the number ourselves, or used a private third-party, there’s no good guarantees against collusion. Entrants in the lottery could reasonably say ‘how do I know that the draw is fair?’, especially as the prize pool is large enough that it could incentivise cheating. The future precommitment is important because it guarantees that we can’t secretly know the number, and the specific timing is important because it means that we can’t just keep waiting for numbers to be generated until we see one that we like the look of.
The method proposed above means that anyone can see how we arrived at the final random number, because it takes a public number that we can’t possibly influence, and then hashes it using SHA256, which is well-verified, deterministic (i.e. anyone can run it on their own computer and check our working) and distributes the possible answers uniformly (so everyone has an equal chance of winning).
Typical lottery drawings have these properties too: live broadcast, studio audience (i.e. they are publicly verifiable), balls being mixed and then picked out of a machine (i.e. an easy-to-inspect, uniformly-distributed source of randomness that, because it is public, cannot be gamed by the people running the lottery).
Earthquakes have the nice property that their incidence follows a rough power law distribution (so you know approximately how regularly they’ll happen), but the specifics of the location, magnitude, depth or any other properties of any given future earthquake are entirely unpredictable. This means that we know that there will be a set of unpredictable (i.e. random) numbers generated by seismometers, but we (and anyone trying to game the lottery) have no way of knowing what they will be in advance.
(This is not actually that different to how your computer generates randomness — it uses small unpredictable events, like the very precise time between keystrokes, or tiny changes in mouse direction, to generate the entropy pool for generating random numbers locally. We’re just using the same technique, but allowing people to see into the entropy pool).
Other plausible sources of randomness we considered included the block hash of the first block mined after the draw date on the Bitcoin blockchain, and the numbers of a particular Powerball drawing.
I’m not sure what the myriad of more responsible ways are. If you trust CEA to not mess with the lottery more than you trust IRIS not to change their earthquake reports to mess with the lottery, then just having CEA pick numbers out of a hat could be better.
It definitely seems like free-riding on some other public lottery drawing that people already trust might be better.
Can you give some examples of “more responsible” ways?
I agree that in general calculating your own random digits feels a lot like rolling your own crypto. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least 1⁄3 of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I’m not an expert, there may be more sophisticated hacks).
1. SHA256 is a hashing-algorithm. Its security is well-vetted for certain kinds of applications and certain kinds of attacks, but “randomly distribute the first 10 hex-digits” is not one of those applications. The post does not include so much as a graph of the distribution of what the past drawing results would have been with this method, so CEA hasn’t really justified why the result would be uniformly distributed.
2. The least-significant digits in the IRIS data are probably fungible by adversaries. It is hard to check them, and IRIS has no reason to secure their data pipeline against attacks that might cost tens of thousands of dollars, because there are normally no stakes whatsoever attached to those bits.
Random.org is exactly in the business that we’re looking for, so they’d be a good option for their own institutional guarantee. Otherwise, any big lottery in any country will work as a source of randomness: the prizes there are bigger, which means that, even if these lotteries could be corrupted, nobody would waste that ability on rigging the donor lottery.
Re 1, this is less of a worry to me. You’re right that this isn’t something that SHA256 has been specifically vetted for, but my understanding is that the SHA-2 family of algorithms should have uniformly-distributed outputs. In fact, the NIST beacon values are all just SHA-512 hashes (of a random seed plus the previous beacon’s value and some other info), so this method vs the NIST method shouldn’t have different properties (although, as you note, we didn’t do a specific analysis of this particular set of inputs — noted, and mea culpa).
However, the point re 2 is definitely a fair concern, and I think that this is the biggest defeater here. As such, (and given the NIST Beacon is back online) we’re reverting to the original NIST method.
Thanks for raising the concerns.
ETA: On further reflection, you’re right that it’s problematic knowing whether the first 10 hex digits will be uniformly distributed given that we don’t have a full-entropy source (which is a significant difference between this method and the NIST beacon — we just made sure that the method had greater entropy than the 40 bits we needed to cover all the possible ticket values). So, your point about testing sample values in advance is well-made.
There is plenty of entropy in the API responses, that’s not the worst concern.
I think the most serious question is whether a participant can influence the lottery draw (e.g. by getting IRIS to change low order digits of the reported latitude or longitude).
Agree with the sentiment, but we’re most definitely not rolling our own crypto. The method above relies on the public and extremely-widely-vetted SHA256 algorithm. This algorithm has the nice property that even slightly different inputs produce wildly different outputs. Secondly, it should distribute these outputs uniformly across the entire possibility space. This means that it would be useless to bruteforce the prediction, because each of your candidates would have an even chance of ending up basically anywhere.
For example, compare the input strings 1111111111111111111111111111 and 1111111111111111111111111112 with their SHA256 outputs:
It doesn’t matter how much of the API response remains the same (for example, we could pad the input of every hash we generated with the same fixed string and have the same randomness properties as the proposal above). All that matters is that each response is going to be (unpredictably) different from the next.
ETA: It’s perhaps more helpful to see the digits from the API response as a publicly verifiable seed to a pseudorandom number generator, rather than as the random number itself.
I’d like to see some justification for using this approach over the myriad of more responsible ways of generating random draws.
The draw should to have the following properties:
The source of randomness needs to be generated independently from both CEA and all possible entrants
The resulting random number needs to be published publicly
The randomness needs to be generated at a specific, precommitted time in the future
The method for arriving at the final number should ideally be open to public inspection
This is because, if we generated the number ourselves, or used a private third-party, there’s no good guarantees against collusion. Entrants in the lottery could reasonably say ‘how do I know that the draw is fair?’, especially as the prize pool is large enough that it could incentivise cheating. The future precommitment is important because it guarantees that we can’t secretly know the number, and the specific timing is important because it means that we can’t just keep waiting for numbers to be generated until we see one that we like the look of.
The method proposed above means that anyone can see how we arrived at the final random number, because it takes a public number that we can’t possibly influence, and then hashes it using SHA256, which is well-verified, deterministic (i.e. anyone can run it on their own computer and check our working) and distributes the possible answers uniformly (so everyone has an equal chance of winning).
Typical lottery drawings have these properties too: live broadcast, studio audience (i.e. they are publicly verifiable), balls being mixed and then picked out of a machine (i.e. an easy-to-inspect, uniformly-distributed source of randomness that, because it is public, cannot be gamed by the people running the lottery).
Earthquakes have the nice property that their incidence follows a rough power law distribution (so you know approximately how regularly they’ll happen), but the specifics of the location, magnitude, depth or any other properties of any given future earthquake are entirely unpredictable. This means that we know that there will be a set of unpredictable (i.e. random) numbers generated by seismometers, but we (and anyone trying to game the lottery) have no way of knowing what they will be in advance.
(This is not actually that different to how your computer generates randomness — it uses small unpredictable events, like the very precise time between keystrokes, or tiny changes in mouse direction, to generate the entropy pool for generating random numbers locally. We’re just using the same technique, but allowing people to see into the entropy pool).
Other plausible sources of randomness we considered included the block hash of the first block mined after the draw date on the Bitcoin blockchain, and the numbers of a particular Powerball drawing.
I’m not sure what the myriad of more responsible ways are. If you trust CEA to not mess with the lottery more than you trust IRIS not to change their earthquake reports to mess with the lottery, then just having CEA pick numbers out of a hat could be better.
It definitely seems like free-riding on some other public lottery drawing that people already trust might be better.
Can you give some examples of “more responsible” ways?
I agree that in general calculating your own random digits feels a lot like rolling your own crypto. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least 1⁄3 of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I’m not an expert, there may be more sophisticated hacks).
My troubles with this method are two-fold.
1. SHA256 is a hashing-algorithm. Its security is well-vetted for certain kinds of applications and certain kinds of attacks, but “randomly distribute the first 10 hex-digits” is not one of those applications. The post does not include so much as a graph of the distribution of what the past drawing results would have been with this method, so CEA hasn’t really justified why the result would be uniformly distributed.
2. The least-significant digits in the IRIS data are probably fungible by adversaries. It is hard to check them, and IRIS has no reason to secure their data pipeline against attacks that might cost tens of thousands of dollars, because there are normally no stakes whatsoever attached to those bits.
Random.org is exactly in the business that we’re looking for, so they’d be a good option for their own institutional guarantee. Otherwise, any big lottery in any country will work as a source of randomness: the prizes there are bigger, which means that, even if these lotteries could be corrupted, nobody would waste that ability on rigging the donor lottery.
Re 1, this is less of a worry to me. You’re right that this isn’t something that SHA256 has been specifically vetted for, but my understanding is that the SHA-2 family of algorithms should have uniformly-distributed outputs. In fact, the NIST beacon values are all just SHA-512 hashes (of a random seed plus the previous beacon’s value and some other info), so this method vs the NIST method shouldn’t have different properties (although, as you note, we didn’t do a specific analysis of this particular set of inputs — noted, and mea culpa).
However, the point re 2 is definitely a fair concern, and I think that this is the biggest defeater here. As such, (and given the NIST Beacon is back online) we’re reverting to the original NIST method.
Thanks for raising the concerns.
ETA: On further reflection, you’re right that it’s problematic knowing whether the first 10 hex digits will be uniformly distributed given that we don’t have a full-entropy source (which is a significant difference between this method and the NIST beacon — we just made sure that the method had greater entropy than the 40 bits we needed to cover all the possible ticket values). So, your point about testing sample values in advance is well-made.
There is plenty of entropy in the API responses, that’s not the worst concern.
I think the most serious question is whether a participant can influence the lottery draw (e.g. by getting IRIS to change low order digits of the reported latitude or longitude).
Agree with the sentiment, but we’re most definitely not rolling our own crypto. The method above relies on the public and extremely-widely-vetted SHA256 algorithm. This algorithm has the nice property that even slightly different inputs produce wildly different outputs. Secondly, it should distribute these outputs uniformly across the entire possibility space. This means that it would be useless to bruteforce the prediction, because each of your candidates would have an even chance of ending up basically anywhere.
For example, compare the input strings
1111111111111111111111111111
and1111111111111111111111111112
with their SHA256 outputs:It doesn’t matter how much of the API response remains the same (for example, we could pad the input of every hash we generated with the same fixed string and have the same randomness properties as the proposal above). All that matters is that each response is going to be (unpredictably) different from the next.
ETA: It’s perhaps more helpful to see the digits from the API response as a publicly verifiable seed to a pseudorandom number generator, rather than as the random number itself.