richard_ngo comments on Announcing an updated drawing protocol for the EffectiveAltruism.org donor lotteries

richard_ngo Jan 25, 2019, 11:10 AM
2 points
0 ∶ 0
Can you give some examples of “more responsible” ways?
I agree that in general calculating your own random digits feels a lot like rolling your own crypto. (Edit: I misunderstood the method and thought there was an easy exploit, which I was wrong about. Nevertheless at least ¹⁄₃ of the digits in the API response are predictable, maybe more, and the whole thing is quite small, so it might be possible to increase your probability of winning slightly by brute force calculating possibilities, assuming you get to pick your own contiguous ticket number range. My preliminary calculations suggest that this method would be too difficult, but I’m not an expert, there may be more sophisticated hacks).
- bethJan 26, 2019, 12:36 PM
  10 points
  0 ∶ 0
  Parent
  My troubles with this method are two-fold.
  1. SHA256 is a hashing-algorithm. Its security is well-vetted for certain kinds of applications and certain kinds of attacks, but “randomly distribute the first 10 hex-digits” is not one of those applications. The post does not include so much as a graph of the distribution of what the past drawing results would have been with this method, so CEA hasn’t really justified why the result would be uniformly distributed.
  2. The least-significant digits in the IRIS data are probably fungible by adversaries. It is hard to check them, and IRIS has no reason to secure their data pipeline against attacks that might cost tens of thousands of dollars, because there are normally no stakes whatsoever attached to those bits.
  Random.org is exactly in the business that we’re looking for, so they’d be a good option for their own institutional guarantee. Otherwise, any big lottery in any country will work as a source of randomness: the prizes there are bigger, which means that, even if these lotteries could be corrupted, nobody would waste that ability on rigging the donor lottery.
  - SamDeere Jan 29, 2019, 12:30 AM
    2 points
    0 ∶ 0
    Parent
    Re 1, this is less of a worry to me. You’re right that this isn’t something that SHA256 has been specifically vetted for, but my understanding is that the SHA-2 family of algorithms should have uniformly-distributed outputs. In fact, the NIST beacon values are all just SHA-512 hashes (of a random seed plus the previous beacon’s value and some other info), so this method vs the NIST method shouldn’t have different properties (although, as you note, we didn’t do a specific analysis of this particular set of inputs — noted, and mea culpa).
    
    However, the point re 2 is definitely a fair concern, and I think that this is the biggest defeater here. As such, (and given the NIST Beacon is back online) we’re reverting to the original NIST method.
    
    Thanks for raising the concerns.
    
    ETA: On further reflection, you’re right that it’s problematic knowing whether the first 10 hex digits will be uniformly distributed given that we don’t have a full-entropy source (which is a significant difference between this method and the NIST beacon — we just made sure that the method had greater entropy than the 40 bits we needed to cover all the possible ticket values). So, your point about testing sample values in advance is well-made.
- Paul_Christiano Jan 25, 2019, 5:54 PM
  3 points
  0 ∶ 0
  Parent
  There is plenty of entropy in the API responses, that’s not the worst concern.
  I think the most serious question is whether a participant can influence the lottery draw (e.g. by getting IRIS to change low order digits of the reported latitude or longitude).
- SamDeere Jan 25, 2019, 5:11 PM
  3 points
  0 ∶ 0
  Parent
  Agree with the sentiment, but we’re most definitely not rolling our own crypto. The method above relies on the public and extremely-widely-vetted SHA256 algorithm. This algorithm has the nice property that even slightly different inputs produce wildly different outputs. Secondly, it should distribute these outputs uniformly across the entire possibility space. This means that it would be useless to bruteforce the prediction, because each of your candidates would have an even chance of ending up basically anywhere.
  
  For example, compare the input strings 1111111111111111111111111111 and 1111111111111111111111111112 with their SHA256 outputs:
```
sha256(1111111111111111111111111111)
  = fe16863cfd4015c58da63aa5d2fe80e6e1fcd0bbdd57296fe28844cc7d79581b


sha256(1111111111111111111111111112)
  = b74822540995e7aa1b50a4d9d23a4b13aff99910c3c2111b9bf649e947e5f49c
```
  It doesn’t matter how much of the API response remains the same (for example, we could pad the input of every hash we generated with the same fixed string and have the same randomness properties as the proposal above). All that matters is that each response is going to be (unpredictably) different from the next.
  
  ETA: It’s perhaps more helpful to see the digits from the API response as a publicly verifiable seed to a pseudorandom number generator, rather than as the random number itself.