The issue is that FLOPS cannot accurately represent computing power across different computing architectures, in particular between single CPUs versus computing clusters. As an example, let’s compare 1 computer of 100 MFLOPS with a cluster of 1000 computers of 1 MFLOPS each. The latter option has 10 times as many FLOPS, but there is a wide variety of computational problems in which the former will always be much faster. This means that FLOPS don’t meaningfully tell you which option is better, it will always depend on how well the problem you want to solve maps onto your hardware.
In large-scale computing, the bottleneck is often the communication speed in the network. If the calculations you have to do don’t neatly fall apart into roughly separate tasks, the different computers have to communicate a lot, which slows everything down. Adding more FLOPS (computers) won’t prevent that in the slightest.
You can not extrapolate FLOPS estimates without justifying why the communication overhead doesn’t make the estimated quantity meaningless on parallel hardware.
I don’t think that 11% figure is correct. It depends on how long you would stay at the company if you would get the job, and on the time you would be unemployed for if the offer were rescinded.
Without commenting on your wider message, I want to pick on two specific factual claims that you are making.
AlphaZero went from a bundle of blank learning algorithms to stronger than the best human chess players in history...in less than two hours.
Training time of the final program is a deeply misleading metric, as these programs have been through endless reruns and tests to get the setup right. I think it is most honest to count total engineering time.
I know people are wary of Kurzweil, but he does seem to be on fairly solid ground here.
Extrapolating FLOPS is inherently fraught, as is the very idea of FLOPS being a useful unit. The problem is best illustrated by the following CS proverb: “A supercomputer is a device for turning computational complexity into communication complexity.” In particular, estimates for the complexity of imitating a small, mostly separate, part of a brain don’t linearly scale to estimates of imitating the much more interconnected whole.
The EA forum doesn’t seem like an obvious best choice. Just because it is related to EA does not make it effective, especially considering the existence of discussion software like Reddit, Discourse, and phpBB.
I’d say it mostly depends on what kind of skills and career capital you are aiming for. There are a number of important (scientific) software packages with either zero or one maintainers, which could be useful to work on either upstream or downstream.
Personally, I am presently just doing (easy) fixes for bugs that I run into myself. But I am considering to either start officially maintaining a driver that I keep patching for my own use anyway or to contribute to some decentralized web project.
It might not be super relevant for you specifically, but I do want to plug Google Summer of Code for all university students of 18 years and older as a wonderful opportunity. (application deadline April 9th)
I used to think pretty much exactly the argument you’re describing, so I don’t think I will change my mind by discussing this with you in detail.
On the other hand, the last sentence of your comment makes me feel that you’re equating my not agreeing with you with my not understanding probability. (I’m talking about my own feelings here, irrespective of what you intended to say.) So, I don’t think I will change your mind by discussing this with you in detail.
I don’t feel motivated to go back and forth on this thread, because I think we will both end up feeling like it was a waste of time. I want to make it clear that I do not say this because I think badly of you.
I will try to clear up the bits you pointed out to be confusing. In the Language section, I am referring to MIRI’s writing, as well as Bostrom’s Superintelligence, as well as most IRL conversations and forum talk I’ve seen. “bits” are an abstraction akin to “log-odds”, I made them up because not every statement in that post is a probabilistic claim in a rigorous sense and the blog post was mostly written for myself. I really do estimate that there is less than 2−170 chance of AI being risky in a way that would lead to extinction, whose risk can be prevented, and moreover that it is possible to make meaningful progress on such prevention within the next 20 years, along with some more qualifiers that I believe to be necessary to support the cause right now.
Thank you for your response and helpful feedback.
I’m not making any predictions about future cars in the language section. “Self-driving cars” and “pre-driven cars” are the exact same things. I think I’m grasping at a point closer to Clarke’s third law, which also doesn’t give any obvious falsifiable predictions. My only prediction is that thinking about “self-driving cars” leads to more wrong predictions than thinking about “pre-driven cars”.
I changed the sentence you mention to “If you want to understand present-day algorithms, the “pre-driven car” model of thinking works a lot better than the “self-driving car” model of thinking. The present and past are the only tools we have to think about the future, so I expect the “pre-driven car” model to make more accurate predictions.” I hope this is clearer.
Your remark on “English that’s precise enough to translate into code” is close, but not exactly what I meant. I think that it is a hopeless endeavour to aim for such precise language in these discussions at this point in time, because I estimate that it would take a ludicrous amount of additional intellectual labour to reach that level of rigour. It’s too high of a target. I think the correct target is summarised in the first sentence: “All sentences are wrong, but some are useful.”
I think that I literally disagree with every sentence in your last paragraph on multiple levels. I’ve read both pages you linked a couple months ago and I didn’t find them at all convincing. I’m sorry to give such a useless response to this part of your message. Mounting a proper answer would take more time and effort than I have to spare in the foreseeable future. I might post some scraps of arguments on my blog soonish, but those posts won’t be well-written and I don’t expect anyone to really read those.
My troubles with this method are two-fold.
1. SHA256 is a hashing-algorithm. Its security is well-vetted for certain kinds of applications and certain kinds of attacks, but “randomly distribute the first 10 hex-digits” is not one of those applications. The post does not include so much as a graph of the distribution of what the past drawing results would have been with this method, so CEA hasn’t really justified why the result would be uniformly distributed.
2. The least-significant digits in the IRIS data are probably fungible by adversaries. It is hard to check them, and IRIS has no reason to secure their data pipeline against attacks that might cost tens of thousands of dollars, because there are normally no stakes whatsoever attached to those bits.
Random.org is exactly in the business that we’re looking for, so they’d be a good option for their own institutional guarantee. Otherwise, any big lottery in any country will work as a source of randomness: the prizes there are bigger, which means that, even if these lotteries could be corrupted, nobody would waste that ability on rigging the donor lottery.
I’d like to see some justification for using this approach over the myriad of more responsible ways of generating random draws.