We deliberately only included information which is based on some specific empirical evidence, not simply advice or recommendations. Of course readers of the review may wish to incorporate additional information or assumptions in deciding how they will run their groups then of course they are welcome to do so.
If you have any particular sources or documents outlining what has been effective in London I’d love to see them!
Hi everyone, thanks for your comments. I’m not much for debating in comments, but if you would like to discuss anything further with me or have any questions, please feel free to send me a message.
I just wanted to make one clarification that I feel didn’t come across strongly in the original post. Namely, I don’t think its a bad thing that EA is an ideology. I do personally disagree with some commonly believed assumptions or methodological preferences etc, but the fact that EA itself is an ideology I think is a good thing, because it gives EA substance. If EA were merely a question I think it would have very little to add to the world.
The point of this post was therefore not to argue that EA should try to avoid being an ideology, but that we should realise the assumptions and methodological frameworks we typically adopt as an EA community, critically evaluate whether they are all justified, and then to the extent they are justified defend them with the best arguments we can muster, of course always remaining open-minded to new evidence or arguments that might change our minds.
People who aren’t “cool with utilitarianism / statistics / etc” already largely self-select out of EA. I think my post articulates some of the reasons why this is the case.
Thanks for the comment!
I agree that the probabilities matter, but then it comes to a question of how these are assessed and weighed against each other. On this basis, I don’t think it has been established that AGI safety research has strong claims to higher overall EV than other such potential mugging causes.
Regarding the Dutch book issue, I don’t really agree with the argument that ‘we may as well go with’ EV because it avoids these cases. Many people would argue that the limitations of the EV approach, such as having to give a precise probability for all beliefs and not being able to suspend judgement etc, also do not fit with our picture of ‘rational’. Its not obvious why hypothetical better behaviours are more important than these considerations. I am not pretending to resolve this argument but I am just trying to raise the issue as being relevant for assessing high impact, low probability events—EV is potentially problematic in such cases and we need to talk about this seriously.
I give some reasons here why I think that such work won’t be very effective, namely that I don’t see how one can achieve sufficient understanding to control a technology without also attaining sufficient understanding to build that technology. Of course that isn’t a decisive argument so there’s room for disagreement here.
Thanks for the link about the Fermi paradox. Obviously I could not hope to address all arguments about this issue in my critique here. All I meant to establish is that Bostrom’s argument does rely on particular views about the resolution of that paradox.
You say ‘it is tautologically true that agents are motivated against changing their final goals, this is just not possible to dispute’. Respectfully I just don’t agree. It all hinges on what is meant by ‘motivation’ and ‘final goal’. You also say ” it just seems clear that you can program an AI with a particular goal function and that will be all there is to it”, and again I disagree. A narrow AI sure, or even a highly competent AI, but not an AI with human level competence in all cognitive activities. Such an AI would have the ability to reflect on its own goals and motivations, because humans have that ability, and therefore it would not be ‘all there is to it’.
Regarding your last point, what I was getting at is that you can change a goal by explicitly rejecting a goal and choosing a new one, or by changing one’s interpretation of an existing goal. This latter method is an alternative path by which an AI could change its goals in practise even if it still regarded itself as following the same goals it was programmed with. My point isn’t that this makes goal alignment not a problem. My point was that this makes the ‘AI will never change its goals’ not a plausible position.
Hi rohinmshah, I agree that our current methods for building an AI do involve maximising particular functions and have nothing to do with common sense. The problem with extrapolating this to AGI is 1) these sorts of techniques have been applied for decades and have never achieved anything close to human level AI (of course that’s not proof it never can but I am quite skeptical, and Bostrom doesn’t really make the case that such techniques will be likely to lead to human level AI), and 2) as I argue in part 2 of my critique, other parts of Bostrom’s argument rely upon much broader conceptions of intelligence that would entail the AI having common sense.
Thanks for these links, this is very useful material!
Hi Denkenberger, thanks for engaging!
Bostrom mentions this scenario in his book, and although I didn’t discuss it directly I do believe I address the key issues here in my piece above. In particular, the amount of protein one can receive in the mail in a few days is small, and in order to achieve its goals of world domination an AI would need large quantities of such materials in order to produce the weapons or technology or other infrastructure needed to compete with world governments and militaries. If the AI chose to produce the protein itself, which it would likely wish to do, it would need extensive laboratory space to do that, which takes time to build and equip. The more expansive its operations become the more time consuming they take to build. It would likely need to hire lawyers to acquire legal permits to build the facilities needed to make the nanotech, etc. I outline these sorts of practical issues in my article. None of these are insuperable, but I argue that they aren’t things that can be solved ‘in a matter of days’.
Thanks for your thoughts. Regarding spreading my argument across 5 posts, I did this in part because I thought connected sequences of posts were encouraged?
Regarding the single quantity issue, I don’t think it is a red herring, because if there are multiple distinct quantities then the original argument for self-sustaining rapid growth becomes significantly weaker (see my responses to Flodorner and Lukas for more on this).
You say “Might the same thing be true of AI—that a few factors really do allow for drastic improvements in problem-solving across many domains? It’s not at all clear that it isn’t.” I believe we have good reason to think no such few factors exist. I would say because A) this does not seem to be how human intelligence works and B) because this does not seem to be consistent with the history of progress in AI research. Both I would say are characterised by many different functionalities or optimisations for particular tasks. Not to say there are no general principles but I think these are not as extensive as you seem to believe. However regardless of this point, I would just say that if Bostrom’s argument is to succeed I think he needs to give some persuasive reasons or evidence as to why we should think such factors exist. Its not sufficient just to argue that they might.
Thanks for your thoughts.
Regarding your first point, I agree that the situation you posit is a possibility, but it isn’t something Bostrom talks about (and remember I only focused on what he argued, not other possible expansions of the argument). Also, when we consider the possibility of numerous distinct cognitive abilities it is just as possible that there could be complex interactions which inhibit the growth of particular abilities. There could easily be dozens of separate abilities and the full matrix of interactions becomes very complex. The original force of the ‘rate of growth of intelligence is proportional to current intelligence leading to exponential growth’ argument is, in my view, substantively blunted.
Regarding your second point, it seems unlikely to me because if an agent had all these abilities, I believe they would use then to uncover reasons to reject highly reductionistic goals like tilling the universe with paperclips. They might end up with goals that are still in opposition to human values, but I just don’t see how an agent with these abilities would not become dissatisfied with extremely narrow goals.
Thanks for your thoughts!
1) The idea I’m getting at is that an exponential-type argument of self-improvement ability being proportional to current intelligence doesn’t really work if there are multiple distinct and separate cognitive abilities, because ability to improve ability X might not be in any clear way related to the current level of X. For example, ability to design a better chess-playing program might not be in any way related to chess playing ability, or object recognition performance might not be related to ability to improve this performance. These are probably not very good examples because probably these sorts of abilities are not fundamental enough, and we should be looking at more abstract cognitive abilities, but hopefully they serve as a general illustration. A superhuman AI would therefore be better at designing AIs than a human sure, but I don’t think the sort of exponential growth arguments Bostrom uses hold if there are multiple distinct cognitive abilities.
2) The idea of a simplistic paper-maximising AI instantiating separate mind simulations is very interesting. I think the way you describe it this would amount to one agent creating distinct agents to perform a set task, rather than a single agent possessing those actual abilities itself. This seems relevant to me because any created mind simulations, being distinct from the original agent, would not necessarily share its goals or beliefs, and therefore a principal-agent problem arises. In order to be smart enough to solve this problem I think the original AI would probably have to be enhanced well beyond paperclip maximising levels. I think there’s a lot more to be said here but I am not convinced this counterexample really und