I work on AI Grantmaking at Open Philanthropy. Comments here are posted in a personal capacity.
alex lawsen
I read this comment as implying that HLI’s reasoning transparency is currently better than Givewell’s, and think that this is both:
-
False.
-
Not the sort of thing it is reasonable to bring up before immediately hiding behind “that’s just my opinion and I don’t want to get into a debate about it here”.
I therefore downvoted, as well as disagree voting. I don’t think downvotes always need comments, but this one seemed worth explaining as the comment contains several statements people might reasonably disagree with.
-
I’m keen to listen to this, thanks for recording it! Are you planning to make the podcast available on other platforms (stitcher, Google podcasts etc—I haven’t found it)
whether you have a 5-10 year timeline or a 15-20 year timeline
Something that I’d like this post to address that it doesn’t is that to have “a timeline” rather than a distribution seems ~indefensible given the amount of uncertainty involved. People quote medians (or modes, and it’s not clear to me that they reliability differentiate between these) ostensibly as a shorthand for their entire distribution, but then discussion proceeds based only on the point estimates.
I think a shift of 2 years in the median of your distribution looks like a shift of only a few % in your P(AGI by 20XX) numbers for all 20XX, and that means discussion of what people who “have different timelines” should do is usually better framed as “what strategies will turn out to have been helpful if AGI arrives in 2030″.
While this doesn’t make discussion like this post useless, I don’t think this is a minor nitpick. I’m extremely worried by “plays for variance”, some of which are briefly mentioned above (though far from the worst I’ve heard). I think these tend to look good only on worldviews which are extremely overconfident, and treat timelines as point estimates/extremely sharp peaks). More balanced views, even those with a median much sooner than mine, should typically realise that the EV gained in the worlds where things move quickly is not worth the expected cost in worlds where they don’t. This is in addition to the usual points about co-operative behaviour when uncertain about the state of the world, adverse selection, the unilateralist’s curse etc.
Huh, I took ‘confidently’ to mean you’d be willing to offer much better odds than 1:1.
I’m going to try to stop paying so much attention to the story while it unfolds, which means I’m retracting my interest in betting. Feel free to call this a win (as with Joel).
This seems close to what you’re looking for.
If there’s any money left over after you’ve agree a line with Joel and Nuno, I’ve got next.
No worries on the acknowledgement front (though I’m glad you found chatting helpful)!
One failure mode of the filtering idea is that the AGI corporation does not use it because of the alignment tax, or because they don’t want to admit that they are creating something that is potentially dangerous
I think it’s several orders of magnitude easier to get AGI corporations to use filtered safe data than to agree to stop using any electronic communication for safety research. Why is it appropriate to consider the alignment tax of “train on data that someone has nicely collected and filtered for you so you don’t die”, which is plausibly negative, but not the alignment tax of “never use googledocs or gmail again”?
I think preserving the secrecy-based value of AI safety plans will realistically be a Swiss cheese approach that combines many helpful but incomplete solutions (hopefully without correlated failure modes)Several others have made this point, but you can’t just say “well anything we can do to make the model safer must be worth trying because it’s another layer of protection” if adding that layer massively hurts all of your other safety efforts. Safety is not merely a function of the number of layers, but also how good they are, and the proposal would force every other alignment research effort to use completely different systems. That the Manhattan Project happened at all does not constitute evidence that the cost to this huge shift would be trivial.
the two-player zero-sum game can be a decent model of the by-default adversarial interaction
I think this is the key crux between you and the several people who’ve brought up points 1-3. The model you’re operating with here is roughly that the alignment game we need to play goes something like this:
1. Train an unaligned ASI
2. Apply “alignment technique”
3. ASI either ‘dodges’ the technique (having anticipated it), or fails to doge the technique and is now aligned.
I think most of the other people thinking about alignment are trying to prevent step 1 from happening. If your adversary is SGD, rather than a fully trained misaligned model, then the 2-player zero-sum game assumption fails and so does everything that follows from it.
Putting all that to one side:
Why doesn’t filtering training data strictly dominate this proposal?It gets you the same result as co-ordinating to remove all alignment posting from the internet (which I think is worthless, but whatever)
It’s much less costly to alignment research efforts
Because it’s much less costly, it’s also much easier to get people to co-ordinate on. You could for example just release a new version of the pile which has more data in it and doesn’t have anything about alignment in it.
If you haven’t already seen them, you might find some of the posts tagged “task y” interesting to read.
EA fellowships and summer programmes should have (possibly more competitive) “early entry” cohorts with deadlines in September/October, where if you apply by then you get a guaranteed place, funding, and maybe some extra perk to encourage it, could literally be a slack with the other participants.
Consulting, finance etc have really early processes which people feel pressure to accept in case they don’t get anything else, and then don’t want to back out of.
That last comment seems very far from the original post which claimed
We have no good reason, only faith and marketing, to believe that we will accomplish AGI by pursuing the DL based AI route.
If we don’t have a biological representation of how BNNs can represent and perform symbolic representation, why do we have reason to believe that we know ANNs can’t?
Without an ability to point to the difference, this isn’t anything close to a reductio, it’s just saying “yeah I don’t buy it dude, I don’t reckon AI will be that good”
Could you mechanistically explain how any of the ‘very many ways’ biological neurons are different mean that the the capacity for symbol manipulation is unique to them?
They’re obviously very different, but what I don’t think you’ve done is show that the differences are responsible for the impossibility of symbolic manipulation in artificial neural networks.
I live in London and have quite a lot of EA and non-EA friends/colleagues/acquaintances, and my impression is that group houses “by choice” are much more common among the EAs. It’s noteworthy that group houses are common among students and lower paid/early stage working professionals for financial reasons though.
If you agree that bundles of biological neurons can have the capacity for symbolic thought, and that non-classical systems can create something symbolic, I don’t understand why you think anything you’ve said shows that DL cannot scale to AGI, even granting your unstated assumption that symbolic thought is necessary for AGI.
(I think that last assumption is false, but don’t think it’s a crux here so I’m keen to grant it for now, and only discuss once we’ve cleared up the other thing)
But humans made python.
If you claim it’s impossible for a non-classical system to create something symbolic, I don’t think you get to hide behind “we don’t know how human cognition works”. I think you been to defend the position that human cognition must be symbolic, and then explain how this arises from biological neural networks but not artificial ones.
I don’t think I quite follow what you consider to be the reductio. In particular, I don’t see why your argument wouldn’t also go through with humans. Why doesn’t the following hold?
Biological Learning (BL) is an alternative form of computation that does not involve classical symbol systems, but instead just a bunch of neurons and some wet stuff, and its amazing success at producing human intelligence shows that human intelligence is not based on classical symbolic systems
Thinking about where to work seems reasonable, listening to others’ thoughts on where to work seems reasonable, this post advises both.
This post also pretty strongly suggests that lesswrong comments are the best choice of others’ thoughts, and I would like to see that claim made explicit and then argued for rather than slipped in. As a couple of other comments have noted, lesswrong is far from a perfect signal of the alignment space.
Feel free to reach out if you do want to re-make my video series! I haven’t had time to improve/continue it, so I can’t promise to be able to offer a lot of support, but I’ll do what I can, and at minimum if you have specific questions about things I mention I’ll try to answer.
The part about newcomers doesn’t reflect my experience FWIW, though my sample size is small. I published a major criticism while a relative newcomer (knew a handful of EAs, mostly online, was working as a teacher, certainly felt like I had no idea what I was doing). Though it wasn’t the goal of doing so, I think that criticism ended causing me to gain status, possibly (though it’s hard to assess accurately) more status that I think I “deserved” for writing it.
[I no longer feel like a newcomer so this is a cached impression from a couple of years ago and should therefore be taken with a pinch of salt]
I’m very happy to see this! Thank you for organising it.