AFAIK this is not something that can be shared publicly.
oh54321
Ah, sure. Justification is something like there are many people who would be great at running FHI and would want to, I’m guessing there is someone who is much better for optics and would make people more comfortable than Bostrom. Replacing him with one of these just-as-capable people seems to have few downsides and several upsides.
Hm, fairly confused by the downvotes. I’m guessing a) people disagree with this being a good decision or b) there’s a really obvious answer? If it’s b), can you please tell me?
This makes more sense. I still feel a bit irked by the downvotes though—I would like people to be aware of the email, and feel much more strongly about this than about not wanting people to see some of pseudonym’s takes about the apology.
While I agree that these kinds of “bad EA optics” posts are generally unproductive and it makes sense for them to get downvoted, I’m surprised that this specific one isn’t getting more upvoted? Unlike most links to hit pieces and criticisms of EA, this post actually contains new information that has changed my perception of EA and EA leadership.
with less intensity, we should discourage the framing of ‘auditing’ very established journalists for red flags
Why? If I was making a decision to be interviewed by Rachel or not, probably the top thing I’d be worried about is whether they’ve previously written not-very-journalistic hit pieces on tech-y people (which is not all critical pieces in general! some are pretty good and well researched). I agree that there’s such thing as going too far, but I don’t think my comment was doing that.
I think “there are situations this is valid (but not for the WSJ!)” is wrong? There have been tons of examples of kind of crap articles in usually highly credible newspapers. For example, this article in the NYT seemed to be pretty wrong and not that good.
I think it makes more sense to look at articles that Rachel has written about SBF/EA. Here’s one:
I (very briefly) skimmed it and didn’t see any major red flags.
Not an answer, but why are you trying to do this? If you’re excited about Biology, there seem to be plenty of ways to do impactful biological work.
Even if you’re purely trying to maximize your impact, for areas like AI Alignment, climate change, or bioweapons, the relevant question is something like: what is the probability that me working on this area prevents a catastrophic event? According to utilitarianism, your # of lives saved is basically this time the total number of people that will ever live or something like this.
So if there’s a 10% chance of AI killing anyone, and you working on this brings it down to 9.999999%, this is less impactful than if there’s a 0.5 % chance of climate change killing everyone, and you working on this brings it down to 0.499 %. Since it’s much more likely to do impactful work in an area that excites you, seems like bio is solid, since it’s relevant to bioweapons and climate change?
Ok, cool, that’s helpful to know. Is your intuition that these examples will definitely occur and we just haven’t seen them yet (due to model size or something like this)? If so, why?
Right, so I’m pretty on board with optimal policies (i.E., “global maximum” policies) usually involve seeking power. However, gradient descent only finds local maximums, not global maximums. It’s unclear to me whether these global maximums would involve something like power-seeking. My intuition for why this might not be the case is that “small tweaks” in the direction of power-seeking would probably not reap immediate benefits, so gradient descent wouldn’t go down this path.
This is where my question kind of arose from. If you have empirical examples of power-seeking coming up in tasks where it’s nontrivial that it would come up, I’d find that particularly helpful.
Does the paper you sent address this? If so, I’ll spend more time reading it.
Thanks! I think most of this made sense to me. I’m a bit fuzzy on the fourth bullet. Also, I’m still confused why a model would even develop an alternative goal to maximizing its reward function, even if it’s theoretically able to pursue one.
Should we expect power-seeking to often be naturally found by gradient descent? Or should we primarily expect it to come up when people are deliberately trying to make power-seeking AI, and train the model as such?
“RL agents with coherent preference functions will tend to be deceptively aligned by default.”—Why?
I’m relatively unconvinced by most arguments I’ve read that claim deceptive alignment will be a thing (which I understand to be a model that intentionally behaves differently on its training data and test data to avoid changing its parameters in training).
Most toy examples I’ve seen, or thought experiments, don’t really seem to actually be examples of deceptive alignment since the model is actually trained on the “test” data in these examples. For example, while humans can decieve their teachers in etiquette school then use poor manners outside the school, you modify neurons in your brain on both using manners inside etiquette school and using them outside of etiquette school, so it makes sense that you would distinguish them.
I certainly think it is possible for a model to be deceptive, but this seems to be much, much more complicated and harder-to-find in the gradient descent landscape, and it seems to me that there’s pretty much no reason for a model to learn to be deceptive in the first place. This makes it seem like it won’t come up at all in practice, or at the least will be very easy to avoid.
Why do people buy deceptive alignment? Could you give some concrete examples where it could come up?
[Question] What are the best ways to encourage de-escalation in regards to Ukraine?
Thanks! Why did you not put this on EA forum?
This is extremely helpful, thank you.
[Question] What highly-impactful work is the most similar to solving fun math puzzles (if any)?
From what I’ve heard, Cambridge EAs have already been elected to positions in the Cambridge Union. I think several EAs at Oxford have also been in Oxford Union-adjacent circles, although it seems like most haven’t engaged with the Union due to lack of interest.
FWIW, I’m guessing most people focused on a social science at Oxbridge have already made up their mind about whether they’d like to network at the Unions or try their hand at getting elected for something. That being said, it seems like relatively few people have considered the potential impact of voting in said elections, as well as other things that would not require serious engagement.
Source is that I remember Ajeya mentioning at one point that it led to positive changes and she doesn’t think it was a bad decision in retrospect, but cannot get into said changes for NDA reasons.