EA and AI safety
Conceptual alignment research at MIRI
I have similar views to Marius’s comment. I did AISC in 2021 and I think it was somewhat useful for starting in AI safety, although I think my views and understanding of the problems were pretty dumb in hindsight.
AISC does seem extremely cheap (at least for the budget options). If you have like 80% on the “Only top talent matters” model (MATS, Astra, others) and 20% on the “Cast a wider net” model (AISC), I would still guess that AISC seems like a good thing to do.
My main worries here are with the negative effects. These are mainly related to the “To not build uncontrollable AI” stream; 3 out of 4 of these seem to be about communication/politics/advocacy.[1] I’m worried about these having negative effects, making the AI safety people seem crazy, uninformed, or careless. I’m mainly worried about this because Remmelt’s recent posting on LW really doesn’t seem like careful or well thought through communication. (In general I think people should be free to do advocacy etc, although please think of externalities) Part of my worry is also from AISC being a place for new people to come, and new people might not know how fringe these views are in the AI safety community.
I would be more comfortable with these projects (and they would potentially still be useful!) if they were focused on understanding the things they were advocating for more. E.g. a report on “How could lawyers and coders stop AI companies using their data?”, rather than attempting to start an underground coalition.
All the projects in the “Everything else” streams (run by Linda) seem good or fine, and likely a decent way to get involved and start thinking about AI safety. Although, as always, there is a risk of wasting time with projects that end up being useless.
[ETA: I do think that AISC is likely good on net.]
The other one seems like a fine/non-risky project related to domain whitelisting.
This is missing a very important point, which is that I think humans have morally relevant experience and I’m not confident that misaligned AIs would. When the next generation replaces the current one this is somewhat ok because those new humans can experience joy, wonder, adventure etc. My best guess is that AIs that take over and replace humans would not have any morally relevant experience, and basically just leave the universe morally empty. (Note that this might be an ok outcome if by default you expect things to be net negative)
I also think that there is way more overlap in the “utility functions” between humans, than between humans and misaligned AIs. Most humans feel empathy and don’t want to cause others harm. I think humans would generally accept small costs to improve the lives of others, and a large part of why people don’t do this is because people have cognitive biases or aren’t thinking clearly. This isn’t to say that any random human would reflectively become a perfectly selfless total utilitarian, but rather that most humans do care about the wellbeing of other humans. By default, I don’t think misaligned AIs will really care about the wellbeing of humans at all.
Yeah, that’s reasonable, as of 5:36pm PST, November 18, 2023 it still seems like a good bet.
I definitely am worried about either Sam Altman + Greg Brockman starting a new, less safety-focused lab, or Sam+Greg somehow returning to OpenAI and removing the safety-focused people from the board.
Even with this, it seems pretty good to have safety-focused people with some influence over OpenAI. I’m a bit confused about situations where it’s like “Yes, it was good to get influence, but it turned out you made a bad tactical mistake and ended up making things worse.”
Based.
Yeah, a more quantitative survey sounds like a useful thing to have, although I don’t have concrete plans to do this currently.
I’m slight wary of causing ‘survey fatigue’ by emailing AI safety people constantly with surveys, but this seems like something that wouldn’t be too fatiguing
Not exactly, but it seems useful to know what other people have done if you want to do similar work to them.
Obviously with all the standard hedges that we don’t want everyone doing exactly the same thing and thinking the same way.
That is definitely part of studying math. The thing I was trying to point to is the process of going from an idea or intuition to something that you can write in math. For example, in linear algebra you might have a feeling about some property of a matrix but then you actually have to show it with math. Or more relevantly, in Optimal Policies Tend to Seek Power it seems like the definition of ‘power’ came from formalizing what properties we would want this thing called ‘power’ to have.
But I’m curious to hear your thoughts on this, and if you think there are other useful ways to develop this ‘formalization’ skill.
I got to the same stage (and also didn’t get in) and had the same experience as you. I was definitely a bit sad about not getting in, but I did appreciate the call and feedback
Maybe some construction megaprojects might count, I’m thinking the Notre-Dame Cathedral which took about 100 years to complete.
This might not really count because the choir was completed after about 20 years. I’m also not sure if it was meant to take so long.
One example would be Benjamin Franklin bequeathing $2,000 to Boston and Philadelphia each, which could only be spent after 200 years
This sounds like an almost exact description of the EA Hotel (CEEALAR), which is mentioned in the post. I think this does a pretty decent job of selecting for ‘genuine EA’ people
For the MIRI Conversations, some people have said they’ll pay at least some money for this https://twitter.com/lxrjl/status/1463845239664394240
How worried are people actually about suffering in neural networks/artificial minds?
(My impression is that this is a fun thing to talk about, but won’t be that useful for a long time)
Updating Moral Beliefs
Imagine there is a box with a ball inside it, and you believe the ball is red. But you also believe that in the future you will update your belief and think that the ball is blue (the ball is a normal, non-color-changing ball). This seems like a very strange position to be in, and you should just believe that the ball is blue now.
This is an example of how we should deal with beliefs in general; if you think in the future you will update a belief in a specific direction then you should just update now.
I think the same principle applies to moral beliefs. If you think that in the future you’ll believe that it’s wrong to do something, then you should believe that it’s wrong now.
As an example of this, if you think that in the future you’ll believe eating meat is wrong, then you sort of already believe eating meat is wrong. I was in exactly this position for a while, thinking in the future I would stop eating meat, while also continuing to eat meat. A similar case to this is deliberately remaining ignorant about something because learning would change your moral beliefs. If you’re avoiding learning about factory farming because you think it would cause you to believe eating factory farmed meat is bad, then you already on some level believe that.
Another case of this is in politics when a politician says it’s ‘not the time’ for some political action but in the future it will be. This is ‘fine’ if it’s ‘not the time’ due to political reasons, such as the electorate not reelecting the politician. But I don’t think it’s consistent to say an action is currently not moral, but will be moral in the future. Obviously this only works if the action now and in the future are actually equivalent.
Wow, that’s great! Very happily surprised that a charity focused on wild animal welfare is getting recognition in an event like this which isn’t explicitly EA.
Flipping the Repugnant Conclusion
Imagine a world populated by many, many (trillions) of people. These people’s lives aren’t purely full of joy, and do have a lot of misery as well. But each person thinks that their life is worth living. Their lives might be a be bit boring or they might be full of huge ups and downs, but on the whole they are net-positive.
From this view it seems really strange to think that it would be good for every person in this world to die/not exist/never have existed in order to allow a very small number of privileged people to live spectacular lives. It seems bad to stop many people from living a life that they mostly enjoy, in order to allow the flourishing of the few.
I think this hypothetical is a decent intuition pump for why the Repugnant Conclusion isn’t actually repugnant. But I do think it might be a little bit dishonest or manipulative. It frames the situation in terms of fairness and equality; we can sympathize with the many slightly happy people who are maybe being denied the right to exist, and think of the few extremely happy people as the privileged elite. It also takes advantage of status quo bias; by beginning with the many slightly happy people it seems worse to then ‘remove’ them.
It seems as if EA organisations were in need of more operations people around 2018 (as evidenced by that 80k article), is there currently a need for more operations people in EA orgs?
Relatedly, how difficult is it to get a position doing operations work for an EA org, especially if you have some but not tonnes of operations experience?
Yep, that is what I was referring to. It does seem like you’re likely to be more careful in the future, but I’m still fairly worried about advocacy done poorly. (Although, like, I also think people should be able to advocacy if they want)