I am Issa Rice. https://issarice.com/
riceissa
Back in July, you held an in-person Q&A at REACH and said “There are a bunch of things about AI alignment which I think are pretty important but which aren’t written up online very well. One thing I hope to do at this Q&A is try saying these things to people and see whether people think they make sense.” Could you say more about what these important things are, and what was discussed at the Q&A?
I read the paper (skipping almost all the math) and Philip Trammell’s blog post. I’m not sure I understood the paper, and in any case I’m pretty confused about the topic of how growth influences x-risk, so I want to ask you a bunch of questions:
-
Why do the time axes in many of the graphs span hundreds of years? In discussions about AI x-risk, I mostly see something like 20-100 years as the relevant timescale in which to act (i.e. by the end of that period, we will either go extinct or else build an aligned AGI and reach a technological singularity). Looking at Figure 7, if we only look ahead 100 years, it seems like the risk of extinction actually goes up in the accelerated growth scenario.
-
What do you think of Wei Dai’s argument that safe AGI is harder to build than unsafe AGI and we are currently putting less effort into the former, so slower growth gives us more time to do something about AI x-risk (i.e. slower growth is better)?
-
What do you think of Eliezer Yudkowsky’s argument that work for building an unsafe AGI parallelizes better than work for building a safe AGI, and that unsafe AGI benefits more in expectation from having more computing power than safe AGI, both of which imply that slower growth is better from an AI x-risk viewpoint?
-
What do you think of Nick Bostrom’s urn analogy for technological developments? It seems like in the analogy, faster growth just means pulling out the balls at a faster rate without affecting the probability of pulling out a black ball. In other words, we hit the same amount of risk but everything just happens sooner (i.e. growth is neutral).
-
Looking at Figure 7, my “story” for why faster growth lowers the probability of extinction is this: The richer people are, the less they value marginal consumption, so the more they value safety (relative to consumption). Faster growth gets us sooner to the point where people are rich and value safety. So faster growth effectively gives society less time in which to mess things up (however, I’m confused about why this happens; see the next point). Does this sound right? If not, I’m wondering if you could give a similar intuitive story.
-
I am confused why the height of the hazard rate in Figure 7 does not increase in the accelerated growth case. I think equation (7) for might be the cause of this, but I’m not sure. My own intuition says accelerated growth not only condenses along the time axis, but also stretches along the vertical axis (so that the area under the curve is mostly unaffected).
As an extreme case, suppose growth halted for 1000 years. It seems like in your model, the graph for hazard rate would be constant at some fixed level, accumulating extinction probability during that time. But my intuition says the hazard rate would first drop near zero and then stay constant, because there are no new dangerous technologies being invented. At the opposite extreme, suppose we suddenly get a huge boost in growth and effectively reach “the end of growth” (near period 1800 in Figure 7) in an instant. Your model seems to say that the graph would compress so much that we almost certainly never go extinct, but my intuition says we do experience a lot of risk for extinction. Is my interpretation of your model correct, and if so, could you explain why the height of the hazard rate graph does not increase?
This reminds me of the question of whether it is better to walk or run in the rain (keeping distance traveled constant). We can imagine a modification where the raindrops are motionless in the air.
-
Can you give some examples of EA organizations that have done things the “right way” (in your view)?
Several background variables give rise to worldviews/outlooks about how to make the transition to a world with AGIs go well. Answering this question requires assigning values to the background variables or placing weights on the various worldviews, and then thinking about how likely “Disneyland with no children” scenarios are under each worldview, by e.g. looking at how they solve philosophical problems (particularly deliberation) and how likely obvious vs non-obvious failures are.
That is to say, I think answering questions like this is pretty difficult, and I don’t think there are any deep public analyses about it. I expect most EAs who don’t specialize in AI alignment to do something on the order of “under MIRI’s views the main difficulty is getting any sort of alignment, so this kind of failure mode isn’t the main concern, at least until we’ve solved alignment; under Paul’s views we will sort of have control over AI systems, at least in the beginning, so this kind of failure seems like one of the many things to be worried about; overall I’m not sure how much weight I place on each view, and don’t know what to think so I’ll just wait for the AI alignment field to produce more insights”.
The inconsistency is itself a little concerning.
I am one of the contributors to the Donations List Website (DLW), the site you link to. DLW is not affiliated with the EA Hotel in anyway (although Vipul, the maintainer of DLW, made a donation to the EA Hotel). Some reasons for the discrepancy in this case:
As stated in bold letters at the top of the page, “Current data is preliminary and has not been completely vetted and normalized”. I don’t think this is the main reason in this case.
Pulling data into DLW is not automatic, so there is a lag between when the donations are made and when they appear on DLW.
DLW only tracks public donations.
The reason may be somewhat simple: most AI alignment researchers do not participate (post or comment) on LW/AF or participate only a little.
I’m wondering how many such people there are. Specifically, how many people (i) don’t participate on LW/AF, (ii) don’t already get paid for AI alignment work, and (iii) do seriously want to spend a significant amount of time working on AI alignment or already do so in their free time? (So I want to exclude researchers at organizations, random people who contact 80,000 Hours for advice on how to get involved, people who attend a MIRI workshop or AI safety camp but then happily go back to doing non-alignment work, etc.) My own feeling before reading your comment was that there are maybe 10-20 such people, but it sounds like there may be many more than that. Do you have a specific number in mind?
if you follow just LW, your understanding of the field of AI safety is likely somewhat distorted
I’m aware of this, and I’ve seen Wei Dai’s post and the comments there. Personally I don’t see an easy way to get access to more private discussions due to a variety of factors (not being invited to workshops, some workshops being too expensive for it to be worth traveling to, not being eligible to apply for certain programs, and so on).
A trend I’ve noticed in the AI safety independent research grants for the past two rounds (April and August) is that most of the grantees have little to no online presence as far as I know (they could be using pseudonyms I am unaware of); I believe Alex Turner and David Manheim are the only exceptions. However, when I think about “who am I most excited to give individual research grants to, if I had that kind of money?”, the names I come up with are people who leave interesting comments and posts on LessWrong about AI safety. (This isn’t surprising because I mostly interact with the AI safety community publicly online, so I don’t have much access to private info.) To give an idea of the kind of people I am thinking of, I would name John Wentworth, Steve Byrnes, Ofer G., Morgan Sinclaire, and Evan Hubinger as examples.
This has me wondering what’s going on. Some possibilities I can think of:
the people who contribute on LW aren’t applying for grants
the private people are higher quality than the online people
the private people have more credentials than the online people (e.g. Hertz Fellowship, math contests experience)
fund managers are more receptive offline than online and it’s easier to network offline
fund managers don’t follow online discussions closely
I would appreciate if the fund managers could weigh in on this so I have a better sense of why my own thinking seems to diverge so much from the actual grant recommendations.
various people’s pressure on OpenPhil to fund MIRI
I’m curious what this is referring to. Are there specific instances of such pressure being applied on Open Phil that you could point to?
this graph is also fairly misleading by putting OpenPhil on the same footing as an individual ETG-funder, although OpenPhil is disbursing wholly 1000x more funds
See my reply to Ozzie.
Also, do you think by moving the nodes around you could reduce the extent to which lines cross over each other, to increase clarity?
I added three additional graphs that use different layout algorithms in here. I don’t know if they’re any better.
I suggest adding labels to the edges to state a rough number of funding
I find that remembering the typical grant/donation size of a donor is easier than remembering all the connections between different donors and donees, so having the edges visually represented (without further decorating the edges) captures most of the value of the exercise. I realize that others who don’t follow the EA granting space as closely as I do may feel differently.
Perhaps it would ideally be an interactive application
I don’t have experience making such applications, so I will let someone else do this.
was there any reason for having Patrick in particular on the top of this?
The node positions were chosen by Graphviz, so I didn’t choose to put Patrick on top. I included Patrick because Vipul suggested doing this (I would guess because Patrick was the most available example of an ETG donor who has given to many x-risk charities).
Funding chains in the x-risk/AI safety ecosystem
I’m not sure I understand the difference between mathematical thinking and mathematical knowledge. Could you briefly explain or give a reference? (e.g. I am wondering what it would look like if someone had a lot of one and very little of the other)
It seems to me that this post has introduced a new definition of cause X that is weaker (i.e. easier to satisfy) than the one used by CEA.
This post defines cause X as:
The concept behind a “cause X” is that there could be a cause neglected by the EA community but that is as important, or more important, to work on than the four currently established EA cause areas.
But from Will MacAskill’s talk:
What are the sorts of major moral problems that in several hundred years we’ll look back and think, “Wow, we were barbarians!”? What are the major issues that we haven’t even conceptualized today?
I will refer to this as Cause X.
See also the first paragraph of Emanuele Ascani’s answer here.
From the “New causes one could consider” list in this post, I think only Invertebrates and Moral circle expansion would qualify as a potential cause X under CEA’s definition (the others already have researchers/organizations working on them full-time, or wouldn’t sound crazy to the average person).
I think it would be good to have a separate term specifically for the cause areas that seem especially crazy or unconceptualized, since searching for causes in this stricter class likely requires different strategies, more open-mindedness, etc.
Related: Guarded definition.
Hi Oliver, are you still planning to reply to this? (I’m not involved with this project, but I was curious to hear your feedback on it.)
filtering by highest rating over several different time ranges
The EA Forum Reader I made a while ago has the ability to do this. The top view shows posts in order of score, and one can filter by various date ranges (“Restrict date range: Today · This week · This month · Last three months · This year · All time” exactly like on the old forum). In addition, the “Archive” links (in the sidebar on desktop, or at the bottom of the page on mobile) in the top view show the top posts from the given time period, so e.g. one can view the top posts in 2018 or the top posts in February 2019.
Open Phil used to have its own page; see e.g. this version and the revision history for some context. (Disclosure: I wrote the original version of the page.)
Animal Charity Evaluators previously had a page on Wikipedia, but was deleted after discussion. You can see a copy of what the page looked like, which can also be used in case someone wants to write the page.
My guess (based on intuition/experience and without spending any time digging up sources) is that almost all of these do not meet Wikipedia’s general notability guideline, so it is somewhat uncertain as to whether they would survive for long (if someone were to write the page). In other words, they might be deleted like the ACE article.
The Chivers book will likely meet the notability criteria for books (if it hasn’t already).
Can you clarify which timezone is being used to determine whether a post is published in one month vs another? (A post I am curious about was published in March in some timezones but in April in others, so I’m wondering if it was even considered for the March prize.)
I don’t see this as a risk for EA/rationalist types though, and would argue that pretty strongly.
Would you be willing to supply this argument? I am very curious to hear more about your thinking on this, as it is something I have wondered about. (For the sake of transparency, I should mention that my own take is that there is a significant risk even for EAs and rationalists to be overtaken by unscientific thinking after strong psychedelic experiences, and that it takes unusually solid worldviews and/or some sort of personality trait that is hard-to-describe in order to resist this influence.)
- 17 Mar 2023 7:14 UTC; 9 points) 's comment on Reminding myself just how awful pain can get (plus, an experiment on myself) by (
On the SSC roadtrip post, you say “After our trip, I’ll write up a post-mortem for other people who might be interested in doing things like this in the future”. Are you still planning to write this, and if so, when do you expect to publish it?