[for context, I’ve talked to Eli about this in person]
I’m interpreting you as having two concerns here.
Firstly, you’re asking why this is different than you deferring to people about the impact of the two orgs.
From my perspective, the nice thing about the impact certificate setup is that if you get paid in org B impact certificates, you’re making the person at orgs A and B put their money where their mouth is. Analogously, suppose Google is trying to hire me, but I’m actually unsure about Google’s long term profitability, and I’d rather be paid in Facebook stock than Google stock. If Google pays me in Facebook stock, I’m not deferring to them about the relative values of these stocks, I’m just getting paid in Facebook stock, such that if Google is overvalued it’s no longer my problem, it’s the problem of whoever traded their Facebook stock for Google stock.
The reason why I think that the policy of maximizing impact certificates is better for the world in this case is that I think that people are more likely to give careful answers to the question “how relatively valuable is the work orgs A and B are doing” if they’re thinking about it in terms of trying to make trades than if some random EA is asking for their quick advice.
Secondly, you’re worrying that people might end up seeming like they’re endorsing an org that they don’t endorse, and that this might harm community epistemics. This is an interesting objection that I haven’t thought much about. A few possible responses:
It’s already currently an issue that people have different amounts of optimism about their workplaces, and people don’t very often publicly state how much they agree and disagree with their employer (though I personally try to be clear about this). It’s unlikely that impact equity trades will exacerbate this problem much.
Also, people often work at places for reasons that aren’t “I think this is literally the best org”, eg:
thinking that the job is fun
the job paying them a high salary (this is exactly analogous to them paying in impact equity of a different org)
thinking that the job will give you useful experience
random fluke of who happened to offer you a job at a particular point
thinking the org is particularly flawed and so you can do unusual amounts of good by pushing it in a good direction
Also, if there were liquid markets in the impact equity of different orgs, then we’d have access to much higher-quality information about the community’s guess about the relative promisingness of different orgs. So pushing in this direction would probably be overall helpful.
This was nice to read, because I’m not sure I’ve ever seen anyone actually admit this before.
Not everyone agrees with me on this point. Many safety researchers think that their path to impact is by establishing a strong research community around safety, which seems more plausible as a mechanism to affect the world 50 years out than the “my work is actually relevant” plan. (And partially for this reason, these people tend to do different research to me.)
You say you think there’s a 70% chance of AGI in the next 50 years. How low would that probability have to be before you’d say, “Okay, we’ve got a reasonable number of people to work on this risk, we don’t really need to recruit new people into AI safety”?
I don’t know what the size of the AI safety field is such that marginal effort is better spent elsewhere. Presumably this is a continuous thing rather than a discrete thing. Eg it seems to me that now compared to five years ago, there are way more people in AI safety and so if your comparative advantage is in some other way of positively influencing the future, you should more strongly consider that other thing.
What do you think about participating in a forecasting platform, e.g. Good Judgement Open or Metaculus? It seems to cover all ingredients, and even be a good signal for others to evaluate your judgement quality.
Seems pretty good for predicting things about the world that get resolved on short timescales. Sadly it seems less helpful for practicing judgement about things like the following:
judging arguments about things like the moral importance of wild animal suffering, plausibility of AI existential risk, and existence of mental illness
predictions about small-scale things like how a project should be organized (though you can train calibration on this kind of question)
Re my own judgement: I appreciate your confidence in me. I spend a lot of time talking to people who have IMO better judgement than me; most of the things I say in this post (and a reasonable chunk of things I say other places) are my rephrasings of their ideas. I think that people whose judgement I trust would agree with my assessment of my judgement quality as “good in some ways” (this was the assessment of one person I asked about this in response to your comment).
It seems that your current strategy is to focus on training, hiring and outreaching to the most promising talented individuals.
This seems like a pretty good summary of the strategy I work on, and it’s the strategy that I’m most optimistic about.
Other alternatives might include more engagement with amatures, and providing more assistance for groups and individuals that want to learn and conduct independent research.
I think that it would be quite costly and difficult for more experienced AI safety researchers to try to cause more good research to happen by engaging more with amateurs or providing more assistance to independent research. So I think that experienced AI safety researchers are probably going to do more good by spending more time on their own research than by trying to help other people with theirs. This is because I think that experienced and skilled AI safety researchers are much more productive than other people, and because I think that a reasonably large number of very talented math/CS people become interested in AI safety every year, so we can set a pretty high bar for which people to spend a lot of time with.
Also, what would change if you had 10 times the amount of management and mentorship capacity?
If I had ten times as many copies of various top AI safety researchers and I could only use them for management and mentorship capacity, I’d try to get them to talk to many more AI safety researchers, through things like weekly hour-long calls with PhD students, or running more workshops like MSFP.
I’m a fairly good ML student who wants to decide on a research direction for AI Safety.
I’m not actually sure whether I think it’s a good idea for ML students to try to work on AI safety. I am pretty skeptical of most of the research done by pretty good ML students who try to make their research relevant to AI safety—it usually feels to me like their work ends up not contributing to one of the core difficulties, and I think that they might have been better off if they’d instead spent their effort trying to become really good at ML in the hope of being better skilled up with the goal of working on AI safety later.
I don’t have very much better advice for how to get started on AI safety; I think the “recommend to apply to AIRCS and point at 80K and maybe the Alignment Newsletter” path is pretty reasonable.
It was a good time; I appreciate all the thoughtful questions.
Most of them are related to AI alignment problems, but it’s possible that I should work specifically on them rather than other parts of AI alignment.
I suppose that the latter goes a long way towards explaining the former.
Yeah, I suspect you’re right.
Personally, there are few technologies that I think are likely to radically change the world within the next 100 years (assuming that your definition of radical is similar to mine). Maybe the only ones that would really qualify are bioengineering and nanotech. Even in those fields, though, I expect the pace of change to be fairly slow if AI isn’t heavily involved.
I think there are a couple more radically transformative technologies which I think are reasonably likely over the next hundred years, eg whole brain emulation. And I suspect we disagree about the expected pace of change with bioengineering and maybe nanotech.
Yeah, makes sense; I didn’t mean “unintentional” by “incidental”.
I think of myself as making a lot of gambles with my career choices. And I suspect that regardless of which way the propositions turn out, I’ll have an inclination to think that I was an idiot for not realizing them sooner. For example, I often have both the following thoughts:
“I have a bunch of comparative advantage at helping MIRI with their stuff, and I’m not going to be able to quickly reduce my confidence in their research directions. So I should stop worrying about it and just do as much as I can.”
“I am not sure whether the MIRI research directions are good. Maybe I should spend more time evaluating whether I should do a different thing instead.”
But even if it feels obvious in hindsight, it sure doesn’t feel obvious now.
So I have big gambles that I’m making, which might turn out to be wrong, but which feel now like they will have been reasonable-in-hindsight gambles either way. The main two such gambles are thinking AI alignment might be really important in the next couple decades and working on MIRI’s approaches to AI alignment instead of some other approach.
When I ask myself “what things have I not really considered as much as I should have”, I get answers that change over time (because I ask myself that question pretty often and then try to consider the things that are important). At the moment, my answers are:
Maybe I should think about/work on s-risks much more
Maybe I spend too much time inventing my own ways of solving design problems in Haskell and I should study other people’s more.
Maybe I am much more productive working on outreach stuff and I should do that full time.
(This one is only on my mind this week and will probably go away pretty soon) Maybe I’m not seriously enough engaging with questions about whether the world will look really different in a hundred years from how it looks today; perhaps I’m subject to some bias towards sensationalism and actually the world will look similar in 100 years.
I hadn’t actually noticed that.
One factor here is that a lot of AI safety research seems to need ML expertise, which is one of my least favorite types of CS/engineering.
Another is that compared to many EAs I think I have a comparative advantage at roles which require technical knowledge but not doing technical research day-to-day.
I’m emphasizing strategy 1 because I think that there are EA jobs for software engineers where the skill ceiling is extremely high, so if you’re really good it’s still worth it for you to try to become much better. For example, AI safety research needs really great engineers at AI safety research orgs.
I worry very little about losing the opportunity to get external criticism from people who wouldn’t engage very deeply with our work if they did have access to it. I worry more about us doing worse research because it’s harder for extremely engaged outsiders to contribute to our work.
A few years ago, Holden had a great post where he wrote:
For nearly a decade now, we’ve been putting a huge amount of work into putting the details of our reasoning out in public, and yet I am hard-pressed to think of cases (especially in more recent years) where a public comment from an unexpected source raised novel important considerations, leading to a change in views. This isn’t because nobody has raised novel important considerations, and it certainly isn’t because we haven’t changed our views. Rather, it seems to be the case that we get a large amount of valuable and important criticism from a relatively small number of highly engaged, highly informed people. Such people tend to spend a lot of time reading, thinking and writing about relevant topics, to follow our work closely, and to have a great deal of context. They also tend to be people who form relationships of some sort with us beyond public discourse.
The feedback and questions we get from outside of this set of people are often reasonable but familiar, seemingly unreasonable, or difficult for us to make sense of. In many cases, it may be that we’re wrong and our external critics are right; our lack of learning from these external critics may reflect our own flaws, or difficulties inherent to a situation where people who have thought about a topic at length, forming their own intellectual frameworks and presuppositions, try to learn from people who bring very different communication styles and presuppositions.
The dynamic seems quite similar to that of academia: academics tend to get very deep into their topics and intellectual frameworks, and it is quite unusual for them to be moved by the arguments of those unfamiliar with their field. I think it is sometimes justified and sometimes unjustified to be so unmoved by arguments from outsiders.
Regardless of the underlying reasons, we have put a lot of effort over a long period of time into public discourse, and have reaped very little of this particular kind of benefit (though we have reaped other benefits—more below). I’m aware that this claim may strike some as unlikely and/or disappointing, but it is my lived experience, and I think at this point it would be hard to argue that it is simply explained by a lack of effort or interest in public discourse.
My sense is pretty similar to Holden’s, though we’ve put much less effort into explaining ourselves publicly. When we’re thinking about topics like decision theory which have a whole academic field, we seem to get very little out of interacting with the field. This might be because we’re actually interested in different questions and academic decision theory doesn’t have much to offer us (eg see this Paul Christiano quote and this comment).
I think that MIRI also empirically doesn’t change its strategy much as a result of talking to highly engaged people who have very different world views (eg Paul Christiano), though individual researchers (eg me) often change their minds from talking to these people. (Personally, I also change my mind from talking to non-very-engaged people.)
Maybe talking to outsiders doesn’t shift MIRI strategy because we’re totally confused about how to think about all of this. But I’d be surprised if we figured this out soon given that we haven’t figured it so far. So I’m pretty willing to say “look, either MIRI’s onto something or not; if we’re onto something, we should go for it wholeheartedly, and I don’t seriously think that we’re going to update our beliefs much from more public discourse, so it doesn’t that seem costly to have our public discourse become costlier”.
I guess I generally don’t feel that convinced that external criticism is very helpful for situations like ours where there isn’t an established research community with taste that is relevant to our work. Physicists have had a lot of time to develop a reasonably healthy research culture where they notice what kinds of arguments are wrong; I don’t think AI alignment has that resource to draw on. And in cases where you don’t have an established base of knowledge about what kinds of arguments are helpful (sometimes people call this “being in a preparadigmatic field”; I don’t know if that’s correct usage), I think it’s plausible that people with different intuitions should do divergent work for a while and hope that eventually some of them make progress that’s persuasive to the others.
By not engaging with critics as much as we could, I think MIRI is probably increasing the probability that we’re barking completely up the wrong tree. I just think that this gamble is worth taking.
I’m more concerned about costs incurred because we’re more careful about sharing research with highly engaged outsiders who could help us with it. Eg Paul has made some significant contributions to MIRI’s research, and it’s a shame to have less access to his ideas about our problems.
I think it’s plausible that “solving the alignment problem” isn’t a very clear way of phrasing the goal of technical AI safety research. Consider the question “will we solve the rocket alignment problem before we launch the first rocket to the moon”—to me the interesting question is whether the first rocket to the moon will indeed get there. The problem isn’t really “solved” or “not solved”, the rocket just gets to the moon or not. And it’s not even obvious whether the goal is to align the first AGI; maybe the question is “what proportion of resources controlled by AI systems end up being used for human purposes”, where we care about a weighted proportion of AI systems which are aligned.
I am not sure whether I’d bet for or against the proposition that humans will go extinct for AGI-misalignment-related-reasons within the next 100 years.
It’s getting late and it feels hard to answer this question, so I’m only going to say briefly:
for something MIRI wrote re this, see the “strategic background” section here
I think there are cases where alignment is non-trivial but prosaic AI alignment is possible, and some people who are cautious about AGI alignment are influential in the groups that are working on AGI development and cause them to put lots of effort into alignment (eg maybe the only way to align the thing involves spending an extra billion dollars on human feedback). Because of these cases, I am excited for the leading AI orgs having many people in important positions who are concerned about and knowledgeable about these issues.
I don’t think you can prep that effectively for x-risk-level AI outcomes, obviously.
I think you can prep for various transformative technologies; you could for example buy shares of computer hardware manufacturers if you think that they’ll be worth more due to increased value of computation as AI productivity increases. I haven’t thought much about this, and I’m sure this is dumb for some reason, but maybe you could try to buy land in cheap places in the hope that in a transhuman utopia the land will be extremely valuable (the property rights might not carry through, but it might be worth the gamble for sufficiently cheap land).
I think it’s probably at least slightly worthwhile to do good and hope that you can sell some of your impact certificates after good AI outcomes.
You should ask Carl Shulman, I’m sure he’d have a good answer.
“Do you have any advice for people who want to be involved in EA, but do not think that they are smart or committed enough to be engaging at your level?”—I just want to say that I wouldn’t have phrased it quite like that.
One role that I’ve been excited about recently is making local groups be good. I think that having better local EA communities might be really helpful for outreach, and lots of different people can do great work with this.
(I’ve spent a few hours talking to people about the LTFF but I’m not sure about things like “what order of magnitude of funding did they allocate last year” (my guess without looking it up is $1M, (which turns out to be correct!)), so take all this with a grain of salt.)
Re Q1: I don’t know, I don’t think that we coordinate very carefully.
Re Q2: I don’t really know. When I look at the list of things the LTFF funded in August or April (excluding regrants to orgs like MIRI, CFAR, and Ought), about 40% look meh (~0.5x MIRI), about 40% look like things which I’m reasonably glad someone funded (~1x MIRI), about 7% are things that I’m really glad someone funded (~3x MIRI), and 3% are things that I wish that they hadn’t funded (-1x MIRI). Note that my mean outcome of the meh, good, and great categories are much higher than the median outcomes—a lot of them are “I think this is probably useless but seems worth trying for value of information”. Apparently this adds up to thinking that they’re 78% as good as MIRI.
Q3: I don’t really know. My median outcome is that they turn out to do less well than my estimation above, but I think there’s a reasonable probability that they turn out to be much better than my estimate above, and I’m excited to see them try to do good. This isn’t really tied up with AI capability or safety progressing though.