Program Associate at Open Philanthropy and chair of the Long-Term Future Fund. I spend half my time on AI and half my time on EA community-building. Any views I express on the forum are my own, not the views of my employer.
abergal
How to change minds
I am worried that investing precludes compounding effects from spending on movement building now that don’t have to do with investment. In particular:
Maybe we should care more about the fraction of the world’s population that’s longtermist than the fraction of the world’s wealth that we control.
Maybe a substantial fraction of the world population can become susceptible to longtermism only via slow diffusion from other longtermists, and cannot be converted through money alone.
That is to say: if there’s a sufficient compounding effect from movement building that we can’t replace with money, then maybe we should spend a lot now on movement building.
I haven’t thought through how much of an effect this is, but something with this flavor feels intuitively compelling to me because we’re in a situation now where it would be nice if e.g. key political figures were longtermists, but there’s no obvious way to spend money to make that happen.
Limited data availability and generality in practice now: this paper ( https://arxiv.org/abs/2006.16668 ) about how improving translation performance for “low resource” languages with not many training examples available relies on “positive language transfer” from training on other languages.
Planned summary of the podcast episode for the Alignment Newsletter:
In this podcast, Ben Garfinkel goes through several reasons why he is skeptical of classic AI risk arguments (some previously discussed <@here@>(@How Sure are we about this AI Stuff?@)). The podcast has considerably more detail and nuance than this summary.
Ben thinks that historically, it has been hard to affect transformative technologies in a way that was foreseeably good for the long-term—it’s hard e.g. to see what you could have done around the development of agriculture or industrialization that would have an impact on the world today. He thinks some potential avenues for long-term influence could be through addressing increased political instability or the possibility of lock-in, though he thinks that it’s unclear what we could do today to influence the outcome of a lock-in, especially if it’s far away.
In terms of alignment, Ben focuses on the standard set of arguments outlined in Nick Bostrom’s Superintelligence, because they are broadly influential and relatively fleshed out. Ben has several objections to these arguments:
- He thinks it isn’t likely that there will be a sudden jump to extremely powerful and dangerous AI systems, and he thinks we have a much better chance of correcting problems as they come up if capabilities grow gradually.
- He thinks that making AI systems capable and making AI systems have the right goals are likely to go together.
- He thinks that just because there are many ways to create a system that behaves destructively doesn’t mean that the engineering process creating that system is likely to be attracted to those destructive systems; it seems like we are unlikely to accidentally create systems that are destructive enough to end humanity.
Ben also spends a little time discussing <@mesa-optimization@>(@Risks from Learned Optimization in Advanced Machine Learning Systems@), a much newer argument for AI risk. He largely thinks that the case for mesa-optimization hasn’t yet been fleshed out sufficiently. He also thinks it’s plausible that learning incorrect goals may be a result of having systems that are insufficiently sophisticated to represent goals appropriately. With sufficient training, we may in fact converge to the system we want.
Given the current state of argumentation, Ben thinks that it’s worth EA time to flesh out newer arguments around AI risk, but also thinks that EAs who don’t have a comparative advantage in AI-related topics shouldn’t necessarily switch into AI. Ben thinks it’s a moral outrage that we have spent less money on AI safety and governance than the 2017 movie ‘The Boss Baby’, starring Alec Baldwin.
Planned opinion:
This podcast covers a really impressive breadth of the existing argumentation. A lot of the reasoning is similar to <@that I’ve heard from other researchers@>(@Takeaways from safety by default interviews@). I’m really glad that Ben and others are spending time critiquing these arguments; in addition to showing us where we’re wrong, it helps us steer towards more plausible risky scenarios.
I largely agree with Ben’s criticisms of the Bostrom AI model; I think mesa-optimization is the best current case for AI risk and am excited to see more work on it. The parts of the podcast where I most disagreed with Ben were:
- I think even in the absence of solid argumentation, I feel good about a prior where AI has a non-trivial chance of being existentially threatening, partially because I think it’s reasonable to put AI in the reference class of ‘new intelligent species’ in addition to ‘new technology’.
- I’m not sure that institutions will address failures sufficiently, <@even if progress is gradual and there are warnings@>(@Possible takeaways from the coronavirus pandemic for slow AI takeoff@).
Rohin’s opinion:
I recommend listening to the full podcast, as it contains a lot of detail that wouldn’t fit in this summary. Overall I agree pretty strongly with Ben. I do think that some of the counterarguments are coming from a different frame than the classic arguments. For example, a lot of the counterarguments involve an attempt to generalize from current ML practice to make claims about future AI systems. However, I usually imagine that the classic arguments are basically ignoring current ML, and instead claiming that if an AI system is superintelligent, then it must be goal-directed and have convergent instrumental subgoals. If current ML systems don’t lead to goal-directed behavior, I expect that proponents of the classic arguments would say that they also won’t lead to superintelligent AI systems. I’m not particularly sold on this intuition either, but I can see its appeal.
Fixed! Whoops.
What are the arguments that speeding up economic growth has a positive long run impact?
Movement building and investing to give later
Great post! I most agree with that we should be more clear that things are still very, very uncertain. I think there are several factors that push against this:
The EA community and discourse doesn’t have any formal structure for propagating ideas, unlike academia. You are likely to hear about something if it’s already popular. Critical or new posts and ideas are unpopular by definition to begin with, so they fall by the wayside.
The story for impact for many existing EA organizations often relies on a somewhat narrow worldview. It does seem correct to me that we should both be trying to figure out the truth and taking bets on worlds where we have a lot of important things to do right now. But it’s easy to mentally conflate “taking an important bet” and “being confident that this is what the world looks like”, both from inside and outside an organization. I personally try to pursue a mixed strategy, where I take some actions assuming a particular worldview where I have a lot of leverage now, and some actions trying to get at the truth. But it’s kind of a weird mental state to hold, and I assume most EAs don’t have enough career flexibility to do this.
I do think that the closer you get to people doing direct work, the more people are skeptical and consider alternative views. I think the kind of deference you talk about in this post is much more common among people who are less involved with the movement.
That being said, it’s not great that the ideas that newcomers and people who aren’t in the innermost circles see are not the best representatives of the truth or of the amount of uncertainty involved. I’m interested in trying to think of ways to fix that—like I said, I think it’s hard because there are lots of different channels and no formal mechanism for what ideas “the movement” is exposed to. Without formal mechanisms, it seems hard to leave an equilibrium where a small number of reputable people or old but popular works of literature have disproportionate influence.
That being said, I really appreciate a lot of recent attempts by people to express uncertainty more publically—see e.g. Ben’s podcast, Will’s talk, 80K’s recent posts, my talk and interviews. For better or for worse, it does seem like a small number of individuals have disproportionate influence over the discourse, and as such I think they do have some responsibility to convey uncertainty in a thoughtful way.
This seems right!
I think that instead of talking about potential failures in the way the EA community prioritized AI risk, it might be better to talk about something more concrete, e.g.
The views of the average EA
How much money was given to AI
How many EAs shifted their careers to be AI-focused as opposed to something else that deserved more EA attention
I think if we think there were mistakes in the concrete actions people have taken, e.g. mistaken funding decisions or mistaken career changes (I’m not sure that there were), we should look at the process that led to those decisions, and address that process directly.
Targeting ‘the views of the average EA’ seems pretty hard. I do think it might be important, because it has downstream effects on things like recruitment, external perception, funding, etc. But then I think we need to have a story for how we affect the views of the average EA (as Ben mentions). My guess is that we don’t have a story like that, and that’s a big part of ‘what went wrong’—the movement is growing in a chaotic way that no individual is responsible for, and that can lead to collectively bad epistemics.
‘Encouraging EAs to defer less’ and ‘expressing more public uncertainty’ could be part of the story for helping the average EA have better views. It also seems possible to me that we want some kind of centralized official source for presenting EA beliefs that keeps up to date the best case for and against certain views (though this obviously has its own issues). Then we can be more sure that people have come to their views after being exposed to alternatives, and we can have something concrete to point to when we worry that there hasn’t been enough criticism.
1. Oh man, I wish. :( I do think there are some people working on making a crisper case, and hopefully as machine learning systems get more powerful we might even see early demonstrations. I think the crispest statement of it I can make is “Similar to how humans are now optimizing for goals that are not just the genetic fitness evolution wants, other systems which contain optimizers may start optimizing for goals other than the ones specified by the outer optimizer.”
Another related concept that I’ve seen (but haven’t followed up on) is what johnswentworth calls “Demons in Imperfect Search”, which basically advocates for the possibility of runaway inner processes in a variety of imperfect search spaces (not just ones that have inner optimizers). This arguably happened with metabolic reactions early in the development of life, greedy genes, managers in companies. Basically, I’m convinced that we don’t know enough about how powerful search mechanisms work to be sure that we’re going to end up somewhere we want.
I should also say that I think these kinds of arguments feel like the best current cases for AI alignment risk. Even if AI systems end up perfectly aligned with human goals, I’m still quite worried about what the balance of power looks like in a world with lots of extremely powerful AIs running around.
2. Yeah, here I should have said ‘new species more intelligent than us’. I think I was thinking of two things here:
Humans causing the extinction of less intelligent species
Some folk intuition around intelligent aliens plausibly causing human extinction (I admit this isn’t the best example...).
Mostly I meant here that since we don’t actually have examples of existentially risky technology (yet), putting AI in the reference class of ‘new technology’ might make you think it’s extremely implausible that it would be existentially bad. But we do have examples of species causing the extinction of lesser species (and scarier intuitions around it), so in the sense that AI is a new, more intelligent species, we should think there’s at least some chance that it could be existentially bad.
3. Obviously not the same thing, but ‘The Boss Baby: Back in Business’, a spin-off of the original, not starring Alec Baldwin, is available on Netflix right now. I’ve watched about 20 seconds of it and feel comfortable saying that the money would be better spent on AI safety and governance work.
This is awesome, you’re completely right and I’m totally updating my post with your model.
Seriously. This did an incredible job of crystallizing my own confusions.
Nit: Maybe you mean something stronger than transformative AI here? I don’t know if it makes sense to me that future explosive growth should tell us much about timelines for transformative AI as traditionally defined (as a transition comparable to the agricultural or industrial revolution). If we know that neither past transition caused explosive growth, it feels like we should think it’s quite plausible that transformative AI will have only a moderate impact on the growth rate.
I think everyone agrees that the industrial revolution led to an increase in the growth rate. I think ‘explosive’ growth as Roodman talks about it hasn’t happened yet, so I would avoid that term.
On the acceleration model, the periods from 1500-2000, 10kBC-1500, and “the beginning of history to 10kBC” are roughly equally important data (and if that hypothesis has higher prior I don’t think you can reject that framing). Changes within 10kBC − 1500 are maybe 1/6th of the evidence, and 1⁄3 of the relevant evidence for comparing “continuous acceleration” to “3 exponentials.” I still think it’s great to dig into one of these periods, but I don’t think it’s misleading to present this period as only 1⁄3 of the data on a graph.
I’m going to try and restate what’s going on here, and I want someone to tell me if it sounds right:
If your prior is that growth rate increases happen on a timescale determined by the current growth rate, e.g. you’re likely to have a substantial increase once every N doublings of output, you care more about later years in history when you have more doublings of output. This is what Paul is advocating for.
If your prior is that growth rate increases happen randomly throughout history, e.g. you’re likely to have a substantial increase at an average rate of once every T years, all the years in history should have the same weight. This is what Ben has done in his regressions.
The more weight you start with on the former prior, the more strongly you should weight later time periods.
In particular: If you start with a lot of weight on the former prior, then T years of non-accelerating data at the beginning of your dataset won’t give you much evidence against it, because it won’t correspond to many doublings. But T years of non-accelerating data at the end of your dataset would correspond to many doublings, so would be more compelling evidence against.
Planned summary for the Alignment Newsletter:
In this blog post, Joseph Carlsmith gives a summary of his longer report estimating the number of floating point operations per second (FLOP/s) which would be sufficient to perform any cognitive task that the human brain can perform. He considers four different methods of estimation.
Using the mechanistic method, he estimates the FLOP/s required to model the brain’s low-level mechanisms at a level of detail adequate to replicate human task-performance. He does this by estimating that ~1e13 − 1e17 FLOP/s is enough to replicate what he calls “standard neuron signaling” — neurons signaling to each other via using electrical impulses (at chemical synapses) — and learning in the brain, and arguing that including the brain’s other signaling processes would not meaningfully increase these numbers. He also suggests that various considerations point weakly to the adequacy of smaller budgets.
Using the functional method, he identifies a portion of the brain whose function we can approximate with computers, and then scales up to FLOP/s estimates for the entire brain. One way to do this is by scaling up models of the human retina: Hans Moravec’s estimates for the FLOP/s of the human retina imply 1e12 − 1e15 FLOP/s for the entire brain, while recent deep neural networks that predict retina cell firing patterns imply 1e16 − 1e20 FLOP/s.
Another way to use the functional method is to assume that current image classification networks with known FLOP/s requirements do some fraction of the computation of the human visual cortex, adjusting for the increase in FLOP/s necessary to reach robust human-level classification performance. Assuming somewhat arbitrarily that 0.3% to 10% of what the visual cortex does is image classification, and that the EfficientNet-B2 image classifier would require a 10x to 1000x increase in frequency to reach fully human-level image classification, he gets 1e13 − 3e17 implied FLOP/s to run the entire brain. Joseph holds the estimates from this method very lightly, though he thinks that they weakly suggest that the 1e13 − 1e17 FLOP/s estimates from the mechanistic method are not radically too low.
Using the limit method, Joseph uses the brain’s energy budget, together with physical limits set by Landauer’s principle, which specifies the minimum energy cost of erasing bits, to upper-bound required FLOP/s to ~7e21. He notes that this relies on arguments about how many bits the brain erases per FLOP, which he and various experts agree is very likely to be > 1 based on arguments about algorithmic bit erasures and the brain’s energy dissipation.
Lastly, Joseph briefly describes the communication method, which uses the communication bandwidth in the brain as evidence about its computational capacity. Joseph thinks this method faces a number of issues, but some extremely preliminary estimates suggest 1e14 FLOP/s based on comparing the brain to a V100 GPU, and 1e16 − 3e17 FLOP/s based on estimating the communication capabilities of brains in traversed edges per second (TEPS), a metric normally used for computers, and then converting to FLOP/s using the TEPS to FLOP/s ratio in supercomputers.
Overall, Joseph thinks it is more likely than not that 1e15 FLOP/s is enough to perform tasks as well as the human brain (given the right software, which may be very hard to create). And he thinks it’s unlikely (<10%) that more than 1e21 FLOP/s is required. For reference, an NVIDIA V100 GPU performs up to 1e14 FLOP/s (although FLOP/s is not the only metric which differentiates two computational systems.)Planned opinion:
I really like this post, although I haven’t gotten a chance to get through the entire full-length report. I found the reasoning extremely legible and transparent, and there’s no place where I disagree with Joseph’s estimates or conclusions. See also [Import AI’s summary](https://jack-clark.net/2020/09/14/import-ai-214-nvidias-40bn-arm-deal-a-new-57-subject-nlp-test-ai-for-plant-disease-detection/).
Random thought: I think it would be kind of cool if there were EA forum prizes for people publicly changing their minds in response to comments/ feedback.
Really good question!
We currently have ~$315K in the fund balance.* My personal median guess is that we could use $2M over the next year while maintaining this year’s bar for funding. This would be:
$1.7M more than our current balance
$500K more per year than we’ve spent in previous years
$800K more than the total amount of donations received in 2020 so far
$400K more than a naive guess for what the total amount of donations received will be in all of 2020. (That is, if we wanted a year of donations to pay for a year of funding, we would need $400K more in donations next year than what we got this year.)
Reasoning below:
Generally, we fund anything above a certain bar, without accounting explicitly for the amount of money we have. According to this policy, for the last two years, the fund has given out ~$1.5M per year, or ~$500K per grant round, and has not accumulated a significant buffer.
This round had an unusually large number of high-quality applicants. We spent $500K, but we pushed two large grant decisions to our next payout round, and several of our applicants happened to receive money from another source just before we communicated our funding decision. This makes me think that if this increase in high-quality applicants persists, it would be reasonable to have $600K - $700K per grant round, for a total of ~$2M over the next year.
My personal guess is that the increase in high-quality applications will persist, and I’m somewhat hopeful that we will get even more high-quality applications, via a combination of outreach and potentially some active grantmaking. This makes me think that $2M over the next year would be reasonable for not going below the ROI on the last marginal dollar of the grants we made this year, though I’m not certain. (Of the two other fund managers who have made quantitative guesses on this so far, one fund manager also had $2M as their median guess, while another thought slightly above $1.5M was more likely.)
I also think there’s a reasonable case for having slightly more than our median guess available in the fund. This would both act as a buffer in case we end up with more grants above our current bar than expected, and would let us proactively encourage potential grantees to apply for funding without being worried that we’ll run out of money.
If we got much more money than applications that meet our current bar, we would let donors know. I think we would also consider lowering our bar for funding, though this would only happen after checking in with the largest donors.
* This is less than the amount displayed in our fund page, which is still being updated with our latest payouts.- The Long-Term Future Fund has room for more funding, right now by 29 Mar 2021 1:46 UTC; 127 points) (
- 22 Dec 2020 16:00 UTC; 11 points) 's comment on 2020 AI Alignment Literature Review and Charity Comparison by (
This is a really good post! I have some bold, unsubstantiated claims to make that I’m curious on people’s thoughts on. Source: I’ve done some small amount of security-related work / coursework, hung around a lot of infosec-ey type people in college, and tried to hire a security officer once.
I’ve noticed some hard to articulate but consistent seeming differences in personality / mindset in people I know who work in security. I think it’s plausible that it’s much harder to become “good” at infosec through pursuing an infosec career path than to be come “good” at machine learning by pursuing a machine learning career path. I think this may be especially true the broader you go, e.g. you might be able to become “good” at securing web browsers, but will have trouble transferring general infosec insights to broader problems like biosecurity.
As a result, I think it might be worth EA effort getting people who are already fairly far in the infosec field to be more concerned about GCRs. (Though I think getting people to try infosec careers is also worth it.)
Related to this, many people people I know in infosec think EA concerns about GCRs are wrong for a variety of reasons, even though a lot of them have ex-risky style thoughts about how e.g. surveillance could lead to a totalitarian state with a lot of lock-in. I think this might be an interesting viewpoint difference to look into.