Software engineer interested in AI safety.
Stephen McAleese
At OpenAI, I’m pretty sure there are far more people working on near-term problems that long-term risks. Though the Superalignment team now has over 20 people from what I’ve heard.
Thanks for the post. It was an interesting read.
According to The Case For Strong Longtermism, 10^36 people could ultimately inhabit the Milky Way. Under this assumption, one micro-doom is equal to 10^30 expected lives.
If a 50%-percentile AI safety researcher reduces x-risk by 31 micro-dooms, they could save about 10^31 expected lives during their career or about 10^29 expected lives per year of research. If the value of their research is spread out evenly across their entire career, then each second of AI safety research could be worth about 10^22 expected future lives which is a very high number.
These numbers sound impressive but I see several limitations of these kinds of naive calculations. I’ll use the three-part framework from What We Owe the Future to explain them:
Significance: the value of research tends to follow a long-tailed curve where most papers get very few citations and a few get an enormous number. Therefore, most research probably has low value.
Contingency: the value of some research is decreased if it would have been created anyway at some later point in time.
Longevity: it’s hard to produce research that has a lasting impact on a field or the long-term trajectory of humanity. Most research probably has a sharp drop off in impact after it is published.
After taking these factors into account, I think the value of any given AI safety research is probably much lower than naive calculations suggest. Therefore, I think grant evaluators should take into account their intuitions on what kinds of research are most valuable rather than relying on expected value calculations.
Thanks for pointing this out. I didn’t know there was a way to calculate the exponentially moving average (EMA) using NumPy.
Previously I was using alpha = 0.33 for weighting the current value. When that value is plugged into the formula alpha = 2 / N + 1, it means I was averaging over the past 5 years.
I’ve now decided to average over the past 4 years so the new alpha value is 0.4.
I recommend this web page for a narrative on what’s happening in our world in the 21st century. It covers many themes such as the rise of the internet, the financial crisis, covid, global warming, AI and demographic decline.
Thanks for the post. Until now, I used to learn about what LTFF funds by manually reading through its grants database. It’s helpful to know what the funding bar looks like and how it would change with additional funding.
I think increased transparency is helpful because it’s valuable for people to have some idea of how likely their applications are to be funded if they’re thinking of making major life decisions (e.g. relocating) based on them. More transparency is also valuable for funders who want to know how their money would be used.
According to Price’s Law, the square root of the number of contributors contributes half of the progress. If there are 400 people working on AI safety full-time then it’s quite possible that just 20 highly productive researchers are making half the contributions to AI safety research. I expect this power law to apply to both the quantity and the quality of research.
I’m surprised that GPT-4 can’t play tic tac toe given that there’s evidence that it can play chess pretty well (though it eventually makes illegal moves).
Thanks for spotting that. I updated the post.
I like the AI Alignment Wikipedia page because it provides an overview of the field that’s well-written, informative, and comprehensive.
Excellent story! I believe there’s strong demand for scenarios explaining how current AI systems could go on to have a catastrophic effect on the world and the story you described sounds very plausible.
I like how the story combines several key AI safety concepts such as instrumental convergence and deceptive alignment with a description of the internal dynamics of the company and its interaction with the outside world.
AI risk has been criticized as implausible given the current state of AI (e.g. chatbots) but your realistic story describes how AI in its present form could eventually cause a catastrophe if it’s not developed safely.
Thanks for writing the post.
I know the sequence is about criticisms of labs but I personally think I would get more value if the post focused mainly on describing what the lab is doing with less about evaluating the organization because I think that the reader can form their own opinion themselves given an informative description. To use more technical language, I would be more interested in a descriptive post than a normative one.
My high-level opinion is that the post is somewhat more negative than I would like. My general sentiment on Conjecture is that it’s one of the few AI safety labs that has been established outside of the bay area and the US.
As a result, Conjecture seems to have significantly boosted London as an AI safety hub which is extremely valuable because London is much more accessible for Europeans interested in AI safety than the bay area.
I think only person can do this every year because any other 0-word post would be a duplicate.
Great post. What I find most surprising is how small the scalable alignment team at OpenAI is. Though similar teams in DeepMind and Anthropic are probably bigger.
I added them to the list of technical research organizations. Sorry for the delay.
Inspiring progress! This post is a positive update for me.
Good point. It’s important to note that black swans are subjective and depend on the person. For example, a Christmas turkey’s slaughter is a black swan for it but not for its butcher.
I disagree because I think these kinds of post hoc explanations are invalidated by the hindsight fallacy. I think the FTX crash was a typical black swan because it seems much more foreseeable in retrospect than it was before the event.
To use another example, the 2008 financial crisis made sense in retrospect, but the Big Short movie shows that, before the event, even the characters shorting the mortgage bonds had strong doubts about whether they were right and most other people were completely oblivious.
Although the FTX crisis makes sense in retrospect, I have to admit that I had absolutely no idea that it was about to happen before the event.
Thanks! I used that format because it was easy for me to write. I’m glad to see that it improves the reading experience too.
I really like this post and I think it’s now my favorite post so far on the recent collapse of FTX.
Many recent posts on this subject have focused on topics such as Sam Bankman Fried’s character, what happened at FTX and how it reflects on EA as a whole.
While these are interesting subjects, I got the impression that a lot of the posts were too backward-looking and not constructive enough.
I was looking for a post that was more reflective and less sensational and focused on what we can learn from the experience and how we can adjust the strategy of EA going forward and I think this post meets these criteria better than most of the previous posts.
The Superalignment team currently has about 20 people according to Jan Leike. Previously I think the scalable alignment team was much smaller and probably only 5-10 people.