AI safety researcher
Thomas Kwa
2-year update on infant outreach
To our knowledge, there have been no significant infant outreach efforts in the past two years. We are deeply saddened by this development, because by now there could have been two full generations of babies, including community builders who would go on to attract even more talent. However, one silver lining is that no large-scale financial fraud has been committed by EA infants.
We think the importance of infant outreach is higher than ever, and still largely endorse this post. However, given FTX events, there are a few changes we would make, including a decreased focus on galactic-scale ambition and especially some way to select against sociopathic and risk-seeking infants. We tentatively propose that future programs favor infants who share their toys, are wary of infants who take others’ toys without giving them back, and never support infants who, when playing with blocks, try to construct tall towers that have high risk of collapse.
This post is important and I agree with almost everything it says, but I do want to nitpick one crucial sentence:
There may well come a day when humanity would tear apart a thousand suns in order to prevent a single untimely death.
I think it is unlikely that we should ever pay the price of a thousand suns to prevent one death, because tradeoffs will always exist. The same resources used to prevent that death could support trillions upon trillions of sentient beings at utopic living standards for billions of years, either biologically or in simulation. The only circumstances where I think such a decision would be acceptable are things like
The “person” we’re trying to save is actually a single astronomically vast hivemind/AI/etc that runs on a star-sized computer and is worth that many resources.
Our moral views at the time dictate that preventing one death now is at least fifteen orders of magnitude worse than extending another being’s life by a billion years.
The action is symbolic, like how in The Martian billions of dollars were spent to save Mark Watney, rather than driven by cause prioritization.
Otherwise, we are always in triage and always will be, and while prices may fluctuate, we will never be rich enough to get everything we want.
- Mar 30, 2024, 11:08 PM; 4 points) 's comment on The Value of a Life by (LessWrong;
My study of the monkeys and infants, i.e. my analysis of past wars, suggested an annual extinction risk from wars of 6.36*10^-14, which is still 1.07 % (= 5.93*10^-12/(5.53*10^-10)) of my best guess.
The fact that one model of one process gives a low number doesn’t mean the true number is within a couple orders of magnitude of that. Modeling mortgage-backed security risk in 2007 using a Gaussian copula gives an astronomically low estimate of something like 10^-200, even though they did in fact default and cause the financial crisis. If the bankers adjusted their estimate upward to 10^-198 it would still be wrong.
IMO it is not really surprising for very near 100% of the risk of something to come from unmodeled risks, if the modeled risk is extremely low. Like say I write some code to generate random digits, and the first 200 outputs are zeros. One might estimate this at 10^-200 probability or adjust upwards to 10^-198, but the probability of this happening is way more than 10^-200 due to bugs.
Don’t have time to reply in depth, but here are some thoughts:
If a risk estimate is used for EA cause prio, it should be our betting odds / subjectie probabilities, that is, average over our epistemic uncertainty. If from our point of view a risk is 10% likely to be >0.001%, and 90% likely to be ~0%, this lower bounds our betting odds at 0.0001%. It doesn’t matter that it’s more likely to be 0%.
Statistics of human height are much better understood than nuclear war because we have billions of humans but no nuclear wars. The situation is more analogous to finding the probability of a 10 meter tall adult human having only ever observed a few thousand monkeys (conventional wars), plus one human infant (WWII) and also knowing that every few individuals humans mutate into an entirely new species (technological progress).
It would be difficult to create a model suggesting a much higher risk because most of the risk comes from black swan events. Maybe one could upper bound the probability by considering huge numbers of possible mechanisms for extinction and ruling them out, but I don’t see how you could get anywhere near 10^-12.
Any probability as low as 5.93*10^-12 about something as difficult to model as the effects of nuclear war on human society seems extremely overconfident to me. Can you really make 1⁄5.93*10^-12 (170 billion) predictions about independent topics and expect to be wrong only once? Are you 99.99% [edit: fixed this number] sure that there is no unmodeled set of conditions under which civilizational collapse occurs quickly, which a nuclear war is at least 0.001% likely to cause? I think the minimum probabilities that one should have given these considerations is not much lower than the superforecasters’ numbers.
There was likely no FTX polycule (a Manifold question resolved 15%) but I was aware that the FTX and Alameda CEOs were dating. I had gone to a couple of FTX events but try to avoid gossip, so my guess is that half of the well-connected EAs had heard gossip about this.
Being attention-getting and obnoxious probably paid off with slavery because abolition was tractable. But animal advocacy is different. I think a big question is whether he was being strategic, or just obnoxious by nature? If we put Benjamin Lay in 2000, would he start cage-free campaigns or become PETA? Or perhaps find some angle we’re overlooking?
My comment is not an ad hominem. An ad hominem attack would be if someone is arguing point X and you distract from X by attacking their character. I was questioning only Remmelt’s ability to distinguish good research from crankery, which is directly relevant to the job of an AISC organizer, especially because some AISC streams are about the work in question by Forrest Landry. I apologize if I was unintentionally making some broader character attack. Whether it’s obnoxious is up to you to judge.
Crossposted from LessWrong.
Maybe I’m being cynical, but I’d give >30% that funders have declined to fund AI Safety Camp in its current form for some good reason. Has anyone written the case against? I know that AISC used to be good by talking to various colleagues, but I have no particular reason to believe in its current quality.
MATS has steadily increased in quality over the past two years, and is now more prestigious than AISC. We also have Astra, and people who go directly to residencies at OpenAI, Anthropic, etc. One should expect that AISC doesn’t attract the best talent.
If so, AISC might not make efficient use of mentor / PI time, which is a key goal of MATS and one of the reasons it’s been successful.
Why does the founder, Remmelt Ellen, keep linkposting writing by Forrest Landry which I’m 90% sure is obvious crankery? It’s not just my opinion; Paul Christiano said “the entire scientific community would probably consider this writing to be crankery”, one post was so obviously flawed it gets −46 karma, and generally the community response has been extremely negative. Some AISC work is directly about the content in question. This seems like a concern especially given the philosophical/conceptual focus of AISC projects, and the historical difficulty in choosing useful AI alignment directions without empirical grounding. [Edit: To clarify, this is not meant to be a character attack. I am concerned that Remmelt does not have the skill of distinguishing crankery from good research, even if he has substantially contributed to AISC’s success in the past.]
All but 2 of the papers listed on Manifund as coming from AISC projects are from 2021 or earlier. Because I’m interested in the current quality in the presence of competing programs, I looked at the two from 2022 or later: this in a second-tier journal and this in a NeurIPS workshop, with no top conference papers. I count 52 participants in the last AISC so this seems like a pretty poor rate, especially given that 2022 and 2023 cohorts (#7 and #8) could both have published by now. (though see this reply from Linda on why most of AISC’s impact is from upskilling)
The impact assessment was commissioned by AISC, not independent. They also use the number of AI alignment researchers created as an important metric. But impact is heavy-tailed, so the better metric is value of total research produced. Because there seems to be little direct research, to estimate the impact we should count the research that AISC alums from the last two years go on to produce. Unfortunately I don’t have time to do this.
Surely as their gold standard “career change” pin-up story, they could find a higher EV career change.
You’re assuming that the EV of switches from global health to biosecurity is lower than the EV of switching from something else to biosecurity. Even though global health is better than most cause areas, this could be false in practice for at least two reasons
If the impact of biosecurity careers is many times higher than the impact of global health, and people currently in global health are slightly more talented, altruistic, or hardworking.
If people currently in global health are not doing the most effective global health interventions.
This article just made HN. It’s a report saying that 39 of 50 top offsetting programs are likely junk, 8 “look problematic”, and 3 lack sufficient information, with none being found good.
I think most climate people are very suspicious of charities like this, rather than or in addition to not believing in ethical offsetting. See this Wendover Productions video on problematic, non-counterfactual, and outright fraudulent climate offsets. I myself am not confident that CATF offsets are good and would need to do a bunch of investigation, and most people are not willing to do this starting from, say, an 80% prior that CATF offsets are bad.
Upvoted. I don’t agree with all of these takes but they seem valuable and underappreciated.
But with no evidence, just your guesses. IMO we should wait until things shake out and even then the evidence will require lots of careful interpretation. Also EA is 2⁄3 male, which means that even minor contributions of women to scandals could mean they cause proportionate harms.
I’m looking for AI safety projects with people with some amount of experience. I have 3⁄4 of a CS degree from Caltech, one year at MIRI, and have finished the WMLB and ARENA bootcamps. I’m most excited about activation engineering, but willing to do anything that builds research and engineering skill.
If you’ve published 2 papers in top ML conferences or have a PhD in something CS related, and are interested in working with me, send me a DM.
Upvoted for making an actual calculation with reasonable numbers.
Is there any evidence for this claim? One can speculate about how average personality gender differences would affect p(scandal), but you’ve just cited two cases where women caused huge harms, which seems to argue neutrally or against you.
Who tends to be clean?
With all the scandals in the last year or two, has anyone looked at which recruitment sources are least likely to produce someone extremely net negative in direct impact or to the community (i.e. a justified scandal)? Maybe this should inform outreach efforts.
In addition to everything mentioned so far, there’s the information and retributive justice effect of the public exposé, which can be positive. As long as it doesn’t devolve into a witch hunt, we want to discourage people from using EA resources and trust in the ways Nonlinear did, and this only works if it’s public. If this isn’t big enough, think about the possibility of preventing FTX. (I don’t know if the actual fraud was preventable, but negative aspects of SBF’s character and the lack of separation between FTX and Alameda could have been well substantiated and made public. Just the reputation of EAs doing due diligence here could have prevented a lot of harm.)
I think someone should do an investigation much wider in scope than what happened at FTX, covering the entire causal chain from SBF first talking to EAs at MIT to the damage done to EA. Here are some questions I’m particularly curious about:
Did SBF show signs of dishonesty early on at MIT? If so, why did he not have a negative reputation among the EAs there?
To what extent did EA “create SBF”—influence the values of SBF and others at FTX? Could a version of EA that placed more emphasis on integrity, diminishing returns to altruistic donations, or something else have prevented FTX?
Alameda was started by various traders from Jane Street, especially EAs. Did they do this despite concerns about how the company would be run, and were they correct to leave at the time?
[edited to add] I have heard that Tara Mac Aulay and others left Alameda in 2018. Mac Aulay claims this was “in part due to concerns over risk management and business ethics”. Do they get a bunch of points for this? Why did this warning not spread, and can we even spread such warnings without overloading the community with gossip even more than it is?
Were Alameda/FTX ever highly profitable controlling for the price of crypto? (edit: ruling out that FTX’s market share was due to artificially tight spreads due to money-losing trades from Alameda). How should we update on the overall competence of companies with lots of EAs?
SBF believed in linear returns to altruistic donations (I think he said this on the 80k podcast), unlike most EAs. Did this cause him to take on undue risk, or would fraud have happened if FTX had a view on altruistic returns similar to that of OP or SFF but linear moral views?
What is the cause of the exceptionally poor media perception of EA after FTX? When i search for “effective altruism news”, around 90% of articles I could find negative and none positive, including many with extremely negative opinions unrelated to FTX. One would expect at least some article saying “Here’s why donating to effective causes is still good”. (In no way do I want to diminish the harms done to customers whose money was gambled away, but it seems prudent to investigate the harms to EA per se)
My guess is that this hasn’t been done simply because it’s a lot of work (perhaps 100 interviews and one person-year of work), no one thinks it’s their job, and conducting such an investigation would somewhat entail someone both speaking for the entire EA movement and criticizing powerful people and organizations.
See also: Ryan Carey’s comment