AI safety researcher
Thomas Kwa
I’m worried that trying to estimate by looking at wages is subject to lots of noise due to assumptions being violated, which could result in the large discrepancy you see between the two estimates.
One worry: I would guess that Anthropic could derive more output from extra researchers (1.5x/doubling?) than from extra GPUs (1.18x/doubling?), yet it spends more on compute than researchers. In particular I’d guess alpha/beta = 2.5, and wages/r_{research} is around 0.28 (maybe you have better data here). Under Cobb-Douglas and perfect competition these should be equal, but they’re off by a factor of 9! I’m not totally sure but I think this would give you strange parameter values in CES as well. This huge gap between output elasticity and where firms are spending their money is strange to me, so I strongly suspect that one of the assumptions is broken rather than just being some extreme value like −0.10 or 2.58 with large firm fixed effects.
My guess at why: The AI industry is very different than it was in 2012 so it is plausible these firm fixed effects have actually greatly changed over time, which would affect the regression coefficients. Just some examples of possible changes over time:
Some of the compute growth over time is due to partnerships, e.g. Amazon’s $4 billion investment in Anthropic. Maybe these don’t reflect buying compute at market prices.
The value of equity now is hundreds of times higher than in 2017, and employees are compensated mainly in equity.
The cost of hiring might be primarily in onboarding / management capacity rather than wages; this would hit OpenAI and Anthropic harder since 2021 since they’ve both grown furiously in that period.
The whole industry is much larger now and elasticity of substitution might not be constant; if so this is worrying because to predict whether there’s a software-only singularity we’ll need to extrapolate over more orders of magnitude of growth and the human labor → AI labor transition.
Companies could be investing in compute because it increases their revenue (since ChatGPT), or stockpiling compute so they can take advantage of research automation later.
Nevertheless I’m excited about the prospect of estimating and and I’m glad this was posted. Are you planning follow-up work, or is there other economic data we could theoretically collect that could give us higher confidence estimates?
(edited to fix numbers, I forgot 2 boxes means +3dB)
dB is logarithmic so a proportional reduction in sound energy will mean subtracting an absolute number of dB, not a percentage reduction in dB.
HouseFresh tested the AirFanta 3Pro https://housefresh.com/airfanta-3pro-review/ at different voltage levels and found:
12.6 V: 56.3 dBA, 14 minutes
6.54 V: 43.3 dBA, 28 minutes
So basically you subtract 13 dB when halving the CADR. I now realize that if you have two boxes, the sound energy will double (+3dB) and so you’ll actually only get −10 dB from running two at half speed. So a more accurate statement for the Airfanta would be that for −15dB noise at the same CADR, you need something like 2.8 purifiers running at 36% speed. It’s still definitely possible to markedly lower noise by adding more filter area.
Your box fan CR box data tell a similar story. If logarithmic scaling is accurate, the sound reduction for halving CADR would be ln(1/2)/ln(165/239)*(8 dB) = 15 dB, or 12 dB for maintaining CADR with double the units. It just doesn’t have a speed low enough to get these low noise levels (and due to the box fan’s low static pressure you might need to add more filters per fan at low speeds).
Airfanta’s absolute noise levels are high for a CR box type design but this is a device that retails for 298 CNY = $41 USD in China, runs at high speed, and uses near-HEPA (95%) rather than MERV filters so is to be expected.
FWIW I predict they will be a constant factor harder but improve at similar rates. Any particular benchmarks you think I should look at?
I broadly agree with section 1, and in fact since we published I’ve been looking into how time horizon varies between domains. Not only is there lots of variance in time horizon, the rate of increase also varies significantly.
See a preliminary graph plus further observations on LessWrong shortform.
You can get down to 25 dB by running two at half speed. Fan noise is proportional to RPM^5, so 50% speed will mean −15dB noise. The fans just need enough static pressure to maintain close to 50% airflow at 50% speed.
Usage varies—the top five posts on /r/crboxes all use PC fans. Other guides do too, and CleanAirKits and Nukit both describe themselves as PC fan CR boxes.
That’s a box fan CR box; the better design (and the one linked) uses PC fans which are better optimized for noise. I don’t have much first-hand experience with this, but physics suggests that noise from the fan will be proportional to power usage, which is pressure * airflow, if efficiency is constant, and this is roughly consistent with various tests I’ve found online.
Both further upsizing and better sound isolation would be great. What’s the best way to reduce duct noise in practice? Is an 8“ flexible duct quieter than a 6” rigid duct or will most of the noise improvement come from oversizing the termination, removing tight bends or installing some kind of silencer device? I might suggest this to a relative.
Isn’t particulate what we care about? The purpose of the filters is to get particulate out of the air, and the controlled experiment Jeff did basically measures that. If air mixing is the concern, ceiling fans can mix air far more than required, and you can just measure particulate in several locations anyway.
A pair of CR boxes can also get 350 CFM CADR at the same noise level for less materials cost than either this or the ceiling fan, and also have much less installation cost. E.g. two of this CleanAirKits model on half speed would probably cost <$250 if it were mass-produced. This is the setup in my group house living room and it works great! DIY CR boxes can get to $250/350 CFM right now.
The key is having enough filter area to make the static pressure and thus power and noise minimal—the scaling works out such that every doubling of filter area at a given CADR decreases noise by 4.5 dB, assuming noise is proportional to power and pressure goes as (face velocity)^1.5, which are common rules of thumb. I’d guess that the pair of CR boxes has 5x more filter area, so an 11dB advantage for the closet sound isolation to make up. MERV filters also get slightly higher efficiency when the face velocity is slower.
I have used inline fans for other purposes and even the air passing through a 6″ duct generates some noise and adds static pressure. With a CR box you’re doing the minimal work necessary to filter air.
Standard HVAC parts do have many advantages though. The aesthetics are unmatched and all parts are likely to be available, and they’re very durable.
I’m a big fan of this. Imagine if this becomes the primary way billionaires are ranked on prestige
Efficiency can decrease too, especially when there are lots of very small particles like smoke. See this reddit thread: https://www.reddit.com/r/crboxes/comments/1fznar2/comment/lr2j404/.
My understanding is the small particles can basically cover the surface area of the fibers and block their electric field. Here’s an image from one of the linked studies showing filters that are (a) clean, (b) after one test, and (c) having absorbed 2 grams / m^2 of smoke and having its efficiency drop from 92% to 33%.
There are at least three common justifications for not donating, each of which can be quite reasonable:
A high standard of living and saving up money are important selfish wants for EAs in AI, just as they are in broader society.
EAs in AI have needs (either career or personal) that require lots of money.
Donations are much lower impact than one’s career.
I don’t donate to charity other than animal product offsets; this is mainly due to 1 and 2. As for 1, I’m still early career enough that immediate financial stability is a concern. Also for me, forgoing luxuries like restaurant food and travel makes me demotivated enough that I have difficulty working. I have tried to solve this in the past but have basically given up and now treat these luxuries as partially needs rather than wants.
For people just above the top-1% threshold of $65,000, 3 and 2 are very likely. $65,000 is roughly the rate paid to marginal AI safety researchers, so donating 20% will bring only 20% of someone’s career impact even if the grantmakers find an opportunity as good as themself. If they also live in a HCOL area, 2 is very likely—in San Francisco the average rent for a 1bed is $2,962/month and an individual making less than $104,000 qualifies for public housing assistance!
But shouldn’t I have more dedication to the cause and donate anyway? I would prefer to instead spend more effort on getting better at my job (since I’m nowhere near the extremely high skillcap of AI safety research) and working more hours (possibly in ways that funge with donations eg by helping out grantmakers). I actually do care about saving for retirement, and finding a higher-paying job at a lab safety team just so I can donate is probably counterproductive, because trying to split one’s effort between two theories of change while compromising on both is generally bad (see the multipliers post). If I happened to get an equally impactful job that paid double, I would probably start donating after about a year, or sooner if donations were urgent and I expected high job security.
If you’re not yet ready to commit to the 💸11% Pledge, consider taking the 🥤Trial Pledge, which obligates you to spend 5.5% of your income on increasing your productivity but offsets the cost by replacing all your food with Huel.
Did you assume the axiom of choice? That’s a reasonable modeling decision—our estimate used an uninformative prior over whether it’s true, false, or meaningless.
Introducing The Spending What We Must Pledge
It was mentioned at the Constellation office that maybe animal welfare people who are predisposed to this kind of weird intervention are working on AI safety instead. I think this is >10% correct but a bit cynical; the WAW people are clearly not afraid of ideas like giving rodents contraceptives and vaccines. My guess is animal welfare is poorly understood and there are various practical problems like preventing animals that don’t feel pain from accidentally injuring themselves constantly. Not that this means we shouldn’t be trying.
The majority of online articles about effective altruism have always been negative (it used to be 80%+). In the past, EAs were coached not to talk to journalists, and perhaps people finally reversing this is why things are getting better, so I appreciate anyone who does it.
Of course there is FTX, but that doesn’t explain everything—many recent articles including this are mostly not about FTX. At the risk of being obvious, for an intelligent journalist (as many are) to write a bad critique despite talking to thoughtful people, it has to be that a negative portrayal of EA serves their agenda far better than a neutral or positive one. Maybe that agenda is advocating for particular causes, a progressive politics that unfortunately aligns with Torres’ personal vendetta, or just a deep belief that charity cannot or should not be quantified or optimized. In these cases maybe there is nothing we can do except promote the ideas of beneficentrism, triage, and scope sensitivity, continue talking to journalists, and fix both the genuine problems and perceived problems created by FTX, until bad critiques are no longer popular enough to succeed.
The Pulse survey has now basically allayed all of my concerns.
Thanks, I’ve started donating $33/month to the FarmKind bonus fund, which is double the calculator estimate for my diet. [1] I will probably donate ~$10k of stocks in 2025 to offset my lifetime diet impact—is there any reason not to do this? I’ve already looked at the non-counterfactual matching argument, which I don’t find convincing.
[1] I basically never eat chicken, substituting it with other meats, so I reduced the poultry category by 2⁄3 and allocated that proportionally between the beef and pork categories.
If your algorithms get more efficient over time at both small and large scales, and experiments test incremental improvements to architecture or data, then they should get cheaper to run proportionally to algorithmic efficiency of cognitive labor. I think this is better as a first approximation than assuming they’re constant, and might hold in practice especially when you can target small-scale algorithmic improvements.