For deception (not deceptive alignment) - AI Deception: A Survey of Examples, Risks, and Potential Solutions (section 2)
Tom Barnes
Relationship between EA Community and AI safety
[Linkpost] Michael Nielsen remarks on ‘Oppenheimer’
This looks very exciting, thanks for posting!
I’ll quickly mention a couple of things that stuck out to me that might make the CEA potentially overoptimistic:
IQ points lost per μg/dl of lead—this is likely a logarithmic relationship (as suggested by Bernard and Schukraft). For a BLL of 2.4 − 10 μg/dl, IQ loss from 1 μg/dl increase may be close to 0.5, but above 10, it’s closer to 0.2 per 1 μg/dl increase, and above 20, closer to 0.1. Average BLL in Bangladesh seem to be around 6.8 μg/dl, though amongst residents living near turmeric sources of lead, it could plausibly be (much) higher, and thus a lower IQ gain will be had from a 1 μg/dl reduction in lead.
Income loss per IQ points loss—The CEA assumes that 1 IQ point loss leads to a 2.1% reduction in income. However, some work by GiveWell (here) suggests this might be closer to 0.67% (and there might be some reasons to discount this further, e.g. due to replicability concerns)
Replicability of intervention—as noted in the text, it’s hard to estimate how much the Bangladesh program reduced lead exposure by. If Bangladesh’s average BLL level is around 6.8 μg/dl, then a 1.64 reduction from the intervention implies this intervention cut BLL by 25% for half of children in Bangladesh. This is great, but I can see several reasons why this may not be informative of future programs’ cost-effectiveness
Maybe Turmeric is much more prevalent in rural Bangladesh than other regions
Maybe it was unusually easy to get regulators to agree to introduce / enforce standards
Maybe it was unusually easy to get producers to switch away from lead chromate
Each of these reasons on their own is fairly weak, but the likelihood of at least one being true gives us reason to discount future cost-effectiveness analyses. More generally, we might expect some regression to the mean w.r.t reducing exposure from tulmeric—maybe everything went right for this particular program, but this is unlikely to be true in future programs. To be clear, there are likely also reasons that this analysis is too pessimistic, and thus on net it may be the case that cost-effectiveness remains at $1/ DALY (or even better). Nonetheless, I think it’s good to be cautious, since $1 / DALY implies this program was >800x better than cash transfers and >80x better than GiveWell’s top charities—a strong claim to make (though still possible!)
My bad, thanks so much!
It would be great to have some way to filter for multiple topics.
Example: Suppose I want to find posts related to the cost-effectiveness of AI safety. Instead of just filtering for “AI safety”, or for just “Forecasting and estimation”, I might want to find posts only at the intersection of those two. I attempted to do this by customizing my frontpage feed, but this doesn’t really work (since it heavily biases to new/upvoted posts)
it relies primarily on heuristics like organiser track record and higher-level reasoning about plans.
I think this is mostly correct, with the caveat that we don’t exclusively rely on qualitative factors and subjective judgement alone. The way I’d frame it is more as a spectrum between
[Heuristics] <------> [GiveWell-style cost-effectiveness modelling]
I think I’d place FP’s longtermist evaluation methodology somewhere between those two poles, with flexibility based on what’s feasible in each cause
I’ll +1 everything Johannes has already said, and add that several people (including myself) have been chewing over the “how to rate longtermist projects” question for quite some time. I’m unsure when we will post something publicly, but I hope it won’t be too far in the future.
If anyone is curious for details feel free to reach out!
Quick take: renaming shortforms to Quick takes is a mistake
Tom Barnes’s Quick takes
This looks super interesting, thanks for posting! I especially appreciate the “How to apply” section
One thing I’m interested in is seeing how this actually looks in practice—specifying real exogenous uncertainties (e.g. about timelines, takeoff speeds, etc), policy levers (e.g. these ideas, different AI safety research agendas, etc), relations (e.g. between AI labs, governments, etc) and performance metrics (e.g “p(doom)”, plus many of the sub-goals you outline). What are the conclusions? What would this imply about prioritization decisions? etc
I appreciate this would be super challenging, but if you are aware of any attempts to do it (even if using just a very basic, simplifying model), I’d be curious to hear how it’s gone
Should recent ai progress change the plans of people working on global health who are focused on economic outcomes?
I think so, see here or here for a bit more discussion on this
If you think that AI will go pretty well by default (which I think many neartermists do)
My guess/impression is that this just hasn’t been discussed by neartermists very much (which I think is one sad side-effect from bucketing all AI stuff in a “longtermist” worldview)
Great question!
One can claim Gift Aid on a donation to the Patient Philanthropy Fund (PPF), e.g. if donating through Giving What We Can. So a basic rate taxpayer gets a 25% “return” on the initial donation (via gift aid). The fund can then be expected to make a financial return equivalent to an index fund (~10% p.a for e.g. S&P 500).
So, if you buy the claim that your expected impact will be 9x larger in 10 years than today, then a £1,000 donation today will have an expected (mean) impact of £11,250, for longtermist causes (£1,000 * 1.25 * 9)[1]
Therefore I think the question of:
“donate now and claim gift aid” OR “invest then donate later”
...can be reframed as:
“donate now and claim gift aid” OR “donate to (e.g. PPF) now and claim gift aid, for the PPF to invest and then donate later”
(I.e. I think gift aid considerations don’t favour one option over the other)
Of course, one may reasonably disagree on giving now vs giving later—this is a much more messy question, and one that I won’t attempt to answer here.
I’m not sure about paying into an organisation’s fund.
I think that conditional on giving later, the PPF is a better option than individually taking an “investing to give” approach (roughly for reasons described here)
(disclaimer: I work on the operations side of the PPF)
- ^
A £1,000 donation becomes $1,250 for a basic rate taxpayer. Over 10 years, expected impact will increase by 9x (using the Investing to Give report model’s mean estimate)
Using the same logic for global health or animal welfare, your expected (mean) impact from a £1,000 donation in 10 years would be £2,625 (£1,000 * 1.25 * 2.1x) and £5,250 (£1,000 * 1.25 * 4.2x).
Note however that no “PPF equivalent” for global health or animal welfare currently exists, AFAIK
- ^
How Founders Pledge’s Patient Philanthropy Fund and Global Catastrophic Risks Fund Work Together
Air Pollution: Founders Pledge Cause Report
I think this could be an interesting avenue to explore. One very basic way to (very roughly) do this is to model p(doom) effectively as a discount rate. This could be an additional user input on GiveWell’s spreadsheets.
So for example, if your p(doom) is 20% in 20 years, then you could increase the discount rate by roughly 1% per year
[Techinically this will be somewhat off since (I’m guessing) most people’s p(doom) doesn’t increase at a constant rate, in the way a fixed discount rate does.]
Rob Besinger of MIRI tweets:
...I’m happy to say that MIRI leadership thinks “humanity never builds AGI” would be the worst catastrophe in history, would cost nearly all of the future’s value, and is basically just unacceptably bad as an option.
Just to add that the Research Institute for Future Design (RIFD) is a Founders Pledge recommendation for longtermist institutional reform
(disclaimer: I am a researcher at Founders Pledge)
OpenPhil might be in a position to expand EA’s expected impact if it added a cause area that allowed for more speculative investments in Global Health & Development.
My impression is that Open Philanthropy’s Global Health and Development team already does this? For example, OP has focus areas on Global aid policy, Scientific research and South Asian air quality, areas which are inherently risky/uncertain.
They have also take a hit based approach philosophically, and this is what distinguishes them from GiveWell—see e.g.
Hits. We are explicitly pursuing a hits-based approach to philanthropy with much of this work, and accordingly might expect just one or two “hits” from our portfolio to carry the whole. In particular, if one or two of our large science grants ended up 10x more cost-effective than GiveWell’s top charities, our portfolio to date would cumulatively come out ahead. In fact, the dollar-weighted average of the 33 BOTECs we collected above is (modestly) above the 1,000x bar, reflecting our ex ante assessment of that possibility. But the concerns about the informational value of those BOTECs remain, and most of our grants seems noticeably less likely to deliver such “hits”.
[Reposting my comment here from previous version]
Following the episode with Mustafa, it would be great to interview the founders of leading AI labs—perhaps Dario (Anthropic) [again], Sam (OpenAI), or Demis (DeepMind). Or alternatively, the companies that invest / support them—Sundar (Google) or Satya (Microsoft).
It seems valuable to elicit their honest opinions[1] about “p(doom)”, timelines, whether they believe they’ve been net-positive for the world, etc.
I think one risk here is either:
a) not challenging them firmly enough—lending them undue credibility / legitimacy in the minds of listeners
b) challenging them too strongly—reducing willingness to engage, less goodwill, etc