My blog is here. My personal site is here. You can contact me using this form.
L Rudolf L
AI & wisdom 3: AI effects on amortised optimisation
AI & wisdom 2: growth and amortised optimisation
AI & wisdom 1: wisdom, amortised optimisation, and AI
Investigating an insurance-for-AI startup
You discuss three types of AI safety ventures:
Infrastructure: Tooling, mentorship, training, or legal support for researchers.
New AI Safety Organizations: New labs or fellowship programs.
Advocacy Organizations: Raising awareness about the field.
Where would, for example, insurance for AI products fit in this? This is a for-profit idea that creates a natural business incentive to understand & research risks from AI products at a very granular level, and if it succeeds, it puts you into position to influence the entire industry (e.g. “we will lower your premiums if you implement safety measure X”).
I agree that if you restrict yourself to either supporting AIS researchers, launching field-building projects or research labs, or doing advocacy, then you will in fact not find good startup ideas, for the structural reasons you do a good job of listing in your post, as well as the fact that these are all things people are already doing.
METR is a very good AIS org. In addition to just being really solid and competent, a lot of why they succeeded was that they started doing something that few people were thinking about at the time. Everyone and their dog is launching an evals startup today, but the real value is finding ideas like METR before they are widespread. If the startup ideas you consider are all about doing the same thing that existing orgs do, you will miss out on the most important ones.
I do agree that the intersection of impact & profit & bootstrappability is small and hard to hit, and there’s no law of nature that says something should definitely exist there. But something exists in that corner, it will be a novel type of thing.
(reposted from a Slack thread)
Positive visions for AI
I’ve been thinking about this space. I have some ideas for hacky projects in the direction of “argument type-checkers”; if you’re interested in this, let me know
I’d like to add an asterisk. It is true that you can and should support things that seem good while they seem good and then retract support, or express support on the margin but not absolutely. But sometimes supporting things for a period has effects you can’t easily take back. This is especially the case if (1) added marginal support summons some bigger version of the thing that, once in place, cannot be re-bottled, or (2) increased clout for that thing changes the culture significantly (I think cultural changes are very hard to reverse; culture generally doesn’t go back, only moves on).
I think there are many cases where, before throwing their lot in with a political cause for instrumental reasons, people should’ve first paused to think more about whether this is the type of thing they’d like to see more of in general. Political movements also tend to have an enormous amount of inertia, and often end up very influenced by by path-dependence and memetic fitness gradients.
I think it’s worth trying hard to stick to strict epistemic norms. The main argument you bring against is that it’s more effective to be more permissive about bad epistemics. I doubt this. It seems to me that people overstate the track record of populist activism at solving complicated problems. If you’re considering populist activism, I would think hard about where, how, and on what it has worked.
Consider environmentalism. It seems quite uncertain whether the environmentalist movement has been net positive (!). This is an insane admission to have to make, given that the science is fairly straightforward, environmentalism is clearly necessary, and the movement has had huge wins (e.g. massive shift in public opinion, pushing governments to make commitments, & many mundane environmental improvements in developed country cities over the past few decades). However, the environmentalist movement has repeatedly spent enormous efforts on directly harming their stated goals through things like opposing nuclear power and GMOs. These failures seem very directly related to bad epistemics.
In contrast, consider EA. It’s not trivial to imagine a movement much worse along the activist/populist metrics than EA. But EA seems quite likely positive on net, and the loosely-construed EA community has gained a striking amount of power despite its structural disadvantages.
Or consider nuclear strategy. It seems a lot of influence was had by e.g. the staff of RAND and other sober-minded, highly-selected, epistemically-strong actors. Do you want more insiders at think-tanks and governments and companies, and more people writing thoughtful pieces that swing elite opinion, all working in a field widely seen as credible and serious? Or do you want more loud activists protesting on the streets?
I’m definitely not an expert here, but by thinking through what I understand about the few cases I can think of, the impression I get is that activism and protest have worked best to fix the wrongs of simple and widespread political oppression, but that on complex technical issues higher-bandwidth methods are usually how actual progress is made.
I think there are also some powerful but abstract points:
Choosing your methods is not just a choice over methods, but also a choice over who you appeal to. And who you appeal to will change the composition of your movement, and therefore, in the long run, the choice of methods. Consider carefully before summoning forces you can’t control (this applies both to superhuman AI as well as epistemically-shoddy charismatic activist-leaders).
If we make the conversation about AIS more thoughtful, reasonable, and rational, it increases the chances that the right thing (whatever that ends up being—I think we should have a lot of intellectual humility here!) ends up winning. If we make it more activist, political, and emotional, we privilege the voice of whoever is better at activism, politics, and narratives. I think you basically always want to push the thoughtfulness/reasonableness/rationality. This point is made well in one of Scott Alexander’s best essays (see section IV in particular, for the concept of asymmetric vs symmetric weapons). There is a spirit here, of truth-seeking and liberalism and building things, of fighting Moloch rather than sacrificing our epistemics to him for +30% social clout. I admit that this is partly an aesthetic preference on my part. But I do believe in it strongly.
(A) Call this “Request For Researchers” (RFR). OpenPhil has tried a more general version of this in the form of the Century Fellowship, but they discontinued this. That in turn is a Thiel Fellowship clone, like several other programs (e.g. Magnificent Grants). The early years of the Thiel Fellowship show that this can work, but I think it’s hard to do well, and it does not seem like OpenPhil wants to keep trying.
(B) I think it would be great for some people to get support for multiple years. PhDs work like this, and good research can be hard to do over a series of short few-month grants. But also the long durations just do make them pretty high-stakes bets, and you need to select hard not just on research skill but also the character traits that mean people don’t need external incentives.
(C) I think “agenda-agnostic” and “high quality” might be hard to combine. It seems like there are three main ways to select good people: rely on competence signals (e.g. lots of cited papers, works at a selective organisation), rely on more-or-less standardised tests (e.g. a typical programming interview, SATs), or rely on inside-view judgements of what’s good in some domain. New researchers are hard to assess by the first, I don’t think there’s a cheap programming-interview-but-for-research-in-general that spots research talent at high rates, and therefore it seems you have to rely a bunch on the third. And this is very correlated with agendas; a researcher in domain X will be good at judging ideas in that domain, but less so in others.
The style of this that I’d find most promising is:
Someone with a good overview of the field (e.g. at OpenPhil) picks a few “department chairs”, each with some agenda/topic.
Each department chair picks a few research leads who they think have promising work/ideas in the direction of their expertise.
These research leads then get collaborators/money/ops/compute through the department.
I think this would be better than a grab-bag of people selected according to credentials and generic competence, because I think an important part of the research talent selection process is the part where someone with good research taste endorses the agenda takes of someone else on agenda-specific inside-view grounds.
A model of research skill
[Fiction] A Disneyland Without Children
Yes, letting them specifically set a distribution, especially as this was implicitly done anyways in the data analysis, would have been better. We’d want to normalise this somehow, either by trusting and/or checking that it’s a plausible distribution (i.e. sums to 1), or by just letting them rate things on a scale of 1-10 and then getting an implied “distribution” from that.
I agree that this is confusing. Also note:
Interestingly, the increase in perceived comfort with entrepreneurial projects is larger for every org than that for research. Perhaps the (mostly young) fellows generally just get slightly more comfortable with every type of thing as they gain experience.
However, this is additional evidence that ERI programs are not increasing fellows’ self-perceived comfort with research any more than they increase fellows’ comfort with anything. It would be interesting to see if mentors of fellows think they have improved overall; it may be that changes in self-perception and actual skill don’t correlate very much.
And also note that fellows consistently ranked the programs as providing on average slightly higher research skill gain than standard academic internships (average 5.7 on a 1-10 scale where 5 = standard academic internship skill gain, see “”perceived skills and skill changes” section).
I can think of many possible theories, including:
fellows don’t become more comfortable with research despite gaining competence at it because the competence does not lead to feeling good at research (e.g. maybe they update towards research being hard, or there is some form of Dunning-Kruger type thing here, or they already feel pretty comfortable as you mention); therefore self-rated research comfort is a bad indicator and we might instead try e.g. asking their mentors or looking at some other external metric
fellows don’t actually get better at research, but still rate it as a top source of value because they want to think they did, and their comfort with research not staying the same is a more reliable indicator than them marking it as a top source of value (and also they either have a low opinion of skill gain from standard academic internships, or then haven’t experienced those and are just (pessimistically) imagining what it would be like)
The main way to answer this seems to be getting a non-self-rated measure of research skill change.
I’ve now posted my entries on LessWrong:
part 1: wisdom, amortised optimisation, and AI
part 2: growth and amortised optimisation
part 3: AI effects on amortised optimisation
I’d also like to really thank the judges for their feedback. It’s a great luxury to be able to read many pages of thoughtful, probing questions about your work. I made several revisions & additions (and also split the entire thing into parts) in response to feedback, which I think improved the finished sequence a lot, and wish I had had the time to engage even more with the feedback.