Focus on the places where you feel shocked everyone’s dropping the ball
Writing down something I’ve found myself repeating in different conversations:
If you’re looking for ways to help with the whole “the world looks pretty doomed” business, here’s my advice: look around for places where we’re all being total idiots.
Look for places where everyone’s fretting about a problem that some part of you thinks it could obviously just solve.
Look around for places where something seems incompetently run, or hopelessly inept, and where some part of you thinks you can do better.
Then do it better.
For a concrete example, consider Devansh. Devansh came to me last year and said something to the effect of, “Hey, wait, it sounds like you think Eliezer does a sort of alignment-idea-generation that nobody else does, and he’s limited here by his unusually low stamina, but I can think of a bunch of medical tests that you haven’t run, are you an idiot or something?” And I was like, “Yes, definitely, please run them, do you need money”.
I’m not particularly hopeful there, but hell, it’s worth a shot! And, importantly, this is the sort of attitude that can lead people to actually trying things at all, rather than assuming that we live in a more adequate world where all the (seemingly) dumb obvious ideas have already been tried.
Or, this is basically my model of how Paul Christiano manages to have a research agenda that seems at least internally coherent to me. From my perspective, he’s like, “I dunno, man, I’m not sure I can solve this, but I also think it’s not clear I can’t, and there’s a bunch of obvious stuff to try, that nobody else is even really looking at, so I’m trying it”. That’s the sort of orientation to the world that I think can be productive.
Or the shard theory folks. I think their idea is basically unworkable, but I appreciate the mindset they are applying to the alignment problem: something like, “Wait, aren’t y’all being idiots, it seems to me like I can just do X and then the thing will be aligned”.
I don’t think we’ll be saved by the shard theory folk; not everyone audaciously trying to save the world will succeed. But if someone does save us, I think there’s a good chance that they’ll go through similar “What the hell, are you all idiots?” phases, where they autonomously pursue a path that strikes them as obviously egregiously neglected, to see if it bears fruit. (Regardless of what I think.)
Contrast this with, say, reading a bunch of people’s research proposals and explicitly weighing the pros and cons of each approach so that you can work on whichever seems most justified. This has more of a flavor of taking a reasonable-sounding approach based on an argument that sounds vaguely good on paper, and less of a flavor of putting out an obvious fire that for some reason nobody else is reacting to.
I dunno, maybe activities of the vaguely-good-on-paper character will prove useful as well? But I mostly expect the good stuff to come from people working on stuff where a part of them sees some way that everybody else is just totally dropping the ball.
In the version of this mental motion I’m proposing here, you keep your eye out for ways that everyone’s being totally inept and incompetent, ways that maybe you could just do the job correctly if you reached in there and mucked around yourself.
That’s where I predict the good stuff will come from.
And if you don’t see any such ways?
Then don’t sweat it. Maybe you just can’t see something that will help right now. There don’t have to be ways you can help in a sizable way right now.
I don’t see ways to really help in a sizable way right now. I’m keeping my eyes open, and I’m churning through a giant backlog of things that might help a nonzero amount—but I think it’s important not to confuse this with taking meaningful bites out of a core problem the world is facing, and I won’t pretend to be doing the latter when I don’t see how to.
Like, keep your eye out. For sure, keep your eye out. But if nothing in the field is calling to you, and you have no part of you that says you could totally do better if you deconfused yourself some more and then handled things yourself, then it’s totally respectable to do something else with your hours.
If you don’t have an active sense that you could put out some visibly-raging fires yourself (maybe after skilling up a bunch, which you also have an active sense you could do), then I recommend stuff like cultivating your ability to get excited about things, and doing other cool stuff.
Sure, most stuff is lower-impact than saving the world from destruction. But if you can be enthusiastic about all the other cool ways to make the world better off around you, then I’m much more optimistic that you’ll be able to feel properly motivated to combat existential risk if and when an opportunity to do so arises. Because that opportunity, if you get one, probably isn’t going to suddenly unlock every lock on the box your heart hides your enthusiasm in, if your heart is hiding your enthusiasm.
See also Rob Wiblin’s “Don’t pursue a career for impact — think about the world’s most important, tractable and neglected problems and follow your passion.”
Or the Alignment Research Field Guide’s advice to “optimize for your own understanding” and chase the things that feel alive and puzzling to you, as opposed to dutifully memorizing other people’s questions and ideas. “[D]on’t ask “What are the open questions in this field?” Ask: “What are my questions in this field?””
I basically don’t think that big changes come from people who aren’t pursuing a vision that some part of them “believes in”, and I don’t think low-risk, low-reward, modest, incremental help can save us from here.
To be clear, when I say “believe in”, I don’t mean that you necessarily assign high probability to success! Nor do I mean that you’re willing to keep trying in the face of difficulties and uncertainties (though that sure is useful too).
English doesn’t have great words for me to describe what I mean here, but it’s something like: your visualization machinery says that it sees no obstacle to success, such that you anticipate either success or getting a very concrete lesson.
The possibility seems open to you, at a glance; and while you may suspect that there’s some hidden reason that the possibility is not truly open, you have an opportunity here to test whether that’s so, and to potentially learn why this promising-looking idea fails.
(Or maybe it will just work. It’s been known to happen, in many a scenario where external signs and portents would have predicted failure.)
I don’t operate with this mindset frequently, but thinking back to some of the highest impact things I’ve done I’m realizing now that I did those things because I had this attitude. So I’m inclined to think it’s good advice.
This principle is especially true for relatively unexplored fields with relatively few people working on them. If theres only like 10 people working on some sub-field of AI, then it’s actually highly likely that all of them are missing something important. This is especially true when you factor in group think and homogeneity: If all 10 of them are mathematicians, then it would be completely unsurprising if they shared flawed assumptions on other fields like computer science or biology.
Everyone being wrong on some core assumption is actually fairly common, if the assumption in question is untested and the set of “everyone” is not that large. This is one of the reasons I am unbothered by having substantially different opinions on AI risk to the majority here.
One question that comes to mind for me is how do you differentiate this situation from unilateral action on bad things? There’s a part of me that wants to uninformative like “oh yeah, just do the good type of unilateral action but not the bad type of unilateral action. Easy!”
But my guess is that reality isn’t quite as unforgiving, and there are probably at least some good heuristics to reduce the risk of bad unilateral actions. But if so, what are they?
One obvious heuristic is to not act on the thing without asking ppl first why they’ve left that gap, which is like, part of OP’s model here—feels like the case in the example OP lists with Devansh, if there was some bad unilateral, OP, with a better map, could have explained why.
Ime most times I’ve had the feeling “why tf isn’t anybody doing X?!” my first response is to run around asking people that question?
This seems like one heuristic of a few that we’d need to make this go safely by default
What you’re saying here resonates with me, but I wonder if there are people who might be more inclined to assume they’re missing something and consequently have a different feeling about what’s going on when they’re in the situation you’re trying to describe. In particular, I’m thinking about people prone to imposter syndrome. I don’t know what their feeling in this situation would be—I’m not prone to imposter syndrome—but I think it might be different.
Thanks for the post!
The link is broken.