https://www.lesswrong.com/users/brook
President, EA Edinburgh. Moderator, EA Forum. Background in Neuroscience and Medicine.
Reach out to me about alignment or tools for collaborative truth-seeking (like Loom, Guesstimate, Metaculus etc.).
https://www.lesswrong.com/users/brook
President, EA Edinburgh. Moderator, EA Forum. Background in Neuroscience and Medicine.
Reach out to me about alignment or tools for collaborative truth-seeking (like Loom, Guesstimate, Metaculus etc.).
Hi everybody! I’m Victoria, I’m currently based in Edinburgh and I heard about EA through LessWrong. I’ve been involved with the local EA group for almost a year now, and with rationalism for a few years longer than that. I’m only now getting around to being active on the forum here.
I was a medical student, but I’m taking a year out and seriously considering moving into either direct existential risk research/policy or something like operations/‘interpreting’ research. When I’ve had opportunities to do things like that I’ve really enjoyed it. I’ve also previously freelanced with Nonlinear and CEA for research and writing gigs.
Long-term I could see myself getting into AI, possibly something like helping build infrastructure for AI researchers to better communicate, or direct AI work (with my neuroscience degree).
See youse all around!
I agree that publishing results of the form “it turns out that X can be done, though we won’t say how we did it” is clearly better than publishing your full results, but I think it’s much more harmful than publishing nothing in a world where other people are still doing capabilities research.
This is because it seems to me that knowing something is possible is often a first step to understanding how. This is especially true if you have any understanding of where this researcher or organisation were looking before publishing this result.
I also think there are worlds where it’s importantly harmful to too openly critique capabilities research, but I lean towards not thinking we are in this world, and think the tone of this post is a pretty good model for how this should look going forwards. +1!
I think this is one reasonable avenue to explore alignment, but I don’t want everybody doing it.
My impression is that AI researchers exist on a spectrum from only doing empirical work (of the kind you describe) to only doing theoretical work (like Agent Foundations), and most fall in the middle, doing some theory to figure out what kind of experiment to run, and using empirical data to improve their theories (a lot of science looks like this!).
I think all (or even a majority of) AI safety researchers moving to doing empirical work on current AI systems is unwise, for two reasons:
Bigger models have bigger problems.
Lessons learned from current misalignment may be necessary for aligning future models, but will certainly not be sufficient. For instance, GPT-3 will (we assume) never demonstrate deceptive alignment, because its model of the world is not broad enough to do so, but more complex AIs may do.
This is particularly worrying because we may only get one shot at spotting deceptive alignment! Thinking about problems in this class before we have direct access to models that could, even in theory, exhibit these problems seems both mandatory and a key reason alignment seems hard to me.
AI researchers are sub-specialised.
Many current researchers working in non-technical alignment, while they presumably have a decent technical background, are not cutting-edge ML engineers. There’s not a 1:1 skill translation from ‘current alignment researcher’ to ‘GPT-3 alignment researcher’
There is maybe some claim here that you could save money on current alignment researchers and fund a whole bunch of GPT alignment researchers, but I expect the exchange rate is pretty poor, or it’s just not possible in the medium term to find sufficient people with a deep understanding of both ML and alignment.
The first one is the biggy. I can imagine this approach working (perhaps inefficiently) in a world were (1) were false and (2) were true, but I can’t imagine this approach working in any worlds where (1) holds.
Thanks for making this! I also feel like I get a lot of value out of quarterly/yearly reviews, and this looks like a nice prompting tool. If you haven’t seen it already, you might like to look at Pete Slattery’s year-review question list too!
Any feedback you have as we go would be much appreciated! I’ve focussed on broadening use, so I’m hoping a good chunk of the value will be in new ways to use the tools as much as anything else—if you have any ways you think are missing they would also be great!
The tool is here, there’ll also be a post in a few hours but it’s pretty self-explanatory
These look great, thanks for suggesting them! Would you be interested in writing tutorials for some/all of them that I could add to the sequence? If not, I think updating the topic page with links to tutorials you think are good would also be great!
I think something like “only a minority of people [specific researchers, billionaires, etc.] are highly influential, so we should spend a lot of energy influencing them” is a reasonable claim that implies we maybe shouldn’t spend as much energy empowering everyday people. But I haven’t seen any strong evidence either way about how easy it is to (say) convert 1,000 non-billionaires to donate as much as one billionaire.
I do think the above view has some optics problems, and that many people who ‘aren’t highly influential’ obviously could become so if they e.g. changed careers.
As somebody strongly convinced by longtermist arguments, I do find it hard to ‘onboard’ new EAs without somebody asking “do you really think most people will sit and have a protracted philosophical discussion about longtermism?” at some point. I think there are two reasonable approaches to this:
If you start small (suggest donating to the AMF instead of some other charity, and maybe coming to some EA meetings), some people will become more invested and research longtermism on their own who would have otherwise been put off.
It’s useful to have two different pitches for EA for different audiences; discuss longtermism with people who are in philosophy or related fields, and something easier to explain the rest of the time. My impression is this is your pitch in this post?
I’m not currently convinced of either view, but would be interested to hear about other peoples’ experiences.