Getting started independently in AI Safety
Drop the constraints. It’s great to have a mentor, a lab and lots of compute. But what if all you have is time and motivation?
I think too many people feel held back from doing a project like thing on their own. Getting the prerequisites in maths, probabilities, software is important to be able to work on a project but I think some people are pushing the prerequisites further than they should. Just do a project. Drop the constraints of ensuring your project is interesting or useful to someone else
You could spend 6 months learning more prerequisites like programming or maths and nobody else would be interested to see this work. How many programming students have reimplemented quick-sort without a second thought. If you have 6 months to work on something it might as well be on some alignment research project. If your experiments fail or it turns out to not be interesting to anyone else then the field is no more advanced than if you had spent the time learning even more maths.
It seems like a waste of 6 months to not have some useful output for the field at the end of it. But if you reframe the goal as learning for yourself then this can be very successful. You can learn quite a lot about project management and the methods of research from an otherwise failed project. If you attempt another project you are likely to plan and manage it much better and are more likely to be successful.
I’m a big fan of just-in-time learning, where you come to a problem in your project and then learn the technique or formula in order to get past it. This kind of motivated learning I find far more effective than learning something out of context and just for itself. Similarly I’ve read machine learning papers and thought that I understood them until I later need to use the techniques from the paper on another project. Even going back to the project it turned out that I didn’t understand it at all and had a lot of trouble implementing it.
So, how do you get started in AI Safety research independently?
As many suggest, start by reading papers and posts from the field. My advice differs here in that I think you should read with the intention of getting distracted. Follow what is interesting to you rather than continuing to read the next thing on your list.
Don’t take notes or summarise the papers. You’re not going to be examined on your knowledge of them and you can always access them in full later. Write down questions that you have instead. Write down things that are missing from what you are reading. Write down the things that don’t seem clear, you don’t understand or disagree with.
Look up some of the references or try to find other resources that answer your questions. If this leads you to some other things that are interesting to you then that’s great. Keep going down the rabbit hole (don’t go too far of course).
If there aren’t any answers to your questions or you can’t find resources that explain something clearly you have found an opportunity. Clarifying a small part of someone else’s publication is a great way to learn for yourself and contribute something valuable to the community. There are plenty of great forum posts on “Clarifying X”, and there is room for more.
Some questions you write down may be a one-liner that is never visited again. For others, you might write a few sentences on what you are trying to ask. Others still may take up a page, including some ideas on how you would answer the questions and some experiments you could run.
If you are following what is interesting to you, then you are likely to come across a question that you can’t help but keep thinking about. This is the project you should work on. The only thing likely holding you back is the thought that no one else would be interested in this, or the question isn’t that important. Do it anyway. It’s much harder to find a project that you are interested in than one that everyone else is.
____
Also check out MIRI’s Alignment Research Field Guide and this list of lists to study guides, research agendas and other resources.
____
Love to get peoples feedback on this below or privately via email
jj@aisafetysupport.org
Also, always just keen to talk to new people calendly.com/jj-hepboin
Absolutely. Also, too many people don’t feel held back enough (e.g. maybe it really would have been beneficial to, say, go through Spinning Up in Deep RL before attempting a deep RL project). How do you tell which group you’re in?
(This comment inspired by Reversing Advice)
This is a good point, although I suppose you could still think of this in the framing of “just in time learning”, i.e. you can attempt a deep RL project, realise you are hopelessly out of your depth, then you know you’d better go through Spinning Up in Deep RL before you can continue.
Although the risk is that it may be demoralising to start something which is too far outside of your comfort zone.
Tbc, I do generally like the idea of just in time learning. But:
You may not realize when you are hopelessly out of your depth (“doesn’t everyone say that ML is an art where you just turn knobs until things work?” or “how was I supposed to know that the algorithm was going to silently clip my rewards, making all of my reward shaping useless?”)
You may not know what you don’t know. In the example I gave you probably very well know that you don’t know RL, but you may not realize that you don’t know the right tooling to use (“what, there’s a Tensorboard dashboard I can use to visualize my training curves?”)
Both of these are often avoided by taking courses that (try to) present the details to you in the right order.
I feel like I’m on both sides of this, so I’ll take the fast.ai course and then immediately jump into whatever seems interesting in PyTorch
Yep, always tricky here. I was actually just reading Reversing Advice just before posting this but wasn’t sure how I should manage this.
Advice is like medication. It should come with similar rules, regulations, restrictions and warnings.
Some advice is over the counter and can be used by almost everyone. Advice should be used in moderation, do not take more than the recommended dose. Prescription medicine is illegal to advertise for (in Australia) because it is not useful for everyone and should only be recommended by a health care professional. Some advice does not mix well with other advice and care should be taken when mixing advice. Do not take advice that has been recommended to someone else as it may not apply to you. A particular problem may have several different advice that is helpful for it but each does not work for everyone, so you may need to try a few before you find the one that works for you.
Having said that I think I would default to aiming for the higher thing when you are not sure. If you aim high you may fall short and if you aim low you can still only fall short. So if you’re on the margin, start with a deep RL project. You might quickly find that its hard to do and fall back to doing Spinning Up.
If symptoms persist, please consult your health care professional.
(See response to rory_greig above)
Thanks for this!
I feel both held back and out of my depth in this, so this and the comments have helped my perspective. Thank you for writing this!
I massively agree with the idea of “just do a project”, particularly since it’s a better way of getting practice of the type of research skills (like prioritisation and project management) that you will need to be a successful researcher.
I suppose the challenge may be choosing a topic for your project, but reaching out to others in the community may be one good avenue for harvesting project ideas.
What are your thoughts on re-implementing existing papers? It can be a good way to develop technical skills, and maybe a middle ground between learning pre-requisites and doing your own research project? Or would you say it’s better to just go for your own project?
The best thing to do is the thing that works for YOU.
Yes, reimplementing existing papers is great. Talking to others in the community for ideas is great if you can.
I don’t think there is a right way for everyone. So if you are already making learning a lot through re-implementing or something else then just ignore most of my advice. Also, if my advice isn’t helpful for you then try one of the other ideas.