It seems plausible that there are ≥100,000 researchers working on ML/AI in total. That’s a ratio of ~300:1, capabilities researchers:AGI safety researchers.
Barely anyone is going for the throat of solving the core difficulties of scalable alignment. Many of the people who are working on alignment are doing blue-sky theory, pretty disconnected from actual ML models.
One question I’m always left with is: what is the boundary between being an AGI safety researcher and a capabilities researcher?
For instance, My friend is getting his PhD in machine learning, he barely knows about EA or LW, and definitely wouldn’t call himself a safety researcher. However, when I talk to him, it seems like the vast majority of his work deals with figuring out how ML systems act when put in foreign situations wrt the training data.
I can’t claim to really understand what he is doing but it sounds to me a lot like safety research. And it’s not clear to me this is some “blue-sky theory”. A lot of the work he does is high-level maths proofs, but he also does lots of interfacing with ml systems and testing stuff on them. Is it fair to call my friend a capabilities researcher?
I have only dabbled in ML but this sounds like he may just be testing to see how generalizable models are / evaluating whether they are overfitting or underfitting the training data based on their performance on test data(data that hasn’t been seen by the model and was withheld from the training data). This is often done to tweak the model to improve its performance.
I definitely have very little idea what I’m talking about but I guess part of my confusion is inner alignment seems like a capability of ai? Apologies if I’m just confused.
ML systems act when put in foreign situations wrt the training data.
Could you elaborate on this more? My guess is that they could be working on the ML ethics side of things, which is great, but different than the Safety problem.
I don’t remember specifics but he was looking if you could make certain claims on models acting a certain way on data not in the training data based on the shape and characteristics about the training data. I know that’s vague sorry, I’ll try to ask him and get a better summary.
One question I’m always left with is: what is the boundary between being an AGI safety researcher and a capabilities researcher?
For instance, My friend is getting his PhD in machine learning, he barely knows about EA or LW, and definitely wouldn’t call himself a safety researcher. However, when I talk to him, it seems like the vast majority of his work deals with figuring out how ML systems act when put in foreign situations wrt the training data.
I can’t claim to really understand what he is doing but it sounds to me a lot like safety research. And it’s not clear to me this is some “blue-sky theory”. A lot of the work he does is high-level maths proofs, but he also does lots of interfacing with ml systems and testing stuff on them. Is it fair to call my friend a capabilities researcher?
I have only dabbled in ML but this sounds like he may just be testing to see how generalizable models are / evaluating whether they are overfitting or underfitting the training data based on their performance on test data(data that hasn’t been seen by the model and was withheld from the training data). This is often done to tweak the model to improve its performance.
I definitely have very little idea what I’m talking about but I guess part of my confusion is inner alignment seems like a capability of ai? Apologies if I’m just confused.
Could you elaborate on this more? My guess is that they could be working on the ML ethics side of things, which is great, but different than the Safety problem.
I don’t remember specifics but he was looking if you could make certain claims on models acting a certain way on data not in the training data based on the shape and characteristics about the training data. I know that’s vague sorry, I’ll try to ask him and get a better summary.