Fwiw, my read is that a lot of “must have an ML PhD” requirements are gatekeeping nonsense. I think you learn useful skills doing a PhD in ML, and I think you learn some skills doing a non-ML PhD (but much less that’s relevant, though physics PhDs are probably notably more relevant than maths). But also that eg academia can be pretty terrible for teaching you skills like ML engineering and software engineering, lots of work in academia is pretty irrelevant in the world of the bitter lesson, and lots of PhDs have terrible mentorship.
I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program
I particularly think that in interpretability, lots of standard ML experience isn’t that helpful, and can actively teach bad research taste and focus on pretty unhelpful problems
(I do think that Redwood should prioritise “hiring people with ML experience” more, fwiw, though I hold this opinion much more strongly around their adversarial training work than their interp work)
I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program
I completely agree.
I’ve worked in ML engineering and research for over 5 years at two companies, I have a PhD (though not in ML), and I’ve interviewed many candidates for ML engineering roles.
If I’m reviewing a resume and I see someone has just graduated from a PhD program (and does not have other job experience), my first thoughts are
This person might have domain experience that would be valuable in the role, but that’s not a given.
This person probably knows how to do lit searches, ingest information from an academic paper at a glance, etc., which are definitely valuable skills.
This person’s coding experience might consist only of low-quality, write-once-read-never rush jobs written to produce data/figures for a paper and discarded once the paper is complete.
More generally, this person might or might not adapt well to a non-academic environment.
I’ve never interviewed a candidate with 4 years at OpenAI on their resume, but if I had, my very first thoughts would involve things like
OpenAI might be the most accomplished AI capabilities lab in the world.
I’m interviewing for an engineering role, and OpenAI is specifically famous for moving capabilities forward through feats of software engineering (by pushing the frontier of huge distributed training runs) as opposed to just having novel ideas.
Anthropic’s success at training huge models, and doing extensive novel research on them, is an indication of what former OpenAI engineers can achieve outside of OpenAI in a short amount of time.
OpenAI is not a huge organization, so I can trust that most people there are contributing a lot, i.e. I can generalize from the above points to this person’s level of ability.
I dunno, I might be overrating OpenAI here?
But I think the comment in the post at least requires some elaboration, beyond just saying “many places have a PhD requirement.” That’s an easy way to filter candidates, but it doesn’t mean people in the field literally think that PhD work is fundamentally superior to (and non-fungible with) all other forms of job experience.
I agree re PhD skillsets (though think that some fraction of people gain a lot of high value skills during a PhD, esp re research taste and agenda settings).
I think you’re way overrating OpenAI though—in particular, Anthropic’s early employees/founders include more than half of the GPT-3 first authors!! I think the company has become much more oriented around massive distributed LLM training runs in the last few years though, so maybe your inference that people would gain those skills is more reasonable now.
Fwiw, my read is that a lot of “must have an ML PhD” requirements are gatekeeping nonsense. I think you learn useful skills doing a PhD in ML, and I think you learn some skills doing a non-ML PhD (but much less that’s relevant, though physics PhDs are probably notably more relevant than maths). But also that eg academia can be pretty terrible for teaching you skills like ML engineering and software engineering, lots of work in academia is pretty irrelevant in the world of the bitter lesson, and lots of PhDs have terrible mentorship.
I care about people having skills, but think that a PhD is only an OK proxy for them, and would broadly respect the skills of someone who worked at one of the top AI labs for four years straight out of undergrad notably more than someone straight out of a PhD program
I particularly think that in interpretability, lots of standard ML experience isn’t that helpful, and can actively teach bad research taste and focus on pretty unhelpful problems
(I do think that Redwood should prioritise “hiring people with ML experience” more, fwiw, though I hold this opinion much more strongly around their adversarial training work than their interp work)
I completely agree.
I’ve worked in ML engineering and research for over 5 years at two companies, I have a PhD (though not in ML), and I’ve interviewed many candidates for ML engineering roles.
If I’m reviewing a resume and I see someone has just graduated from a PhD program (and does not have other job experience), my first thoughts are
This person might have domain experience that would be valuable in the role, but that’s not a given.
This person probably knows how to do lit searches, ingest information from an academic paper at a glance, etc., which are definitely valuable skills.
This person’s coding experience might consist only of low-quality, write-once-read-never rush jobs written to produce data/figures for a paper and discarded once the paper is complete.
More generally, this person might or might not adapt well to a non-academic environment.
I’ve never interviewed a candidate with 4 years at OpenAI on their resume, but if I had, my very first thoughts would involve things like
OpenAI might be the most accomplished AI capabilities lab in the world.
I’m interviewing for an engineering role, and OpenAI is specifically famous for moving capabilities forward through feats of software engineering (by pushing the frontier of huge distributed training runs) as opposed to just having novel ideas.
Anthropic’s success at training huge models, and doing extensive novel research on them, is an indication of what former OpenAI engineers can achieve outside of OpenAI in a short amount of time.
OpenAI is not a huge organization, so I can trust that most people there are contributing a lot, i.e. I can generalize from the above points to this person’s level of ability.
I dunno, I might be overrating OpenAI here?
But I think the comment in the post at least requires some elaboration, beyond just saying “many places have a PhD requirement.” That’s an easy way to filter candidates, but it doesn’t mean people in the field literally think that PhD work is fundamentally superior to (and non-fungible with) all other forms of job experience.
I agree re PhD skillsets (though think that some fraction of people gain a lot of high value skills during a PhD, esp re research taste and agenda settings).
I think you’re way overrating OpenAI though—in particular, Anthropic’s early employees/founders include more than half of the GPT-3 first authors!! I think the company has become much more oriented around massive distributed LLM training runs in the last few years though, so maybe your inference that people would gain those skills is more reasonable now.