I’m sorry to see so many orgs take 10+ hours to get you only partway through the process, let alone multiple 40+ hour processes. This is especially glaring compared to the very low number of orgs that rejected you in under 5 hours.
It sounds like many of these orgs would benefit (both you and themselves!) from improving their evaluations to reject people earlier in the process.
My team at Wave’s current technical interview process is under 10 hours over 4 stages (assuming you spend 1 hour on your cover letter and resume); the majority of rejections happen after less than 5 hours. The non-technical interview process is somewhat longer, but I would guess still not more than 15 hours and with the majority of applications being rejected in under 5 hours (the final interview is a full day).
Notably, we do two work samples, a 2hr one (where most applicants are rejected) and a 4-5hr one for the final interview. If I were interviewing for a non-technical role I’d insert a behavioral interview after the first work sample as well. These shorter interviews help us screen out many candidates before we waste a ton of their time. It’s hard for me to imagine needing 8+ hours for a work sample unless the role is extremely complex and requires many different skills.
Wave is trying to do a much easier assessment than EA orgs mostly are; lots of people have thoughts about hire to software engineers, and software engineering is a well established industry with lots of established wisdom about how orgs should be structured. EA jobs often have much less precedent and so we shouldn’t be surprised that we don’t know how to figure out as efficiently whether people are likely to be good fits.
I think the reason the OP had a high fraction of ‘long’ processes had more to do with him being a strong applicant who would get through a lot of the early filters. I don’t think a typical ‘EA org’ hiring round passes ~50% of its applicants to a work test.
This doesn’t detract from your other points re. the length in absolute terms. (The descriptions from OP and others read uncomfortably reminiscent of more senior academic hiring, with lots of people getting burned competing for really attractive jobs). There may be some fundamental trade-offs (the standard argument about ‘*really* important to get the right person, so we want to spent a lot of time assessing plausible candidates to pick the right one, false negatives at intermediate stages cost more than false positives, etc. etc.’), but an easy improvement (mentioned elsewhere) is to communicate as best as one can the likelihood of success (perhaps broken down by stage) so applicants can make a better-informed decision.
This is why I think Wave’s two-work-test approach is useful; even if someone “looks good on paper” and makes it through the early filters, it’s often immediately obvious from even a small work sample that they won’t be at the top of the applicant pool, so there’s no need for the larger sample.
Per Buck’s comment, I think identifying software engineering talent is a pretty different problem than identifying e.g. someone who is already a good fit for Open Phil generalist RA roles.
A large part of Wave’s engineer hiring process was aimed at assessing fit with the team & the mission (at least when I was there), which seems similar to part of the problem of hiring Open Phil RAs.
Nearly all of Open Phil’s RA hiring process is focused on assessing someone’s immediate fit for the kind of work we do (via the remote work tests), not (other types of) fit with the team and mission.
Not super clear on the distinction you’re drawing; I feel like a lot of “team fit” and “mission fit” flows from stuff like how similar the candidate’s epistemology & communication style are to the firm’s.
Seems like those sorts of things would also bear on a candidate’s immediate fit for the kind of work the firm does.
I think there are probably a few things that some EA orgs could improve and I hope to write a post about it soon. In the meantime, it might be useful to explain where some of these high numbers come from:
1. Un-timed work test (e.g. OpenPhil research analyst):
I think most EA orgs underestimate how much time a work test takes. Take for example the conversation notes test of OpenPhil’s application procedure. In the email instructions to the test, you will find the following line: “Historically, we think people have spent 2-8 hours on this assignment. ” But there is no indication of how much time you should/are allowed to spend. And since everyone knows that the process is really competitive, and your results keep on improving if you invest more time, many people invest a lot of time. I spent 16 hours on the task. I asked three other people how much time they had spent, and it was 8 h, 16 h, and 24 h.
2. Research proposals (e.g. FHI research scholar programme, OpenPhil biosecurity early career researcher grant):
Writing a research proposal just takes a lot of time. I spent 30 hours on my proposal for FHI. I know of 4 other people who applied. These are the times they spent on the proposal (full-time): one day, one week, one week, several weeks.
3. Trying to be really well prepared (my own fault, no one forced me to do that):
Knowing that the positions are competitive, I would often spend several hours preparing for (later-stage) interviews. E.g. when applying for the CEA local group specialist role, I spend 4-5 hours reading and thinking about CEA’s strategy in movement building.
4. Travel time:
As stated in the post, I counted travel time at 50%. And Oxford is really far off :-)
---------
So depending on how exactly Wave’s application process looks like, I might potentially have spent more than 10 hours on it as well :-)
Thanks for mentioning the thing about the conversation notes test. It was simply an oversight to not explicitly say “Please don’t spend more than X hours on this work test,” and I’ve now added such a sentence to our latest draft of those work test instructions. We had explicit time limits for our other two tests.
I would advocate for controllably timed work tests whenever possible. Simply saying “please don’t spend more than X hours on this work test” gives the opportunity to cheat by spending more time. Incentives for cheating are strong, because:
The tasks usually have tight time limits, so spending additional time will improve your results.
Applicants know the application process is highly competitive.
Applicants know that EA organisations put a lot of value on work test performance.
If you have enough applicants, some will cheat, and they will get a significant advantage. In rare cases, this may even deter people from applying. There was one position were I was planning to apply but then didn’t because they had a non-controllably timed worktest (I don’t want to cheat, somebody probably will cheat, and I am not super-well qualified for the position anyway so I would really need to shine in the work test → not worth applying). (I admit that this deterrence probably doesn’t happen often)
Great tools online for doing controllably timed work tests exist.
(I realize that it is not always possible to control the time limit, e.g. when the task is too long to be done in one sitting. I have no recommendation for what to do then, other than that I think Jonas Vollmer’s comment in this thread seems reasonable).
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
Huh. I’m really surprised that they find this useful. One of the main ways that Wave employees’ productivity has varied is in how quickly they can accomplish a task at a given level of quality, which varies by an order of magnitude between our best and worst candidates. (Or equivalently, how good of a job they can do in a fixed amount of time.) It seems like not time-boxing the work sample would make it much, much harder to make an apples-to-apples quality comparison between applicants, because slower applicants can spend more time to reach the same level of quality.
It seems easier to increase the efficiency of your work than the quality. All else equal, I’m tentatively more interested in people who can do very high-quality work inefficiently than people doing mediocre work quickly – because I expect that the the former are more likely to eventually do high-quality, highly efficient work.
Some people tend to get very nervous with timed tests and mess up badly; it seems good to give them the opportunity to prove themselves in a less stressful environment.
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
It seems easier to increase the efficiency of your work than the quality.
In software engineering, I’ve found the exact opposite. It’s relatively easy for me to train people to identify and correct flaws in their own code–I point out the problems in code review and try to explain the underlying heuristics/models I’m using, and eventually other people learn the same heuristics/models. On the other hand, I have no idea how to train people to work more quickly.
(Of course there are many reasons why other types of work might be different from software eng!)
I expect that good software engineers are more likely to figure out for themselves how to be more efficient than they are to figure out how to increase their work quality. So it’s not obvious what to infer from “it’s harder for an employer to train people to work faster”—does it just mean that the employer has less need to train the slow, high quality worker?
I hadn’t noticed the discrepancy before between the conversation notes test and their other tests, which generally read something like this:
“This test should require somewhere between X and Y hours of work; please send us your work, even if it’s incomplete, after Y hours.”
Adjusting the notes test seems like a good step, or at least asking applicants how much time they spent, so that there’s a clear tradeoff between speed and thoroughness (maybe it’s the case that a slightly messy four-hour test gets as good a score as a better eight-hour test, and Open Phil would be happy to consider both, or something like that).
It’s much more understandable to me for the grants to have labor-intensive processes, since they can’t fire bad performers later so the effective commitment they’re making is much higher. (A proposal that takes weeks to write is still a questionable format IMO in terms of information density/ease of evaluation, but I don’t know much about grant-making, so this is weakly held.)
I’m sorry to see so many orgs take 10+ hours to get you only partway through the process, let alone multiple 40+ hour processes. This is especially glaring compared to the very low number of orgs that rejected you in under 5 hours.
It sounds like many of these orgs would benefit (both you and themselves!) from improving their evaluations to reject people earlier in the process.
My team at Wave’s current technical interview process is under 10 hours over 4 stages (assuming you spend 1 hour on your cover letter and resume); the majority of rejections happen after less than 5 hours. The non-technical interview process is somewhat longer, but I would guess still not more than 15 hours and with the majority of applications being rejected in under 5 hours (the final interview is a full day).
Notably, we do two work samples, a 2hr one (where most applicants are rejected) and a 4-5hr one for the final interview. If I were interviewing for a non-technical role I’d insert a behavioral interview after the first work sample as well. These shorter interviews help us screen out many candidates before we waste a ton of their time. It’s hard for me to imagine needing 8+ hours for a work sample unless the role is extremely complex and requires many different skills.
Wave is trying to do a much easier assessment than EA orgs mostly are; lots of people have thoughts about hire to software engineers, and software engineering is a well established industry with lots of established wisdom about how orgs should be structured. EA jobs often have much less precedent and so we shouldn’t be surprised that we don’t know how to figure out as efficiently whether people are likely to be good fits.
I think the reason the OP had a high fraction of ‘long’ processes had more to do with him being a strong applicant who would get through a lot of the early filters. I don’t think a typical ‘EA org’ hiring round passes ~50% of its applicants to a work test.
This doesn’t detract from your other points re. the length in absolute terms. (The descriptions from OP and others read uncomfortably reminiscent of more senior academic hiring, with lots of people getting burned competing for really attractive jobs). There may be some fundamental trade-offs (the standard argument about ‘*really* important to get the right person, so we want to spent a lot of time assessing plausible candidates to pick the right one, false negatives at intermediate stages cost more than false positives, etc. etc.’), but an easy improvement (mentioned elsewhere) is to communicate as best as one can the likelihood of success (perhaps broken down by stage) so applicants can make a better-informed decision.
This is why I think Wave’s two-work-test approach is useful; even if someone “looks good on paper” and makes it through the early filters, it’s often immediately obvious from even a small work sample that they won’t be at the top of the applicant pool, so there’s no need for the larger sample.
Per Buck’s comment, I think identifying software engineering talent is a pretty different problem than identifying e.g. someone who is already a good fit for Open Phil generalist RA roles.
A large part of Wave’s engineer hiring process was aimed at assessing fit with the team & the mission (at least when I was there), which seems similar to part of the problem of hiring Open Phil RAs.
Nearly all of Open Phil’s RA hiring process is focused on assessing someone’s immediate fit for the kind of work we do (via the remote work tests), not (other types of) fit with the team and mission.
Not super clear on the distinction you’re drawing; I feel like a lot of “team fit” and “mission fit” flows from stuff like how similar the candidate’s epistemology & communication style are to the firm’s.
Seems like those sorts of things would also bear on a candidate’s immediate fit for the kind of work the firm does.
I think there are probably a few things that some EA orgs could improve and I hope to write a post about it soon. In the meantime, it might be useful to explain where some of these high numbers come from:
1. Un-timed work test (e.g. OpenPhil research analyst):
I think most EA orgs underestimate how much time a work test takes. Take for example the conversation notes test of OpenPhil’s application procedure. In the email instructions to the test, you will find the following line: “Historically, we think people have spent 2-8 hours on this assignment. ” But there is no indication of how much time you should/are allowed to spend. And since everyone knows that the process is really competitive, and your results keep on improving if you invest more time, many people invest a lot of time. I spent 16 hours on the task. I asked three other people how much time they had spent, and it was 8 h, 16 h, and 24 h.
2. Research proposals (e.g. FHI research scholar programme, OpenPhil biosecurity early career researcher grant):
Writing a research proposal just takes a lot of time. I spent 30 hours on my proposal for FHI. I know of 4 other people who applied. These are the times they spent on the proposal (full-time): one day, one week, one week, several weeks.
3. Trying to be really well prepared (my own fault, no one forced me to do that):
Knowing that the positions are competitive, I would often spend several hours preparing for (later-stage) interviews. E.g. when applying for the CEA local group specialist role, I spend 4-5 hours reading and thinking about CEA’s strategy in movement building.
4. Travel time:
As stated in the post, I counted travel time at 50%. And Oxford is really far off :-)
---------
So depending on how exactly Wave’s application process looks like, I might potentially have spent more than 10 hours on it as well :-)
Thanks for mentioning the thing about the conversation notes test. It was simply an oversight to not explicitly say “Please don’t spend more than X hours on this work test,” and I’ve now added such a sentence to our latest draft of those work test instructions. We had explicit time limits for our other two tests.
I would advocate for controllably timed work tests whenever possible. Simply saying “please don’t spend more than X hours on this work test” gives the opportunity to cheat by spending more time. Incentives for cheating are strong, because:
The tasks usually have tight time limits, so spending additional time will improve your results.
Applicants know the application process is highly competitive.
Applicants know that EA organisations put a lot of value on work test performance.
If you have enough applicants, some will cheat, and they will get a significant advantage. In rare cases, this may even deter people from applying. There was one position were I was planning to apply but then didn’t because they had a non-controllably timed worktest (I don’t want to cheat, somebody probably will cheat, and I am not super-well qualified for the position anyway so I would really need to shine in the work test → not worth applying). (I admit that this deterrence probably doesn’t happen often)
Great tools online for doing controllably timed work tests exist.
(I realize that it is not always possible to control the time limit, e.g. when the task is too long to be done in one sitting. I have no recommendation for what to do then, other than that I think Jonas Vollmer’s comment in this thread seems reasonable).
Huh. I’m really surprised that they find this useful. One of the main ways that Wave employees’ productivity has varied is in how quickly they can accomplish a task at a given level of quality, which varies by an order of magnitude between our best and worst candidates. (Or equivalently, how good of a job they can do in a fixed amount of time.) It seems like not time-boxing the work sample would make it much, much harder to make an apples-to-apples quality comparison between applicants, because slower applicants can spend more time to reach the same level of quality.
Two points that speak against this view a bit:
It seems easier to increase the efficiency of your work than the quality. All else equal, I’m tentatively more interested in people who can do very high-quality work inefficiently than people doing mediocre work quickly – because I expect that the the former are more likely to eventually do high-quality, highly efficient work.
Some people tend to get very nervous with timed tests and mess up badly; it seems good to give them the opportunity to prove themselves in a less stressful environment.
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
In software engineering, I’ve found the exact opposite. It’s relatively easy for me to train people to identify and correct flaws in their own code–I point out the problems in code review and try to explain the underlying heuristics/models I’m using, and eventually other people learn the same heuristics/models. On the other hand, I have no idea how to train people to work more quickly.
(Of course there are many reasons why other types of work might be different from software eng!)
I expect that good software engineers are more likely to figure out for themselves how to be more efficient than they are to figure out how to increase their work quality. So it’s not obvious what to infer from “it’s harder for an employer to train people to work faster”—does it just mean that the employer has less need to train the slow, high quality worker?
Good point, agree it depends on the type of work.
I hadn’t noticed the discrepancy before between the conversation notes test and their other tests, which generally read something like this:
“This test should require somewhere between X and Y hours of work; please send us your work, even if it’s incomplete, after Y hours.”
Adjusting the notes test seems like a good step, or at least asking applicants how much time they spent, so that there’s a clear tradeoff between speed and thoroughness (maybe it’s the case that a slightly messy four-hour test gets as good a score as a better eight-hour test, and Open Phil would be happy to consider both, or something like that).
It’s much more understandable to me for the grants to have labor-intensive processes, since they can’t fire bad performers later so the effective commitment they’re making is much higher. (A proposal that takes weeks to write is still a questionable format IMO in terms of information density/ease of evaluation, but I don’t know much about grant-making, so this is weakly held.)