I think there are probably a few things that some EA orgs could improve and I hope to write a post about it soon. In the meantime, it might be useful to explain where some of these high numbers come from:
1. Un-timed work test (e.g. OpenPhil research analyst):
I think most EA orgs underestimate how much time a work test takes. Take for example the conversation notes test of OpenPhil’s application procedure. In the email instructions to the test, you will find the following line: “Historically, we think people have spent 2-8 hours on this assignment. ” But there is no indication of how much time you should/are allowed to spend. And since everyone knows that the process is really competitive, and your results keep on improving if you invest more time, many people invest a lot of time. I spent 16 hours on the task. I asked three other people how much time they had spent, and it was 8 h, 16 h, and 24 h.
2. Research proposals (e.g. FHI research scholar programme, OpenPhil biosecurity early career researcher grant):
Writing a research proposal just takes a lot of time. I spent 30 hours on my proposal for FHI. I know of 4 other people who applied. These are the times they spent on the proposal (full-time): one day, one week, one week, several weeks.
3. Trying to be really well prepared (my own fault, no one forced me to do that):
Knowing that the positions are competitive, I would often spend several hours preparing for (later-stage) interviews. E.g. when applying for the CEA local group specialist role, I spend 4-5 hours reading and thinking about CEA’s strategy in movement building.
4. Travel time:
As stated in the post, I counted travel time at 50%. And Oxford is really far off :-)
---------
So depending on how exactly Wave’s application process looks like, I might potentially have spent more than 10 hours on it as well :-)
Thanks for mentioning the thing about the conversation notes test. It was simply an oversight to not explicitly say “Please don’t spend more than X hours on this work test,” and I’ve now added such a sentence to our latest draft of those work test instructions. We had explicit time limits for our other two tests.
I would advocate for controllably timed work tests whenever possible. Simply saying “please don’t spend more than X hours on this work test” gives the opportunity to cheat by spending more time. Incentives for cheating are strong, because:
The tasks usually have tight time limits, so spending additional time will improve your results.
Applicants know the application process is highly competitive.
Applicants know that EA organisations put a lot of value on work test performance.
If you have enough applicants, some will cheat, and they will get a significant advantage. In rare cases, this may even deter people from applying. There was one position were I was planning to apply but then didn’t because they had a non-controllably timed worktest (I don’t want to cheat, somebody probably will cheat, and I am not super-well qualified for the position anyway so I would really need to shine in the work test → not worth applying). (I admit that this deterrence probably doesn’t happen often)
Great tools online for doing controllably timed work tests exist.
(I realize that it is not always possible to control the time limit, e.g. when the task is too long to be done in one sitting. I have no recommendation for what to do then, other than that I think Jonas Vollmer’s comment in this thread seems reasonable).
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
Huh. I’m really surprised that they find this useful. One of the main ways that Wave employees’ productivity has varied is in how quickly they can accomplish a task at a given level of quality, which varies by an order of magnitude between our best and worst candidates. (Or equivalently, how good of a job they can do in a fixed amount of time.) It seems like not time-boxing the work sample would make it much, much harder to make an apples-to-apples quality comparison between applicants, because slower applicants can spend more time to reach the same level of quality.
It seems easier to increase the efficiency of your work than the quality. All else equal, I’m tentatively more interested in people who can do very high-quality work inefficiently than people doing mediocre work quickly – because I expect that the the former are more likely to eventually do high-quality, highly efficient work.
Some people tend to get very nervous with timed tests and mess up badly; it seems good to give them the opportunity to prove themselves in a less stressful environment.
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
It seems easier to increase the efficiency of your work than the quality.
In software engineering, I’ve found the exact opposite. It’s relatively easy for me to train people to identify and correct flaws in their own code–I point out the problems in code review and try to explain the underlying heuristics/models I’m using, and eventually other people learn the same heuristics/models. On the other hand, I have no idea how to train people to work more quickly.
(Of course there are many reasons why other types of work might be different from software eng!)
I expect that good software engineers are more likely to figure out for themselves how to be more efficient than they are to figure out how to increase their work quality. So it’s not obvious what to infer from “it’s harder for an employer to train people to work faster”—does it just mean that the employer has less need to train the slow, high quality worker?
I hadn’t noticed the discrepancy before between the conversation notes test and their other tests, which generally read something like this:
“This test should require somewhere between X and Y hours of work; please send us your work, even if it’s incomplete, after Y hours.”
Adjusting the notes test seems like a good step, or at least asking applicants how much time they spent, so that there’s a clear tradeoff between speed and thoroughness (maybe it’s the case that a slightly messy four-hour test gets as good a score as a better eight-hour test, and Open Phil would be happy to consider both, or something like that).
It’s much more understandable to me for the grants to have labor-intensive processes, since they can’t fire bad performers later so the effective commitment they’re making is much higher. (A proposal that takes weeks to write is still a questionable format IMO in terms of information density/ease of evaluation, but I don’t know much about grant-making, so this is weakly held.)
I think there are probably a few things that some EA orgs could improve and I hope to write a post about it soon. In the meantime, it might be useful to explain where some of these high numbers come from:
1. Un-timed work test (e.g. OpenPhil research analyst):
I think most EA orgs underestimate how much time a work test takes. Take for example the conversation notes test of OpenPhil’s application procedure. In the email instructions to the test, you will find the following line: “Historically, we think people have spent 2-8 hours on this assignment. ” But there is no indication of how much time you should/are allowed to spend. And since everyone knows that the process is really competitive, and your results keep on improving if you invest more time, many people invest a lot of time. I spent 16 hours on the task. I asked three other people how much time they had spent, and it was 8 h, 16 h, and 24 h.
2. Research proposals (e.g. FHI research scholar programme, OpenPhil biosecurity early career researcher grant):
Writing a research proposal just takes a lot of time. I spent 30 hours on my proposal for FHI. I know of 4 other people who applied. These are the times they spent on the proposal (full-time): one day, one week, one week, several weeks.
3. Trying to be really well prepared (my own fault, no one forced me to do that):
Knowing that the positions are competitive, I would often spend several hours preparing for (later-stage) interviews. E.g. when applying for the CEA local group specialist role, I spend 4-5 hours reading and thinking about CEA’s strategy in movement building.
4. Travel time:
As stated in the post, I counted travel time at 50%. And Oxford is really far off :-)
---------
So depending on how exactly Wave’s application process looks like, I might potentially have spent more than 10 hours on it as well :-)
Thanks for mentioning the thing about the conversation notes test. It was simply an oversight to not explicitly say “Please don’t spend more than X hours on this work test,” and I’ve now added such a sentence to our latest draft of those work test instructions. We had explicit time limits for our other two tests.
I would advocate for controllably timed work tests whenever possible. Simply saying “please don’t spend more than X hours on this work test” gives the opportunity to cheat by spending more time. Incentives for cheating are strong, because:
The tasks usually have tight time limits, so spending additional time will improve your results.
Applicants know the application process is highly competitive.
Applicants know that EA organisations put a lot of value on work test performance.
If you have enough applicants, some will cheat, and they will get a significant advantage. In rare cases, this may even deter people from applying. There was one position were I was planning to apply but then didn’t because they had a non-controllably timed worktest (I don’t want to cheat, somebody probably will cheat, and I am not super-well qualified for the position anyway so I would really need to shine in the work test → not worth applying). (I admit that this deterrence probably doesn’t happen often)
Great tools online for doing controllably timed work tests exist.
(I realize that it is not always possible to control the time limit, e.g. when the task is too long to be done in one sitting. I have no recommendation for what to do then, other than that I think Jonas Vollmer’s comment in this thread seems reasonable).
Huh. I’m really surprised that they find this useful. One of the main ways that Wave employees’ productivity has varied is in how quickly they can accomplish a task at a given level of quality, which varies by an order of magnitude between our best and worst candidates. (Or equivalently, how good of a job they can do in a fixed amount of time.) It seems like not time-boxing the work sample would make it much, much harder to make an apples-to-apples quality comparison between applicants, because slower applicants can spend more time to reach the same level of quality.
Two points that speak against this view a bit:
It seems easier to increase the efficiency of your work than the quality. All else equal, I’m tentatively more interested in people who can do very high-quality work inefficiently than people doing mediocre work quickly – because I expect that the the former are more likely to eventually do high-quality, highly efficient work.
Some people tend to get very nervous with timed tests and mess up badly; it seems good to give them the opportunity to prove themselves in a less stressful environment.
My current view is to ask for both timed and untimed tests, and make the untimed tests very simple/short (such that you could complete it in 20 minutes if you had to and there’s very little benefit to spending >2h on it).
In software engineering, I’ve found the exact opposite. It’s relatively easy for me to train people to identify and correct flaws in their own code–I point out the problems in code review and try to explain the underlying heuristics/models I’m using, and eventually other people learn the same heuristics/models. On the other hand, I have no idea how to train people to work more quickly.
(Of course there are many reasons why other types of work might be different from software eng!)
I expect that good software engineers are more likely to figure out for themselves how to be more efficient than they are to figure out how to increase their work quality. So it’s not obvious what to infer from “it’s harder for an employer to train people to work faster”—does it just mean that the employer has less need to train the slow, high quality worker?
Good point, agree it depends on the type of work.
I hadn’t noticed the discrepancy before between the conversation notes test and their other tests, which generally read something like this:
“This test should require somewhere between X and Y hours of work; please send us your work, even if it’s incomplete, after Y hours.”
Adjusting the notes test seems like a good step, or at least asking applicants how much time they spent, so that there’s a clear tradeoff between speed and thoroughness (maybe it’s the case that a slightly messy four-hour test gets as good a score as a better eight-hour test, and Open Phil would be happy to consider both, or something like that).
It’s much more understandable to me for the grants to have labor-intensive processes, since they can’t fire bad performers later so the effective commitment they’re making is much higher. (A proposal that takes weeks to write is still a questionable format IMO in terms of information density/ease of evaluation, but I don’t know much about grant-making, so this is weakly held.)