Focused on impact evaluation, economics, and (lately) animal welfare
geoffrey
Quickly throwing in a related dynamic. I suspect animal welfare folks have more free time to post online.
Career advancement in animal welfare is much more generalist than global health & development. This means there’s not as many career goals to ‘grind’ towards, leaving more free time for public engagement. Alternative proteins feel like a space where one can specialize, but that’s all I can think of. I’d love to know of other examples.In contrast, global health & development has many distinct specialities that you have to focus on if you want to grow your career. It’s not uncommon for someone’s career to be built on incredibly narrow topic like, say, the implications of decentralization for regulating groundwater pollution. There are even ‘playbooks’ for breaking into the space, and they rarely align with writing EA Forum posts, or really any public writing.
Thanks, this is exactly what I’m looking for.
Accuracy isn’t too important here. More interested in how people approach this
This advice sounds right to me if you already have the signal in hand and deciding whether to job search.
But if you’re don’t have the signal, then you need to spend time getting it. And then I think the advice hinges on how long it takes to get the signal. Short time-capped projects are great (like OP’s support on 80,000 hours CEO search). But for learning and then demonstrating new skills on your own, it’s not always clear how long you’ll need.
Ooh good idea. I should do more of that.
I do think this can run into Goodhart’s Law problems. For example, in the early 2010s, back when being a self-taught software engineer was much more doable, and it was a strong sign when someone had a GitHub profile with some side projects with a few months of work behind each of them. GitHub profile correlated with a lot of other desirable things. But then everyone got that advice (including me) and that signal got rapidly discounted.
So I guess I’d qualify that with: press really hard on why the signal is impressive and also ask people explicitly if they agree with the signals you heard from others (ex. I heard from people in field that signal X is good / bad, do you agree with that?)
I like this advice a lot but want to add a quick asterisk when transitioning to a new field.
It’s really really hard to tell what an expensive signal is without feedback. If you’re experienced in a field or you hang out with folks who work in a field, then you’ve probably internalized what counts as an “impressive project” to some degree.
In policy land, this cashes out as advice to take a job you don’t want in the organization you do want. Because that’s how you’ll learn what’s valuable and what’s not. Or taking low paid internships and skilled volunteering roles. Or dropping a lot of money to attend a conference for your target field.
It’s also really hard to know the steps to executing the “impressive project” (which is why the signal is so expensive!). With internships and skilled volunteering, you’ll get supervision. And even a light touch can prevent you from investing a ton of time in something that doesn’t matter. Or get reassurance that task X really does take everyone a long time so don’t feel bad about the time sink.
But with grants or independent work, you’ll have to seek out the feedback, brief them on project and hope you’ve given enough context for useful feedback, and also hope you picked someone who knows your area well enough. (I haven’t had success here and I’m not sure how realistic it is for most people.)
Work tests are awesome here since they’re mini-projects. But feedback is often noisy and hard to interpret since there aren’t good incentives for orgs to specialize in concrete feedback. I’ve interpreted this feedback wrong in both directions (first being too optimistic about a generic but lengthy “there were many strong candidates” and then too pessimistic about the terse but personalized rejections encouraging me to still consider research as a career)
The point I’m trying to make is that the idea of “cheap tests, expensive signal” is probably a lot easier for mid-career folks to apply independently. But for people without any experience, the advice depends on whether you can get supervision from an organization. Without that, it may be better to just “get your foot in the door” in any way possible. Maybe a “good enough cool sounding project” helps to demonstrate interest, but it’s tough for people to perform at 1-year of experience level before they have that 1 year of experience.
Strong agree. By no means am I suggesting organizations outsource or cancel more of their non-core work. It’s hard for organizations to define those, non-core work needs a lot of context, and a lot of grunt work is genuinely “real work” that people don’t appreciate.
But from an individual POV, I wanted to make sense of the feeling that extra hours could sometimes be increasing in value even when I was very tired. And I think it’s this dynamic with some tasks or career goals where the last N% is where most of the rewards are. So spending more time once you get there is a big deal.
I believe Claudia Goldin calls these “greedy jobs”.
Decreasing focus over time may not mean decreasing productivity:
Suppose you want to double your productivity by doubling your work hours from 30 to 60 per week. Standard advice will say this is silly, since focus decreases over time. You may still increase your productivity but it will scale slower than your work hours.
But this assumes all assigned work is equally important. In reality, many jobs have peripheral tasks that must be done before your core tasks (or your “real work”). Civil servants have reporting requirements, academic researchers have teaching obligations, and individual contributors everywhere have to attend meetings so managers can coordinate direction.
Suppose the non-core tasks takes 20 hours per week. Then going from a 30-hour to 60-hour workweek isn’t just doubling your core task hours; it’s quadrupling your core task hours from 10 to 40! And that quadrupling of core task hours can outweigh the diminishing focus over time. It can even mean that the last 20 hours are more productive than the first 20 hours.
Now 20 hours of peripheral tasks is admittedly an extreme example. But it may not be that far off for modeling career advancement. Promotions are based partly on stretch assignments (or “performing above your level”) and you won’t get to work on stretch assignments all the time. Managers may split your time between your current job and the job you want to promote into.
Once you get to a certain level of seniority and organizational maturity, then more of your hours become core task hours. So diminishing focus more directly translates into diminishing productivity. But I think the earlier you are in your career, the more exploration you’re doing, and the further you are from your target job, the more likely you’ll want those extra hours.
Any tips for running discussion groups on the WIT sequences? I’m vaguely interested in doing one on the CURVE sequence (which I’ve read deeply) or the CRAFT sequence (which I have only skimmed). However, the technical density seems like a big barrier and perhaps some posts are more key than others
What are selfish lifestyle reasons to work on the WIT team?
Is it fair to say the work WIT does is unusual outside of academia? What are closely related organizations that tackle similar problems?
How does your team define “good enough” for a sequence? What adjustments do you make when you fall behind schedule? Cutting individual posts? Shortening posts? Spending more time?
How much does the direction of a sequence change as you’re writing it? It seems like you have a vision in mind when starting out, but you also mention being surprised by some results.
Can you tell us more about the structure of research meetings? How frequently do individual authors chat with each other and for what reason? In particular, the CURVE sequence feels very intentionally like a celebration of different “EA methodologies”. Most of the posts feel individual before converging on a big cost-effectiveness analysis.
Much of your work is numerical simulation over discrete choices. Have there been attempts to define more “closed-form” analytical equations? What are pros and cons here?
What are the main constraints the WIT team faces?
Definitely wish I read and believed this when I was out of college.
One thing that surprised me once I got my ‘dream job’ was how behind I was on soft skills. I think if I had ‘lowered my bar’ earlier, I would have had more practice in communicating concisely, staying cool in stressful situations, and building work relationships.
Not sure if that path would have been more or less impactful in expectation, but there are definitely benefits to ‘lower ambition’ that I’m only appreciating now
Not knowing anything else about your friend, CEA intro resources + saying you’d be excited to discuss it sometime sounds like the best bet.
Cruxes here include:
How deeply does your friend want to learn about EA? They might only want to engage with it for a week or sporadically. Or they may want to know that longtermism is a thing but not go through any (or much) of the moral calculus
How does their disability manifest? The little bit I know about intellectual disabilities suggests that it’s hard to know in advance how it affects your learning, even for the person who has the disability. Struggling with math and stats is very common so that doesn’t tell me much.
Not knowing either of these makes me suspect you should do the same as usual but mention the community’s not always the best at communicating / makes stuff more complicated than it needs to be
Project-based learning seems to be a underappreciated bottleneck for building career capital in public policy and non-profits. By projects, I mean subjective problems like writing policy briefs, delivering research insights, lobbying for political change, or running community events. These have subtle domain-specific tradeoffs without a clean answer. (See the Project Work section in On-Ramps Into Biosecurity)
Thus the lessons can’t be easily generalized or made legible the way a math problem can be. With projects, even the very first step of identifying a good problem is tough. Without access to a formal network, you can spend weeks on a dead end only realizing your mistakes months or years after the fact.
This constraint seems well-known for professionals in the network, as organizers for research fellowships like SERI Mats describe their program as valuable, highly in-demand, yet constrained in how many people they can train.
I think operations best shows the surprising importance of domain-specific knowledge. The skill set looks similar across fields. So that would imply some exchange-ability between private sector and social sector. But in practice, organizations want you to know their specific mission very well and they’re willing (correctly or incorrectly) to hire a young Research Assistant over, say, someone with 10 years of experience in a Fortune 500 company. That domain knowledge helps you internalize the organization’s trade-offs and prioritize without using too much senior management time.
Emphasizing this supervised project-based learning mechanism of getting domain-specific career capital would clarify a few points.
With school, it would
emphasize that textbook-knowledge is both necessary yet insufficient for contributing to social sector work
show the benefits of STEM electives and liberal arts fields, where the material is easier from a technical standpoint but you work on open-ended problems
illustrate how research-based Master degrees in Europe tend to be better training than purely coursework-based ones in the US (IMHO, true in Economics)
With young professionals, it would
highlight the “Hollywood big break” element of getting a social sector job, where it’s easier to develop your career capital after you get your target job and get feedback on what to work on (and probably not as important before that)
formalize the intuition some people have about “assistant roles in effective organizations” being very valuable even though you’re not developing many hard skills
With discussions on elitism and privilege, it would
give a reason for the two-tier system many social sectors seem to have, where the stable jobs require years of low-paid experience and financially unstable training opportunities require significant sacrifice to even access
perhaps inform some informational interventions like books highlighting the hidden curriculum in executing projects or communicating with stakeholders (Doing Economics: What You Should Have Learned in Grad School But Didn’t or The Unspoken Rules: Secrets to Starting Your Career Off Right)
I always read therapeutic alliance as advice for the patient, where one should try many therapists before finding one that fits. I imagine therapists are already putting a lot of effort on the alliance front
Perhaps an intervention could be an information campaign to tell patients more about this? I feel it’s not well known or to obvious that you can (1) tell your therapist their approach isn’t working and (2) switch around a ton before potentially finding a fit
I haven’t looked much into it though
Love this and excited to see more of it. (3) is the biggest surprise for me and I think I’m more positive on education now.
Interested to hear your thoughts on growth diagnostics if you ever get around to it
P.S. I imagine you’re too busy to respond, but I’d be curious to hear if these findings surprised you / what updates you made as a result
EA organizations often have to make assumptions about how long a policy intervention matters in calculating cost-effectiveness. Typically people assume that passing a policy is equivalent to having it in place for around five years more or moving the start date of the policy forward by around five years.
I am really really surprised 5 years is the typical assumption. My conservative guess would have been ~30 years persistence on average for a “referendum-sized” policy change.
Related, I’m surprised this paper is a big update for some people. I suppose that attests to the power of empirical work, however uncertain, for illuminating the discussion on big picture questions.
How Much Does Performance Differ Between People by Max Daniel and Benjamin Todd goes into this
Also there’s a post on “vetting-constrained” I can’t recall off the top of my head. The gist is that funders are risk-adverse (not in the moral sense, but in the relying on elite signals sense) because Program Officers don’t have enough time / knowledge as they’d like for evaluating grant opportunities. So they rely more on credentials than ideal
I liked this a lot. For context, I work as a RA on an impact evaluation project. I have light interests / familiarity with meta-analysis + machine learning, but I did not know what surrogate indices were going into the paper. Some comments below, roughly in order of importance:
Unclear contribution. I feel there’s 3 contributions here: (1) an application of surrogate method to long-term development RCTs, (2) a graduate-level intro to the surrogate method, and (3) a new M-Lasso method which I mostly ignored. I read the paper mostly for the first 2 contributions, so I was surprised to find out that the novel contribution was actually M-Lasso
Missing relevance for “Very Long-Run” Outcomes. Given the mission of Global Priorities Institute, I was thinking throughout how the surrogate method would work when predicting outcomes on a 100-year horizon or 1000-year horizon. Long-run RCTs will get you around the 10-year mark. But presumably, one could apply this technique to some historical econ studies with (I would assume) shaky foundations.
Intuition and layout is good. I followed a lot of this pretty well despite not knowing the fiddly mechanics of many methods. And I had a good idea on what insight I would gain if I dived into the details in each section. It’s also great that the paper led with a graph diagram and progressed from simple kitchen sink regression before going into the black box ML methods.
Estimator properties could use more clarity.
Unsure what “negative bias” is. I don’t know if the “negative bias” in surrogate index is an empirical result arising from this application, or a theoretical result where the estimator is biased in a negative direction. I’m also unsure if this is attenuation (biasing towards 0) or a honest-to-god negative bias. The paper sometimes mentions attenuation and other times negative bias but as far as I can tell, there’s one surrogacy technique used
Is surrogate index biased and inconsistent? Maybe machine learning sees this differently, but I think of estimators as ideally being unbiased and consistent (i.e. consistent meaning more probability mass around the true value as sample size tends to infinity). I get that the surrogate index has a bias of some kind, but I’m unclear on if there’s also the asymptotic property of consistency. And at some point, a limit is mentioned but not what it’s a limit with respect to (larger sample size within each trial is my guess, but I’m not sure)
How would null effects perform? I might be wrong about this but I think normalization of standard errors wouldn’t work if treatment effects are 0...
Got confused on relation between Prentice criterion and regular unconfoundedness. Maybe this is something I just have to sit down and learn one day, but I initially read Prentice criterion as a standard econometric assumption of exogeneity. But then the theory section mentions Prentice criterion (Assumption 3) as distinct from unconfoundedness (Assumption 1). It is good the assumptions are spelt are since that pointed out a bad assumption I was working with but perhaps this can be clarified.
Analogy to Instrumental Variables / mediators could use a bit more emphasis. The econometric section (lit review?) buries this analogy towards the end. I’m glad it’s mentioned since it clarifies the first-stage vibes I was getting through the theory section, but I feel it’s (1) possibly a good hook to lead the the theory section and (2) something worth discussing a bit more
Could expand Table 1 with summary counts on outcomes per treatment. 9 RCTs sounds tiny, until I remember that these have giant sample sizes, multiple outcomes, and multiple possible surrogates. A summary table of sample size, outcomes, and surrogates used might give a bit more heft to what’s forming the estimates.
Other stuff I really liked
The “selection bias” in long-term RCTs is cool. I like the paragraph discussing how these results are biased by what gets a long-term RCT. Perhaps it’s good emphasizing this as a limitation in the intro or perhaps it’s a good follow-on paper. Another idea is how surrogates would perform in dynamic effects that grow over time. Urban investments, for example, might have no effect until agglomeration kicks in.
The surprising result of surrogates being more precise than actual RCTs outcomes. This was a pretty good hook for me but I could have easily passed over in in the intro. I also think the result here captures the core intuition of bias-variance tradeoff + surrogate assumption in the paper quite strongly.
I’ve read conflicting things about how individual contributor skills (writing the code) and people management skills relate to one another in programming.
Hacker News and the cscareerquestions subreddit give me the impression that they’re very separate, with many complaining about how advancement dries up on a non-management track.
But I’ve also read a few blog posts (which I can’t recall) arguing the most successful tech managers / coders switch between the two, so that they keep their technical skills fresh and know how their work fits in a greater whole.
What’s your take in this? Has it changed since starting your new job?
Hey John, this is very cool to read. I like the focus on what surprised you as a founder (and maybe newcomer?) in the mental health field.
I’m curious to hear more about the implementation details. Could you tell me more about the length, intensity, and duration of a typical treatment program? I saw 6 sessions in a graph which makes me think this is once-a-week program for 1-2 hour sessions over 1-2 months
Less sessions is a reliable way to reduce cost, but my understanding is there’s a U-shaped curve to cost-effectiveness here. 1 session doesn’t have enough benefits but 100 sessions costs too much and doesn’t add more benefit.
Also, are you targeting specific conditions? I see improvement in insomnia but that can arise from a sleep intervention or a general CBT course too