Writing about my job: Data Scientist, CDC

A post covering the topic and content of the author’s current career as a data scientist at the Centers for Disease Control and Prevention (CDC).

Context: As per You should write about your job and Writing about your job is (still) great — consider doing it, forum-users have offered information on their careers, which are typically highly relevant to EA goals, alongside their thoughts on the work they’ve done through the career; in a similar fashion, I am providing information on my current job doing data science in epidemiology, as I think sharing my experiences in this regard may benefit the EA community.

Readership: This post might be particularly valuable to the following people (NB: this is by no means an attempt at an exhaustive list): graduate or undergraduates students in STEM fields or with degrees in STEM (particularly Biology, Computer Science, Statistics, or Physics), other data scientists, those who are interested in epidemiology, or those who are interested in or work in biosecurity or pandemic preparedness.

Disclaimer: Some of the information I provide is deliberately vague, so that I can avoid being identified (I highly value online anonymity). Forgive me for this—if there is something more you would like to know, my DMs are open (I usually respond within 48 hours).


Since I am discussing life as a Data Scientist, I cannot help but instantiate a quote I read not too long ago in Writing about my job: Data Scientist by Gavin (I would appreciate it if someone could provide additional information about the quote):

Data Scientist: Person who is worse at statistics than any statistician & worse at software engineering than any software engineer.

—Will Cukierski


Application Process

What was the application process like? How long did it take from application to start?

Historically, I have found Biosecurity and AI Safety to both be highly important cause-areas. Using 80K Hours in early-mid 2022, I found a listing for Data Scientist and other positions at the CDC. Give my experience with statistical modeling, the positive expected value of this position in biosecurity, and my life-needs, I opted to apply to the Data Scientist position. Following the link, I ended up on USAJOBS, where some additional searching produced another related position. The positions were junior (GS-11) and senior (GS-13) data scientist roles at a newish Center within the CDC.

In late May 2022, I sent in my application, which consisted of my CV and some other personal information (there was not a test task at this stage). There was a delay with one of my applications: the application for the senior role was received in mid-July 2022. It was not until early October that I received a notice indicating that I was referred for the junior but not the senior role, which I believe is reasonable, given my personal background (see next section). After this notice, I had to wait until January for further hiring activity: I received an email from someone in the Center who asked when I was next available for an interview.

Over the month of January, I had the interview, which lasted roughly 45 minutes, if I recall correctly, and did not consist of any concrete problem solving, but did contain some descriptive questions[1]. This occurred a week or so before I was given a coding test task (this meant that I passed the interview and had moved into the next stage; I had one week to complete the test task) that was flexible (I could choose which programming language I used) and maybe of medium-LeetCode difficulty overall (the task had sections; these sections were of varying difficult). I was notified that I passed the test task via email and also received more information on the offer. I then accepted the offer.

Timeline:

  • [0 months post-apply] May 2022: Apply

  • [5 months post-apply] October 2022: Referred

  • [8 months post-apply] January 2023: Interview & Test Task

  • [10 month post-apply] March 2023: Start

  • I’ve been working here now for 6 months


Personal Background

What did my life look like before I began this job? How do I believe my past experiences were helpful with getting and performing the job?

The following points roughly constitute my background at the time of applying [A] and interviewing [I]. If I write [A, I] I mean to indicate that I had AND gained experience in the category at the time of applying and interviewing, respectively.

  • Bachelor’s degree in Math & other stem field [A]

  • ~3.5 years of statistical modeling [A, I]

  • ~4.5 years of general coding experience [A, I]

  • One research work (DL in biology) published [A]

  • ~4 years of EA activities (local EA org.) [A]

  • ~1.5 years of forecasting experience + code [A, I]

  • 3 research assistantships (intra- and extra-my-school) during college [A]

  • 3-month part-time paid job at EA-org. [I]

  • 3-month part-time unpaid volunteering at other EA-org.

From my conversations later on with those who interviewed me, the strongest signals of my competence that I believe made me a particularly attractive candidate in their eyes was the publication and the forecasting experience (some of this forecasting experience was in epidemiology, but not through formal channels).

The statistical modelling and coding experience I had along with my performance on the coding test task were lower-level filters for making sure that I would, in expectation, perform the job adequately. I do not think I would have been referred for the position I am now in had there not been a steady background-noise of coding and statistical modeling in my life (college played a large hand in this).

I have not been able to gauge my colleagues perceptions of EA, but I think that my experiences with the EA-orgs. and EA in general did not seem to make much of a difference (partially because the work I performed in those organizations did not involve much mathematical modeling), but if it did, I would expect my experiences to have been slightly beneficial.

I am not sure where EA-optics, on average, stand at the moment, and also do not have an accurate mental-model of the distribution for how those-in-academia-adjacent orgs. perceive EA, so I am not confident how the”EA” frame that encapsulated some of my past work affected my hiring prospects.

Possible helpful lesson: In applying to EA-orgs., really make sure the work of the org. is calibrated with what you roughly expect much of your future time to be spent on — do not just [apply, interview, work] there solely or mostly for the reason that the org. is an EA-org. Content > Topic.


Job Content

What typically occurs on the job at the daily, weekly, and monthly intervals?

Time & Place

I have to be on-call for the same 4 hours each work day. The other 4 hours I can fill in between 6AM and 11PM. Place = Remote.

Meetings & Presentations

If we are to think about Meetings & Presentations by the week, then in the first 6 months of my work, the situation looks roughly like this:

  • Monday: [0-2 months] ~30m [2-4 months] ~1h [4-6 months] ~1h

  • Tuesday: [0-2 months] ~30m [2-4 months] ~1.5h [4-6 months] ~1.5h

  • Wednesday: [0-2 months] ~30m [2-4 months] ~30m [4-6 months] ~45m

  • Thursday: [0-2 months] ~45m [2-4 months] ~45m [4-6 months] ~1.5h

  • Friday: [0-2 months] ~1h [2-4 months] ~2h [4-6 months] ~2h

Estimated weekly average hours spent in meetings and presentations:

  • [0-2 months]: 3.25 hours

  • [2-4 months]: 5.75 hours

  • [4-6 months]: 6.75 hours

There is a set of recurring meetings that I have each week that do not frequently change in nature and are fairly important to the actual content of the job. Then there are presentations on different tools, on standard practices, and on bureaucratic procedures and presentations from external groups that both occur infrequently but are usually much longer (~1.5-2h).

Coding

The majority of my time not in meetings or presentations is spent in VSCode writing Python, Julia, Stan, or R (ordered by time spent writing) code for epidemiological models (infectious disease modeling). Most of my work to date has involved building a novel mechanistic model for influenza based off of a fairly common model architecture (I do not want to be too specific here). Much of the time I spend coding involves support actions, such as learning how a particular package works, debugging, writing tests, documenting the code, etc… Most of the code I am writing does not reinvent the wheel, i.e., for the most part, I am not designing and implementing custom algorithms. The brunt of the coding skills I’ve picked come from being exposed to Python and Julia packages and protocols I had not been previously familiar with.

Learning

My job has been very good with providing opportunities to spend time learning (e.g., writing scratch-code, doing recommended textbook exercises, reading papers, using some new computational tool). Maybe, on average, 10% of my non-meeting-presentation time each week is spent learning something to improve my capabilities as an epidemiological data scientist. Although this has yet to be implemented in my Center, there are likely going to be formal learning-workplans where employees can take part-time online courses. In terms of what I’ve learned, I’d say that I’ve come to better understand the major algorithms for inference (HMC, MCMC, NUTS) from a mathematical standpoint, dynamical systems, downstream coding practices (building production ready tools), and epidemiological processes in general.


Wage & Benefits

I receive GS-11 pay which is around 60k USD (unadjusted) annual in 2023. The adjustment is based on which geographic region you live in; my adjusted pay is around 80k USD. My contract automatically converts my position to GS-12 after 1 year of work. There was a sign on bonus of roughly 8k USD.

There are numerous government health and savings benefits I received from working at the CDC. I will not list them here, but for more information see here.


Hopefully some of you found this post helpful — regardless, have a nice day.

Notes

  1. ^

    e.g., I was asked (paraphrasing) “How familiar are you with Bayesian Inference?”; upon saying that I was familiar, I was asked to describe or define Bayesian Inference. For the (I believe) 2 occasions where I did not have experience with the concept, I answered honestly that I did not have much experience with it; I think this honesty was appreciated by the interviewers (there were 2 interviewers).