I worked at OpenAI for three years, from 2021-2024 on the Alignment team, which eventually became the Superalignment team. I worked on scalable oversight, part of the team developing critiques as a technique for using language models to spot mistakes in other language models. I then worked to refine an idea from Nick Cammarata into a method for using language model to generate explanations for features in language models. I was then promoted to managing a team of 4 people which worked on trying to understand language model features in context, leading to the release of an open source “transformer debugger” tool.
I resigned from OpenAI on February 15, 2024.
William_S
Another way of thinking about this is that in an overdetermined environment it seems like there would be a point at which the impact of EA movement building will be “causing a person to join EA sooner” instead of “adding another person to EA” (which is the current basis for evaluating EA movement building impact), which would be much less valuable.
What sort of feedback signals would we get if EA was currently falling into a meta-trap? What is the current state of those signals?
In response to this article, I followed the advice in 1) and thought about where I’d donate in the animal suffering cause area, ending up donating $20 to New Harvest.
Idea: allow people to sign up to a list. Then, every (week/2 weeks/month) randomly pair up all people on the list and suggest they have a short Skype conversation with the person they are paired with.
80k now has career profiles on doing Software Engineering, Data Science and a Computer Science PhD. I’m in a position where I could plausibly pursue any of these. What is the ratio of effective altruists currently pursuing each of these options, and where do you think adding an additional EA is of most value? (Having information this information on the career profiles might be a nice touch)
Are there any areas of the current software industry that developing expertise in might be useful to MIRI’s research agenda in the future?
I wonder if delaying donations might play a role as a crude comparison of room for more funding between different EA organizations, or for a desire to keep all current EA organizations afloat. A donor who wants to support EA organizations but is uncertain about which provides the most value might chose the heuristic “donate to the EA organization that is farthest from their fundraising target at the end of their fundraiser”. If this is the case, providing better information for comparing EA organizations might help. Or, a “EA Meta-Organization Fund” could be created that the individual donors could fund, and then would fund the individual organizations (according to room for more funding, avoiding organizations collapsing due to lack of funds, or according to an impact evaluation of the individual organizations)
Would it work to run shorter fundraisers? If it’s the case that most donation money is tied up in this dynamic, then running a shorter fundraiser wouldn’t significantly reduce the amount of money raised (of course, that might not be true)
Maybe price in the cost of staff time spent on the fundraiser—that is, if everyone donates immediately, it takes $X to fill the fundraiser. But if everyone donates at the end, it takes $X + $Y, where $Y is the cost of additional staff time spend on the fundraiser.
I wonder if there’s a large amount of impact to be had in people outside of the tail trying to enhance the effectiveness of people in the tail (these might look like being someone’s personal assistant or sidekick, introducing someone in the tail to someone cool outside of the EA movement, being a solid employee for someone who founds an EA startup, etc.)? Being able to improve impact of someone in the tail (even if you can’t quantify what you accomplished) might avert the social comparison aspect, as one would feel like they’d be able to at least take partial credit for the accomplishments of the EA superstars.
One approach to this could be tying your self-esteem into something other than your personal impact. You might try setting your goal to “be an effective altruist” or “be a member of the effective altruist tribe”. There are reasonable and achievable criteria (ie. the GWWC pledge) for this, and performance of people on the tail in no way effects your ability to pass these criteria. And, while trying to improve one’s own impact is a thing that effective altruists do, it’s not necessary to do or to achieve any specific criteria of success to meet the self-esteem criteria. A useful supplement to this attitude is a feeling of excitement about where effective altruism is going, which is a feeling that is actually enhanced by the achievements of the long tail. (“I can’t wait to see what these amazing people are going to accomplish!”)
Maybe the status issues in the “lottery ticket” fields could be partially alleviated by having a formal mechanism of redistributing credit for success according to the ex-ante probabilities—for the malaria vaccine example, you could create something like impact certificates covering the output of all EAs working in the area, and distribute them according to an ex-ante estimate of each researcher’s usefulness, or some other agreed on distribution. In that case, you would end up with a certificate saying you own x% of the discovery of the malaria vaccine, which would be pretty cool to have (and valuable to have, if a the impact certificate market takes off).
If anyone is ever at a point where they are significantly discouraged by thoughts along these lines (as I’ve been at times), there’s an Effective Altruist self-help group where you can find other EAs to talk to about how you’re feeling (and it really does help!). The group is hidden, but if you message me, I can point you in the right direction (or you can find information about it on the sidebar of the Effective Altruist facebook group).
I haven’t heard of anything like this. It’s the sort of thing that might feel less important than identifying/supporting top charities to most EAs. It might also require some expertise both in the area of the charity and in EA, to actually provide value. It’s the sort of thing that might be a good fit for someone with, say, a commitment to an existing organization, but with an interest in EA.
Another application of the Effectiveness-alone strategy might be to create an EA organization aiming to improve the effectiveness of charities by applying EA ideas (as opposed to evaluating charities to find the best ones).
When considering working for a startup/company with significant positive externalities, would it be far off to estimate your share of impact as (estimate of total impact of the company vs. the world where it did not exist) * (equity share of company)?
This seems easier to estimate than your impact on company as a whole, and matches up with something like the impact certificate model (equity share seems like the best estimate we would have of what impact certificate division might look like). It’s also possible that there are distortions in allocation of money that would lead to an underestimate of true impact.
On the downside, it doesn’t fully account for replaceabilty, and I’m not sure if it meshes with the assessment that “negative externalities don’t matter too much in most cases because someone else would take your job” that seems to be the typical EA position.
For people who have worked in the technology sector, what form has the most useful learning come in? (ie. learning from school, learning while working on a problem independently, learning while collaborating with people, learning from reading previous work/existing codebases, etc.)?
It seems like the way to make the most money from working in tech jobs would be to find identifying startups/companies that are likely to do well in the future, work with them, and make money from the equity you get. For example, Dustin Moskovitz suggests that you can get a better return from trying to be employee #100 at the next Facebook or Dropbox than by being an entrepreneur Any thoughts on how to identify startups/companies likely to do well/be valuable to work for, or at least rule out ones likely to fail? (It seems like the problem of doing this from an investor standpoint is well investigated, and hard to do, but the employee standpoint is different).
It seems like the correct approach would be to make predictions on the future performance of a bunch of startups and track the results, in order to calibrate your predictive model, but one would need time to build up a prediction history. Short of this, there might be heuristics that are sort of helpful, ie. I’d guess that startups with more funding or more employees are more likely to succeed due to more people having confidence in them and having survived for some period of time already, but this also indicate that you are likely to get less equity.
What skills/experience do you think will be useful to have in 3-5 years, either in general or for EA plots?
This is probably not answerable until you’ve made some significant progress in your current focus, but it would be nice to get a sense of how well the pool of people available to work on technology for good projects lines up with the skills required for those problems (for example, are there a lot of machine learning experts who are willing to work on these problems, but not many projects where that is the right solution? Is there a shortage of, say, front-end web developers who are willing to work on these kinds of projects?).