I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let’s suppose you’re an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
There’s been some discussions on the EA forum along the lines of “why do we care about value alignment shouldn’t we just hire who can best do the job”. My answer to that is that it’s myopic to only consider what happens whilst they’re working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they’re someone who you want to empower[1].
Admittedly, this isn’t quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if their views were the opposite in many ways. But in the absence of this, value alignment matters.
I’d like to suggest a little bit more clarity here. The phrases you use refer to some knowledge that isn’t explicitly stated here. “in light of a recently announced startup” and “three people at the same org” make sense to someone who already knows the context of what you are writing about, but it is confusing to a reader who doesn’t have the same background knowledge that you do.
Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria.
So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million).
Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers.
Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.
100 percent agree. I dont understand the entire post because I don’t know the context. I don’t think alluding to something helps, better to say it explicitly.
Also, it is worrying if the optimists easily find financial opportunities that depend on them not changing their minds. Even if they are honest and have the best of intentions, the disparity in returns to optimism is epistemically toxic.
I agree that we need to be careful about who we are empowering.
“Value alignment” is one of those terms which has different meanings to different people. For example, the top hit I got on Google for “effective altruism value alignment” was a ConcernedEAs post which may not reflect what you mean by the term. Without knowing exactly what you mean, I’d hazard a guess that some facets of value alignment are pretty relevant to mitigating this kind of risk, and other facets are not so important. Moreover, I think some of the key factors are less cognitive or philosophical than emotional or motivational (e.g., a strong attraction toward money will increase the risk of defecting, a lack of self-awareness increases the risk of motivated reasoning toward goals one has in a sense repressed).
So, I think it would be helpful for orgs to consider what elements of “value alignment” are of particular importance here, as well as what other risk or protective factors might exist outside of value alignment, and focus on those specific things.
If you only hire people who you believe are intellectually committed to short AGI timelines (and who won’t change their minds given exposure to new evidence and analysis) to work in AGI forecasting, how can you do good AGI forecasting?
One of the co-founders of Mechanize, who formerly worked at Epoch AI, says he thinks AGI is 30 to 40 years away. That was in this video from a few weeks ago on Epoch AI’s YouTube channel.
He and one of his co-founders at Mechanize were recently on Dwarkesh Patel’s podcast (note: Dwarkesh Patel is an investor in Mechanize) and I didn’t watch all of it but it seemed like they were both arguing for longer AGI timelines than Dwarkesh believes in.
I also disagree with the shortest AGI timelines and found it refreshing that within the bubble of people who are fixated on near-term AGI, at least a few people expressed a different view.
I think if you restrict who you hire to do AGI forecasting based on strong agreement with a predetermined set of views, such as short AGI timelines and views on AGI alignment and safety, then you will just produce forecasts that re-state the views you already decided were the correct ones while you were hiring.
I wasn’t suggesting only hiring people who believe in short-timelines. I believe that my original post adequately lays out my position, but if any points are ambiguous, feel free to request clarification.
I don’t know how Epoch AI can both “hire people with a diversity of viewpoints in order to counter bias” and ensure that your former employees won’t try to “cash in on the AI boom in an acceleratory way”. These seem like incompatible goals.
I think Epoch has to either:
Accept that people have different views and will have different ideas about what actions are ethical, e.g., they may view creating an AI startup focused on automating labour as helpful to the world and benign
or
Only hire people who believe in short AGI timelines and high AGI risk and, as a result, bias its forecasts towards those conclusions
Presumably there are at least some people who have long timelines, but also believe in high risk and don’t want to speed things up. Or people who are unsure about timelines, but think risk is high whenever it happens. Or people (like me) who think X-risk is low* and timelines very unclear, but even a very low X-risk is very bad. (By very low, I mean like at least 1 in 1000, not 1 in 1x10^17 or something. I agree it is probably bad to use expected value reasoning with probabilities as low as that.)
I think you are pointing at a real tension though. But maybe try to see it a bit from the point of view of people who think X-risk is real enough and raised enough by acceleration that acceleration is bad. It’s hardly going to escape their notice that projects at least somewhat framed as reducing X-risk often end up pushing capabilities forward. They don’t have to be raging dogmatists to worry about this happening again, and it’s reasonable for them to balance this risk against risks of echo chambers when hiring people or funding projects.
*I’m less surely merely catastrophic biorisk from human misuse is low sadly.
Why don’t we ask ChatGPT? (In case you’re wondering, I’ve read every word of this answer and I fully endorse it, though I think there are better analogies that the journalism example ChatGPT used).
Hopefully, this clarifies a possible third option (one that my original answer was pointing at).
I think there is a third option, though it’s messy and imperfect. The third option is to:
Maintain epistemic pluralism at the level of research methods and internal debate, while being selective about value alignment on key downstream behaviors.
In other words:
You hire researchers with a range of views on timelines, takeoff speeds, and economic impacts, so long as they are capable of good-faith engagement and epistemic humility.
But you also have clear social norms, incentives, and possibly contractual commitments around what counts as harmful conflict of interest — e.g., spinning out an acceleratory startup that would directly undermine the mission of your forecasting work.
This requires drawing a distinction between research belief diversity and behavioral alignment on high-stakes actions. That’s tricky! But it’s not obviously incoherent.
The key mechanism that makes this possible (if it is possible) is something like:
“We don’t need everyone to agree on the odds of doom or the value of AGI automation in theory. But we do need shared clarity on what types of action would constitute a betrayal of the mission or a dangerous misuse of privileged information.”
So you can imagine hiring someone who thinks timelines are long and AGI risk is overblown but who is fully on board with the idea that, given the stakes, forecasting institutions should err on the side of caution in their affiliations and activities.
This is analogous to how, say, journalists might disagree about political philosophy but still share norms about not taking bribes from the subjects they cover.
Caveats and Challenges:
Enforceability is hard. Noncompetes are legally dubious in many jurisdictions, and “cash in on the AI boom” is vague enough that edge cases will be messy. But social signaling and community reputation mechanisms can still do a lot of work here.
Self-selection pressure remains. Even if you say you’re open to diverse views, the perception that Epoch is “aligned with x-risk EAs” might still screen out applicants who don’t buy the core premises. So you risk de facto ideological clustering unless you actively fight against that.
Forecasting bias could still creep in via mission alignment filtering. Even if you welcome researchers with divergent beliefs, if the only people willing to comply with your behavioral norms are those who already lean toward the doomier end of the spectrum, your epistemic diversity might still collapse in practice.
Summary:
The third option is:
Hire for epistemic virtue, not belief conformity, while maintaining strict behavioral norms around acceleratory conflict of interest.
It’s not a magic solution — it requires constant maintenance, good hiring processes, and clarity about the boundaries between “intellectual disagreement” and “mission betrayal.” But I think it’s at least plausible as a way to square the circle.”
So, you want to try to lock in AI forecasters to onerous and probably illegal contracts that forbid them from founding an AI startup after leaving the forecasting organization? Who would sign such a contract? This is even worse than only hiring people who are intellectually pre-committed to certain AI forecasts. Because it goes beyond a verbal affirmation of their beliefs to actually attempting to legally force them to comply with the (putative) ethical implications of certain AI forecasts.
If the suggestion is simply promoting “social norms” against starting AI startups, well, that social norm already exists to some extent in this community, as evidenced by the response on the EA Forum. But if the norm is too weak, it won’t prevent the undesired outcome (the creation of an AI startup), and if the norm is too strong, I don’t see how it doesn’t end up selecting forecasters for intellectual conformity. Because non-conformists would not want to go along with such a norm (just like they wouldn’t want to sign a contract telling them what they can and can’t do after they leave the forecasting company).
I guess orgs need to be more careful about who they hire as forecasting/evals researchers in light of a recently announced startup.
Sometimes things happen, but three people at the same org...
This is also a massive burning of the commons. It is valuable for forecasting/evals orgs to be able to hire people with a diversity of viewpoints in order to counter bias. It is valuable for folks to be able to share information freely with folks at such forecasting orgs without having to worry about them going off and doing something like this.
However, this only works if those less worried about AI risks who join such a collaboration don’t use the knowledge they gain to cash in on the AI boom in an acceleratory way. Doing so undermines the very point of such a project, namely, to try to make AI go well. Doing so is incredibly damaging to trust within the community.
Now let’s suppose you’re an x-risk funder considering whether to fund their previous org. This org does really high-quality work, but the argument for them being net-positive is now significantly weaker. This is quite likely to make finding future funding harder for them.
This is less about attacking those three folks and more just noting that we need to strive to avoid situations where things like this happen in the first place. This requires us to be more careful in terms of who gets hired.
There’s been some discussions on the EA forum along the lines of “why do we care about value alignment shouldn’t we just hire who can best do the job”. My answer to that is that it’s myopic to only consider what happens whilst they’re working for you. Hiring someone or offering them an opportunity empowers them, you need to consider whether they’re someone who you want to empower[1].
Admittedly, this isn’t quite the same as value alignment. Suppose someone were diligent, honest, wise and responsible. You might want to empower them even if their views were extremely different from yours. Stronger: even if their views were the opposite in many ways. But in the absence of this, value alignment matters.
I’d like to suggest a little bit more clarity here. The phrases you use refer to some knowledge that isn’t explicitly stated here. “in light of a recently announced startup” and “three people at the same org” make sense to someone who already knows the context of what you are writing about, but it is confusing to a reader who doesn’t have the same background knowledge that you do.
Once upon a time, some people were arguing that AI might kill everyone, and EA resources should address that problem instead of fighting Malaria. So OpenPhil poured millions of dollars into orgs such as EpochAI (they got 9 million). Now 3 people from EpochAI created a startup to provide training data to help AI replace human workers. Some people are worried that this startup increases AI capabilities, and therefore increases the chance that AI will kill everyone.
100 percent agree. I dont understand the entire post because I don’t know the context. I don’t think alluding to something helps, better to say it explicitly.
I tend to agree; better to be explicit especially as the information is public knowledge anyway.
It refers to this: https://forum.effectivealtruism.org/posts/HqKnreqC3EFF9YcEs/
Also, it is worrying if the optimists easily find financial opportunities that depend on them not changing their minds. Even if they are honest and have the best of intentions, the disparity in returns to optimism is epistemically toxic.
I agree that we need to be careful about who we are empowering.
“Value alignment” is one of those terms which has different meanings to different people. For example, the top hit I got on Google for “effective altruism value alignment” was a ConcernedEAs post which may not reflect what you mean by the term. Without knowing exactly what you mean, I’d hazard a guess that some facets of value alignment are pretty relevant to mitigating this kind of risk, and other facets are not so important. Moreover, I think some of the key factors are less cognitive or philosophical than emotional or motivational (e.g., a strong attraction toward money will increase the risk of defecting, a lack of self-awareness increases the risk of motivated reasoning toward goals one has in a sense repressed).
So, I think it would be helpful for orgs to consider what elements of “value alignment” are of particular importance here, as well as what other risk or protective factors might exist outside of value alignment, and focus on those specific things.
Agreed. “Value alignment” is a simplified framing.
Short update—TLDR—mechanise is going straight for automating software engineering.
If you only hire people who you believe are intellectually committed to short AGI timelines (and who won’t change their minds given exposure to new evidence and analysis) to work in AGI forecasting, how can you do good AGI forecasting?
One of the co-founders of Mechanize, who formerly worked at Epoch AI, says he thinks AGI is 30 to 40 years away. That was in this video from a few weeks ago on Epoch AI’s YouTube channel.
He and one of his co-founders at Mechanize were recently on Dwarkesh Patel’s podcast (note: Dwarkesh Patel is an investor in Mechanize) and I didn’t watch all of it but it seemed like they were both arguing for longer AGI timelines than Dwarkesh believes in.
I also disagree with the shortest AGI timelines and found it refreshing that within the bubble of people who are fixated on near-term AGI, at least a few people expressed a different view.
I think if you restrict who you hire to do AGI forecasting based on strong agreement with a predetermined set of views, such as short AGI timelines and views on AGI alignment and safety, then you will just produce forecasts that re-state the views you already decided were the correct ones while you were hiring.
I wasn’t suggesting only hiring people who believe in short-timelines. I believe that my original post adequately lays out my position, but if any points are ambiguous, feel free to request clarification.
I don’t know how Epoch AI can both “hire people with a diversity of viewpoints in order to counter bias” and ensure that your former employees won’t try to “cash in on the AI boom in an acceleratory way”. These seem like incompatible goals.
I think Epoch has to either:
Accept that people have different views and will have different ideas about what actions are ethical, e.g., they may view creating an AI startup focused on automating labour as helpful to the world and benign
or
Only hire people who believe in short AGI timelines and high AGI risk and, as a result, bias its forecasts towards those conclusions
Is there a third option?
Presumably there are at least some people who have long timelines, but also believe in high risk and don’t want to speed things up. Or people who are unsure about timelines, but think risk is high whenever it happens. Or people (like me) who think X-risk is low* and timelines very unclear, but even a very low X-risk is very bad. (By very low, I mean like at least 1 in 1000, not 1 in 1x10^17 or something. I agree it is probably bad to use expected value reasoning with probabilities as low as that.)
I think you are pointing at a real tension though. But maybe try to see it a bit from the point of view of people who think X-risk is real enough and raised enough by acceleration that acceleration is bad. It’s hardly going to escape their notice that projects at least somewhat framed as reducing X-risk often end up pushing capabilities forward. They don’t have to be raging dogmatists to worry about this happening again, and it’s reasonable for them to balance this risk against risks of echo chambers when hiring people or funding projects.
*I’m less surely merely catastrophic biorisk from human misuse is low sadly.
Why don’t we ask ChatGPT? (In case you’re wondering, I’ve read every word of this answer and I fully endorse it, though I think there are better analogies that the journalism example ChatGPT used).
Hopefully, this clarifies a possible third option (one that my original answer was pointing at).
So, you want to try to lock in AI forecasters to onerous and probably illegal contracts that forbid them from founding an AI startup after leaving the forecasting organization? Who would sign such a contract? This is even worse than only hiring people who are intellectually pre-committed to certain AI forecasts. Because it goes beyond a verbal affirmation of their beliefs to actually attempting to legally force them to comply with the (putative) ethical implications of certain AI forecasts.
If the suggestion is simply promoting “social norms” against starting AI startups, well, that social norm already exists to some extent in this community, as evidenced by the response on the EA Forum. But if the norm is too weak, it won’t prevent the undesired outcome (the creation of an AI startup), and if the norm is too strong, I don’t see how it doesn’t end up selecting forecasters for intellectual conformity. Because non-conformists would not want to go along with such a norm (just like they wouldn’t want to sign a contract telling them what they can and can’t do after they leave the forecasting company).
Why not attack them? They defected. They did a really bad thing.