I do alignment research at the Alignment Research Center. Learn more about me at markxu.com/about
Mark Xu
I think this model is kind of misleading, and that the original astronomical waste argument is still strong. It seems to me that a ton of the work in this model is being done by the assumption of constant risk, even in post-peril worlds. I think this is pretty strange. Here are some brief comments:
If you’re talking about the probability of a universal quantifier, such as “for all humans x, x will die”, then it seems really weird to say that this remains constant, even when the thing you’re quantifying over grows larger.
For instance, it seems clear that if there were only 100 humans, the probability of x-risk would be much higher than if there were 10^6 humans. So it seems like if there are 10^20 humans, it should be harder to cause extinction than 10^10 humans.
Assuming constant risk has the implication that human extinction is guaranteed to happen at some point in the future, which puts sharp bounds on the goodness of existential risk reduction.
It’s not that hard to get exponentially decreasing probability on universal quantifiers if you assume independence in survival amongst some “unit” of humanity. In computing applications, it’s not that hard to drive down the probability of error exponentially in the resources allocated, because each unit of resource can ~halve the probability of error. Naively, each human doesn’t want to die, so there are # humans rolls for surviving/solving x-risk.
It seems like the probability of x-risk ought to be inversely proportional to the current estimated amount of value at stake. This seems to follow if you assume that civilization acts as a “value maximizer” and it’s not that hard to reduce x-risk. Haven’t worked it out, so wouldn’t be surprised if I was making some basic error here.
Generally, it seems like most of the risk is going to come from worlds where the chance of extinction isn’t actually a universal quantifier, and there’s some correlation amongst seemingly independent roles for survival. In particularly bad cases, humans go extinct if there exists someone that wants to destroy the universe, so we actually see an extremely rapid increasing probability of extinction as we get more humans. These worlds would require extremely strong coordination and governance solutions.
These worlds are also slightly physically impossible because parts of humanity will rapidly become causally isolated from each other. I don’t know enough cosmology to have an intuition for which way the functional form will ultimately go.
Generally, it seems like the naive view is that as humans get richer/smarter, they’ll allocate more and more resources towards not dying. At equilibrium, it seems reasonable to first-order-assume we’ll drive existential risk down until the marginal cost equals the marginal benefit, so the key question is how this equilibrium behaves. It seems like my guess is that it will depend heavily on the total amount of value available in the future, determined by physical constraints (and potentially more galaxy-brained considerations).
This view seems to allow you to recover more the more naive astronomical waste perspective.
This makes me feel like the model makes kind of strong assumptions about the amount it will ultimately cost to drive down existential risk. E.g. you seem to imply that rl = 0.0001 is small, but an independent chance that large each century suggests that the probability humanity survives for ~10^10 years is ~0. This feels quite absurd to me.
The sentence: “Note that for the Pessimist, this is a reduction of 200,000%”, but humans routinely reduce the probabilities of failures by more than 200,000% via engineering efforts and produce highly complex and artifacts like computers, airplanes, rockets, satellites, etc. It feels like you should naively expect “breaking” human civilization to be harder than breaking an airplane, especially when civilization is actively trying to ensure that it doesn’t go extinct.
Also, you seem to assume each century has some constant value v eventually, which seems reasonable to me, but the implication “Warming (slightly) on short-termist cause areas” relies on an assumption that the current century is close to value v, when it seems like even pretty naive bounds (e.g. percent of sun’s energy), suggest that the current century is not even within a factor of 10^9 of the long-run value-per-century humanity could reach.
Assuming that value grows quadratically seems also quite weird, because of analysis like eternity in 6 hours, which seems to imply that a resource-maximizing civilization will undergo a period of incredibly rapid expansion to achieve per-century rates of value much higher than the current century, and then have nowhere else to go. A better model from my perspective is logistic growth of value, with the upper bound given by some weak proxy like “suppose that value is linear in the amount of energy a civilization uses, then take the total amount of value in the year 2020”, with the ultimate unit being “value in 2020″. This would produce much higher numbers, and give a more intuitive sense of “astronomical waste.”
I like the process of proposing concrete models for things as a substrate for disagreement, and I appreciate that you wrote this. It feels much better to articulate objections like “I don’t think this particular parameter should be constant in your model” than to have abstract arguments. I also like how it’s now more clear that if you do believe that risk in post-peril worlds is constant, then the argument for longtermism is much weaker (although I think still quite strong because of my comments about v).
- Winners of the EA Criticism and Red Teaming Contest by 1 Oct 2022 1:50 UTC; 226 points) (
- 20 Dec 2022 1:48 UTC; 7 points) 's comment on Should we discount future people in proportion to the probability of them not existing? by (
I expect 10 people donating 10% of their time to be less effective than 1 person using 100% of their time because you don’t get to reap the benefits of learning for the 10% people. Example: if people work for 40 years, then 10 people donating 10% of their time gives you 10 years with 0 experience, 10 with 1 year, 10 with 2 years, and 10 with 3 years; however, if someone is doing EA work full-time, you get 1 year with 0 exp, 1 with 1, 1 with 2, etc. I expect 1 year with 20 years of experience to plausibly be as good/useful as 10 with 3 years of experience. Caveats to the simple model:
labor-years might be more valuable during the present
if you’re volunteering for a thing that is similar to what you spend the other 90% of your time doing, then you still get better at the thing you’re volunteering for
I make a similar argument here.
One key difference is that “continuing school” usually has a specific mental image attached, whereas “drop out of school” is much vaguer, making them difficult to compare between.
Many people in EA depart from me here: they see choices that do not maximize impacts as personal mistakes. Imagine a button that, if you press it, would cause you to always take the impact-maximizing action for the rest of your life, even if it entails great personal sacrifice. Many (most?) longtermist EAs I talk to say they would press this button – and I believe them. That’s not true of me; I’m partially aligned with EA values (since impact is an important consideration for me), but not fully aligned.
I think there are people (e.g. me) that value things besides impact and would also press the button because of golden-rule type reasoning. Many people optimize for impact to the point where it makes them less happy.
A title like “How many lives might have been saved given an earlier COVID-19 vaccine rollout?” would have given me much more information about what the post was about than the current title, which I find very vague.
kindle’s are smaller, have backlights, and the kindle store is a good user experience.
Note: I work for ARC.
I would consider someone a “pretty good fit” (whatever that means) for alignment research if they started out with a relatively technical background, e.g. an undegrad degree in math/cs, but not really having engaged with alignment before and they were able to come up with a decent proposal after:
~10 hours of engaging with the ELK doc.
~10 hours of thinking about the document and resolving confusions they had, which might involve asking some questions to clarify the rules and the setup.
~10 hours of trying to come up with a proposal.
If someone starts from having thought about alignment a bunch, I would consider them a potentially “pretty good researcher” if they were able to come up with a decent proposal in 2-8 hours. I expect many existing (alignment) researchers to be able to come up with proposals in <1 hour.
Note that I’m saying “if (can come up with proposal in N hours), then (might be good alignment researcher)” and not saying the other implication also holds, e.g. it is not the case that “if (might be good alignment researcher), then (can come up with proposal in N hours)”
Can confirm we would be interested in hearing what you came up with.
nit: link on “reasons” was pasted twice. For others it’s https://www.lesswrong.com/posts/PZtsoaoSLpKjjbMqM/the-case-for-aligning-narrowly-superhuman-models
Also hadn’t seen that paper. Thanks!
Ben Pace, Ben Khun, Ben Todd, Ben West, and Ben Garfinkel should all become the same person, to avoid confusion.
Thanks for writing this up. Just ordered a misto, elastic laces, and a waterpik. My own personal list of recommendations is on https://markxu.com/things, but it lacks justifications. Feel free to ask me about any of the items though.
Systematic undervaluing of some fields is not something I considered and slightly undermines my argument.
I still think the main problem would be identifying rising-star historians in advance instead of in retrospect.
Hey Charles! Glad to see that you’re still around.
It seems we can immediately evaluate “earning to give” and the purchasing of labor for EA
I don’t think OpenPhil or the EA Funds are particularly funding constrained, so this seems to suggest that “people who can do useful things with money” is more of a bottleneck than money itself.
It seems easy to construct EA projects that benefit from monies and purchasable talent
I think I disagree about the quality of execution one is likely to get by purchasing talent. I agree that in areas like global health, it’s likely possible to construct scalable projects.
I am pessimistic about applying “standard skills” to projects in the EA space for reasons related to Goodhart’s Law.
It seems implausible that market forces are ineffective
I think my take is “money can coordinate activity around a broad set of things, but EA is bottlenecked by things that are outside this set.”
I also don’t get this section “Talent is very desirable”:
I don’t think this section is very important. It is arguing that paying people less than market rate means they’re effectively “donating their time”. If those people were earning money, they would be donating money instead. In both cases, the amount of donations is roughly constant, assuming some market efficiently. Note that this argument is probably false because the efficiency assumption doesn’t hold in practice.
What is Mark’s model for talent?
I think your guesses are mostly right. Perhaps one analogy is that I think EA is trying to do something similar to “come up with revolutionary insights into fundamental physics”, although that’s not quite right because money can be used to build large measuring instruments, which has no obvious backwards analogue.
However, in either of these cases, it seems that special organizations can find ways to motivate, mentor or cultivate these people, or the environment they grow up in. These organizations can be funded for money.
I agree this is true, but I claim that the current bottleneck by far the organizations/mentors not yet existing. I would much rather someone become a mentor than earn money and try to hire a mentor.
I am confused by EA orgs not meeting basic living thresholds. Could you provide some examples?
The purpose of hiring two people isn’t just to do twice the amount of work. Two people can complement each other, creating a team which is better than the sum of their parts. Even two people with the same job title are never doing exactly the same work, and this matters in determining how much value they’re adding to the firm. I think this works against the point you’re making in this passage. Do you account for this somewhere else in your post, and/or do you think it affects your overall point?
My claim is that having one person with the skill-set of two people is more useful that having both those people. I have some sense that teams are actually rarely better than the sum of their parts, but I have not thought this very much. I don’t account for this and don’t think it weakens my overall point very much.
But if we can’t really even measure talent to begin with, what are we even talking about when we talk about talent? What do you mean when you say “talent”?
I mean something vaguely like “has good judgement” and “if I gave this person a million dollars, I would be quite pleased with what they did with it” and “it would be quite useful for this person to spend time thinking about important things”.
It is difficult to measure this property, which is why hiring talented people is difficult.
I agree I use the word talent a lot and this is unfortunate, but I couldn’t think of a better word to use.
Rather than “earn to give” or “do direct work,” I think it might be “try as hard as you can to become a highly talented person” (maybe by acquiring domain expertise in an important cause area).
“Try and become very talented” is good advice to take from this post. I don’t have a particular method in mind, but becoming the Pareto best in the world at some combination of relevant skills might be a good starting point.
The flip side is that if you value money/monetary donations linearly—or more linearly than other talented people—then you’ve got a comparative advantage in earning to give! The fact that “people don’t value money” means that no one’s taking the exhausting/boring/bad-location jobs that pay really well. If you do, you can earn more than you “should” (in an efficient market) and make an outsize impact.
This is a good point. People able to competently perform work they’re unenthusiastic about should, all else being equal, have an outsized impact because the work they do can more accurately reflect the true value behind the work.
I’m excited about more efficient matching between people who want career advice and people who are not-maximally-qualified to give it, but can still give aid nonetheless. For example, when planning my career, I often find it helpful to talk to other students making similar decisions, even though they’re more “more qualified” than me. I suspect that other students/people feel similarly and one doesn’t need to be a career coach to be helpful.
I will now consider everything that Carl writes henceforth to be in a parenthetical.
This creates weird incentives, e.g. I could construct a plausible-but-false view, make a post about it, then make a big show of changing my mind. I don’t think the amounts of money involved make it worth it, but I’m wary of incentivizing things that are so easily gamed.
yes, thanks!