Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
(I’ve only skimmed the post)
This seems right theoretically, but I’m worried that people will read this and think this consideration ~conclusively implies fewer people should go into AI alignment, when my current best guess is the opposite is true. I agree sometimes people make the argmax vs. softmax mistake and there are status issues, but I still think not enough people proportionally go into AI for various reasons (underestimating risk level, it being hard/intimidating, not liking rationalist/Bay vibes, etc.).
Agree that this could be misused, just as the sensible 80k framework is misused, or as anything can be.
Some skin in the game then: me and Jan both spend most of our time on AI.
Thanks for clarifying. Might be worth making clear in the post (if it isn’t already, I may have missed something).
I’m a bit confused if by ‘fewer people’ / ‘not enough people proportionally’ you mean ‘EAs’. In my view, while too few people (as ‘humans’) work on AI alignment, too large fraction of EAs ‘goes into AI’.
I mean EAs. I’m most confident about “talent-weighted EAs”. But probably also EAs in general.
In particular, I think many of the epistemically best EAs go into stuff like grant making, philosophy, general longtermist research, etc. which leaves a gap of really epistemically good people focusing full-time on AI. And I think the current epistemic situation in the AI alignment field (both technical and governance) is pretty bad in part due to this.
Interestingly, I have the opposite intuition, that entire subareas of EA/longtermism are kinda plodding along and not doing much because our best people keep going into AI alignment. Some of those areas are plausibly even critical for making the AI story go well.
Still, it’s not clear to me whether the allocation is inaccurate, just because alignment is so important.
Technical biosecurity and maybe forecasting might be exceptions though.
A couple of comments.
(1), I found this post quite hard to understand—it was quite jargon-heavy.
(2) I’d have appreciated it if you’d located this in what you take to be the relevant literature. I’m not sure if you’re making an argument about (A) why you might want to diversify resources across various causes, even if certain in some moral view (for instance because there are diminishing marginal returns, so you fund option X up to some point and then switch to Y) or (B) why you might want to diversify because you are morally uncertain.
(3), because of (2), I’m not sure what your objection to ‘argmax’ is. You say ‘naive argmax’ doesn’t work. But isn’t that a reason to do ‘non-naive argmax’ rather than do something else? Cf. debates where people object to consequentialism by claiming it implies you ought to kill people and harvest their organs, and the consequentialist says that’s naive and not actually what consequentialism would recommend.
Fwiw, the standard approaches to moral uncertainty (‘my favourite theory’ and ‘maximise expected choiceworthiness’) provide no justification in themselves for splitting your resources. In contrast, the ‘worldview diversfication’ approach does do this. You say that worldview diversification is ad hoc, but I think it can be justified by a non-standard approach to moral uncertainty, one I call ‘internal bargaining’ and have written about here.
(1) The post attempts to skirt between being completely non-technical, and being very technical. It’s unclear if successfully.
(2) The technical claim is mostly that argmax(actions) is a dumb decision procedure in the real world for boundedly rational agents, if the actions are not very meta.
Softmax is one of the more principled alternative choices (see eg here)
(3) That argmax(actions) is not the optimal thing to do for boundedly rational agents is perhaps best illuminated by information-theoretic bounded rationality.
In my view the technical research useful for developing good theory of moral uncertainty for bounded agents in the real world is currently mostly located in other fields of research (ML, decision theory, AI safety, social choice theory, mechanism design, etc), so I would not expect lack of something in the moral uncertainty literature to be evidence of anything.
E.g., the internal bargaining you link is mostly simply OCB and HG applying bargaining theory to bargaining between moral theories.
We say worldview diversification is less ad hoc than the other things: worldview diversification is mostly Thompson sampling.
(4) You can often “rescue” some functional form if you really want. Love argmax()? Well, do argmax(ways how to choose actions) or something. Really attached to the label of utilitarianism, but in practice want to do something closer to virtues? Well, do utilitarianism but just on the of actions of the type “select your next self” or similar.
A comment and then a question. One problem I’ve encountered in trying to explain ideas like this to a non-technical audience is that actually the standard rationales for ‘why softmax’ are either a) technical or b) not convincing or even condescending about its value as a decision-making approach. Indeed, the ‘Agents as probabilistic programs’ page you linked to introduces softmax as “People do not always choose the normatively rational actions. The softmax agent provides a simple, analytically tractable model of sub-optimal choice.” The ‘Softmax demystified’ page offers relatively technical reasons (smoothing is good, flickering bad) and an unsupported claim (it is good to pick lower utility options some of the time). Implicitly this makes presentations of ideas like this have the flavor of “trust us, you should use this because it works in practice, even it has origins in what we think is irrational or that we can’t justify”. And, to be clear, I say that as someone who’s on your side, trying to think of how to share these ideas with others. I think there is probably a link between what I’ve described above and Michael Plant’s point (3).
So, I’m wonder if ‘we can do better’ in justifying softmax (and similar approaches). What is the most convincing argument you’ve seen?
I feel like the holy grail would be an empirical demonstration that an RL agent develops softmax like properties across a range of realistic environments. And/or a theoretical argument for why this should happen.
One justification might be that in an online setting where you have to learn which options are best from past observations, the naive “follow the leader” approach—exactly maximizing your action based on whatever seems best so far—is easily exploited by an adversary.
This problem resolves itself if you make actions more likely if they’ve performed well, but regularize a little to smooth things out. The most common regularizer is entropy, and then as described on the “Softmax demystified” page, you basically end up recovering softmax (this is the well-known “multiplicative weight updates” algorithm).
Yes, and is there a proof of this that someone has put together? Or at least a more formal justification?
Here’s one set of lecture notes (don’t endorse that they’re necessarily the best, just first I found quickly) https://lucatrevisan.github.io/40391/lecture12.pdf
Keywords to search for other sources would be “multiplicative weight updates”, “follow the leader”, “follow the regularized leader”.
Note that this is for what’s sometimes called the “experts” setting, where you get full feedback on the counterfactual actions you didn’t take. But the same approach basically works with some slight modification for the “bandit” setting, where you only get to see the result of what you actually did.
Do we have reason to believe softmax is a better approximation to “Enlightened argmax” than just directly trying to approximate Enlightened argmax or its outputs?
See also the muti-armed bandit <https://en.wikipedia.org/wiki/Multi-armed_bandit> problem.
Upvoted, though I was struck by this part of the appendix:
While I totally agree with the the conclusion of the post (the community should have a portfolio of causes, and not invest everything in the top cause), I feel very unsure that a lot of these reasons are good ones for spreading out from the most promising cause.
Or if they do imply spreading out, they don’t obviously justify the standard EA alternatives to AI Risk.
I noticed I felt like I was disagreeing with your reasons for not doing argmax throughout the post, and this list helped to explain why.
1. Starting with VOI, that assumes that you can get significant information about how good a cause is by having people work on it. In practice, a ton of uncertainty is about scale and neglectedness, and having people work on the cause doesn’t tell you much about that. Global priorities research usually seems more useful.
VOI would also imply working on causes that might be top, but that we’re very uncertain about. So, for example, that probably wouldn’t imply that that longtermist-interested people should work on global health or factory farming, but rather spread out over lots of weirder small causes, like those listed here: https://80000hours.org/problem-profiles/#less-developed-areas
2. “You don’t know the whole option set” sounds like a similar issue to VOI. It would imply trying to go and explore totally new areas, rather than working on familiar EA priorities.
3. Many approaches to moral uncertainty suggest that you factor in uncertainty in your choice of values, but then you just choose the best option with respect to those values. It doesn’t obviously suggest supporting multiple causes.
4. Concave altruism. Personally I think there are increasing returns on the level of orgs, but I don’t think there are significant increasing returns at the level of cause areas. (And that post is more about exploring the implications of concave altruism rather than making the case it actually applies to EA cause selection.)
5. Optimizer’s curse. This seems like a reason to think your best guess isn’t as good as you think, rather than to support multiple causes.
6. Worldview diversification. This isn’t really an independent reason to spread out – it’s just the name of Open Phil’s approach to spreading out (which they believe for other reasons).
7. Risk aversion. I don’t think we should be risk averse about utility, so agree with your low ranking of it.
8. Strategic skullduggery. This actually seems like one of the clearest reasons to spread out..
9. Decreased variance. I agree with you this is probably not a big factor.
You didn’t add diminishing returns to your list, though I think you’d rank it near the top. I’d also agree it’s a factor, though I also think it’s often oversold. E.g. if there are short-term bottlenecks in AI that create diminishing returns, it’s likely the best response is to invest in career capital and wait for the bottlenecks to disappear, rather than to switch into a totally different cause. You also need big increases in resources to get enough diminishing returns to change the cause ranking e.g. if you think AI safety is 10x as effective as pandemics at the margin, you might need to see the AI safety community roughly 10x in size relative to biosecurity before they’d equalise.
I tried to summarise what I think the good reasons for spreading out are here.
For a longtermist, I think those considerations would suggest a picture like:
50% into the top 1-3 issues
20% into the next couple of issues
20% into exploring a wide range of issues that might be top
10% into other popular issues
If I had to list a single biggest driver, it would be personal fit / idiosyncratic opportunities, which can easily produce orders of magnitude differences in what different people should focus on.
The question of how to factor in neartermism (or other alternatives to AI-focused longtermism) seems harder. It could easily imply still betting everything on AI, though putting some % of resources into neartermism in proportion to your credence in it also seems sensible.
Some more here about how worldview diversification can imply a wide range of allocations depending on how you apply it: https://twitter.com/ben_j_todd/status/1528409711170699264
3. Tarsney suggests one other plausible reason moral uncertainty is relevant: nonunique solutions leaving some choices undetermined. But I’m not clear on this.
Excellent comment, thanks!
Yes, wasn’t trying to endorse all of those (and should have put numbers on their dodginess).
1. Interesting. I disagree for now but would love to see what persuaded you of this. Fully agree that softmax implies long shots.
2. Yes, new causes and also new interventions within causes.
3. Yes, I really should have expanded this, but was lazy / didn’t want to disturb the pleasant brevity. It’s only “moral” uncertainty about how much risk aversion you should have that changes anything. (à la this.)
4. Agree.
5. Agree.
6. I’m using (possibly misusing) WD to mean something more specific like “given cause A, what is best to do?; what about under cause B? what about under discount x?...”
7. Now I’m confused about whether 3=7.
8. Yeah it’s effective in the short run, but I would guess that the loss of integrity hurts us in the long run.
Will edit in your suggestions, thanks again.
Looking back two weeks later, this post really needs
to discuss of the cost of prioritisation (we use softmax because we are boundedly rational) and the Price of Anarchy;
to have separate sections for individual prioritisation and collective prioritisation;
to at least mention bandits and the Gittins index, which is optimal where softmax is highly principled suboptimal cope.
FWIW, I didn’t get the impression there’s a very principled justification for softmax in this post, if that’s what you intended by “highly principled”. That it might work better than naive argmax in practice on some counts isn’t really enough, and there wasn’t really much comparison to enlightened argmax, which is optimal in theory.
I’d probably require being provably (approximately) optimal for a principled justification. Quickly checking bandits and the Gittins index on Wikipedia, bandits are general problems and the Gittins index is just the value of the aggregate reward. I guess you could say “maximize Gittins index” (use the Gittins index policy), but that’s, imo, just a formal characterization of what enlightened argmax should be under certain problem assumptions, and doesn’t provide much useful guidance on its own. Like what procedure should we follow to maximize the Gittins index? Is it just calculate really hard?
Also, according to the Wikipedia page, the Gittins index policy is optimal if the projects are independent, but not necessarily if they aren’t, and the problem is NP-hard in general if they can be dependent.
Not in this post, we just link to this one. By “principled” I just mean “not arbitrary, has a nice short derivation starting with something fundamental (like the entropy)”.
Yeah, the Gittins stuff would be pitched at a similar level of handwaving.
Curious what people think of the argument that, given that people in the EA community have different rankings of the top causes, a close-to-optimal community outcome could be reached if individuals argmax using their own ranking?
(At least assuming that the number of people who rank a certain cause as the top one is proportional to how likely it is to be the top one.)
Animals do this intuitively:
Matching Law
The above makes EA’s huge investment in research seem like a better bet: “do more research” is a sort of exploration. Arguably we don’t do enough active exploration (learning by doing), but we don’t want less research.
Are there any principled probability assignments we could use? E.g., the probability that this would be my top choice after N further hours of investigation into it and alternatives (including realistically collecting data or performing experiments), maybe allowing N to be unrealistic?
From my understanding, softmax is formally sensitive to parametrizations, so the specific outputs seem pretty unprincipled unless you actually have feedback and are doing some kind of optimization like minimizing some kind of softmax loss.
On the other hand, I can see even using the credences like I proposed to be way too upside-focused, i.e. focused on picking the best interventions, but little concern for avoiding the worst (badly net negative in EV) interventions. Consider an intervention that has a 55% chance of being the best and vastly net positive in expectation after further investigation, but a 45% chance of being the worst and vastly net negative in expectation (of similar magnitude), and your current overall belief is that it’s vastly net positive and highest in EV. It’s plausible some high-leverage interventions are sensitive in this way, because they involve tradeoffs for existential risks (tradeoffs between different x-risks, but also within x-risks, like differential progress), or, in the near-term, because of wild animal effects dominating and having uncertain sign. Would we really want to put the largest share of resources, let alone most resources, into such an intervention?
Alternatively, we may have multiple choices, among which three, A, B and C are such that, for some c>0, after further investigation, we expect that:
A is 40% likely to be the best, with EV = c, and 35% likely to be the worst, with EV=-c, and and the rest of the time EV=0.
B is 35% likely to be the best, with EV=c and 30% likely to be the worst, with EV=-c, and the rest of the time EV=0.
C is 5% likely to be the best, with EV=c, and otherwise has EV=0 and has probability 0 of being the worst.
How should we weight our resources between these three (ignoring other options)? Currently, they all have the same overall EV (=5%*c). What if we increase the probability that A is best slightly, without changing anything else? Or, what if we increase the probability that C is best slightly, without changing anything else?
Ord’s undergrad thesis is a tight argument in favour of enlightened argmax: search over decision procedures and motivations and pick the best of those instead of acts or rules.
Interesting thesis! Though, it’s his doctoral thesis, not from one of his bachelor’s degrees, right?
Yep ta, even says so on page 1.
Also maybe of interest, I think the current EA portfolio is actually allocated pretty well in line with what this heuristic would imply:
https://forum.effectivealtruism.org/posts/nws5pai9AB6dCQqxq/how-are-resources-in-ea-allocated-across-issues#What_might_we_learn_from_this_
I think the bigger issue might be that it’s currently demoralising not to work on AI or meta. So I appreciate this post as an exploration of ways to make it more intuitive that everyone shouldn’t work on AI.
Also see Brian Christian briefly suggesting a cause allocation rule a bit like this towards the end of 80k’s interview with him.
We were discussing solutions to the explore-exploit problem, and one is that you allocate resources in proportion to your credence the option is best.
Seems like people agree with you!
Oh full disclosure I guess: I am a well-known shill for argmin.
Booo <3
In practice the thing that the EA community is doing is much closer to quantilization (video explanation) than maximization anyway, and that’s okay. The goal could be an ever-increasing q.
Mostly true, but a string of posts about the risks attests to there being some unbounded optimisers. (Or at least that we are at risk of having some.)