EDIT: I’ve now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.
I think that a bunch of people are overindexing on Yudkowsky’s views; I’ve nevertheless downvoted this post because it seems like it’s making claims that are significantly too strong, based on a methodology that I strongly disendorse. I’d much prefer a version of this post which, rather than essentially saying “pay less attention to Yudkowsky”, is more nuanced about how to update based on his previous contributions; I’ve tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky’s track record.)
The part of this post which seems most wild to me is the leap from “mixed track record” to
In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.
For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.
Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky’s views than towards the views of almost anyone else. I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.
The part of this post which seems most wild to me is the leap from “mixed track record” to
In particular, I think, they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk.
For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.
I disagree that the sentence is false for the interpretation I have in mind.
I think it’s really important to seperate out the question “Is Yudkowsky an unusually innovative thinker?” and the question “Is Yudkowsky someone whose credences you should give an unusual amount of weight to?”
I read your comment as arguing for the former, which I don’t disagree with. But that doesn’t mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).
I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.
But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it’s important to know whether the risk of everyone dying soon is 5% or 99%. It’s not enough just to determine whether we should take AI risk seriously.
We’re also now past the point, as a community, where “Should AI risk be taken seriously?” is that much of a live question. The main epistemic question that matters is what probability we assign to it—and I think this post is relevant to that.
(More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements.)
I definitely recommend people read the post Paul just wrote! I think it’s overall more useful than this one.
But I don’t think there’s an either-or here. People—particularly non-experts in a domain—do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.
The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.
I discuss this in response to another comment, here, but I’m not convinced of that point.
I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional “downweight this person”. I don’t think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky’s views if they’re doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it’s hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).
By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin for more details. Basically, as soon as we move beyond toy models of deference, the “innovative thinking” part becomes crucially important, and the “well-calibrated” part becomes much less so.
One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you’ve done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who… And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.
when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.
This seems like an overly research-centric position.
When your job is to come up with novel relevant stuff in a domain, then I agree that it’s mostly about “which ideas and arguments to take seriously” rather than specific credences.
When your job is to make decisions right now, the specific credences matter. Some examples:
Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
What should AI-focused community builders provide as starting resources?
Should there be an organization dedicated to solving Eliezer’s health problems? What should its budget be?
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?
I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we’re talking about.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident. This is a huge hit in terms of Bayes points; if that’s how you determine deference, and you believe he’s 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved—this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).
Then you might say: well, okay, we’re not just making binary decisions, we’re making complex decisions where we’re choosing between lots of different options. But the more complex the decisions you’re making, the less you should care about whether somebody’s credences on a few key claims are accurate, and the more you should care about whether they’re identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.
As a simple example, as soon as you’re estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can’t just update towards experts’ credences, you have to actually update towards experts’ reasons for those credences, which then puts you in the regime of caring more about whether you’ve identified the right types of considerations.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident. This is a huge hit in terms of Bayes points; if that’s how you determine deference, and you believe he’s 2x off, then plausibly you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved—this should very rarely move you from a yes to no, or vice versa.
Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.
I haven’t thought much about nuclear policy, so I can’t respond there. But at least in alignment, I expect that pushing on variables where there’s less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.
(By contrast, upweighting or downweighting Eliezer’s opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn’t make much difference is deferring to a version of Eliezer who’s 90% confident about something, versus deferring to the same extent to a version of Eliezer who’s 45% confident in the same thing.)
My more general point, which doesn’t hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident.
I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I’m not sure why you’re only considering probabilities on specific claims; when I think of “deferring” I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.
(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don’t think that matters much for my point.)
Taking my examples:
should funders reallocate nearly all biosecurity money to AI?
Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that’s a discount factor of < 2x on x-risk-targeted biosecurity work. So that’s almost 4 OOMs of difference.
What should AI-focused community builders provide as starting resources?
Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I’d imagine that Eliezer would be at < 1% on “this will help the person become an effective alignment researcher” whereas I’d be at > 50% (for actual probabilities I’d want a better operationalization), leading to a >50x difference in cost effectiveness.
(And if you compare against the set of readings Eliezer would choose, I’d imagine the difference becomes even greater—I could imagine we’d each think the other’s choice would be net negative.)
Should there be an organization dedicated to solving Eliezer’s health problems? What should its budget be?
I don’t have a citation but I’m guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can’t make a dent of more than 0.01 percentage points, suggesting that “improve Eliezer’s health + project management skills” is 3 OOM more important than “all other alignment work” (saying nothing about tractability, which I don’t know enough to evaluate). Whereas I’d have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?
This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by many OOMs on “regular EAs trying to solve technical AI alignment” but roughly agree on the value of “culture of secrecy”.
I realize I haven’t engaged with the abstract points you made. I think I mostly just don’t understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.
We both agree that you shouldn’t defer to Eliezer’s literal credences, because we both think he’s systematically overconfident. The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn’t overconfident.
I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn’t make much sense.
For instance:
should funders reallocate nearly all biosecurity money to AI?
It doesn’t make sense to defer to Eliezer’s estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.
Should there be an organization dedicated to solving Eliezer’s health problems? What should its budget be?
I’m guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can’t make a dent of more than 0.01 percentage points, suggesting that “improve Eliezer’s health + project management skills” is 3 OOM more important than “all other alignment work” (saying nothing about tractability, which I don’t know enough to evaluate). Whereas I’d have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.
Again, the problem is that you’re deferring on a question-by-question basis, without considering the correlations between different questions—in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don’t think he’s ever made a claim that big.)
Here’s an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that’s 90% likely and I think that’s 10% likely. Then if our choices are “defer entirely to Eliezer” or “defer entirely to Richard”, there’s a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between “defer to Eliezer no more than a median AI safety researcher” and something like “assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight”. If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org’s efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!
Won’t go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit “deference model” that you and Ben are using doesn’t actually work (even for very simple examples like the ones you gave).
It doesn’t make sense to defer to Eliezer’s estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.
There’s lots of things you can do under Eliezer’s worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn’t expect those sorts of things to happen.
I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that’s 90% likely and I think that’s 10% likely.
This seems like a crazy way to do cost-effectiveness analyses.
Like, if I were comparing deworming to GiveDirectly, would I be saying “well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there’s only a 1.4x difference”? Something has clearly gone wrong here.
It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there’s a 10% chance of that, so there’s only a 9x gap? And then once you do all of your adjustments it’s only 2x? Why do we even bother with cause prioritization under this worldview?
I don’t have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.
(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to “all other alignment work”.)
The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn’t overconfident.
I don’t see why you are not including “c) give significant deference weight to his actual worldview”, which is what I’d be inclined to do if I didn’t have significant AI expertise myself and so was trying to defer.
(Aside: note that Ben said “they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk”, which is slightly different from your rephrasing, but that’s a nitpick)
Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined?
¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% − 99.99%) numbers are ones I’ve heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities “excluding miracles” or something else that makes these not the right numbers to use.
I’m not trying to be “charitable” to Eliezer, I’m trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like “obviously Eliezer meant this more normal, less crazy thing” they seem to be wrong.
Rob thinking that it’s not actually 99.99% is in fact an update for me.
(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to “all other alignment work”.)
IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren’t very many good worldviews going around—hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he’s totally wrong.)
Again, the difference is in large part determined by whether you think you’re in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer’s worldview and the best ways to generate utility according to other worldviews become much smaller.
This seems like a crazy way to do cost-effectiveness analyses.
Like, if I were comparing deworming to GiveDirectly, would I be saying “well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there’s only a 1.4x difference”? Something has clearly gone wrong here.
Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I’m acting like that worldview’s favored interventions are in a comparable EV ballpark to all the other worldviews’ favored interventions. That’s a feature not a bug.
It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there’s a 10% chance of that, so there’s only a 9x gap? And then once you do all of your adjustments it’s only 2x? Why do we even bother with cause prioritization under this worldview?
I don’t have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.
An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it’d run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews’ favored interventions, changing the weights on different worldviews doesn’t typically lead to many OOM changes in how you’re acting like you’re assigning EVs.
Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can’t do that, because the whole point of deference is you don’t fully understand their views.
There’s lots of things you can do under Eliezer’s worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn’t expect those sorts of things to happen.
What do you mean “he doesn’t expect this sort of thing to happen”? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer’s worldview thinks are our best shot, as long as they don’t cause much harm according to other worldviews.
I don’t see why you are not including “c) give significant deference weight to his actual worldview”, which is what I’d be inclined to do if I didn’t have significant AI expertise myself and so was trying to defer.
Because neither Ben nor myself was advocating for this.
Okay, my new understanding of your view is that you’re suggesting that (if one is going to defer) one should:
Identify a panel of people to defer to
Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].
I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don’t particularly make sense to think about.
However, I still disagree with the original claim I was disagreeing with:
when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large.
Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of “credences”, and the sort of thing that Ben’s post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that’s a reallocation of 20% of your resources, which is pretty large!
Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn’t have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.
Upon further reflection I think I’d make two changes to your rephrasing.
First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommendedpolicies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don’t want to give many resources to Kurzweil’s policies, because Kurzweil might have no idea which policies make any difference.
So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there’s a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you’ll probably still recommend working on nanotech (or nanotech safety) either way.
Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between “good” and “lucky”. But fundamentally we should think of these as approximations to policy evaluation, at least if you’re assuming that we mostly can’t fully evaluate whether their reasons for holding their views are sound.
Second change: what about the case where we don’t get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.
Some complications:
I say “domains” not “decisions” because you don’t want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other’s actions).
More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.
Lastly, two meta-level notes:
I feel like I’ve probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
It’s very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he’s probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...
In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil’s beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is “moral parliament” style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn’t end up influencing your decisions at all.
That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I’d do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)
But notably, the quality of past recommended policies is often not very sensitive to credences!
I think you’re thinking way too much about credences-in-particular. The relevant notion is not “credences”, it’s that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben’s post would be “I think people assign too high a weight to Eliezer”, rather than anything about credences. I don’t think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.
I do agree that Ben’s post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people’s credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don’t agree with), and it seems different from what you’ve been arguing so far (except possibly in the parent comment).
Second change:
This change seems fine. Personally I’m pretty happy with a rough heuristic of “here’s how I should be splitting my resources across worldviews” and then going off of intuitive “how much does this worldview care about this decision” + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.
In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil’s beliefs, but that hypothetical-Kurzweil is completely indifferent over policies.
Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he’ll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don’t care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.
(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person’s worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don’t know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)
I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I’d do it for Eliezer in any sane way.
I think I’m happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn’t: e.g. when I say that credences matter less than coherence of worldviews, that’s because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like “total risk level” aren’t very important, that’s because in principle we should be aggregating policies not risk estimates between worldviews.
I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like “do the standard things while remembering what’s a proxy for what”.
Meta: This comment (and some previous ones) get a bunch into “what should deference look like”, which is interesting, but I’ll note that most of this seems unrelated to my original claim, which was just “deference* seems important for people making decisions now, even if it isn’t very important in practice for researchers”, in contradiction to a sentence on your top-level comment. Do you now agree with that claim?
*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this “credences” because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.
Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he’ll throw his full weight behind that policy.
Agreed, but I’m not too worried about that. It seems like you’ll necessarily have some edge cases like this; I’d want to see an argument that the edge cases would be common before I switch to something else.
The chain of approximations could look something like:
The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I’m assuming for now that you’re not in the business of coming up with new ideas of things to do.)
Second approximation: Actually it’s still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we’ll instead do the ones that the experts say is highest impact. Since the experts disagree, we’ll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
Third approximation: Actually expected impact of an expert’s portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that’s easier to assess.
It seems like right now we’re disagreeing about proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with “expected future impact” in general and (2) how easy they are to evaluate. It seems to me like you’re thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose “expected future impact”. Anyway, individual proxies and my thoughts on them:
Beliefs / credences: 5⁄10 on easy to evaluate (e.g. Ben could write this post). 3⁄10 on correlation with expected future impact. Doesn’t take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
Coherence: 3⁄10 on easy to evaluate (seems hard to do this without being an expert in the field). 2⁄10 on correlation with expected future impact (it’s not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
Hypothetical impact of past policies: 1⁄10 on easy to evaluate (though it depends on the domain). 7⁄10 on correlation with expected future impact (it’s not 9⁄10 or 10⁄10 because selection bias seems very hard to account for).
As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.
Which I claim is an accurate description of what I was doing, and what Ben wasn’t
I get the sense that you think I’m trying to defend “this is a good post and has no problems whatsoever”? (If so, that’s not what I said.)
Summarizing my main claims about this deference model that you might disagree with:
In practice, an expert’s beliefs / credences will be relevant information into deciding what weight to assign them,
Ben’s post provides relevant information about Eliezer’s beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).
Meta: I’m currently writing up a post with a fully-fleshed-out account of deference. If you’d like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I’ve described the position I’m defending in more detail.
I’ll note that most of this seems unrelated to my original claim, which was just “deference* seems important for people making decisions now, even if it isn’t very important in practice for researchers”, in contradiction to a sentence on your top-level comment. Do you now agree with that claim?
I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the “specific credences” of the people you’re deferring to. You were arguing above that the difference between your and Eliezer’s views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don’t?
individual proxies and my thoughts on them
This is helpful, thanks. I of course agree that we should consider both correlations with impact and ease of evaluation; I’m talking so much about the former because not noticing this seems like the default mistake that people make when thinking about epistemic modesty. Relatedly, I think my biggest points of disagreement with your list are:
1. I think calibrated credences are badly-correlated with expected future impact, because: a) Overconfidence is just so common, and top experts are often really miscalibrated even when they have really good models of their field b) The people who are best at having impact have goals other than sounding calibrated—e.g. convincing people to work with them, fighting social pressure towards conformity, etc. By contrast, the people who are best at being calibrated are likely the ones who are always stating their all-things-considered views, and who therefore may have very poor object-level models. This is particularly worrying when we’re trying to infer credences from tone—e.g. it’s hard to distinguish the hypotheses “Eliezer’s inside views are less calibrated than other peoples” and “Eliezer always speaks based on his inside-view credences, whereas other people usually speak based on their all-things-considered credences”. c) I think that “directionally correct beliefs” are much better-correlated, and not that much harder to evaluate, and so credences are especially unhelpful by comparison to those (like, 2⁄10 before conditioning on directional correctness, and 1⁄10 after, whereas directional correctness is like 3⁄10).
2. I think coherence is very well-correlated with expected future impact (like, 5⁄10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don’t think it’s that hard to evaluate in hindsight, because the more coherent a view is, the more easily it’s falsified by history.
3. I think “hypothetical impact of past policies” is not that hard to evaluate. E.g. in Eliezer’s case the main impact is “people do a bunch of technical alignment work much earlier”, which I think we both agree is robustly good.
You were arguing above that the difference between your and Eliezer’s views makes much more than a 2x difference;
I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you’re suggesting (which I didn’t know you were suggesting at the time).
do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don’t?
No, I don’t agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want “get more resources” so I’m assuming you’re already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn’t change the amount of resources each worldview gets to spend.
Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It’s a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason.
Maybe you think that there are lots of things one could do that have way more effect than “redirecting 10% of one’s resources” and so it’s not a big deal? If so can you give examples?
I think calibrated credences are badly-correlated with expected future impact
I agree overconfidence is common and you shouldn’t literally calculate a Brier score to figure out who to defer to.
I agree that directionally-correct beliefs are better correlated than calibrated credences.
When I say “evaluate beliefs” I mean “look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated” and not “calculate a Brier score”; I think this post is obviously closer to the former than the latter.
I agree that people’s other goals make it harder to evaluate what their “true beliefs” are, and that’s one of the reasons I say it’s only 3⁄10 correlation.
I think coherence is very well-correlated with expected future impact (like, 5⁄10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don’t think it’s that hard to evaluate in hindsight, because the more coherent a view is, the more easily it’s falsified by history.
Re: correlation, I was implicitly also asking the question “how much does this vary across experts”. Across the general population, maybe coherence is 7⁄10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2⁄10, because most experts seem pretty coherent (within the domains they’re thinking about and trying to influence) and so the differences in impact depend on other factors.
Re: evaluation, it seems way more common to me that there are multiple strong, coherent, conflicting views that all seem compelling (see epistemic learned helplessness), which do not seem to have been easily falsified by history (in sufficiently obvious manner that everyone agrees which one is false).
This too is in large part because we’re looking at experts in particular. I think we’re good at selecting for “enough coherence” before we consider someone an expert (if anything I think we do it too much in the “public intellectual” space), and so evaluating coherence well enough to find differences between experts ends up being pretty hard.
I think “hypothetical impact of past policies” is not that hard to evaluate. E.g. in Eliezer’s case the main impact is “people do a bunch of technical alignment work much earlier”, which I think we both agree is robustly good.
I feel like looking at any EA org’s report on estimation of their own impact makes it seem like “impact of past policies” is really difficult to evaluate?
Eliezer seems like a particularly easy case, where I agree his impact is probably net positive from getting people to do alignment work earlier, but even so I think there’s a bunch of questions that I’m uncertain about:
How bad is it that some people completely dismiss AI risk because they encountered Eliezer and found it off putting? (I’ve explicitly heard something along the lines of “that crazy stuff from Yudkowsky” from multiple ML researchers.)
How many people would be working on alignment without Eliezer’s work? (Not obviously hugely fewer, Superintelligence plausibly still gets published, Stuart Russell plausibly still goes around giving talks about value alignment and its importance.)
To what extent did Eliezer’s forceful rhetoric (as opposed to analytic argument) lead people to focus on the wrong problems?
I’ve now written up a more complete theory of deference here. I don’t expect that it directly resolves these disagreements, but hopefully it’s clearer than this thread.
Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change.
Note that this wouldn’t actually make a big change for AI alignment, since we don’t know how to use more funding. It’d make a big change if we were talking about allocating people, but my general heuristic is that I’m most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.)
Across the general population, maybe coherence is 7⁄10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2⁄10, because most experts seem pretty coherent (within the domains they’re thinking about and trying to influence) and so the differences in impact depend on other factors.
Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I’m reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.)
I feel like looking at any EA org’s report on estimation of their own impact makes it seem like “impact of past policies” is really difficult to evaluate?
The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don’t disagree too much on this question—I think epistemic evaluations are gonna be bigger either way, and I’m mostly just advocating for the “think-of-them-as-a-proxy” thing, which you might be doing but very few others are.
Note that this wouldn’t actually make a big change for AI alignment, since we don’t know how to use more funding.
Funding isn’t the only resource:
You’d change how you introduce people to alignment (since I’d guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you’d flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%).
(More broadly I think there’s a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?)
You’d change what you used political power for. I don’t particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I’m more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader.
Experts are coherent within the bounds of conventional study.
What do you mean “he doesn’t expect this sort of thing to happen”?
I mean that he predicts that these costly actions will not be taken despite seeming good to him.
Because neither Ben nor myself was advocating for this.
I think it’s also important to consider Ben’s audience. If I were Ben I’d be imagining my main audience to be people who give significant deference weight to Eliezer’s actual worldview. If you’re going to write a top-level comment arguing against Ben’s post it seems pretty important to engage with the kind of deference he’s imagining (or argue that no one actually does that kind of deference, or that it’s not worth writing to that audience, etc).
(Of course, I could be wrong about who Ben imagines his audience to be.)
Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn’t answer the survey.
I should have clarified that I think(or at least I thought so, prior to your question; kind of confused now) Yudkowsky’s answer is probably one of those two MIRI responses. Sorry about that.
I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don’t actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.
Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don’t know who wrote the response you’re talking about, and I’m very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they’re from MIRI).
Musing out loud: I don’t know of any complete model of deference which doesn’t run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.
If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy—i.e. a set of decisions that’s inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.
Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer’s worldview doesn’t end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).
Yeah, I’m gonna ballpark guess he’s around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom “without miracles”, which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I’m not sure that’s a mental move he does at all, or would ever report on if he did).
It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?
On the positive side, I’d be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*
*What do I mean by this? Idk, here’s a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).
[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky’s views, I too think there are a bunch of people who defer to him too much, I too think he is often overconfident, wrong about various things, etc.]
[ETA: OK, I guess I think Bostrom probably was actually slightly better than Yudkowsky even on 20-year timespan.]
[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don’t adopt his credences. E.g. think “we’re probably doomed” but not “99% chance of doom” Also, Yudkowsky doesn’t seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]
Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn’t see it & instead saw the unposted draft, but while I was looking for it I saw Richard’s post… I guess it must have been some sort of issue with having multiple tabs open. I’ll delete the other version.)
I’ve nevertheless downvoted this post because it seems like it’s making claims that are significantly too strong, based on a methodology that I strongly disendorse.
I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it.
Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind.
I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).
Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit.
The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can’t try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place.
Of course that’s not the status quo. I just really don’t want EA to move in that direction.
“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.
EDIT: I’ve now written up my own account of how we should do epistemic deference in general, which fleshes out more clearly a bunch of the intuitions I outline in this comment thread.
I think that a bunch of people are overindexing on Yudkowsky’s views; I’ve nevertheless downvoted this post because it seems like it’s making claims that are significantly too strong, based on a methodology that I strongly disendorse. I’d much prefer a version of this post which, rather than essentially saying “pay less attention to Yudkowsky”, is more nuanced about how to update based on his previous contributions; I’ve tried to do that in this comment, for example. (More generally, rather than reading this post, I recommend people read this one by Paul Christiano, which outlines specific agreements and disagreements. Note that the list of agreements there, which I expect that many other alignment researchers also buy into, serves as a significant testament to Yudkowsky’s track record.)
The part of this post which seems most wild to me is the leap from “mixed track record” to
For any reasonable interpretation of this sentence, it’s transparently false. Yudkowsky has proven to be one of the best few thinkers in the world on a very difficult topic. Insofar as there are others who you couldn’t write a similar “mixed track record” post about, it’s almost entirely because they don’t have a track record of making any big claims, in large part because they weren’t able to generate the relevant early insights themselves. Breaking ground in novel domains is very, very different from forecasting the weather or events next year; a mixed track record is the price of entry.
Based on his track record, I would endorse people deferring more towards the general direction of Yudkowsky’s views than towards the views of almost anyone else. I also think that there’s a good case to be made that Yudkowsky tends to be overconfident, and this should be taken into account when deferring; but when it comes to making big-picture forecasts, the main value of deference is in helping us decide which ideas and arguments to take seriously, rather than the specific credences we should place on them, since the space of ideas is so large. The EA community has ended up strongly moving in Yudkowsky’s direction over the last decade, and that seems like much more compelling evidence than anything listed in this post.
I disagree that the sentence is false for the interpretation I have in mind.
I think it’s really important to seperate out the question “Is Yudkowsky an unusually innovative thinker?” and the question “Is Yudkowsky someone whose credences you should give an unusual amount of weight to?”
I read your comment as arguing for the former, which I don’t disagree with. But that doesn’t mean that people should currently weigh his risk estimates more highly than they weigh the estimates of other researchers currently in the space (like you).
But we do also need to try to have well-calibrated credences, of course. For the reason given in the post, it’s important to know whether the risk of everyone dying soon is 5% or 99%. It’s not enough just to determine whether we should take AI risk seriously.
We’re also now past the point, as a community, where “Should AI risk be taken seriously?” is that much of a live question. The main epistemic question that matters is what probability we assign to it—and I think this post is relevant to that.
I definitely recommend people read the post Paul just wrote! I think it’s overall more useful than this one.
But I don’t think there’s an either-or here. People—particularly non-experts in a domain—do and should form their views through a mixture of engaging with arguments and deferring to others. So both arguments and track records should be discussed.
I discuss this in response to another comment, here, but I’m not convinced of that point.
I phrased my reply strongly (e.g. telling people to read the other post instead of this one) because deference epistemology is intrinsically closely linked to status interactions, and you need to be pretty careful in order to make this kind of post not end up being, in effect, a one-dimensional “downweight this person”. I don’t think this post was anywhere near careful enough to avoid that effect. That seems particularly bad because I think most EAs should significantly upweight Yudkowsky’s views if they’re doing any kind of reasonable, careful deference, because most EAs significantly underweight how heavy-tailed the production of innovative ideas actually is (e.g. because of hindsight bias, it’s hard to realise how much worse than Eliezer we would have been at inventing the arguments for AI risk, and how many dumb things we would have said in his position).
By contrast, I think your post is implicitly using a model where we have a few existing, well-identified questions, and the most important thing is to just get to the best credences on those questions, and we should do so partly by just updating in the direction of experts. But I think this model of deference is rarely relevant; see my reply to Rohin for more details. Basically, as soon as we move beyond toy models of deference, the “innovative thinking” part becomes crucially important, and the “well-calibrated” part becomes much less so.
One last intuition: different people have different relationships between their personal credences and their all-things-considered credences. Inferring track records in the way you’ve done here will, in addition to favoring people who are quieter and say fewer useful things, also favor people who speak primarily based on their all-things-considered credences rather than their personal credences. But that leads to a vicious cycle where people are deferring to people who are deferring to people who… And then the people who actually do innovative thinking in public end up getting downweighted to oblivion via cherrypicked examples.
Modesty epistemology delenda est.
This seems like an overly research-centric position.
When your job is to come up with novel relevant stuff in a domain, then I agree that it’s mostly about “which ideas and arguments to take seriously” rather than specific credences.
When your job is to make decisions right now, the specific credences matter. Some examples:
Any cause prioritization decision, e.g. should funders reallocate nearly all biosecurity money to AI?
What should AI-focused community builders provide as starting resources?
Should there be an organization dedicated to solving Eliezer’s health problems? What should its budget be?
Should people try to solve technical AI alignment or try to, idk, create a culture of secrecy within AGI labs?
I think that there are very few decisions which are both a) that low-dimensional and b) actually sensitive to the relevant range of credences that we’re talking about.
Like, suppose you think that Eliezer’s credences on his biggest claims are literally 2x higher than they should be, even for claims where he’s 90% confident. This is a huge hit in terms of Bayes points; if that’s how you determine deference, and you believe he’s 2x off, then plausibly that implies you should defer to him less than you do to the median EA. But when it comes to grantmaking, for example, a cost-effectiveness factor of 2x is negligible given the other uncertainties involved—this should very rarely move you from a yes to no, or vice versa. (edit: I should restrict the scope here to grantmaking in complex, high-uncertainty domains like AI alignment).
Then you might say: well, okay, we’re not just making binary decisions, we’re making complex decisions where we’re choosing between lots of different options. But the more complex the decisions you’re making, the less you should care about whether somebody’s credences on a few key claims are accurate, and the more you should care about whether they’re identifying the right types of considerations, even if you want to apply a big discount factor to the specific credences involved.
As a simple example, as soon as you’re estimating more than one variable, you typically start caring a lot about whether the errors on your estimates are correlated or uncorrelated. But there are so many different possibilities for ways and reasons that they might be correlated that you can’t just update towards experts’ credences, you have to actually update towards experts’ reasons for those credences, which then puts you in the regime of caring more about whether you’ve identified the right types of considerations.
Such differences are crucial for many of the most important grant areas IME, because they are areas where you are trading off multiple high-stakes concerns. E.g. in nuclear policy all the strategies on offer have arguments that they might lead to nuclear war or worse war. On AI alignment there are multiple such tradeoffs and people embracing strategies to push the same variable in opposite directions with high stakes on both sides.
I haven’t thought much about nuclear policy, so I can’t respond there. But at least in alignment, I expect that pushing on variables where there’s less than a 2x difference between the expected positive and negative effects of changing that variable is not a good use of time for altruistically-motivated people.
(By contrast, upweighting or downweighting Eliezer’s opinions by a factor of 2 could lead to significant shifts in expected value, especially for people who are highly deferential. The specific thing I think doesn’t make much difference is deferring to a version of Eliezer who’s 90% confident about something, versus deferring to the same extent to a version of Eliezer who’s 45% confident in the same thing.)
My more general point, which doesn’t hinge on the specific 2x claim, is that naive conversions between metrics of calibration and deferential weightings are a bad idea, and that a good way to avoid naive conversions is to care a lot more about innovative thinking than calibration when deferring.
I think differences between Eliezer + my views often make way more than a 2x difference to the bottom line. I’m not sure why you’re only considering probabilities on specific claims; when I think of “deferring” I also imagine deferring on estimates of usefulness of various actions, which can much more easily have OOMs of difference.
(Fwiw I also think Eliezer is way more than 2x too high for probabilities on many claims, though I don’t think that matters much for my point.)
Taking my examples:
Since Eliezer thinks something like 99.99% chance of doom from AI, that reduces cost effectiveness of all x-risk-targeted biosecurity work by a factor of 10,000x (since only in 1 in 10,000 worlds does the reduced bio x-risk matter at all), whereas if you have < 50% of doom from AI (as I do) then that’s a discount factor of < 2x on x-risk-targeted biosecurity work. So that’s almost 4 OOMs of difference.
Eliezer seems very confident that a lot of existing alignment work is useless. So if you imagine taking a representative set of such papers as starting resources, I’d imagine that Eliezer would be at < 1% on “this will help the person become an effective alignment researcher” whereas I’d be at > 50% (for actual probabilities I’d want a better operationalization), leading to a >50x difference in cost effectiveness.
(And if you compare against the set of readings Eliezer would choose, I’d imagine the difference becomes even greater—I could imagine we’d each think the other’s choice would be net negative.)
I don’t have a citation but I’m guessing that Eliezer thinks that with more energy and project management skills he could make a significant dent in x-risk (perhaps 10 percentage points), while thinking the rest of the alignment field if fully funded can’t make a dent of more than 0.01 percentage points, suggesting that “improve Eliezer’s health + project management skills” is 3 OOM more important than “all other alignment work” (saying nothing about tractability, which I don’t know enough to evaluate). Whereas I’d have it at, idk, 1-2 OOM less important, for a difference of 4-5 OOMs.
This one is harder to make up numbers for but intuitively it seems like there should again be many OOMs of difference, primarily because we differ by many OOMs on “regular EAs trying to solve technical AI alignment” but roughly agree on the value of “culture of secrecy”.
I realize I haven’t engaged with the abstract points you made. I think I mostly just don’t understand them and currently they feel like they have to be wrong given the obvious OOMs of difference in all of the examples I gave. If you still disagree it would be great if you could explain how your abstract points play out in some of my concrete examples.
We both agree that you shouldn’t defer to Eliezer’s literal credences, because we both think he’s systematically overconfident. The debate is between two responses to that:
a) Give him less deference weight than the cautious, sober, AI safety people who make few novel claims but are better-calibrated (which is what Ben advocates).
b) Try to adjust for his overconfidence and then give significant deference weight to a version of his worldview that isn’t overconfident.
I say you should do the latter, because you should be deferring to coherent worldviews (which are rare) rather than deferring on a question-by-question basis. This becomes more and more true the more complex the decisions you have to make. Even for your (pretty simple) examples, the type of deference you seem to be advocating doesn’t make much sense.
For instance:
It doesn’t make sense to defer to Eliezer’s estimate of the relative importance of AI without also accounting for his estimate of the relative tractability of funding AI, which I infer he thinks is very low.
Again, the problem is that you’re deferring on a question-by-question basis, without considering the correlations between different questions—in this case, the likelihood that Eliezer is right, and the value of his work. (Also, the numbers seem pretty wild; maybe a bit uncharitable to ascribe to Eliezer the view that his research would be 3 OOM more valuable than the rest of the field combined? His tone is strong but I don’t think he’s ever made a claim that big.)
Here’s an alternative calculation which takes into account that correlation. I claim that the value of this organization is mainly determined by the likelihood that Eliezer is correct about a few key claims which underlie his research agenda. Suppose he thinks that’s 90% likely and I think that’s 10% likely. Then if our choices are “defer entirely to Eliezer” or “defer entirely to Richard”, there’s a 9x difference in funding efficacy. In practice, though, the actual disagreement here is between “defer to Eliezer no more than a median AI safety researcher” and something like “assume Eliezer is, say, 2x overconfident and then give calibrated-Eliezer, say, 30%ish of your deference weight”. If we assume for the sake of simplicity that every other AI safety researcher has my worldview, then the practical difference here is something like a 2x difference in this org’s efficacy (0.1 vs 0.3*0.9*0.5+0.7*0.1). Which is pretty low!
Won’t go through the other examples but hopefully that conveys the idea. The basic problem here, I think, is that the implicit “deference model” that you and Ben are using doesn’t actually work (even for very simple examples like the ones you gave).
There’s lots of things you can do under Eliezer’s worldview that add dignity points, like paying relevant people millions of dollars to spend a week really engaging with the arguments, or trying to get whole-brain emulation before AGI. My understanding is that he doesn’t expect those sorts of things to happen.
This seems like a crazy way to do cost-effectiveness analyses.
Like, if I were comparing deworming to GiveDirectly, would I be saying “well, the value of deworming is mainly determined by the likelihood that the pro-deworming people are right, which I estimate is 70% but you estimate is 50%, so there’s only a 1.4x difference”? Something has clearly gone wrong here.
It also feels like this reasoning implies that no EA action can be > 10x more valuable than any other action that an EA critic thinks is good? Since you assign a 90% chance that the EA is right, and the critic thinks there’s a 10% chance of that, so there’s only a 9x gap? And then once you do all of your adjustments it’s only 2x? Why do we even bother with cause prioritization under this worldview?
I don’t have a fleshed out theory of how and when to defer, but I feel pretty confident that even our intuitive pretheoretic deference should not be this sort of thing, and should be the sort of thing that can have orders of magnitude of difference between actions.
(One major thing is that I think you should be comparing between two actions, rather than evaluating an action by itself, which is why I compared to “all other alignment work”.)
I don’t see why you are not including “c) give significant deference weight to his actual worldview”, which is what I’d be inclined to do if I didn’t have significant AI expertise myself and so was trying to defer.
(Aside: note that Ben said “they shouldn’t defer to him more than they would defer to anyone else who seems smart and has spent a reasonable amount of time thinking about AI risk”, which is slightly different from your rephrasing, but that’s a nitpick)
¯\_(ツ)_/¯ Both the 10% and 0.01% (= 100% − 99.99%) numbers are ones I’ve heard reported (though both second-hand, not directly from Eliezer), and it also seems consistent with other things he writes. It seems entirely plausible that people misspoke or misremembered or lied, or that Eliezer was reporting probabilities “excluding miracles” or something else that makes these not the right numbers to use.
I’m not trying to be “charitable” to Eliezer, I’m trying to predict his views accurately (while noting that often people predict views inaccurately by failing to be sufficiently charitable). Usually when I see people say things like “obviously Eliezer meant this more normal, less crazy thing” they seem to be wrong.
Rob thinking that it’s not actually 99.99% is in fact an update for me.
IMO the crux is that I disagree with both of these. Instead I think you should use each worldview to calculate a policy, and then generate some kind of compromise between those policies. My arguments above were aiming to establish that this strategy is not very sensitive to exactly how much you defer to Eliezer, because there just aren’t very many good worldviews going around—hence why I assign maybe 15 or 20% (inside view) credence to his worldview (updated from 10% above after reflection). (I think my all-things-considered view is similar, actually, because deference to him cancels out against deference to all the people who think he’s totally wrong.)
Again, the difference is in large part determined by whether you think you’re in a low-dimensional space (here are our two actions, which one should we take?) versus a high-dimensional space (millions of actions available to us, how do we narrow it down?) In a high-dimensional space the tradeoffs between the best ways to generate utility according to Eliezer’s worldview and the best ways to generate utility according to other worldviews become much smaller.
Within a worldview, you can assign EVs which are orders of magnitude different. But once you do worldview diversification, if a given worldview gets even 1% of my resources, then in some sense I’m acting like that worldview’s favored interventions are in a comparable EV ballpark to all the other worldviews’ favored interventions. That’s a feature not a bug.
An arbitrary critic typically gets well less than 0.1% of my deference weight on EA topics (otherwise it’d run out fast!) But also see above: because in high-dimensional spaces there are few tradeoffs between different worldviews’ favored interventions, changing the weights on different worldviews doesn’t typically lead to many OOM changes in how you’re acting like you’re assigning EVs.
Also, I tend to think of cause prio as trying to integrate multiple worldviews into a single coherent worldview. But with deference you intrinsically can’t do that, because the whole point of deference is you don’t fully understand their views.
What do you mean “he doesn’t expect this sort of thing to happen”? I think I would just straightforwardly endorse doing a bunch of costly things like these that Eliezer’s worldview thinks are our best shot, as long as they don’t cause much harm according to other worldviews.
Because neither Ben nor myself was advocating for this.
Okay, my new understanding of your view is that you’re suggesting that (if one is going to defer) one should:
Identify a panel of people to defer to
Assign them weights based on how good they seem (e.g. track record, quality and novelty of ideas, etc)
Allocate resources to [policies advocated by person X] in proportion to [weight assigned to person X].
I agree that (a) this is a reasonable deference model and (b) under this deference model most of my calculations and questions in this thread don’t particularly make sense to think about.
However, I still disagree with the original claim I was disagreeing with:
Even in this new deference model, it seems like the specific weights chosen in step 2 are a pretty big deal (which seem like the obvious analogues of “credences”, and the sort of thing that Ben’s post would influence). If you switch from a weight of 0.5 to a weight of 0.3, that’s a reallocation of 20% of your resources, which is pretty large!
Yepp, thanks for the clear rephrasing. My original arguments for this view were pretty messy because I didn’t have it fully fleshed out in my mind before writing this comment thread, I just had a few underlying intuitions about ways I thought Ben was going wrong.
Upon further reflection I think I’d make two changes to your rephrasing.
First change: in your rephrasing, we assign people weights based on the quality of their beliefs, but then follow their recommended policies. But any given way of measuring the quality of beliefs (in terms of novelty, track record, etc) is only an imperfect proxy for quality of policies. For example, Kurzweil might very presciently predict that compute is the key driver of AI progress, but suppose (for the sake of argument) that the way he does so is by having a worldview in which everything is deterministic, individuals are powerless to affect the future, etc. Then you actually don’t want to give many resources to Kurzweil’s policies, because Kurzweil might have no idea which policies make any difference.
So I think I want to adjust the rephrasing to say: in principle we should assign people weights based on how well their past recommended policies for someone like you would have worked out, which you can estimate using things like their track record of predictions, novelty of ideas, etc. But notably, the quality of past recommended policies is often not very sensitive to credences! For example, if you think that there’s a 50% chance of solving nanotech in a decade, or a 90% chance of solving nanotech in a decade, then you’ll probably still recommend working on nanotech (or nanotech safety) either way.
Having said all that, since we only get one rollout, evaluating policies is very high variance. And so looking at other information like reasoning, predictions, credences, etc, helps you distinguish between “good” and “lucky”. But fundamentally we should think of these as approximations to policy evaluation, at least if you’re assuming that we mostly can’t fully evaluate whether their reasons for holding their views are sound.
Second change: what about the case where we don’t get to allocate resources, but we have to actually make a set of individual decisions? I think the theoretically correct move here is something like: let policies spend their weight on the domains which they think are most important, and then follow the policy which has spent most weight on that domain.
Some complications:
I say “domains” not “decisions” because you don’t want to make a series of related decisions which are each decided by a different policy, that seems incoherent (especially if policies are reasoning adversarially about how to undermine each other’s actions).
More generally, this procedure could in theory be sensitive to bargaining and negotiating dynamics between different policies, and also the structure of the voting system (e.g. which decisions are voted on first, etc). I think we can just resolve to ignore those and do fine, but in principle I expect it gets pretty funky.
Lastly, two meta-level notes:
I feel like I’ve probably just reformulated some kind of reinforcement learning. Specifically the case where you have a fixed class of policies and no knowledge of how they relate to each other, so you can only learn how much to upweight each policy. And then the best policy is not actually in your hypothesis space, but you can learn a simple meta-policy of when to use each existing policy.
It’s very ironic that in order to figure out how much to defer to Yudkowsky we need to invent a theory of idealised cooperative decision-making. Since he’s probably the person whose thoughts on that I trust most, I guess we should meta-defer to him about what that will look like...
In your Kurzweil example I think the issue is not that you assigned weights based on hypothetical-Kurzweil’s beliefs, but that hypothetical-Kurzweil is completely indifferent over policies. I think the natural fix is “moral parliament” style decision making where the weights can still come from beliefs but they now apply more to preferences-over-policies. In your example hypothetical-Kurzweil has a lot of weight but never has any preferences-over-policies so doesn’t end up influencing your decisions at all.
That being said, I agree that if you can evaluate quality of past recommended policies well, without a ton of noise, that would be a better signal than accuracy of beliefs. This just seems extremely hard to do, especially given the selection bias in who comes to your attention in the first place, and idk how I’d do it for Eliezer in any sane way. (Whereas you get to see people state many more beliefs and so there are a lot more data points that you can evaluate if you look at beliefs.)
I think you’re thinking way too much about credences-in-particular. The relevant notion is not “credences”, it’s that-which-determines-how-much-influence-the-person-has-over-your-actions. In this model of deference the relevant notion is the weights assigned in step 2 (however you calculate them), and the message of Ben’s post would be “I think people assign too high a weight to Eliezer”, rather than anything about credences. I don’t think either Ben or I care particularly much about credences-based-on-deference except inasmuch as they affect your actions.
I do agree that Ben’s post looks at credences that Eliezer has given and considers those to be relevant evidence for computing what weight to assign Eliezer. You could take a strong stand against using people’s credences or beliefs to compute weights, but that is at least a pretty controversial take (that I personally don’t agree with), and it seems different from what you’ve been arguing so far (except possibly in the parent comment).
This change seems fine. Personally I’m pretty happy with a rough heuristic of “here’s how I should be splitting my resources across worldviews” and then going off of intuitive “how much does this worldview care about this decision” + intuitive trading between worldviews rather than something more fleshed out and formal but that seems mostly a matter of taste.
Your procedure is non-robust in the sense that, if Kurzweil transitions from total indifference to thinking that one policy is better by epsilon, he’ll throw his full weight behind that policy. Hmm, but then in a parliamentary approach I guess that if there are a few different things he cares epsilon about, then other policies could negotiate to give him influence only over the things they don’t care about themselves. Weighting by hypothetical-past-impact still seems a bit more elegant, but maybe it washes out.
(If we want to be really on-policy then I guess the thing which we should be evaluating is whether the person’s worldview would have had good consequences when added to our previous mix of worldviews. And one algorithm for this is assigning policies weights by starting off from a state where they don’t know anything about the world, then letting them bet on all your knowledge about the past (where the amount they win on bets is determined not just by how correct they are, but also how much they disagree with other policies). But this seems way too complicated to be helpful in practice.)
I think I’m happy with people spending a bunch of time evaluating accuracy of beliefs, as long as they keep in mind that this is a proxy for quality of recommended policies. Which I claim is an accurate description of what I was doing, and what Ben wasn’t: e.g. when I say that credences matter less than coherence of worldviews, that’s because the latter is crucial for designing good policies, whereas the former might not be; and when I say that all-things-considered estimates of things like “total risk level” aren’t very important, that’s because in principle we should be aggregating policies not risk estimates between worldviews.
I also agree that selection bias could be a big problem; again, I think that the best strategy here is something like “do the standard things while remembering what’s a proxy for what”.
Meta: This comment (and some previous ones) get a bunch into “what should deference look like”, which is interesting, but I’ll note that most of this seems unrelated to my original claim, which was just “deference* seems important for people making decisions now, even if it isn’t very important in practice for researchers”, in contradiction to a sentence on your top-level comment. Do you now agree with that claim?
*Here I mean deference in the sense of how-much-influence-various-experts-have-over-your-actions. I initially called this “credences” because I thought you were imagining a model of deference in which literal credences determined how much influence experts had over your actions.
Agreed, but I’m not too worried about that. It seems like you’ll necessarily have some edge cases like this; I’d want to see an argument that the edge cases would be common before I switch to something else.
The chain of approximations could look something like:
The correct thing to do is to consider all actions / policies and execute the one with the highest expected impact.
First approximation: Since there are so many actions / policies, it would take too long to do this well, and so we instead take a shortcut and consider only those actions / policies that more experienced people have thought of, and execute the ones with the highest expected impact. (I’m assuming for now that you’re not in the business of coming up with new ideas of things to do.)
Second approximation: Actually it’s still pretty hard to evaluate the expected impact of the restricted set of actions / policies, so we’ll instead do the ones that the experts say is highest impact. Since the experts disagree, we’ll divide our resources amongst them, in accordance with our predictions of which experts have highest expected impact across their portfolios of actions. (This is assuming a large enough pile of resources that it makes sense to diversify due to diminishing marginal returns for any one expert.)
Third approximation: Actually expected impact of an expert’s portfolio of actions is still pretty hard to assess, we can save ourselves decision time by choosing weights for the portfolios according to some proxy that’s easier to assess.
It seems like right now we’re disagreeing about proxies we could use in the third approximation. It seems to me like proxies should be evaluated based on how close they reach the desired metric (expected future impact) in realistic use cases, which would involve both (1) how closely they align with “expected future impact” in general and (2) how easy they are to evaluate. It seems to me like you’re thinking mostly of (1) and not (2) and this seems weird to me; if you were going to ignore (2) you should just choose “expected future impact”. Anyway, individual proxies and my thoughts on them:
Beliefs / credences: 5⁄10 on easy to evaluate (e.g. Ben could write this post). 3⁄10 on correlation with expected future impact. Doesn’t take into account how much impact experts think their policies could have (e.g. the Kurzweil example above).
Coherence: 3⁄10 on easy to evaluate (seems hard to do this without being an expert in the field). 2⁄10 on correlation with expected future impact (it’s not that hard to have wrong coherent worldviews, see e.g. many pop sci books).
Hypothetical impact of past policies: 1⁄10 on easy to evaluate (though it depends on the domain). 7⁄10 on correlation with expected future impact (it’s not 9⁄10 or 10⁄10 because selection bias seems very hard to account for).
As is almost always the case with proxies, I would usually use an intuitive combination of all the available proxies, because that seems way more robust than relying on any single one. I am not advocating for only relying on beliefs.
I get the sense that you think I’m trying to defend “this is a good post and has no problems whatsoever”? (If so, that’s not what I said.)
Summarizing my main claims about this deference model that you might disagree with:
In practice, an expert’s beliefs / credences will be relevant information into deciding what weight to assign them,
Ben’s post provides relevant information about Eliezer’s beliefs (note this is not taking a stand on other aspects of the post, e.g. the claim about how much people should defer to Eliezer)
The weights assigned to experts are important / valuable to people who need to make decisions now (but they are usually not very important / valuable to researchers).
Meta: I’m currently writing up a post with a fully-fleshed-out account of deference. If you’d like to drop this thread and engage with that when it comes out (or drop this thread without engaging with that), feel free; I expect it to be easier to debate when I’ve described the position I’m defending in more detail.
I always agreed with this claim; my point was that the type of deference which is important for people making decisions now should not be very sensitive to the “specific credences” of the people you’re deferring to. You were arguing above that the difference between your and Eliezer’s views makes much more than a 2x difference; do you now agree that, on my account of deference, a big change in the deference-weight you assign to Eliezer plausibly leads to a much smaller change in your policy from the perspective of other worldviews, because the Eliezer-worldview trades off influence over most parts of the policy for influence over the parts that the Eliezer-worldview thinks are crucial and other policies don’t?
This is helpful, thanks. I of course agree that we should consider both correlations with impact and ease of evaluation; I’m talking so much about the former because not noticing this seems like the default mistake that people make when thinking about epistemic modesty. Relatedly, I think my biggest points of disagreement with your list are:
1. I think calibrated credences are badly-correlated with expected future impact, because:
a) Overconfidence is just so common, and top experts are often really miscalibrated even when they have really good models of their field
b) The people who are best at having impact have goals other than sounding calibrated—e.g. convincing people to work with them, fighting social pressure towards conformity, etc. By contrast, the people who are best at being calibrated are likely the ones who are always stating their all-things-considered views, and who therefore may have very poor object-level models. This is particularly worrying when we’re trying to infer credences from tone—e.g. it’s hard to distinguish the hypotheses “Eliezer’s inside views are less calibrated than other peoples” and “Eliezer always speaks based on his inside-view credences, whereas other people usually speak based on their all-things-considered credences”.
c) I think that “directionally correct beliefs” are much better-correlated, and not that much harder to evaluate, and so credences are especially unhelpful by comparison to those (like, 2⁄10 before conditioning on directional correctness, and 1⁄10 after, whereas directional correctness is like 3⁄10).
2. I think coherence is very well-correlated with expected future impact (like, 5⁄10), because impact is heavy-tailed and the biggest sources of impact often require strong, coherent views. I don’t think it’s that hard to evaluate in hindsight, because the more coherent a view is, the more easily it’s falsified by history.
3. I think “hypothetical impact of past policies” is not that hard to evaluate. E.g. in Eliezer’s case the main impact is “people do a bunch of technical alignment work much earlier”, which I think we both agree is robustly good.
I was arguing that EV estimates have more than a 2x difference; I think this is pretty irrelevant to the deference model you’re suggesting (which I didn’t know you were suggesting at the time).
No, I don’t agree with that. It seems like all the worldviews are going to want resources (money / time) and access to that is ~zero-sum. (All the worldviews want “get more resources” so I’m assuming you’re already doing that as much as possible.) The bargaining helps you avoid wasting resources on counterproductive fighting between worldviews, it doesn’t change the amount of resources each worldview gets to spend.
Going from allocating 10% of your resources to 20% of your resources to a worldview seems like a big change. It’s a big difference if you start with twice as much money / time as you otherwise would have, unless there just happens to be a sharp drop in marginal utility of resources between those two points for some reason.
Maybe you think that there are lots of things one could do that have way more effect than “redirecting 10% of one’s resources” and so it’s not a big deal? If so can you give examples?
I agree overconfidence is common and you shouldn’t literally calculate a Brier score to figure out who to defer to.
I agree that directionally-correct beliefs are better correlated than calibrated credences.
When I say “evaluate beliefs” I mean “look at stated beliefs and see how reasonable they look overall, taking into account what other people thought when the beliefs were stated” and not “calculate a Brier score”; I think this post is obviously closer to the former than the latter.
I agree that people’s other goals make it harder to evaluate what their “true beliefs” are, and that’s one of the reasons I say it’s only 3⁄10 correlation.
Re: correlation, I was implicitly also asking the question “how much does this vary across experts”. Across the general population, maybe coherence is 7⁄10 correlated with expected future impact; across the experts that one would consider deferring to I think it is more like 2⁄10, because most experts seem pretty coherent (within the domains they’re thinking about and trying to influence) and so the differences in impact depend on other factors.
Re: evaluation, it seems way more common to me that there are multiple strong, coherent, conflicting views that all seem compelling (see epistemic learned helplessness), which do not seem to have been easily falsified by history (in sufficiently obvious manner that everyone agrees which one is false).
This too is in large part because we’re looking at experts in particular. I think we’re good at selecting for “enough coherence” before we consider someone an expert (if anything I think we do it too much in the “public intellectual” space), and so evaluating coherence well enough to find differences between experts ends up being pretty hard.
I feel like looking at any EA org’s report on estimation of their own impact makes it seem like “impact of past policies” is really difficult to evaluate?
Eliezer seems like a particularly easy case, where I agree his impact is probably net positive from getting people to do alignment work earlier, but even so I think there’s a bunch of questions that I’m uncertain about:
How bad is it that some people completely dismiss AI risk because they encountered Eliezer and found it off putting? (I’ve explicitly heard something along the lines of “that crazy stuff from Yudkowsky” from multiple ML researchers.)
How many people would be working on alignment without Eliezer’s work? (Not obviously hugely fewer, Superintelligence plausibly still gets published, Stuart Russell plausibly still goes around giving talks about value alignment and its importance.)
To what extent did Eliezer’s forceful rhetoric (as opposed to analytic argument) lead people to focus on the wrong problems?
I’ve now written up a more complete theory of deference here. I don’t expect that it directly resolves these disagreements, but hopefully it’s clearer than this thread.
Note that this wouldn’t actually make a big change for AI alignment, since we don’t know how to use more funding. It’d make a big change if we were talking about allocating people, but my general heuristic is that I’m most excited about people acting on strong worldviews of their own, and so I think the role of deference there should be much more limited than when it comes to money. (This all falls out of the theory I linked above.)
Experts are coherent within the bounds of conventional study. When we try to apply that expertise to related topics that are less conventional (e.g. ML researchers on AGI; or even economists on what the most valuable interventions are) coherence drops very sharply. (I’m reminded of an interview where Tyler Cowen says that the most valuable cause area is banning alcohol, based on some personal intuitions.)
The question is how it compares to estimating past correctness, where we face pretty similar problems. But mostly I think we don’t disagree too much on this question—I think epistemic evaluations are gonna be bigger either way, and I’m mostly just advocating for the “think-of-them-as-a-proxy” thing, which you might be doing but very few others are.
Funding isn’t the only resource:
You’d change how you introduce people to alignment (since I’d guess that has a pretty strong causal impact on what worldviews they end up acting on). E.g. if you previously flipped a 10%-weighted coin to decide whether to send them down the Eliezer track or the other track, now you’d flip a 20%-weighted coin, and this straightforwardly leads to different numbers of people working on particular research agendas that the worldviews disagree about. Or if you imagine the community as a whole acting as an agent, you send 20% of the people to MIRI fellowships and the remainder to other fellowships (whereas previously it would be 10%).
(More broadly I think there’s a ton of stuff you do differently in community building, e.g. do you target people who know ML or people who are good at math?)
You’d change what you used political power for. I don’t particularly understand what policies Eliezer would advocate for but they seem different, e.g. I think I’m more keen on making sure particular alignment schemes for building AI systems get used and less keen on stopping everyone from doing stuff besides one secrecy-oriented lab that can become a leader.
Yeah, that’s what I mean.
Responding to other more minor points:
I mean that he predicts that these costly actions will not be taken despite seeming good to him.
I think it’s also important to consider Ben’s audience. If I were Ben I’d be imagining my main audience to be people who give significant deference weight to Eliezer’s actual worldview. If you’re going to write a top-level comment arguing against Ben’s post it seems pretty important to engage with the kind of deference he’s imagining (or argue that no one actually does that kind of deference, or that it’s not worth writing to that audience, etc).
(Of course, I could be wrong about who Ben imagines his audience to be.)
This survey suggests that he was at 96-98% a year ago.
Why do you think it suggests that? There are two MIRI responses in that range, but responses are anonymous, and most MIRI staff didn’t answer the survey.
I should have clarified that I think (or at least I thought so, prior to your question; kind of confused now) Yudkowsky’s answer is probably one of those two MIRI responses. Sorry about that.
I recall you or somebody else at MIRI once wrote something along the lines that most of MIRI researchers don’t actually believe that p(doom) is extremely high, like >90% doom. Then, in the linked post, there is a comment from someone who marked themselves both as a technical safety and strategy researcher and who gave 0.98, 0.96 on your questions. The style/content of the comment struck me as something Yudkowsky would have written.
Cool! I figured your reasoning was probably something along those lines, but I wanted to clarify that the survey is anonymous and hear your reasoning. I personally don’t know who wrote the response you’re talking about, and I’m very uncertain how many researchers at MIRI have 90+% p(doom), since only five MIRI researchers answered the survey (and marked that they’re from MIRI).
Musing out loud: I don’t know of any complete model of deference which doesn’t run into weird issues, like the conclusion that you should never trust yourself. But suppose you have some kind of epistemic parliament where you give your own views some number of votes, and assign the rest of the votes to other people in proportion to how defer-worthy they seem. Then you need to make a bunch of decisions, and your epistemic parliament keeps voting on what will best achieve your (fixed) goals.
If you do naive question-by-question majority voting on each question simultaneously then you can end up with an arbitrarily incoherent policy—i.e. a set of decisions that’s inconsistent with each other. And if you make the decisions in some order, with the constraint that they each have to be consistent with all prior decisions, then the ordering of the decisions can become arbitrarily important.
Instead, you want your parliament to negotiate some more coherent joint policy to follow. And I expect that in this joint policy, each worldview gets its way on the questions that are most important to it, and cedes responsibility on the questions that are least important. So Eliezer’s worldview doesn’t end up reallocating all the biosecurity money, but it does get a share of curriculum time (at least for the most promising potential researchers). But in general how to conduct those negotiations is an unsolved problem (and pretty plausibly unsolveable).
I could be wrong, but I’d guess Eliezer’s all-things-considered p(doom) is less extreme than that.
Yeah, I’m gonna ballpark guess he’s around 95%? I think the problem is that he cites numbers like 99.99% when talking about the chance of doom “without miracles”, which in his parlance means assuming that his claims are never overly pessimistic. Which seems like wildly bad epistemic practice. So then it goes down if you account for that, and then maybe it goes down even further if he adjusts for the possibility that other people are more correct than him overall (although I’m not sure that’s a mental move he does at all, or would ever report on if he did).
Even at 95% you get OOMs of difference by my calculations, though significantly fewer OOMs, so this doesn’t seem like the main crux.
Beat me to it & said it better than I could.
My now-obsolete draft comment was going to say:
It seems to me that between about 2004 and 2014, Yudkowsky was the best person in the world to listen to on the subject of AGI and AI risks. That is, deferring to Yudkowsky would have been a better choice than deferring to literally anyone else in the world. Moreover, after about 2014 Yudkowsky would probably have been in the top 10; if you are going to choose 10 people to split your deference between (which I do not recommend, I recommend thinking for oneself), Yudkowsky should be one of those people and had you dropped Yudkowsky from the list in 2014 you would have missed out on some important stuff. Would you agree with this?
On the positive side, I’d be interested to see a top ten list from you of people you think should be deferred to as much or more than Yudkowsky on matters of AGI and AI risks.*
*What do I mean by this? Idk, here’s a partial operationalization: Timelines, takeoff speeds, technical AI alignment, and p(doom).
[ETA: lest people write me off as a Yudkowsky fanboy, I wish to emphasize that I too think people are overindexing on Yudkowsky’s views, I too think there are a bunch of people who defer to him too much, I too think he is often overconfident, wrong about various things, etc.]
[ETA: OK, I guess I think Bostrom probably was actually slightly better than Yudkowsky even on 20-year timespan.]
[ETA: I wish to reemphasize, but more strongly, that Yudkowsky seems pretty overconfident not just now but historically. Anyone deferring to him should keep this in mind; maybe directly update towards his credences but don’t adopt his credences. E.g. think “we’re probably doomed” but not “99% chance of doom” Also, Yudkowsky doesn’t seem to be listening to others and understanding their positions well. So his criticisms of other views should be listened to but not deferred to, IMO.]
Didn’t you post that comment right here?
Oops! Dunno what happened, I thought it was not yet posted. (I thought I had posted it at first, but then I looked for it and didn’t see it & instead saw the unposted draft, but while I was looking for it I saw Richard’s post… I guess it must have been some sort of issue with having multiple tabs open. I’ll delete the other version.)
I agree, and I’m a bit confused that the top-level post does not violate forum rules in its current form. There is a version of the post – rephrased and reframed – that I think would be perfectly fine even though I would still disagree with it.
And I say that as someone who loved Paul’s response to Eliezer’s list!
Separately, my takeaway from Ben’s 80k interview has been that I think that Eliezer’s take on AI risk is much more truth-tracking than Ben’s. To improve my understanding, I would turn to Paul and ARC’s writings rather than Eliezer and MIRI’s, but Eliezer’s takes are still up there among the most plausible ones in my mind.
I suspect that the motivation for this post comes from a place that I would find epistemically untenable and that bears little semblance to the sophisticated disagreement between Eliezer and Paul. But I’m worried that a reader may come away with the impression that Ben and Paul fall into one camp and Eliezer into another on AI risk when really Paul agrees with Eliezer on many points when it comes to the importance and urgency of AI safety (see the list of agreements at the top of Paul’s post).
That seems like a considerable overstatement to me. I think it would be bad if the forum rules said an article like this couldn’t be posted.
Maybe, but I find it important to maintain the sort of culture where one can be confidently wrong about something without fear that it’ll cause people to interpret all future arguments only in light of that mistake instead of taking them at face value and evaluating them for their own merit.
The sort of entrepreneurialness that I still feel is somewhat lacking in EA requires committing a lot of time to a speculative idea on the off-chance that it is correct. If it is not, the entrepreneur has wasted a lot of time and usually money. If additionally it has the social cost that they can’t try again because people will dismiss them because of that past failure, it makes it just so much less likely still that anyone will try in the first place.
Of course that’s not the status quo. I just really don’t want EA to move in that direction.
If anything, I think that prohibiting posts like this from being published would have a more detrimental effect on community culture.
Of course, people are welcome to criticise Ben’s post—which some in fact do. That’s a very different category from prohibition.
Yeah, that sounds perfectly plausible to me.
“A bit confused” wasn’t meant to be any sort of rhetorical pretend understatement or something. I really just felt a slight surprise that caused me to check whether the forum rules contain something about ad hom, and found that they don’t. It may well be the right call on balance. I trust the forum team on that.