Here’s one possible way to distinguish the two: Under the optimizer’s curse + judgement stickiness scenario retrospective evaluation should usually take a step towards the truth, though it could be a very small one if judgements are very sticky! Under motivated reasoning, retrospective evaluation should take a step towards the “desired truth” (or some combination of truth an desired truth, if the organisation wants both).
David Johnston
One thing to think about: in order to reason about “observations” using mathematical theory, we need to (and do) convert then into mathematical things. Probability theory can only address the mathematical things we get in the end.
Most schemes for doing this ignore a lot of important stuff. E.g. “measure my height in cm, write down the answer” is a procedure that produces a real number, but also one that is indifferent to almost every “observation” I might care about in the future.
(The quotes around observation are to indicate that I don’t know if it’s exactly the right word).
One thing we could try to to is to propose a scheme for mathematising every observation we care about. One way we could try to do this is to try to come up with a sequence of questions “are my observations like X or like not X?”. Then the mathematical object our observations become will be a binary sequence. In practice, this will never solve the problem of distinguishing any two observations we care to distinguish, but maybe imagining something like this that goes on forever is not a bad idealization in the sense that we might care less and less about the remaining undistinguished observations.
Can this story capture something like the tale of the universal prior? The problem here is that what I’ve described looks a bit like a Turing machine—it outputs a sequence of binary digits—but it isn’t a Turing machine because it has no well-defined domain. In fact, the problem of getting from a vague domain to something mathematical is what it was meant to solve to begin with.
One way we can conceptualize of inputs to this process is to postulate “more powerful observers”. For example, if I turn an observation into n binary questions, a more powerful observer is one that asks the same n questions and also asks one more. Then our “observation process” is a Turing machine that takes the output of the more powerful observer and drops the last digit.
However, if we consider the n->infinity limit of this, it seems consistent to me that the more powerful observer could be an anti-inductor or a randomiser vs us every step of the way.
So it seems that this story at least requires an assumption like “we can eventually predict the more powerful observer perfectly”.
There are lots of other ways to make more powerful observers, they just need to be capable of distinguishing everything our observation process distinguishes.
I think the follow up is much more helpful, but I found the original helpful too. I think it may be possible to say the same content less rudely, but “I think strong minds research is poor” is still a useful comment to me.
I share this concern. I don’t have much of a baseline on how much meta-analysis overstated effect sizes, but I suspect it is substantial.
One comparison I do know about: as of about 2018, the average effect size of unusually careful studies funded by the EEF (https://educationendowmentfoundation.org.uk/projects-and-evaluation/projects) was 0.08, while the mean of meta-analytic effect sizes overall was allegedly 0.40(https://visible-learning.org/hattie-ranking-influences-effect-sizes-learning-achievement/), suggesting that meta analysis in that field on average yields effect sizes about five times higher than is realistic.
The point is, these concerns cannot be dealt with simply by suggesting that they won’t make enough difference to change the headline result; in fact they could.
If this issue was addressed in the research discussed here, it’s not obvious to me how it was done.
Give well rated the evidence of impact for GiveDirectly “Exceptionally strong”, though it’s not clear exactly what this means with regard to the credibility of studies that estimate the size of the effect of cash transfers on wellbeing (https://www.givewell.org/charities/top-charities#cash). Nevertheless, if a charity was being penalized in such comparisons for doing rigorous research, then I would expect to see assessments like “strong evidence, lower effect size”, which is what we see here.
I would be interested in this same concept but framed so as to compare personal utility instead of impersonal utility, because I feel like I’m trying to estimate other people’s values for personal utility and aggregate them in order to get an idea of impersonal utility. It seems tricky, though:
- How many {50} year old {friends/family members/strangers} would you save vs {5} year old {friends/family members/strangers}?This seems straightforward, except maybe it’s necessary to add “considering only your own benefit” if we want personal utilities that we can aggregate instead of a mixture of personal and impersonal utilities.
- How many 50 year old yourselves would you save vs 5 year old yourselves?
This one doesn’t make much sense to me, and if I try to frame it differently, e.g.
“imagine a group of 50-74 year olds and a group of <5 year olds. There’s a treatment that saves {X} 50 year olds and {Y} 5 year olds, and the <5 year olds dictate who gets it. What is the minimum X:Y for there to be a 50% chance of choosing the 50-74 year olds?”
My first thought is there’s no way to sensibly answer this question because 3 year olds are incredibly stubborn and also won’t understand.
Anyway, don’t know if this is very helpful, but that was my first response to the app and the result of my first few minutes thinking about it.
The world’s first slightly superhuman AI might be only slightly superhuman at AI alignment. Thus if creating it was a suicidal act by the world’s leading AI researchers, it might be suicidal in exactly the same way. In the other hand, if it has a good grasp of alignment then it’s creators might also have a good grasp of alignment.
In the first scenario (but not the second!), creating more capable but not fully aligned descendants seems like it must be a stable behaviour of intelligent agents, as by assumption
behaviour of descendants is only weakly controlled by parents
the parents keep making better descendants until the descendants are strongly superhuman
I think that Buck’s also right that the world’s first superhuman AI might have a simpler alignment problem to solve.
[Question] Who is working on structured crowd forecasting?
Forecast procedure competitions
Someone should offer him a bet! Best EA vs best lib in a debate or something.
“And suppose, per various respectable cosmologies, that the universe is filled with an infinite number of people very much like you”
I’m not familiar with these cosmologies: do they also say that the universe is filled with an equally large number of people quite like me except they make the opposite decision whenever considering a donation?
Are you at ~65% that marginal scientific acceleration is net negative, or is most of your weight on costs = benefits?
However, I also expect them to be a large waste of resources (especially forecasting time), compared to idealized setups
Incidentally, I was just working on this post about efficiency of forecast markets vs consultancies.
I share your intuition that liquid/large scale markets may solve some of the inefficiency problems—forecasters would have a much better idea about what kinds of questions they can get paid for, specialise in particular kinds of questions, work in teams that can together answer broader ranges of questions and so forth. However, there’s a kind of chicken and egg problem that unless people can be confident of getting cost-effective answers to crucial questions, they’re probably not going to invest a huge amount into forecasting markets.
[Cross-post] A nuclear war forecast is not a coin flip
Thanks, that was actually my intention.
Nice explanation, thanks
If you think there’s an exchangeable model underlying someone else’s long-run prediction, I’m not sure of a good way to try to figure it out. Off the top of my head, you could do something like this:
def model(a,b,conc_expert,expert_forecast):
# forecasted distribution over annual probability of nuclear war
prior_rate = numpyro.sample('rate',dist.Beta(a,b))
with numpyro.plate('w',1000):
war = numpyro.sample('war',dist.Bernoulli(prior_rate),infer={'enumerate':'parallel'})
anywars = (war.reshape(10,100).sum(1)>1).mean()
expert_prediction = numpyro.sample('expert',dist.Beta(conc_expert*anywars,conc_expert*(1-anywars)),obs=expert_forecast)
This is saying that the expert is giving you a noisy estimate of the 100 year rate of war occurrence, and then treating their estimate as an observation. I don’t really know how to think about how much noise to attribute to their estimate, and I wonder if there’s a better way to incorporate it. The noise level is given by the parameter conc_expert, see here for an explanation of the “concentration” parameter in the beta distribution.I don’t know! I think in general if it’s an estimate for (say) 100 year risk with ⇐ 100 years of data (or evidence that is equivalently good), then you should at least be wary of this pitfall. If there’s >>100 years of data and it’s a 100 year risk forecast, then the binomial calculation is pretty good.
Do you know of work on this off the top of your head? I know if Ord has his estimate of 6% extinction in the next 100 years, but I don’t know of attempts to extrapolate this or other estimates.
I think for long timescales, we wouldn’t want to use an exchangeable model, because the “underlying risk” isn’t stationary
From the title, I thought this was going to be a defense of being money pumped!
You’ve gotten me interested in looking at total extinction risk as a follow up, are you interested in working together on it?
I like this post. Some ideas inspired by it:
If “bias” is pervasive among EA organisations, the most direct implication of this seems to me to be that we shouldn’t take judgements published by EA organisations at face value. That is, if we want to know what is true we should apply some kind of adjustment to their published judgements.
It might also be possible to reduce bias in EA organisations, but that depends on other propositions like how effective debiasing strategies actually are.
A question that arises is “what sort of adjustment should be applied?”. The strategy I can imagine, which seems hard to execute, is: try to anticipate the motivations of EA organisations, particularly those that aren’t “inform everyone accurately about X”, and discount those aspects of their judgements that support these aims.
I imagine that doing this overtly would cause a lot of offence A) because it involves deliberately standing in the way of some of the things that people at EA organisations want and B) because I have seen many people react quite negatively to accusations “you’re just saying W because you want V”.
Considering this issue—how much should we trust EA organisations—and this strategy of trying to make “goals-informed” assessments of their statiments, it occurs to me that a question you could ask is “how well has this organisation oriented themselves towards truthfulness?”.
I like that this post has set out the sketch of a theory of organisation truthfulness. In particular
“In worlds where motivated reasoning is commonplace, we’d expect to see:
Red-teaming will discover errors that systematically slant towards an organization’s desired conclusion.
Deeper, more careful reanalysis of cost-effectiveness or impact analyses usually points towards lower rather than higher impact.”
Presumably, in worlds where motivated reasoning is rare, red-teaming will discover errors that slant towards and away from an organisation’s desired conclusion and deeper, more careful reanalysis of cost-effectiveness points towards lower and higher impact equally often.
I note that you are talking about a collection of organisations while I’m talking about a specific organisation. I think you are thinking about it from “how can we evaluate truth-alignment” and I’m thinking about “what do we want to know about truth-alignment”. Maybe it is only possible to evaluate collections of organisations for truth-alignment. At the same time I think it would clearly be useful to know about the truth-alignment of individual organisations, if we could.
It would be interesting, and I think difficult, to expand this theory in three ways:
To be more specific about what “an organisation’s desired conclusion” is, so we can unambiguously say whether something “slants towards” it
Consider whether there are other indications of truth-misalignment
Consider whether it is possible to offer a quantitative account of (A) the relationship between the degree of truth-misalignment of an organisation and the extent to which we see certain indications like consistent updating in the face of re-analysis and (B) the relationship between an organisation’s truth-misalignment and the manner and magnitude by which we should discount their judgements
To be clear, I’m not saying these things are priorities, just ideas I had and haven’t carefully evaluated.