Developing my worldview. Interested in meta-ethics, epistemics, psychology, AI safety, and AI strategy.
Jack R
No worries! Seemed mostly coherent to me, and please feel free to respond later.
I think the thing I am hung up on here is what counts as “happiness” and “suffering” in this framing.
Could you try to clarify what you mean by the AI (or an agent in general) being “better off?”
I’m actually a bit confused here, because I’m not settled on a meta-ethics: why isn’t it the case that a large part of human values is about satisfying the preferences of moral patients, and human values consider any or most advanced AIs as non-trivial moral patients?
I don’t put much weight on this currently, but I haven’t ruled it out.
If you had to do it yourself, how would you go about a back-of-the-envelope calculation for estimating the impact of a Flynn donation?
Asking this question because I suspect that other people in the community won’t actually do this, and since you are maybe one of the best-positioned people to do this since you seem interested in it.
Yeah, I had to look this up
e.g. from P(X) = 0.8, I may think in a week I will—most of the time—have notched this forecast slightly upwards, but less of the time notching it further downwards, and this averages out to E[P(X) [next week]] = 0.8.
I wish you had said this in the BLUF—it is the key insight, and the one that made me go from “Greg sounds totally wrong” to “Ohhh, he is totally right”
ETA: you did actually say this, but you said it in less simple language, which is why I missed it
I really like your drawings in section 2 -- conveys the idea surprisingly succinctly
Ha!
Note to self: I should really, really try to avoid speaking like this when facilitating in the EA intro fellowship
Hah!
The entire time I’ve been thinking about this, I’ve been thinking of utility curves as logarithmic, so you don’t have to sell me on that. I think my original comment here is another way of understanding why tractability perhaps doesn’t vary much between problems, not within a problem.
Ah, I see now that within a problem, tractability shouldn’t change as the problem gets less neglected if you assume that u(r) is logarithmic, since then the derivative is like 1/R, making tractability like 1/u_total
But why is tractability roughly constant with neglectedness in practice? Equivalently, why are there logarithmic returns to many problems?
I don’t see why logarithmic utility iff tractability doesn’t change with neglectedness.
There was an inference there—you need tractability to balance with the neglectedness to add up to equal cost-effectiveness
I don’t know if I understand why tractability doesn’t vary much. It seems like it should be able to vary just as much as cost-effectiveness can vary.
For example, imagine two problems with the same cost-effectiveness, the same importance, but one problem has 1000x fewer resources invested in it. Then the tractability of that problem should be 1000x higher [ETA: so that the cost-effectiveness can still be the same, even given the difference in neglectedness.]
Another example: suppose an AI safety researcher solved AI alignment after 20 years of research. Then the two problems “solve the sub-problem which will have been solved by tomorrow” and “solve AI alignment” have the same local cost-effectiveness (since they are locally the same actions), the same amount of resources invested into each, but potentially massively different importances. This means the tractabilities must also be massively different.These two examples lead me to believe that in as much as tractability doesn’t vary much, it’s because of a combination of two things:
The world isn’t dumb enough to massively underinvest in a really cost-effective and important problems
The things we tend to think of as problems are “similarly sized” or something like that
I’m still not fully convinced, though, and am confused for instance about what “similarly sized” might actually mean.
When I formalize “tractability” it turns out to be directly related to neglectedness. If R is the number of resources invested in a problem currently, and u(r) is the difference in world utility from investing 0 v.s. r resources into the problem, and u_total is u(r) once the problem is solved, then tractability turns out to be:
Tractability = u’(R) * R * 1/ u_totalSo I’m not sure I really understand yet why tractability wouldn’t change much with neglectedness. I have preliminary understanding, though, which I’m writing up in another comment.
each additional doubling will solve a similar fraction of the problem, in expectation
Aren’t you assuming the conclusion here?
As a note, it’s only ever the case that something is good “in expectation” from a particular person’s point of view or from a particular epistemic state. It’s possible for someone to disagree with me because they know different facts about the world, and so for instance think that different futures are more or less likely.
In other words, the expected value referred to by the term “expectation” is subtly an expected value conditioned on a particular set of beliefs.
I disagree with your reasons for downvoting the post, since I generally judge posts on their content, but I do appreciate your transparency here and found it interesting to see that you disliked a post for these reasons. I’m tempted to upvote your comment, though that feels weird since I disagree with it
Because of Evan’s comment, I think that the signaling consideration here is another example of the following pattern:
Someone suggests we stop (or limit) doing X because of what we might signal by doing X, even though we think X is correct. But this person is somewhat blind to the negative signaling effects of not living up to our own stated ideals (i.e. having integrity). It turns out that some more rationalist-type people report that they would be put off by this lack of honesty and integrity (speculation: perhaps because these types have an automatic norm of honesty).
The other primary example of this I can think of is with veganism and the signaling benefits (and usually unrecongnized costs).
A solution is that when you find yourself saying “X will put off audience Y” to ask yourself “but what audience does X help attract, and who is put off by my alternative to X?”
Maybe someone should user-interview or survey Oregonians to see what made people not want to vote for Carrick