ESRogs

Karma: 418

ESRogs Nov 5, 2020, 5:52 PM
3 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: Are we living at the most influential time in history?
Just a quick thought on this issue: Using Laplace’s rule of succession (or any other similar prior) also requires picking a somewhat arbitrary start point.
Doesn’t the uniform prior require picking an arbitrary start point and end point? If so, switching to a prior that only requires an arbitrary start point seems like an improvement, all else equal. (Though maybe still worth pointing out that all arbitrariness has not been eliminated, as you’ve done here.)

ESRogs Oct 14, 2020, 7:54 AM
3 points
0 ∶ 0
in reply to: Ramiro’s comment on: Who should / is going to win 2020 FLI award 2020?
The Nobel Prize comes with a million dollars (9,000,000 SEK). 50k doesn’t seem like that much, in comparison.

ESRogs Aug 5, 2020, 8:55 PM
3 points
0 ∶ 0
on: EA reading list: miscellaneous
Another Karnofsky series that I thought was important (and perhaps doesn’t fit anywhere else) is his posts on The Straw Ratio.

ESRogs Aug 5, 2020, 8:53 PM
1 point
0 ∶ 0
in reply to: MichaelStJules’s comment on: EA reading list: EA motivations and psychology
Also: Charity: The video game that’s real, by Holden Karnofsky

ESRogs Aug 5, 2020, 8:48 PM
6 points
0 ∶ 0
on: EA reading list: EA motivations and psychology
FYI Purchase fuzzies and utilons separately is showing up twice in the list.

ESRogs May 22, 2020, 5:59 PM
1 point
0 ∶ 0
in reply to: Lancer21’s comment on: What’s the big deal about hypersonic missiles?
ballistic ones are faster, but reach Mach 20 and similar speeds outside of the atmosphere
This seems notable, since there is no sound w/o atmosphere. So perhaps ballistic missiles never actually engage in hypersonic flight, despite reaching speeds that would be hypersonic if in the atmosphere? Though I would be surprised if they’re reaching Mach 20 at a high altitude and then not still going super fast (above Mach 5) on the way down.

ESRogs May 22, 2020, 5:54 PM
0 points
0 ∶ 0
on: What’s the big deal about hypersonic missiles?
according to Thomas P. Christie (DoD director of Operational Test and Evaluation from 2001–2005) current defense systems “haven’t worked with any degree of confidence”.[12] A major unsolved problem is that credible decoys are apparently “trivially easy” to build, so much so that during missile defense tests, balloon decoys are made larger than warheads—which is not something a real adversary would do. Even then, tests fail 50% of the time.
I didn’t follow this. What are the decoys? Are they made by the attacking side or the defending side? Why does them being easy to build mean that people make large ones during tests, and why wouldn’t that also happen in a real attack? Why is it notable that tests still fail at a high rate in the presence of large decoys?

ESRogs Jan 26, 2020, 12:44 AM
4 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
Thanks! Just read it.
I think there’s a key piece of your thinking that I don’t quite understand / disagree with, and it’s the idea that normativity is irreducible.
I think I follow you that if normativity were irreducible, then it wouldn’t be a good candidate for abandonment or revision. But that seems almost like begging the question. I don’t understand why it’s irreducible.
Suppose normativity is not actually one thing, but is a jumble of 15 overlapping things that sometimes come apart. This doesn’t seem like it poses any challenge to your intuitions from footnote 6 in the document (starting with “I personally care a lot about the question: ‘Is there anything I should do, and, if so, what?’”). And at the same time it explains why there are weird edge cases where the concept seems to break down.
So few things in life seem to be irreducible. (E.g. neither Eric nor Ben is irreducible!) So why would normativity be?
[You also should feel under no social obligation to respond, though it would be fun to discuss this the next time we find ourselves at the same party, should such a situation arise.]

ESRogs Dec 1, 2019, 7:58 AM
3 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
Don’t Make Things Worse: If a decision would definitely make things worse, then taking that decision is not rational.
Don’t Commit to a Policy That In the Future Will Sometimes Make Things Worse: It is not rational to commit to a policy that, in the future, will sometimes output decisions that definitely make things worse.
...
One could argue that R_CDT sympathists don’t actually have much stronger intuitions regarding the first principle than the second—i.e. that their intuitions aren’t actually very “targeted” on the first one—but I don’t think that would be right. At least, it’s not right in my case.
I would agree that, with these two principles as written, more people would agree with the first. (And certainly believe you that that’s right in your case.)
But I feel like the second doesn’t quite capture what I had in mind regarding the DMTW intuition applied to P_′s.
Consider an alternate version:
If a decision would definitely make things worse, then taking that decision is not good policy.
Or alternatively:
If a decision would definitely make things worse, a rational person would not take that decision.
It seems to me that these two claims are naively intuitive on their face, in roughly the same way that the ”… then taking that decision is not rational.” version is. And it’s only after you’ve considered prisoners’ dilemmas or Newcomb’s paradox, etc. that you realize that good policy (or being a rational agent) actually diverges from what’s rational in the moment.
(But maybe others would disagree on how intuitive these versions are.)
EDIT: And to spell out my argument a bit more: if several alternate formulations of a principle are each intuitively appealing, and it turns out that whether some claim (e.g. R_CDT is true) is consistent with the principle comes down to the precise formulation used, then it’s not quite fair to say that the principle fully endorses the claim and that the claim is not counter-intuitive from the perspective of the original intuition.
Of course, this argument is moot if it’s true that the original DMTW intuition was always about rational in-the-moment action, and never about policies or actors. And maybe that’s the case? But I think it’s a little more ambiguous with the ”… is not good policy” or “a rational person would not...” versions than with the “Don’t commit to a policy...” version.
EDIT2: Does what I’m trying to say make sense? (I felt like I was struggling a bit to express myself in this comment.)

ESRogs Nov 29, 2019, 6:58 AM
9 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
There may be a pretty different argument here, which you have in mind. I at least don’t see it yet though.
Perhaps the argument is something like:
- “Don’t make things worse” (DMTW) is one of the intuitions that leads us to favoring R_CDT
- But the actual policy that R_CDT recommends does not in fact follow DMTW
- So R_CDT only gets intuitive appeal from DMTW to the extent that DMTW was about R_′s, and not about P_′s
- But intuitions are probably(?) not that precisely targeted, so R_CDT shouldn’t get to claim the full intuitive endorsement of DMTW. (Yes, DMTW endorses it more than it endorses R_FDT, but R_CDT is still at least somewhat counter-intuitive when judged against the DMTW intuition.)

ESRogs Nov 29, 2019, 6:23 AM
3 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
both R_UDT and R_CDT imply that the decision to commit yourself to a two-boxing policy at the start of the game would be rational
That should be “a one-boxing policy”, right?

ESRogs Nov 27, 2019, 3:31 AM
7 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
Thanks! This is helpful.
It seems like following general situation is pretty common: Someone is initially inclined to think that anything with property P will also have property Q1 and Q2. But then they realize that properties Q1 and Q2 are inconsistent with one another.
One possible reaction to this situation is to conclude that nothing actually has property P. Maybe the idea of property P isn’t even conceptually coherent and we should stop talking about it (while continuing to independently discuss properties Q1 and Q2). Often the more natural reaction, though, is to continue to believe that some things have property P—but just drop the assumption that these things will also have both property Q1 and property Q2.
I think I disagree with the claim (or implication) that keeping P is more often more natural. Well, you’re just saying it’s “often” natural, and I suppose it’s natural in some cases and not others. But I think we may disagree on how often it’s natural, though hard to say at this very abstract level. (Did you see my comment in response to your Realism and Rationality post?)
In particular, I’m curious what makes you optimistic about finding a “correct” criterion of rightness. In the case of the politician, it seems clear that learning they don’t have some of the properties you thought shouldn’t call into question whether they exist at all.
But for the case of a criterion of rightness, my intuition (informed by the style of thinking in my comment), is that there’s no particular reason to think there should be one criterion that obviously fits the bill. Your intuition seems to be the opposite, and I’m not sure I understand why.
My best guess, particularly informed by reading through footnote 15 on your Realism and Rationality post, is that when faced with ethical dilemmas (like your torture vs lollipop examples), it seems like there is a correct answer. Does that seem right?
(I realize at this point we’re talking about intuitions and priors on a pretty abstract level, so it may be hard to give a good answer.)

ESRogs Nov 26, 2019, 9:10 AM
1 point
0 ∶ 0
in reply to: RobBensinger’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
But the arguments I’ve seen for “CDT is the most rational decision theory” to date have struck me as either circular, or as reducing to “I know CDT doesn’t get me the most utility, but something about it just feels right”.
It seems to me like they’re coming down to saying something like: the “Guaranteed Payoffs Principle” / “Don’t Make Things Worse Principle” is more core to rational action than being self-consistent. Whereas others think self-consistency is more important.
Mind you, if the sentence “CDT is the most rational decision theory” is true in some substantive, non-trivial, non-circular sense
It’s not clear to me that the justification for CDT is more circular than the justification for FDT. Doesn’t it come down to which principles you favor?
Maybe you could say FDT is more elegant. Or maybe that it satisfies more of the intuitive properties we’d hope for from a decision theory (where elegance might be one of those). But I’m not sure that would make the justification less-circular per se.
I guess one way the justification for CDT could be more circular is if the key or only principle that pushes in favor of it over FDT can really just be seen as a restatement of CDT in a way that the principles that push in favor of FDT do not. Is that what you would claim?

ESRogs Nov 26, 2019, 2:03 AM
2 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
Just want to note that I found the R_ vs P_ distinction to be helpful.

I think using those terms might be useful for getting at the core of the disagreement.

ESRogs Nov 26, 2019, 1:31 AM
7 points
0 ∶ 0
in reply to: bgarfinkel’s comment on: I’m Buck Shlegeris, I do research and outreach at MIRI, AMA
is more relevant when trying to judge the likelihood of a criterion of rightness being correct
Sorry to drop in in the middle of this back and forth, but I am curious—do you think it’s quite likely that there is a single criterion of rightness that is objectively “correct”?
It seems to me that we have a number of intuitive properties (meta criteria of rightness?) that we would like a criterion of rightness to satisfy (e.g. “don’t make things worse”, or “don’t be self-effacing”). And so far there doesn’t seem to be any single criterion that satisfies all of them.
So why not just conclude that, similar to the case with voting and Arrow’s theorem, perhaps there’s just no single perfect criterion of rightness.
In other words, once we agree that CDT doesn’t make things worse, but that UDT is better as a general policy, is there anything left to argue about about which is “correct”?
EDIT: Decided I had better go and read your Realism and Rationality post, and ended up leaving a lengthy comment there.
What links here?
- bmgarfinkel's comment on Realism and Rationality by bmgarfinkel (LessWrong; Feb 15, 2020, 12:31 AM; 8 points)

ESRogs Aug 2, 2019, 11:37 PM
12 points
0 ∶ 0
in reply to: ESRogs’s comment on: ‘Longtermism’
IMHO the most natural name for “people at any time have equal value” should be something like temporal indifference, which more directly suggests that meaning.
Edit: I retract temporal indifference in favor of Holly Elmore’s suggestion of temporal cosmopolitanism.

ESRogs Aug 2, 2019, 11:35 PM
18 points
0 ∶ 0
in reply to: William_MacAskill’s comment on: ‘Longtermism’
Given this, I’m inclined to stick with the stronger version — it already has broad appeal, and has some advantages over the weaker version.
Why not include this in the definition of strong longtermism, but not weak longtermism?
Having longtermism just mean “caring a lot about the long-term future” seems the most natural and least likely to cause confusion. I think for it to mean anything other than that, you’re going to have to keep beating people over the head with the definition (analogous to the sorry state of the phrase, “begs the question”).
When most people first hear the term longtermism, they’re going to hear it in conversation or see it in writing without the definition attached to it. And they are going to assume it means caring a lot about the long-term future. So why define it to mean anything other than that?
On the other hand, anyone who comes across strong longtermism, is much more likely to realize that it’s a very specific technical term, so it seems much more natural to attach a very specific definition to it.

ESRogs Oct 30, 2017, 1:20 AM
2 points
0 ∶ 0
in reply to: LukeDing’s comment on: Inadequacy and Modesty
1. Extra potency may arise if the product is important enough to affect the market or indeed the society it operates in creating a feedback loop (what George Soros calls reflexivity). The development of credit derivatives and subsequent bust could be a devastating example of this. And perhaps ‘the Big Short’ is a good illustration of Eliezer’s points.
Could you say more about this point? I don’t think I understand it.

My best guess is that it means that when changes to the price of an asset result in changes out in the world, which in turn cause the asset price to change again in the same direction, then the asset price is likely to be wrong, and one can expect a correction. Is that it?

ESRogs Dec 22, 2016, 6:13 PM
1 point
0 ∶ 0
in reply to: kbog’s comment on: Lunar Colony

Is it good for keeping people safe against x-risks? Nope. In what scenario does having a lunar colony efficiently make humanity more resilient? If there’s an asteroid, go somewhere safe on Earth...

What if it’s a big asteroid?

ESRogs Dec 2, 2016, 9:08 AM
0 points
0 ∶ 0
on: Why I’m donating to MIRI this year

Note that this is particularly an argument about money. I think that there are important reasons to skew work towards scenarios where AI comes particularly soon, but I think it’s easier to get leverage over that as a researcher choosing what to work on (for instance doing short-term safety work with longer-term implications firmly in view) than as a funder.

I didn’t understand this part. Are you saying that funders can’t choose whether to fund short-term or long-term work (either because they can’t tell which is which, or there aren’t enough options to choose from)?