Strange, unless the original comment from Gerald has been edited since I responded I think I must have misread most of the comment, as I thought it was making a different point (i.e., “could someone explain how misalignment could happen”). I was tired and distracted when I read it, so it wouldn’t be surprising. However, the final paragraph in the comment (which I originally thought was reflected in the rest of the comment) still seems out of place and arrogant.
Marcel D
This really isn’t the right post for most of those issues/questions, and most of what you mentioned are things you should be able to find via searches on the forum, searches via Google, or maybe even just asking ChatGPT to explain it to you (maybe!). TBH your comment also just comes across quite abrasive and arrogant (especially the last paragraph), without actually appearing to be that insightful/thoughtful. But I’m not going to get into an argument on these issues.
I wish! I’ve been recommending this for a while but nobody bites, and usually (always?) without explanation. I often don’t take seriously many of these attempts at “debate series” if they’re not going to address some of the basic failure modes that competitive debate addresses, e.g., recording notes in a legible/explorable way to avoid the problem of arguments getting lost under layers of argument branches.
Hi Oisín, no worries, and thanks for clarifying! I appreciate your coverage of this topic, I just wanted to make sure there aren’t misinterpretations.
In policy spaces, this is known as the Brussels Effect; that is, when a regulation adopted in one jurisdiction ends up setting a standard followed by many others.
I am not clear how the Brussels effect applies here, especially since we’re not talking manufacturing a product with high costs of running different production lines. I recognize there may be some argument/step that I’m missing, but I can’t dismiss the possibility that the author doesn’t actually understand what the Brussels Effect really is / normally does, and is throwing it around like a buzzword. Could you please elaborate a bit more?
I’m curious whether people (e.g., David, MIRI folk) think that LLMs now or in the near future would be able to substantially speed up this kind of theoretical safety work?
I was not a huge fan of the instrumental convergence paper, although I didn’t have time to thoroughly review it. In short, it felt too slow in making its reasoning and conclusion clear, and once (I think?) I understood what it was saying, it felt quite nitpicky (or a borderline motte-and-bailey). In reality, I’m still unclear if/how it responds to the real-world applications of the reasoning (e.g., explaining why a system with a seemingly simple goal like calculating digits of pi would want to cause the extinction of humanity).
The summary in this forum post seems to help, but I really feel like the caveats identified in this post (“this paper simply argues that this would not be true of agents with randomly-initialized goals”) is not made clear in the abstract.[1]
- ^
The abstract mentions “I find that, even if intrinsic desires are randomly selected [...]” but this does not at all read like a caveat, especially due to the use of “even if” (rather than just “if”).
- ^
Sorry about the delayed reply, I saw this and accidentally removed the notification (and I guess didn’t receive an email notification, contrary to my expectations) but forgot to reply. Responding to some of your points/questions:
One can note that AIXR is definitely falsifiable, the hard part is falsifying it and staying alive.
I mostly agree with the sentiment that “if someone predicts AIXR and is right then they may not be alive”, although I do now think it’s entirely plausible that we could survive long enough during a hypothetical AI takeover to say “ah yeah, we’re almost certainly headed for extinction”—it’s just too late to do anything about it. The problem is how to define “falsify”: if you can’t 100% prove anything, you can’t 100% falsify anything; can the last person alive say with 100% confidence “yep, we’re about to go extinct?” No, but I think most people would say that this outcome basically “falsifies” the claim “there is no AIXR,” even prior to the final person being killed.
Knightian uncertainty makes more sense in some restricted scenarios especially related to self-confirming/self-denying predictions.
This is interesting; I had not previously considered the interaction between self-affecting predictions and (Knightian) “uncertainty.” I’ll have to think more about this, but as you say I do still think Knightian uncertainty (as I was taught it) does not make much sense.
This can apply as well to different people: If I believe that X has a very good reasoning process based on observations on Xs past reasoning, I might not want to/have to follow Xs entire train of thought before raising my probability of their conclusion.
Yes, this is the point I’m trying to get at with forecast legibility, although I’m a bit confused about how it builds on the previous sentence.
Some people have talked about probability distributions on probability distributions, in the case of a binary forecast that would be a function , which is…weird. Do I need to tack on the resilience to the distribution? Do I compute it out of the probability distribution on probability distributions? Perhaps the people talking about imprecise probabilities/infrabayesianism are onto something when they talk about convex sets of probability distributions as the correct objects instead of probability distributions per se.
Unfortunately I’m not sure I understand this paragraph (including the mathematical portion). Thus, I’m not sure how to explain my view of resilience better than what I’ve already written and the summary illustration: someone who says “my best estimate is currently 50%, but within 30 minutes I think there is a 50% chance that my best estimate will become 75% and a 50% chance that my best estimate becomes 25%” has a less-resilient belief compared to someone who says “my best estimate is currently 50%, and I do not think that will change within 30 minutes.” I don’t know how to calculate/quantify the level of resilience between the two, but we can obviously see there is a difference.
Epistemic status: I feel fairly confident about this but recognize I’m not putting in much effort to defend it and it can be easily misinterpreted.
I would probably just recommend not using the concept of neglectedness in this case, to be honest. The ITN framework is a nice heuristic (e.g., usually more neglected things benefit more from additional marginal contributions) but it is ultimately not very rigorous/logical except when contorted into a definitional equation (as many previous posts have explained). Importantly, in this case I think that focusing on neglectedness is likely to lead people astray, given that a change in neglectedness could equate to an increase in tractability.
Over the past few months I have occasionally tried getting LLMs to do some tasks related to argument mapping, but I actually don’t think I’ve tried that specifically, and probably should. I’ll make a note to myself to try here.
Interesting. Perhaps we have quite different interpretations of what AGI would be able to do with some set of compute/cost and time limitations. I haven’t had the chance yet to read the relevant aspects of your paper (I will try to do so over the weekend), but I suspect that we have very cruxy disagreements about the ability of a high-cost AGI—and perhaps even pre-general AI that can still aid R&D—to help overcome barriers in robotics, semiconductor design, and possibly even aspects of AI algorithm design.
Just to clarify, does your S-curve almost entirely rely on base rates of previous trends in technological development, or do you have a component in your model that says “there’s some X% chance that conditional on the aforementioned progress (60% * 40%) we get intermediate/general AI that causes the chance of sufficiently rapid progress in everything else to be Y%, because AI could actually assist in the R&D and thus could have far greater returns to progress than most other technologies”?
I find this strange/curious. Is your preference more a matter of “Traditional interfaces have good features that a flowing interface would lack“ (or some other disadvantage to switching) or “The benefits of switching to a flowing interface would be relatively minor”?
For example on the latter, do you not find it more difficult with the traditional UI to identify dropped arguments? Or suppose you are fairly knowledgeable about most of the topics but there’s just one specific branch of arguments you want to follow: do you find it easy to do that? (And more on the less-obvious side, do you think the current structure disincentivizes authors from deeply expanding on branches?)
On the former, I do think that there are benefits to having less-structured text (e.g., introductions/summaries and conclusions) and that most argument mapping is way too formal/rigid with its structure, but I think these issues could be addressed in the format I have in mind.
Thanks for posting this, Ted, it’s definitely made me think more about the potential barriers and the proper way to combine probability estimates.
One thing I was hoping you could clarify: In some of your comments and estimates, it seems like you are suggesting that it’s decently plausible(?)[1] we will “have AGI“ by 2043, it’s just that it won’t lead to transformative AGI before 2043 because the progress in robotics, semiconductors, and energy scaling will be too slow by 2043. However, it seems to me that once we have (expensive/physically-limited) AGI, this should be able to significantly help with the other things, at least over the span of 10 years. So my main question is: Does your model attach significantly higher probabilities to transformative AGI by 2053? Is it just that 2043 is right near the base of a rise in the cumulative probability curve?
- ^
I wasn’t clear if this is just 60%, or 60%*40%, or what. If you could clarify this, that would be helpful!
- ^
Are your referring to this format on LessWrong? If so I can’t say I’m particularly impressed, as it still seems to suffer from the problems of linear dialogue vs. a branching structure (e.g., it is hard to see where points have been dropped, it is harder to trace specific lines of argument). But I don’t recall seeing this, so thanks for the flag.
As for “I don’t think we could have predicted people…”, that’s missing my point(s). I’m partially saying “this comment thread seems like it should be a lesson/example of how text-blob comment-threads are inefficient in general.” However, even in this specific case Paul knew that he was laying out a multi-pronged criticism, and if the flow format existed he could have presented his claims that way, to make following the debate easier—assuming Ted would reply.
Ultimately, it just seems to me like it would be really logical to have a horizontal flow UI,[1] although I recognize I am a bit biased by my familiarity with such note taking methods from competitive debate.
- ^
In theory it need not be as strictly horizontal as I lay out; it could be a series of vertically nested claims, kept largely within one column—where the idea is that instead of replying to the entire comment you can just reply to specific blocks in the original comment (e.g., accessible in a drop down at the end of a specific argument block rather than the end of the entire comment).
- ^
Am I really the only person who thinks it’s a bit crazy that we use this blobby comment thread as if it’s the best way we have to organize disagreement/argumentation for audiences? I feel like we could almost certainly improve by using, e.g., a horizontal flow as is relatively standard in debate.[1]
With a generic example below:
To be clear, the commentary could still incorporate non-block/prose text.
Alternatively, people could use something like Kialo.com. But surely there has to be something better than this comment thread, in terms of 1) ease of determining where points go unrefuted, 2) ease of quickly tracing all responses in specific branches (rather than having to skim through the entire blob to find any related responses), and 3) seeing claims side-by-side, rather than having to scroll back and forth to see the full text. (Quoting definitely helps with this, though!)
- ^
(Depending on the format: this is definitely standard in many policy debate leagues.)
- ^
Epistemic status: writing fast and loose, but based on thoughts I’ve mulled over for a while given personal experience/struggles and discussions with other people. Thus, easy to misinterpret what I’m saying. Take with salt.
On the topic of educational choice, I can’t emphasize enough the importance of having legible hard skills such as language or, perhaps more importantly, quantitative skills. Perhaps the worst mistake I made in college was choosing to double major in both international studies and public policy, rather than adding a second major in econ or computer science. To my freshman self, this seemed to make sense: “I really want to do international security and like public policy, so this should demonstrate my enthusiasm and improve my policy-analysis skills.”
I’m somewhat skeptical that the second belief panned out (relative to majoring in econ), but the first clearly seems to have been misguided. Understanding what I do now about the application process, including how shallow/ineffective the process is at discerning capabilities and interest, how people will just exaggerate/BS their way into positions (through claims in their interviews and cover letters), that some positions will not consider people without STEM backgrounds (even when the skills can sometimes be learned prior to the job), AND how much of the process in some places relies on connections or prestige, it’s really clear that having legible hard skills is crucial. In contrast, you probably get rapidly-diminishing marginal returns with soft science degrees.
Ultimately, I’ve found that the line between empirical and theoretical analysis is often very blurry, and if someone does develop a decent brightline to distinguish the two, it turns out that there are often still plenty of valuable theoretical methods, and some of the empirical methods can be very misleading.
For example, high-fidelity simulations are arguably theoretical under most definitions, but they can be far more accurate than empirical tests.
Overall, I tend to be quite supportive of using whatever empirical evidence we can, especially experimental methods when they are possible, but there are many situations where we cannot do this. (I’ve written more on this here: https://georgetownsecuritystudiesreview.org/2022/11/30/complexity-demands-adaptation-two-proposals-for-facilitating-better-debate-in-international-relations-and-conflict-research/ )
I see. (For others’ reference, those two points are pasted below)
All knowledge is derived from impressions of the external world. Our ability to reason is limited, particularly about ideas of cause and effect with limited empirical experience.
History shows that societies develop in an emergent process, evolving like an organism into an unknown and unknowable future. History was shaped less by far-seeing individuals informed by reason than by contexts which were far too complex to realize at the time.
Overall, I don’t really know what to make of these. They are fairly vague statements, making them very liable to motte-and-bailey interpretations; they border on deepities, in my reading.
“All knowledge is derived from impressions of the external world” might be true in a trivially obvious sense that you often need at least some iota of external information to develop accurate beliefs or effective actions (although even this might be somewhat untrue with regard to biological instincts). However, it makes no clear claim about how much and what kind of “impressions from the external world” are necessary for “knowledge.”[1] Insofar as the claim is that forecasts about AI x-risks are not “derived from impressions of the external world,” I think this is completely untrue. In such an interpretation, I question whether the principle even lives up to its own claims: what empirical evidence was this claim derived from?
The second claim suffers from similar problems in my view: I obviously wouldn’t claim that there have always been seers who could just divine the long-run future. However, insofar as it is saying that the future is so “unknowable” that people cannot reason about what actions in front of them are good, I also reject this: it seems obviously untrue with regards to, e.g., fighting Nazi Germany in WW2. Moreover, I would say that even if this has been true, that does not mean it will always be true, especially given the potential for value lock-in from superintelligent AI.
Overall, I agree that it’s important to be humble about our forecasts and that we should be actively searching for more information and methods to improve our accuracy, questioning our biases, etc. But I also don’t trust vague statements that could be interpreted as saying it’s largely hopeless to make decision-informing predictions about what to do in the short term to increase the chance of making the long-run future go well.
- ^
A term I generally dislike for its ambiguity and philosophical denotations (which IMO are often dubious at best).
I haven’t looked very hard but the short answer is no, I’m not aware of any posts/articles that specifically address the idea of “methodological overhang” (a phrase I hastily made up and in hindsight realize may not be totally logical) as it relates to AI capabilities.
That being said, I have written about the possibility that our current methods of argumentation and communication could be really suboptimal, here: https://georgetownsecuritystudiesreview.org/2022/11/30/complexity-demands-adaptation-two-proposals-for-facilitating-better-debate-in-international-relations-and-conflict-research/
I definitely would have preferred a TLDR or summary at the top, not the bottom. However, I definitely appreciate your investigation into this, as I have long loathed Eliezer’s use of the term once I realized he just made it up.