dsj

Karma: 128

dsj May 29, 2025, 3:31 AM
1 point
0 ∶ 0
on: The myth of AI “warning shots” as cavalry
Warning shots/accidents are normally discussed in the frame of generating political will, by convincing a previously unpersuaded public or policymakers that AI is unsafe and action must be taken.
I think this is a mistake.
Accidents (which might be relatively small-scale), in AI as in other fields, are useful mainly for generating real-world, non-hypothetical failure cases in all their intricate detail, thereby yielding a model organism which can be studied by engineers (and hopefully reproduced in a controlled manner) to better understand both the circumstances in which such scenarios might arise, and countermeasures to prevent them.
This is analogous to how aircraft accidents are investigated in depth by the NTSB so as to learn how to prevent similar accidents. There’s already political will to make aircraft safe, but there’s only so much that can be done from the ivory tower without real-world experience.
The choices are:
- Stop AI development permanently.
- Pause AI temporarily until we make it safe.
- Muddle through.
The printing press and electricity were existentially dangerous technologies, because they enabled everything that came after, including AI. When those technologies were developed, however, the world wasn’t globalized enough, nor were nations powerful enough, that a permanent stop button could have been pressed. By contrast, perhaps a permanent “stop AI” button could be pressed today, however I don’t see any way of doing so short of entrenching a permanent totalitarian state.
So that leaves pausing until we make it safe, or muddling through.
But I think the aircraft accident analogy works quite well for AI: there’s only so much that safety research can do from the ivory tower without experience of AIs being used in the real world. So I think the “pause until we make it safe” option is illusory.
That leaves muddling through, as we’ve done with every technology before: We discover problems, hopefully at a small scale, and fix or mitigate them as they arise.
There are no guarantees, but I think it’s our best bet.

dsj Dec 22, 2023, 7:27 PM
3 points
0 ∶ 0
in reply to: Karthik Tadepalli’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
Oh lol, thanks for explaining! Sorry for misunderstanding you. (It’s a pretty amusing misunderstanding though, I think you’d agree.)

dsj Dec 22, 2023, 6:08 PM
1 point
0 ∶ 0
in reply to: Greg_Colbourn ⏸️ ’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
Fair enough, I edited it again. I still think the larger points stand unchanged.

dsj Dec 22, 2023, 6:00 PM
1 point
0 ∶ 0
in reply to: Greg_Colbourn ⏸️ ’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.

dsj Dec 22, 2023, 4:17 PM
5 points
0 ∶ 1
in reply to: Greg_Colbourn ⏸️ ’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I’m criticizing.
I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That’s why I work on this problem full-time.
I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.
(I provide some links to views dissenting from this extreme confidence here.)

dsj Dec 22, 2023, 4:11 PM
11 points
1 ∶ 0
in reply to: Karthik Tadepalli’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
To be clear, mostly I’m not asking for “more work”, I’m asking people to use much better epistemic hygiene. I did use the phrase “work much harder on its epistemic standards”, but by this I mean please don’t make sweeping, confident claims as if they are settled fact when there’s informed disagreement on those subjects.
Nevertheless, some examples of the sort of informed disagreement I’m referring to:
- The mere existence of many serious alignment researchers seriously optimistic about scalable oversight methods such as debate.
- This post by Matthew Barnett arguing we’ve been able to specify values much more successfully than MIRI anticipated.
- Shard theory, developed mostly by Alex Turner and Quintin Pope, calling into question the utility argmaxer framework which has been used to justify many historical concerns about instrumental convergence leading to AI takeover.
- This comment by me arguing ChatGPT is pretty aligned compared to MIRI’s historical predictions, because it does what we mean and not what we say.
- A detailed set of objections from Quintin Pope to Eliezer’s views, which Eliezer responded to by saying it’s “kinda long”, and engaged with extremely superficially before writing it off.
- This by Stuhlmüller and Byun, as well as many other articles by others, arguing that process oversight is a viable alignment strategy, converging with rather than opposing capabilities.
Notably, the extreme doomer contingent has largely failed even to understand, never mind engage with, some of these arguments, frequently lazily pattern-matching and misrepresenting them as more basic misconceptions. A typical example is thinking Matthew Barnett and I have been saying that GPT understanding human values is evidence against the MIRI/doomer worldview (after all, “the AI knows what you want but does not care, as we’ve said all along”), when in fact we’re saying there’s evidence we have actually pointed GPT successfully at those values.
It’s fine if you have a different viewpoint. Just don’t express that viewpoint as if it’s self-evidently right when there’s serious disagreement on the matter among informed, thoughtful people. An article like the OP which claims that labs should shut down should at least try to engage with the views of someone who thinks the labs should not shut down, and not just pretend such people are fools unworthy of mention.

dsj Dec 22, 2023, 8:20 AM
1 point
2 ∶ 0
in reply to: Greg_Colbourn ⏸️ ’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
These essays are well known and I’m aware of basically all of them. I deny that there’s a consensus on the topic, that the essays you link are representative of the range of careful thought on the matter, or that the arguments in these essays are anywhere near rigorous enough to meet my criterion: justifying the degree of confidence expressed in the OP (and some of the posts you link).

dsj Dec 19, 2023, 3:23 AM
11 points
7 ∶ 2
in reply to: Sanjay’s comment on: OpenAI, DeepMind, Anthropic, etc. should shut down.
I’ll go further and say that I think those two claims are widely believed by many in the AI safety world (in which I count myself) with a degree of confidence that goes way beyond what can be justified by any argument that has been provided by anyone, anywhere, and I think this is a huge epistemic failure of that part of the AI safety community.

I strongly downvoted the OP for making these broad, sweeping, controversial claims as if they are established fact and obviously correct, as opposed to one possible way the world could be which requires good arguments to establish, and not attempting any serious understanding of and engagement with the viewpoints of people who disagree that these organizations shutting down would be the best thing for the world.

I would like the AI safety community to work much harder on its epistemic standards.

dsj Jul 9, 2023, 4:59 PM
9 points
1 ∶ 0
in reply to: Holly Morgan’s comment on: Announcing the EA Archive
Another easy thing you can do, which I did several years ago, is download Kiwix onto your phone, which allows you to save offline versions of references such as Wikipedia, WikiHow, and way, way more. Then also buy a solar-powered or hand-crank USB charger (often built into disaster radios such as this one which I purchased).
For extra credit, store this data on an old phone you no longer use, and keep that and the disaster radio in a Faraday bag.

dsj Apr 1, 2023, 5:34 AM
74 points
7 ∶ 0
on: Iterating on our redesign (Forum update April 2023)
I’m calling for a six month pause on new font faces more powerful than Comic Sans.

dsj Mar 31, 2023, 2:05 AM
9 points
3 ∶ 0
in reply to: Gil’s comment on: Nuclear brinksmanship is not a good AI x-risk strategy
It varies, but most treaties are not backed up by force (by which I assume we’re referring to inter-state armed conflict). They’re often backed up by the possibility of mutual tit-for-tat defection or economic sanction, among other possibilities.

dsj Mar 24, 2023, 12:41 AM
1 point
0 ∶ 0
on: Time-Sensitive Opportunity to Make $8000 EV in 4 hours in Massachusetts via Online Sports Betting
Thanks for pointing this out. Can you clarify why the EV is so much lower when taking the standard deduction?

dsj Mar 19, 2023, 5:47 AM
9 points
3 ∶ 0
in reply to: Habryka [Deactivated]’s comment on: High-level hopes for AI alignment
A better argument is that the wildness of the next century means our models of the future are untrustworthy, which should make us pretty suspicious of any claim that something is the P = 1 - ε outcome without a watertight case for the proposition.
There doesn’t seem to be such a watertight case for AI takeover. Most threat models^[1] rest heavily on the assumption that transformative AI will be single-mindedly optimizing for some (misspecified or mislearned) utility function, as opposed to e.g. following a bunch of contextually-activated policies^[2]. While this is plausible, and thus warrants significant effort to prevent, it’s far from clear that this is even the most likely outcome “absent highly specific conditions”, never mind a near certainty.
1. ^
  e.g. Cotra and Ngo et al
2. ^
  as proposed e.g. by shard theory

dsj Jan 17, 2023, 4:06 PM
2 points
0 ∶ 0
in reply to: basil.halperin’s comment on: AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years
It appears the UK’s index-linked gilts, at least, don’t have this structural issue.
See “redemption payments” on page 6 of this document, or put in a sufficiently large negative inflation assumption here.

dsj Jan 16, 2023, 1:13 AM
5 points
0 ∶ 0
on: AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years
One possible explanation is an expectation of massive deflation (perhaps due to AI-caused decreases in production costs) which the structure of Treasury Inflation Protected Securities (TIPS) and other inflation-linked government bonds — the source of your real interest rate data — doesn’t account for.
While TIPS adjust the principal (and corresponding coupons) up and down over time according to changes in the consumer price index, you ALWAYS get at least the initial principal back at maturity. Typical “yield” calculations, however, are based on the assumption that you get your inflation-adjusted principal back (which you do if inflation was positive over its term, as it usually would be historically).
This means that iff there’s net deflation over its term, the “yield” underestimates your real rate of return with TIPS by the amount of that deflation.