I ultimately decided to vote for the animal welfare groups, because I believe that animal welfare, in both it’s farmed and wild variants is probably one of the most robust and large problems in the world, and with the exception of groups that are the logistical/epistemic backbone of the movements (they are valuable for gathering data and making sure that the animal welfare groups can do their actions), I’ve become more skeptical that other causes were robustly net-positive, especially reducing existential risks.
Sharmake
This sounds very much like the missile gap/bomber gap narrative, and yeah this is quite bad news if they actually adopt the commitments pushed here.
The evidence that China is racing to AGI is quite frankly very little, and I see a very dangerous arms race that could come:
I honestly agree with this post, and to best translate this into my own thinking, we should rather have AI that is superhuman at faithful COT reasoning than it is at wise forward pass thinking.
The Peter Singer/Einstein/Legible reasoning corresponds to COT reasoning, whereas a lot of the directions for intuitive wise/illegible thinking depend on making the forward pass thinking more capable, which is not a great direction for reasons of trust and alignment.
In retrospect, I agree more with 3, and while I do still think AI timelines are plausibly very short, I do think that after-2030 timelines are reasonably plausible from my perspective.
I have become less convinced that takeoff speed from the perspective of the state will be slow, slightly due to entropix reducing my confidence in a view where algorithmic progress doesn’t suddenly go critical and make AI radically better, and more so because I now think there will be less flashy/public progress, and more importantly I think the gap between consumer AI and internal AI used in OpenAI will only widen, so I expect a lot of the GPT-4 moments where people wowed and got very concerned at AI to not happen again.
So I expect the landscape of AI governance to have less salience when AIs can automate AI research than the current AI governance field thinks, which means overall I’ve reduced my probability of a strong societal response from say 80-90% likely to only 45-60% likely.
This is mostly correct as a summary of my position, but for point 6, I want to point out while this is technically true, I do fear economic incentives are against this path.
Agree with the rest of the summary though.
I basically grant 2, sort of agree with 1, and drastically disagree with three (that timelines will be long.)
Which makes me a bit weird, since while I do have real confidence in the basic story that governments are likely to influence AI a lot, I do have my doubts that governments will try to regulate AI seriously, especially if timelines are short enough.
Yeah, at least several comments have much more severe issues than tone or stylistic choices, like rewording ~every claim by Ben, Chloe and Alice, and then assuming that the transformed claims had the same truth value as the original claim.
I’m in a position very similar to Yarrow here: While I think Kat Woods has mostly convinced me that the most incendiary claims are likely false, and I’m sympathetic to the case for suing Ben and Habryka, there was dangerous red flags in the responses, so much so that I’d stop funding Nonlinear entirely, and I think it’s quite bad that Kat Woods responded the way they did.
I unendorsed primarily because apparently, the board didn’t fire because of safety concerns, though I’m not sure this is accurate.
It seems like the board did not fire Sam Altman for safety reasons, but instead for other reasons instead. Utterly confusing, and IMO demolishes my previous theory, though a lot of other theories also lost out.
Sources below, with their archive versions included:
https://twitter.com/norabelrose/status/1726635769958478244
https://twitter.com/eshear/status/1726526112019382275
While I generally agree that they almost certainly have more information on what happened, which is why I’m not really certain on this theory, my main reason here is that for the most part, AI safety as a cause basically managed to get away with incredibly weak standards of evidence for a long time, until the deep learning era in 2019-, especially with all the evolution analogies, and even now it still tends to have very low standards (though I do believe it’s slowly improving right now). This probably influenced a lot of EA safetyists like Ilya, who almost certainly imbibed the norms of the AI safety field, and one of them is that there is a very low standard of evidence needed to claim big things, and that’s going to conflict with corporate/legal standards of evidence.
But I don’t think most people who hold influential positions within EA (or EA-minded people who hold influential positions in the world at large, for that matter) are likely to be that superficial in their analysis of things. (In particular, I’m strongly disagreeing with the idea that it’s likely that the board “basically had no evidence except speculation from the EA/LW forum”. I think one thing EA is unusually good at – or maybe I should say “some/many parts of EA are unusually good at” – is hiring people for important roles who think for themselves and have generally good takes about things and acknowledge the possibility of being wrong about stuff. [Not to say that there isn’t any groupthink among EAs. Also, “unusually good” isn’t necessarily that high of a bar.])
I agree with this weakly, in the sense that being high up in EA is at least a slight update towards them actually thinking through things and being able to make actual cases. My disagreement here is that this effect is probably not strong enough to wash away the cultural effects of operating in a cause area where they don’t need to meet any standard of evidence except long-winded blog posts and getting rewarded, for many reasons.
Also, the board second-guessed it’s decision, which would be evidence for the theory that they couldn’t make a case that actually abided to the standard of evidence for a corporate/legal setting.
If it was any other cause like say GiveWell or some other causes in EA, I would trust them much more that they do have good reason. But AI safety has been so reliant on very low-non-existent standards of evidence or epistemics that they probably couldn’t explain themselves in a way that would abide by the strictness of a corporate/legal standard of evidence.
Edit: The firing wasn’t because of safety related concerns.
If Ilya can say “we’re pushing capabilities down a path that is imminently highly dangerous, potentially existentially, and Sam couldn’t be trusted to manage this safely” with proof that might work—but then why not say that?
I suspect this is due to the fact that quite frankly, the concerns they had about Sam Altman being unsafe on AI basically had no evidence except speculation from the EA/LW forum, which is not enough evidence at all in the corporate world/legal world, and to be quite frank, the EA/LW standard of evidence on AI risk being a big deal enough to investigate is very low, sometimes non-existent, and that simply does not work once you have to deal with companies/the legal system.
More generally, EA/LW is shockingly loose, sometimes non-existent in its standards of evidence for AI risk, which doesn’t play well with the corporate/legal system.
This is admittedly a less charitable take than say, Lukas Gloor’s take.
My general thoughts on this can be stated as: I’m mostly of the opinion that EA will survive this, bar something massively wrong like the board members willfully lying or massive fraud from EAs, primarily because most of the criticism is directed to the AI safety wing, and EA is more than AI safety, after all.
Nevertheless, I do think that this could be true for the AI safety wing, and they may have just hit a key limit to their power. In particular, depending on how this goes, I could foresee a reduction in AI safety power and influence, and IMO this was completely avoidable.
Yeah, this is one of the few times where I believe that the EAs on the board likely overreached here, because they probably didn’t give enough evidence to justify their excoriating statement there that Sam Altman was dishonest, and he might be coming back to lead the company.
I’m not sure how to react to all of this, though.
Edit: My reaction is just WTF happened, and why did they completely play themselves? Though honestly, I just believe that they were inexperienced.
The Bay Area rationalist scene is a hive of techno-optimisitic libertarians.[1] These people have a negative view of state/government effectiveness at a philosophical and ideological level, so their default perspective is that the government doesn’t know what it’s doing and won’t do anything. [edit: Re-reading this paragraph it comes off as perhaps mean as well as harsh, which I apologise for]
Yeah, I kinda of have to agree with this, I think the Bay Area rationalist scene underrates government competence, though even I was surprised at how little politicking happened, and how little it ended up being politicized.
Similary, ‘Politics is the Mind-Killer’ might be the rationalist idea that has aged worst—especially for its influences on EA. EA is a political project—for example, the conclusions of Famine, Affluence, and Morality are fundamentally political.
I think that AI was a surprisingly good exception to the rule that politicizing something would make it harder to get, and I think this is mostly due to the popularity of AI regulations. I will say though that there’s clear evidence that at least for now, AI safety is in a privileged position, and the heuristic no longer applies.
Overly-short timelines and FOOM. If you think takeoff is going to be so fast that we get no firealarms, then what governments do doesn’t matter. I think that’s quite a load bearing assumption that isn’t holding up too well
Not just that though, I also think being overly pessimistic around AI safety sort of contributed, as a lot of people’s mental health was almost certainly not great at best, making them catastrophize the situation and being ineffective.
This is a real issue in the climate change movement, and I expect that AI safety’s embrace of pessimism was not good at all for thinking clearly.
Thinking of AI x-risk as only a technical problem to solve, and undervaluing AI Governance. Some of that might be comparative advantage (I’ll do the coding and leave political co-ordination to those better suited). But it’d be interesting to see x-risk estimates include effectiveness of governance and attention of politicians/the public to this issue as input parameters.
I agree with this, at least for the general problem of AI governance, though I disagree if we talk about AI alignment, though I agree that rationalists underestimate the governance work required to achieve a flourishing future.
Okay, my crux is that the simplicity/Kolmogorov/Solomonoff prior is probably not very malign, assuming we could run it, and in general I find the prior not to be malign except for specific situations.
This is basically because it relies on the IMO dubious assumption that the halting oracle can only be used once, and notably once we use the halting/Solomonoff oracle more than once, the Solomonoff oracle loses it’s malign properties.
More generally, if the Solomonoff Oracle is duplicatable, as modern AIs generally are, then there’s a known solution to mitigate the malignancy of the Solomonoff prior: Duplicate it, and let multiple people run the Solomonoff inductor in parallel to increase the complexity of manipulation. The goal is essentially to remove the uniqueness of 1 Solomonoff inductor, and make an arbitrary number of such oracles to drive up the complexity of manipulation.
So under a weak assumption, the malignancy of the Solomonoff prior goes away. This is described well in the link below, and the important part is that we need either a use-once condition, or we need to assume uniqueness in some way. If we don’t have either assumption holding, as is likely to be the case, then the Solomonoff/Kolmogorov prior isn’t malign.
And that’s if it’s actually malign, which it might not be, at least in the large-data limit:
More specifically, it’s this part of John Wentworth’s comment:
In Solomonoff Model, Sufficiently Large Data Rules Out Malignness
There is a major outside-view reason to expect that the Solomonoff-is-malign argument must be doing something fishy: Solomonoff Induction (SI) comes with performance guarantees. In the limit of large data, SI performs as well as the best-predicting program, in every computably-generated world. The post mentions that:
A simple application of the no free lunch theorem shows that there is no way of making predictions that is better than the Solomonoff prior across all possible distributions over all possible strings. Thus, agents that are influencing the Solomonoff prior cannot be good at predicting, and thus gain influence, in all possible worlds.
… but in the large-data limit, SI’s guarantees are stronger than just that. In the large-data limit, there is no computable way of making better predictions than the Solomonoff prior in any world. Thus, agents that are influencing the Solomonoff prior cannot gain long-term influence in any computable world; they have zero degrees of freedom to use for influence. It does not matter if they specialize in influencing worlds in which they have short strings; they still cannot use any degrees of freedom for influence without losing all their influence in the large-data limit.
Takeaway of this argument: as long as we throw enough data at our Solomonoff inductor before asking it for any outputs, the malign agent problem must go away. (Though note that we never know exactly how much data that is; all we have is a big-O argument with an uncomputable constant.)
As far as the actual practical question, there is a very important limitation on inner-misaligned agents by SGD, primarily because gradient hacking is very difficult to do, and is an underappreciated limitation on misalignment, since SGD has powerful tools to remove inner-misaligned circuits/TMs/Agents in the link below:
https://www.lesswrong.com/posts/w2TAEvME2yAG9MHeq/gradient-hacking-is-extremely-difficult
I want to flag that I see quite a lot of inappropriate binarization happening, and I generally see quite a lot of dismissals of valid third options.
Either they take the extinction risks seriously, or they don’t.
There are other important possibilities, like a potential belief in AI progress helping or solving the existential risk, thinking that the intervention of increasing AI progress is actually the best strategy, etc. More generally, once we make weaker or no assumptions about AI risk, we no longer obtain the binary you’ve suggested.
So this doesn’t really work, because it basically requires us to assume the conclusion, especially for near-term people.
My conclusion is that they say they take the extinction risks seriously, but they’re lying, or they’re profoundly self-deceived. In any case, their revealed preferences are that they prefer a little extra money, power, and status for themselves over a lot of extra safety for everybody else—and for themselves.
Yeah, I’d really like to know how they’d respond to information that says that they’d have to stop doing something that would go against their incentives, like accelerating AI progress.
I don’t think it’s very likely, but given the incentives at play, it really matters that the organization will actually be able to at least seriously consider the possibility that the solution to AI safety might be something that they aren’t incentivized to do, or have anti-incentives to doing.
The basic reasoning is that SGD is an extremely powerful optimizer, and even the imperfections of SGD in real life that mesa-optimizers can use are detectable without much interpretability progress at all. Also, there is an incentive by capabilities groups to improve SGD, so we have good reason to expect that these flaws become less worrisome over time.
In particular, it is basically immune to acausal trade setups or blackmail setups by mesa-optimizers.
Some choice quotes from Beren’s post below:
The key intuition is that gradient descent optimizes the entire causal graph leading to the loss (property 1) and leaves no slack (properties 2 and 3). Everything that can be optimized will be, and in the locally maximal way. If a parameter is contributing to the loss, it is optimized by its exact sensitivity to decrease the loss the most. If a circuit results in bad behaviour, every element of the circuit is optimized away independently. It is easy to construct gradient hacking examples in cases where these conditions do not hold. For instance, if the gradient hacker’s internal circuitry and external connections to other parts of the network are shielded from GD, then it is easy to construct a gradient hacker
In general, if a gradient hacker H is on part of a differentiable path through the network, and tries to manipulate some other part of the network X to output some incorrect output, then gradient descent will simultaneously and independently do all of: a.) destroy the internal representations of the gradient hacker H to stop it doing this behaviour. b.) modify the input connections to H such that the gradient hacker becomes confused about how to do its negative behaviour. c.) modify the output connections from H to X to reduce the impact its behaviour and causally cut off the gradient hacker, d.) modify the internal structure of X so that it is less affected by whatever H outputs, and e.) modify the outputs of X (i.e. to the loss) to reduce its influence, potentially trying to cut off the entire causal network path the the gradient hacker is on.
This leads to an important point. Better optimization algorithms lead to both better capabilities and greater safety from mesaoptimizers. They do this because they reduce the slack available in the optimization process that a mesaoptimizer could exploit for its own purposes. For instance, gradient descent is very sensitive to the conditionining of the loss landscape around a parameter. A gradient hacker could exist for a very long time in an extremely flat region of poor conditioning before being optimized away, potentially persisting over an entire training run. However, this threat can be removed with better preconditioners or second order optimizations which makes gradient descent much less sensitive to local conditioning and so removes the slack caused by poor conditioning.
The link here is below for the full post:
https://www.lesswrong.com/posts/w2TAEvME2yAG9MHeq/gradient-hacking-is-extremely-difficult
Even though I don’t think EA needs to totally replicate outside norms, I do agree that there are good reasons why quite a few norms exist.
I’d say the biggest norms from outside that EA needs to adopt are less porous boundaries on work/dating, and importantly actually having normalish pay structures/work environments.
I think this might not be irrationality, but a genuine difference in values.
In particular, I think something like a discount rate disagreement is at the core of a lot of disagreements on AI safety, and to be blunt, you shouldn’t expect convergence unless you successfully persuade them of this.