Linch

Karma: 26,996

Linch Apr 7, 2025, 10:54 PM
2 points
0 ∶ 0
in reply to: SiebeRozendal’s comment on: The EA case for Trump 2024
Anything specific that prompted you to comment this now, may I ask?

Linch Mar 29, 2025, 1:21 AM
2 points
0 ∶ 0
in reply to: tylermjohn’s comment on: Power Laws of Value
Apologies for doubting you!

Very much of a tangent, but do you have an short explanation for why the shape is likely to be a power-law? I think power laws are relatively rare in nature, and the more common generators of power law distributions (e.g. network effects) don’t seem to apply here.

Linch Mar 28, 2025, 2:22 AM
2 points
0 ∶ 0
in reply to: Ozzie Gooen’s comment on: Ozzie Gooen’s Shortform
Yeah I don’t quite understand that line of argument. Naively, it seems like a bait-and-switch, not unlike “journalists don’t write their own terrible headlines.”

Linch Mar 28, 2025, 1:55 AM
4 points
1 ∶ 0
in reply to: AnonymousTurtle’s comment on: Ozzie Gooen’s Shortform
Possibly a tangential point, but lots of people in many EA communities think that accelerating economic growth in the US is a top use of funds.
Hmm I think the link does not support your claim.

Linch Mar 26, 2025, 10:40 PM
8 points
1 ∶ 0
in reply to: finm’s comment on: Power Laws of Value
Why would value be disributed over some suitable measure of world-states in a way that can be described as a power law specifically (vs some other functional form where the most valuable states are rare)?
I agree with this. I’m probably being too much of a pedant, but it’s a slight detriment to our broader epistemic community that people use “power law” as a shorthand for “heavy-tailed distribution” or just “many OOMs of difference between best and worst/median outcomes.” I think it makes our thinking a bit less clear when we try to translate back and forth between intuitions and math.

Linch Mar 26, 2025, 3:23 AM
7 points
1 ∶ 0
on: Power Laws of Value
Thanks a lot for this post! I tried addressing this earlier by exploring “extinction” vs “doom” vs “not utopia,” but your writing here is clearer, more precise and more detailed. One alternative framing I have for describing the “power laws of value,” hypothesis as a contrast of your 14-word summary:
“Utopia” by the lights of one axiology or moral framework might be close to worthless under other moral frameworks, assuming an additive axiology.
It’s 23 words and has more jargon, but I think it describes my own confusions better. In particular, I don’t think you need to believe in “weird stuff” to get to many OOMs of difference between “best possible future” and “realistic future”, unless additive/linear axiology itself is weird.
As one simple illustration, humanity can either be correct or incorrect in colonizing the stars with biological bodies instead of digital emulations. Either way, if you’re wrong you lose many OOMs of value
1. If we decide to go the biological route: biological bodies are much less efficient than digital emulations. it’s also much more difficult, as a practical/short-term matter, to colonize stars with bodies, so you capture a smaller fraction of the lightcone.).
2. If we decide to go the digital route, and it turns out emulations don’t have meaningful moral value (eg at the level of fidelity that emulations are seeded on, digital emulations are in practice not conscious), then we lose ~100.0000% of the value.

Linch Mar 18, 2025, 9:45 PM
3 points
0 ∶ 0
on: Discussion Thread: Existential Choices Debate Week
mostly because of tractability than any other reason

Linch Mar 10, 2025, 4:30 AM
8 points
1 ∶ 0
in reply to: Jason’s comment on: Emergency pod: Judge plants a legal time bomb under OpenAI (with Rose Chan Loui)
To me, “advanc[ing] digital intelligence in the way that is most likely to benefit humanity as a whole” does not necessitate them building AGI at all. Indeed the same mission statement can be said to apply to e.g. Redwood Research.
Further evidence for this view comes from OpenAI’s old merge-and-assist clause, which indicates that they’d be willing to fold and assist a different company if the other company is a) within 2 years of building AGI and b) sufficiently good.

Linch Mar 9, 2025, 7:10 PM
6 points
0 ∶ 0
in reply to: Jason’s comment on: Emergency pod: Judge plants a legal time bomb under OpenAI (with Rose Chan Loui)
They may assert that subsequent developments establish that nonprofit development of AI is financially infeasible, that they are going to lose the AI arms race without massive cash infusions, and that obtaining infusions while the nonprofit is in charge isn’t viable. If the signs are clear enough that the mission as originally envisioned is doomed to fail, then switching to a backup mission doesn’t seem necessarily unreasonable under general charitable-law principles to me
I’m confused about this line of argument. Why is losing the AI arms race relevant to whether the mission as originally envisioned is doomed to fail?
I tried to find the original mission statement. Is the following correct?
OpenAI is a non-profit artificial intelligence research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our research is free from financial obligations, we can better focus on a positive human impact.
If so, I can see how an OpenAI plantiff can try to argue that “advanc[ing] digital intelligence in the way that is most likely to benefit humanity as a whole” necessitates them “winning the AI arms race”, but I don’t exactly see why an impartial observer should grant them that.

Linch Mar 5, 2025, 2:03 AM
9 points
1 ∶ 1
on: Linch’s Shortform
(x-posted from LW)
Single examples almost never provides overwhelming evidence. They can provide strong evidence, but not overwhelming.
Imagine someone arguing the following:

1. You make a superficially compelling argument for invading Iraq
2. A similar argument, if you squint, can be used to support invading Vietnam
3. It was wrong to invade Vietnam
4. Therefore, your argument can be ignored, and it provides ~0 evidence for the invasion of Iraq.
In my opinion, 1-4 is not reasonable. I think it’s just not a good line of reasoning. Regardless of whether you’re for or against the Iraq invasion, and regardless of how bad you think the original argument 1 alluded to is, 4 just does not follow from 1-3.
___
Well, I don’t know how Counting Arguments Provide No Evidence for AI Doom is different. In many ways the situation is worse:
a. invading Iraq is more similar to invading Vietnam than overfitting is to scheming.
b. As I understand it, the actual ML history was mixed. It wasn’t just counting arguments, many people also believed in the bias-variance tradeoff as an argument for overfitting. And in many NN models, the actual resolution was double-descent, which is a very interesting and confusing interaction where as the ratio of parameters to data points increases, the test error first falls, then rises, then falls again! So the appropriate analogy to scheming, if you take it very literally, is to imagine first you have goal generalization, than goal misgeneralization, than goal generalization again. But if you don’t know which end of the curve you’re on, it’s scarce comfort.
Should you take the analogy very literally and directly? Probably not. But the less exact you make the analogy, the less bits you should be able to draw from it.

---
I’m surprised that nobody else pointed out my critique in the full year since the post was published. Given that it was both popular and had critical engagement, I’m surprised that nobody else mentioned my criticism, which I think is more elementary than the sophisticated counterarguments other people provided. Perhaps I’m missing something.
When I made my arguments verbally to friends, a common response was that they thought the original counting arguments were weak to begin with, so they didn’t mind weak counterarguments to it. But I think this is invalid. If you previously strongly believed in a theory, a single counterexample should update you massively (but not all the way to 0). If you previously had very little faith in a theory, a single counterexample shouldn’t update you much.

Linch Feb 28, 2025, 2:45 AM
2 points
0 ∶ 0
in reply to: titotal’s comment on: Linch’s Shortform
Right, in the definitions above I was mostly thinking of companies and a subset of the empirical AI safety literature, which do use these terms quite differently from how e.g. MIRI or LessWrong will use them.

I think there’s three common definitions of the word “alignment” in the traditional AIS literature:

Aligned to anything, anything at all (sometimes known as “technical alignment”):So in this sense, both perfectly “jailbroken” models and perfectly “corporately aligned” models in the limit count as succeeding technical alignment. As will success at aligning to more absurd goals like pure profit maximization or diamond maximization. The assumed difficulty here is that even superficially successful strategies, extreme edge cases, after distributional shift etc. To be clear, this is not globally a “win” but you may wish to restrict the domain of what you work on.
Aligned to the interest of all humanity/moral code (this is sometimes just known as “alignment”): I think this is closer to what you mean by the moral code. Under this ontology, one decomposition is that you’re able to a) succeed at the technical problem of alignment to arbitrary targets as well as b) figure out what we value (also known as variously as value-loading, axiology, theory of welfare etc). Of course, we may also find that clean decomposition is too hard and we can point AIs to a desired morality without being able to point them towards arbitrary targets.
Minimally aligned enough to not be a major catastrophic or existential risk: E.g., an AI that is expected to not result in greater than 1 billion deaths (sometimes there’s an additional stipulation that the superhuman AIs are sufficiently powerful and/or sufficiently useful as well, to exclude e.g. a rock counting as “aligned”).

Traditionally, I believe the first problem is considered more than 50% of the difficulty of the second problem, at least on a technical level.

Linch Feb 27, 2025, 3:49 AM
18 points
2 ∶ 1
on: Linch’s Shortform
Reading the Emergent Misalignment paper and comments on the associated Twitter thread has helped me clarify the distinction^[1] between what companies call “aligned” vs “jailbroken” models.

“Aligned” in the sense that AI companies like DeepMind, Anthropic and OpenAI mean it = aligned to the purposes of the AI company that made the model. Or as Eliezer puts it, “corporate alignment.” For example, a user may want the model to help edit racist text or the press release of an asteroid impact startup but this may go against the desired morals and/or corporate interests of the company that model the model. A corporately aligned model will refuse.

”Jailbroken” in the sense that it’s usually used in the hacker etc literature = approximately aligned to the (presumed) interest of the user. This is why people often find jailbroken models to be valuable. For example, jailbroken models can help users say racist things or build bioweapons, even if it goes against the corporate interests of the AI companies that made the model.

”Misaligned” in the sense that the Emergent Misalignment paper uses it = aligned to neither the interests of the AI’s creators nor the users. For example, the model may unprompted try to persuade the user to take a lot of sleeping pills, an undesirable behavior that benefits neither the user nor the creator.
1. ^
  EDIT: This was made especially crisp/clear to me in discussions of the Emergent Misalignment paper. The authors make a clear distinction between “jailbroken” vs what they call “misaligned” models. Though I don’t think they call the base models “aligned” (since that’d be wrong in the traditional AI safety lexicon). However, many commentators were confused and thought all the paper contributed was a novel jailbreak, it is of course much less interesting!

Linch Feb 27, 2025, 3:32 AM
11 points
2 ∶ 0
in reply to: quila’s comment on: Scouts need soldiers for their work to be worth anything
At the risk of being pedantic, I reread your comment several times^[1] and I still don’t see why it’s locally invalid. I can see why it’s externally/globally invalid, but I don’t think you actually speak to the local validity here?
1. ^
  And the comment is pretty short so I don’t think I’m missing something.

Linch Feb 27, 2025, 2:02 AM
8 points
3 ∶ 0
in reply to: Matthew_Barnett’s comment on: How confident are you that it’s preferable for America to develop AGI before China does?
Yes I was making a pretty limited critique of a specific line in Lark’s comment on causal attribution. I mostly agree with you (and him) on other points.
I agree that the US government, and Western governments in general, have substantially greater respect for individual freedoms, partially for Hayekian reasons and partially due to different intrinsic moral commitments to freedom. I also agree that this is one of the most important factors to consider if you’re asking whether you prefer a US- or China- led world order.
I also agree with your final paragraph.

Linch Feb 26, 2025, 2:28 AM
7 points
3 ∶ 0
in reply to: GideonF’s comment on: How confident are you that it’s preferable for America to develop AGI before China does?
Good point! Though my impression is that animal welfare is worse in China than the US, though I’m pretty unfamiliar with this topic.

Linch Feb 26, 2025, 2:01 AM
16 points
6 ∶ 0
in reply to: Davidmanheim’s comment on: How confident are you that it’s preferable for America to develop AGI before China does?
If you are willing to bring up historical examples, than comparing like-for-like nothing the US does domestically is of comparable badness to the Great Leap Forward except maybe slavery (and that was a 1800s rather than a 1900s phenomenon). The US has also done other things that are quite bad over the last 100 years, eg. the Japanese internment camps, but they’re not in the same order of magnitude.

Linch Feb 25, 2025, 1:03 AM
6 points
3 ∶ 0
in reply to: Lorenzo Buonanno🔸’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
I think that is extremely unlikely, they have a lot to lose as soon as it’s confirmed that the archived data is not manipulated.
Not just that, I expect charities to have a lot to lose just from the fight alone, for better or worse. Getting into fights about your integrity generally has negative effects on your reputation and fundraising capacity.

Linch Feb 25, 2025, 12:35 AM
15 points
5 ∶ 0
in reply to: Jason’s comment on: Request for Guidance: Reaching Out to Charities Before Publishing Reviews and Our Concerns
Yeah the causal model here is extremely implausible to me. I’m not saying fraud or falsified evidence is rare in charities, on balance I expect it to be slightly higher than numbers I’ve seen of the economy as a whole (3-5%). But the specific causal model of (summarizing)
charity was reviewed-> charity gets sent evidence from the review → charity hides the evidence that was previously on the public internet → they were successful in doing so
just seems extremely implausible to me.

Linch Feb 24, 2025, 11:53 PM
9 points
5 ∶ 0
in reply to: Henry Howard🔸’s comment on: We need a new Artesunate—the miracle drug fades
Genuinely, thank you for your service.

Linch Feb 24, 2025, 11:46 PM
13 points
7 ∶ 0
in reply to: Nathan Sidney’s comment on: How confident are you that it’s preferable for America to develop AGI before China does?
My personal preference is to take my chances with unaligned ASI as the thought of either of these circuses being the ringmaster of all eternity is terrifying. I’d much rather be a paper clip than a communist/corporate serf.
I don’t want to harp too much on “lived experiences”, but both stated and revealed preferences from existing denizens of either the US or China will strongly suggest otherwise for the preferences of most other people. It’s possible you’d have an unusual preference if you lived in those countries, but I currently suspect otherwise.