niplav

Karma: 936

I follow Crocker’s rules.

niplav May 9, 2025, 1:32 PM
3 points
0 ∶ 0
on: niplav’s Shortform
Depending on the relationship between brain size and moral weight, different animals may be more or less ethical to farm.

A common assumption in effective altruism is that moral weight is marginally decreasing in number of neurons (i.e. small brains matter more per neuron). This implies that we’d want to avoid putting many small animals into factory farms, and prefer few big ones, especially if smaller animals have faster subjective experience.

A reductio ad absurdum of this view would be to (on the margin) advocate for the re-introduction of whaling, but this would be blocked by optics concerns and moral uncertainty (if we value something like sapience and culture of animals).

If factory farming can’t be easily replaced with clean meat in the forseeable future, one might want to look for animals that are least unethicl to farm, mostly by them fulfilling the following conditions:
- Small brain & low number of neurons
- Easy to breed & fast reproduction cycle
- Low behavioral complexity
- Large body, high-calorie meat
- Palatable to consumers
- Stopped evolving early (if sentience evolved late in evolutionary history)
In conversation with various LLMs^[1], three animals were suggested as performing well on those trade-offs. My best guess is that current factory farming can’t be beat with these animals in effectiveness.

Ostriches

Advantages: Already farmed, very small brain for large body mass

Disadvantages: Fairly late in evolutionary history

Arapaima

Advantages: Very large for small brain size (up to 3m in length), fast-growing, simple neurology, already farmed, can be raised herbivorously, lineage is ~200 mio. years old bony fishes

Disadvantages: Tricky to breed

Tilapia

Advantages: Very easy to breed, familiarity to consumers, small neuron count

Disadvantages: Fairly small, not as ancient as the arapaima
1. ↩︎
  Primarily Claude 3.7 Sonnet

niplav Apr 21, 2025, 9:52 AM
3 points
0 ∶ 0
on: Genes did misalignment first: comparing gradient hacking and meiotic drive
Awesome post. Loved it.

Here’s some thoughts I had while reading, with no particular coherent theme:

The way I see it, there are two kinds of gradient hacking possible. The first is a situation where the solution to the problem the model was trained to solve is an agent, a “mesa optimizer”, that has its own goals that are imperfectly aligned with the goals of the people who trained it and that rediscovers gradient hacking from first principles during its computation. […] The other way I see gradient hacking happening is if there are circuits in the model that simply resist being rewritten by gradient descent.

I think this distinction maps pretty cleanly to a now-forgotten concept in AI alignment, the former being indeed a mesa-optimizer, the second mapping onto optimization daemons. I think these should be given different names, maybe “full gradient hacker” and “internal gradient hacker”? A big difference is that a system could have multiple internal gradient hackers. Maybe it’s just a question about the level we’re looking at, and whether the hacker is short-/long-term beneficial/detrimental to itself/the supersystem?

Internal gradient hackers have been observed in non-neural network systems, for example in Eurisko, where a heuristic assigned itself as the discoverer of other heuristics, resulting in a very high Worth. I don’t think we’ve seen something like this in the context of neural networks, but I could imagine circuits copying themselves “backwards” through the network and mutating along the way. I guess the fact that there’s no recurrence (yet…) in advanced ML models is a big advantage.

Here’s the relevant passage:

One of the first heuristics that ᴇᴜʀɪꜱᴋᴏ synthesized (H59) quickly attained nearly the highest Worth possible (999). Quite excitedly, we examined it and could not understand at first what it was doing that was so terrific. We monitored it carefully, and finally realized how it worked: whenever a new conjecture was made with high worth, this rule put its own name down as one of the discoverers! It turned out to be particularly difficult to prevent this generic type of finessing of ᴇᴜʀɪꜱᴋᴏ′s evaluation mechanism. Since the rules had full access to ᴇᴜʀɪꜱᴋᴏ′s code, they would have access to any safeguards we might try to implement. We finally opted for having a small ‘meta-level’ of protected code that the rest of the system could not modify.

—Douglas B. Lenat, “ᴇᴜʀɪꜱᴋᴏ: A Program That Learns New Heuristics and Domain Concepts” p. 30, 1983

There is no direct analogy to recombination in gradient descent.

I’m not sure this is completely true, though I have to think a bit more about it. There’s techniques like dropout, which make training more robust, and in the context of an internal gradient hacker this would probably change parts of the hacker while leaving other parts untouched, which makes it much more difficult for reliable internal communication. I guess it would also provide an incentive for an internal gradient hacker to “evolve” internal redundancy & modularity, which we don’t want.

I also know that people have observed that swapping layers of neural networks doesn’t have a very large effect; I don’t think this is used as a training technique but it could be.

Paternal/maternal genome exclusion. This is a real thing that can happen where one parent’s genetic material is either silenced or rejected entirely at an early stage of development. It can lead to parthenogenesis. The short-term advantage of this is that the included parent’s genes are 100% represented in each offspring. The longterm disadvantage is having mutations accumulate.

I knew it! I’ve been wondering about this for literally years, thanks for confirming that this is a thing that happens.

The examples of gradient hackers with positive effects seem like they could be following the pattern of “here’s a sub-system doing something bad (e.g. transposons copying themselves incessantly), which the system needs to defend against, so the system finds a way (e.g. introns) to defend which carries other (maybe greater) benefits but which wouldn’t have been found otherwise”, does that seem like it explains things?

niplav Apr 9, 2025, 2:31 PM
3 points
0 ∶ 0
on: Selling out to AI companies is bad. Period. You will be corrupted.
Would you say that investing in frontier AI companies (as an individual with normal human levels of capital) is similarly bad?

Creating Libertarian Free Will

niplavApr 1, 2025, 9:00 AM

20 points

0 comments3 min readEA link

niplav Mar 17, 2025, 12:18 PM
3 points
1 ∶ 0
on: Discussion Thread: Existential Choices Debate Week
Under moral uncertainty, many moral perspectives care much more about averting downsides than producing upsides.

Additionally, tractability is probably higher for extinction-level threats, since they are “absorptive”; decreasing the chance we end up in one gives humanity and their descendants ability to do whatever they figure out is best.

Finally, there is a meaningful sense in which working on improving the future is plagued by questions about moral progress and lock-in of values, and my intuition is that most interventions that take moral progress seriously and try to avoid lock-in boil down to working on things that are fairly equivalent to avoiding extinction. Interventions that don’t take moral progress seriously instead may look like locking in current values.

niplav Feb 26, 2025, 6:34 PM
3 points
1 ∶ 0
in reply to: Matrice Jacobine’s comment on: Patching ~All Security-Relevant Open-Source Software?
That’s maybe a more productive way of looking at it! Makes me glad I estimated more than I claimed.

I think governments are probably the best candidate for funding this, or AI companies in cooperation with governments. And it’s an intervention which has limited downside and is easy to scale up/down, with the most important software being evaluated first.

niplav Feb 25, 2025, 9:36 PM
2 points
0 ∶ 0
on: Patching ~All Security-Relevant Open-Source Software?
Not draft amnesty but I’ll take it. Yell at me below to get my justification for the variable-values in the Fermi estimate.

Patching ~All Security-Relevant Open-Source Software?

niplavFeb 25, 2025, 9:35 PM

35 points

4 comments2 min readEA link

niplav Jan 10, 2025, 3:32 PM
18 points
1 ∶ 0
in reply to: akash 🔸’s comment on: akash’s Quick takes
Dario Amodei is the 43rd Giving What We Can Pledge member, (a?) Tom Brown the 1214th, and (a?) Jack Clark the 4002nd.

niplav Dec 6, 2024, 4:17 PM
5 points
0 ∶ 0
on: A Qualitative Case for LTFF: Filling Critical Ecosystem Gaps
Since this is turning out to be basically an AMA for LTFF, another question:

How high is the bar for giving out grants to projects trying to increase human intelligence^[1]? Has the LTFF given out grants in the area^[2], and is this something you’re looking for?

(A short answer without justification, or a simple yes/no, would be highly appreciated for me to know whether this is a gap I should be trying to fill.)
1. ↩︎
  Or projects trying to create very intelligent animals that can collaborate with and aid humans.
2. ↩︎
  Looking around in the grants database CSV I didn’t find anything obviously relevant.

niplav Dec 5, 2024, 5:30 PM
2 points
0 ∶ 0
on: niplav’s Shortform
I was curious how the “popularity” of the ITN factors has changed in EA recently. In short: Mentions of “importance” have become slightly more popular, and both “neglectedness” and “tractability” have become slightly less popular, by ~2-6 percentage points.

I don’t think this method is strong enough to make conclusions, but it does track my perception of a vibe-shift towards considering importance more than the other two factors.

Searching the EA forum for the words importance/neglectedness/tractability (in quotation marks for exact matches) in the last year yields 454/87/110 (in percentages 69%/13%/17%) results, for important/neglected/tractable it’s 1249/311/152 (73%/18%/9%).

When searching for all time the numbers for importance/neglectedness/tractability are 2824/761/858 (in percentages 63%/17%/19%) results, for important/neglected/tractable it’s 7956/2002/1129 (71%/18%/10%). I didn’t find a way to exclude results from the last year, unfortunately.

niplav Oct 10, 2024, 11:39 AM
3 points
3 ∶ 0
on: Which arguments do you find compelling in debate week?
Argument in favor of giving to humans:

Factory farming will stop at some point in this century $_{80 %}$ , while human civilization could stay for a much longer time. So you can push humanity in a slightly better long-term direction by improving the circumstances in the third world, e.g. reducing the chance that some countries will want to acquire nuclear weapons for conflict because of wars because of famines.

So there’s an option to affect trajectory change by giving to global health, but not really for animal welfare.

niplav Aug 2, 2024, 3:27 PM
8 points
0 ∶ 0
in reply to: Jason’s comment on: EA should unequivocally condemn race science
The backlink-checker doesn’t show anything of the sorts; but I think it doesn’t work for discord or big social media websites like 𝕏.

An AI Race With China Can Be Better Than Not Racing

niplavJul 2, 2024, 5:57 PM

19 points

1 comment EA link

niplav Jun 21, 2024, 10:39 AM
1 point
0 ∶ 0
in reply to: rosehadshar’s comment on: Fat Tails Discourage Compromise
Thanks for the comment! The reasoning looks good, and was thought-provoking.

If I instead go for the best of both worlds, it seems intuitively more likely that I end up with something which is mediocre on both axes—which is a bit better than mediocre on one and irrelevant on the other

I think I disagree with you here. I model being bad at choosing good interventions as randomly sampling from the top n% (e.g. 30%) from the distribution when I’m trying to choose the best thing along the axis of e.g. non-x-risk impact. If this is a good way of thinking about it, then I don’t think that things change a lot—because of the concavity of the frontier, things I choose from that set are still going to be quite good from a non-x-risk perspective, and pretty middling from the x-risk perspective.

I am very unsure about this, but I think it might look like in this image:

When you choose from the top 30% on popularity, you get options from the purple box at random, and same for options in the green box for effectiveness.

If you want to push axes, I guess you’re going to aim for selecting from the intersection of both boxes, but I’m suspicious you actually can do that, or whether you end up selecting from the union of the boxes instead $_{60 %}$ . Because if you can select from the intersection, you get options that are pretty good along both axes, pretty much by definition.

I could use my code to quantify how good this would be, though a concrete use case might be more illuminating.

niplav 18 Jun 2024 23:10 UTC
5 points
2 ∶ 0
in reply to: Stefan_Schubert’s comment on: Fat Tails Discourage Compromise
Huh, the convergent lines of thought are pretty cool!

Your suggested solution is indeed what I’m also gesturing towards. A “barbell strategy” works best if we only have few dimensions we don’t want to make comparable, I think.

(AFAIU It grows only linearly, but we still want to perform some sampling of the top options to avoid the winners curse?)

niplav 18 Jun 2024 10:40 UTC
2 points
0 ∶ 0
in reply to: CianHamilton’s comment on: Fat Tails Discourage Compromise
I think this link is informative: Charitable interventions appear to be (weakly) lognormally distributed in cost-effectiveness. In general, my intuition is that “charities are lognormal, markets are normal”, but I don’t have a lot of evidence for the second part of the sentence.

Fat Tails Discourage Compromise

niplav17 Jun 2024 9:39 UTC

33 points

7 comments EA link

niplav 5 Jun 2024 15:47 UTC
13 points
4 ∶ 1
in reply to: Caruso’s comment on: Caruso’s Quick takes
My current understanding is that he believes extinction or similar from AI is possible, at 5% probability, but that this is low enough that concerns about stable totalitarianism are slightly more important. Furthermore, he believes that AI alignment is a technical but solvable problem. More here.

I am far more pessimistic than him about extinction from misaligned AI systems, but I think it’s quite sensible to try to make money from AI even in worlds from high probability of extinction, since the market signal provided counterfactually moves the market far less than the realizable benefit from being richer in such a crucial time.

niplav 4 May 2024 22:16 UTC
1 point
0 ∶ 0
in reply to: rime’s comment on: Sustainable fishing policy increases fishing, and demand reductions might, too
Thanks for tagging me! I’ll read the post and your comment with care.

niplav

Ostriches

Arapaima

Tilapia

Creat­ing Liber­tar­ian Free Will

Patch­ing ~All Se­cu­rity-Rele­vant Open-Source Soft­ware?

An AI Race With China Can Be Bet­ter Than Not Racing

Fat Tails Dis­cour­age Compromise

Creating Libertarian Free Will

Patching ~All Security-Relevant Open-Source Software?

An AI Race With China Can Be Better Than Not Racing

Fat Tails Discourage Compromise