I follow Crocker’s rules.
niplav
I find “epistemics” neat because it is shorter than “applied epistemology” and reminds me of “athletics” and the resulting (implied) focus on being more focused on practice. I don’t think anyone ever explained what “epistemics” refers to, and I thought it was pretty self-explanatory from the similarity to “athletics”.
I also disagree about the general notion that jargon specific to a community is necessarily bad, especially if that jargon has fewer syllables. Most subcultures, engineering disciplines, sciences invent words or abbreviations for more efficient communication, and while some of that may be due to trying to gatekeep, it’s so universal that I’d be surprised if it doesn’t carry value. There can be better and worse coinages of new terms, and three/four/five-letter abbreviations such as “TAI” or “PASTA” or “FLOP” or “ASARA” are worse than words like “epistemics” or “agentic”.
I guess ethics makes the distinction between normative ethics and applied ethics. My understanding is that epistemology is not about practical techniques, and that one can make a distinction here (just like the distinction between “methodology” and “methods”).
I tried to figure out if there’s a pair of suffixes that try to express the difference between the theoretic study of some field and the applied version, Claude suggests “-ology”/”-urgy” (as in metallurgy, dramaturgy) and “-ology”/”-iatry” (as in psychology/psychiatry), but notes no general such pattern exists.
Yep, I wouldn’t have predicted that. I guess the standard retort is: Worst case! Existing large codebase! Experienced developers!
I know that there’s software tools I use >once a week that wouldn’t have existed without AI models. They’re not very complicated, but they’d’ve been annoying to code up myself, and I wouldn’t have done it. I wonder if there’s a slowdown in less harsh scenarios, but it’s probably not worth the value of information of running such a study.
I dunno. I’ve done a bunch of calibration practice[1], this feels like a 30%, I’m calling 30%. My probability went up recently, mostly because some subjectively judged capabilities that I was expecting didn’t start showing up.
- ↩︎
My metaculus calibration around 30% isn’t great, I’m overconfident there, I’m trying to keep that in mind. My fatebook is slightly overconfident in that range, and who can tell with Manifold.
- ↩︎
What is the probability that the U.S. AI industry (including OpenAI, Anthropic, Microsoft, Google, and others) is in a financial bubble — as determined by multiple reliable sources such as The Wall Street Journal, the Financial Times, or The Economist — that will pop before January 1, 2031?
I put 30% on this possiblility, maybe 35%. I don’t have much more to say than “time horizons!”, “look how useful they’re becoming in my dayjob & personal life!”, “look at the qualitative improvement over the last six years”, “we only need to automate machine learning research, which isn’t the hardest thing to automate”.
Worlds in which we get a bubble pop are worlds in which we don’t get a software intelligence explosion, and in which either useful products come too late for the investment to sustain itself or there’s not really much many useful products after what we already have. (This is tied in with “are we getting TAI through the things LLMs make us/are able to do, without fundamental insights”.
Right, I’d forgotten that betting on this is hard. I was thinking if one could do a sort of cross-over between an end-of-the-world bet crossed with betting a specific on a proportion of one’s net worth. This is the most fleshed-out proposal I’ve seen so far.
But I don’t want to give a stranger from another country a 7-year loan that I wouldn’t be able to compel them to repay once the time is up.
I wonder if this could be solved via a trusted third person who knows both bettors. (I think there are possible solutions here via blockchains, e.g. the ability to unilaterally destroy an escrow, but I guess that’s going to become quite complicated, not worth the setup, and using a technology I guess you’re skeptical of anyway)
I’ve been confused about the “defense-in-depth” cheese analogy. The analogy works in two dimensions, and we can visualize that constructing multiple barriers with holes will block any path from a point out of a three-dimensional sphere.
(What follows is me trying to think through the mathematics, but I lack most of the knowledge to evaluate it properly.
Johnson-Lindenstrauss may be involved in solving this?(it’s not, GPT-5 informs me))But plans in the the real world real world are very high-dimensional, right? So we’re imagining a point (let’s say at ) in a high-dimensional space (let’s say for large , as an example), and an -sphere around that point. Our goal is that there is no straight path from to somewhere outside the sphere. Our possible actions are that we can block off sub-spaces within the sphere, or construct n-dimensional barriers with “holes”, inside the sphere, to prevent any such straight paths. Do we know the scaling properties of how many of such barriers we have to create, given such-and-such “moves” with some number of dimensions/porosity?
My purely guessed intuition is that, at least if you’re given porous -dimensional “sheets” you can place inside of the -sphere, that you need many of them with increasing dimensionality .Nevermind, I was confused about this.
Is the post you made here AI-written?
Whereas many people in EA seem to think the probability of AGI being created within the next 7 years is 50% or more, I think that probability is significantly less than 0.1%.
Are you willing to bet on this?
Yeah, I goofed by using Claude for math, not any of the OpenAI models, which are much better at math.
The relevant is this timestamp in an interview. Relevant part of the interview:
But now, getting to the job side of this, I do have a fair amount of concern about this. On one hand, I think comparative advantage is a very powerful tool. If I look at coding, programming, which is one area where AI is making the most progress, what we are finding is we are not far from the world—I think we’ll be there in three to six months—where AI is writing 90 percent of the code. And then in twelve months, we may be in a world where AI is writing essentially all of the code. But the programmer still needs to specify what the conditions of what you’re doing are, what the overall app you’re trying to make is, what the overall design decision is. How do we collaborate with other code that’s been written? How do we have some common sense on whether this is a secure design or an insecure design? So as long as there are these small pieces that a programmer, a human programmer, needs to do, the AI isn’t good at, I think human productivity will actually be enhanced. But on the other hand, I think that eventually all those little islands will get picked off by AI systems. And then we will eventually reach the point where the AIs can do everything that humans can. And I think that will happen in every industry.
For what it’s worth at the time I thought he was talking about code at Anthropic, and another commenter agreed. The “we are finding” indicates to me that it’s at Anthropic. Claude 4.5 Sonnet disagree with me and says that it can be read as being about the entire world.
(I really hope you’re right and the entire AI industry goes up in flames next year.)
Humanity Learned Almost Nothing From COVID-19
I’m almost centrally the guy claiming LLMs will d/acc us out of AI takeover by fixing infrastructure, technically I’m usually hedging more than that but it’s accurate in spirit.
I’m happy this is reaching exactly the right people :-D
As for proving invariances, that makes sense as a goal, and I like it. If I perform any follow-up I’ll try to estimate how much more tokens that’ll produce, since IIRC seL4 or CakeML had proofs that exceeded 10× the length of their source code.
A recent experience I’ve had is to try and use LLMs to generate Lean definitions and proofs for a novel mathematical structure I’m toying with, they do well with anything below 10 lines but start to falter with more complicated proofs, and
sorrytheir way out of anything I’d call non-trivial. My understanding is that a lot of software formal verification is gruntwork but there also needs to be interwoven moments of brilliance.I’m always unsure what to think of claims that markets will incentivize the correct level of investment in software security. Like, initial computer security in the aughts seems to me like it was actually pretty bad, and while it became better over time it did take at least a decade. From afar, markets look efficient, from close up you can see efficiency establish itself. And then it’s the question how much of the cost is internalized, which I feel for private companies should be close to 100%? For open source projects of course that number then goes close to zero.
It’d be cool to see a time series of the number of found exploits in open source software, thanks for the
curlnumbers. You picked a fairly old/established codebase with an especially dedicated developer, I wonder what it’s like with newer ones, and whether one discovers more exploits in the early, middle, or late stage of development. The adoption of better programming languages than C/C++ of course helps.
Ontological Cluelessness
Thanks! That’s the kind of answer I was looking for. I’ll sleep a night about pre-ordering, and then definitely look forward more to the online appendices. (I also should’ve specified it’s the other Barnett ;-)
Note: I’m being a bit adversarial with these questions, probably because the book launch advertisements are annoying me a bit. Still, an answer to my questions could tip me over to pre-ordering/not pre-ordering the book.
Would you say that this book meaningfully moves the frontier of the public discussion on AI x-risk forward?
As in, if someone’s read much to ~all of the publicly available MIRI material (including all of the arbital alignment domain, the 2021 dialogues, the LW sequences, and even some of the older papers), plus a bunch of writing from detractors (e.g. Pope, Belrose, Turner, Barnett, Thornley, 1a3orn), will they find updated defenses/elaborations on the evolution analogy, why automated alignment isn’t possible, why to expect expected utility maximizers, why optimization will be “infectious”, and some more on things linked here?
Additionally, would any of the long-time MIRI-debaters (as mentioned above, also including Christiano, the OpenPhil/Constellation cluster of people) plausibly give a positive endorsement of the book as to not just being a good distillation, but moving the frontier of the public discussion forward?
It’s not OK to eat honey
My best guess is that eating honey is pretty bad, because I buy that bees have non-negligible moral weight, and the arguments for bee-lives being net-negative seem plausible too.
I’m far less wedded to bee lives being net-negative, so it could be that I’ll be convinced that eating honey isn’t just good, but extremely good—that eating honey is one of the best things modern humans do, because it allows for the existence of many flourishing bees.
Depending on the relationship between brain size and moral weight, different animals may be more or less ethical to farm.
A common assumption in effective altruism is that moral weight is marginally decreasing in number of neurons (i.e. small brains matter more per neuron). This implies that we’d want to avoid putting many small animals into factory farms, and prefer few big ones, especially if smaller animals have faster subjective experience.
A reductio ad absurdum of this view would be to (on the margin) advocate for the re-introduction of whaling, but this would be blocked by optics concerns and moral uncertainty (if we value something like sapience and culture of animals).
If factory farming can’t be easily replaced with clean meat in the forseeable future, one might want to look for animals that are least unethicl to farm, mostly by them fulfilling the following conditions:
Small brain & low number of neurons
Easy to breed & fast reproduction cycle
Low behavioral complexity
Large body, high-calorie meat
Palatable to consumers
Stopped evolving early (if sentience evolved late in evolutionary history)
In conversation with various LLMs[1], three animals were suggested as performing well on those trade-offs. My best guess is that current factory farming can’t be beat with these animals in effectiveness.
Ostriches
Advantages: Already farmed, very small brain for large body mass
Disadvantages: Fairly late in evolutionary history
Arapaima
Advantages: Very large for small brain size (up to 3m in length), fast-growing, simple neurology, already farmed, can be raised herbivorously, lineage is ~200 mio. years old bony fishes
Disadvantages: Tricky to breed
Tilapia
Advantages: Very easy to breed, familiarity to consumers, small neuron count
Disadvantages: Fairly small, not as ancient as the arapaima
- ↩︎
Primarily Claude 3.7 Sonnet
Awesome post. Loved it.
Here’s some thoughts I had while reading, with no particular coherent theme:
The way I see it, there are two kinds of gradient hacking possible. The first is a situation where the solution to the problem the model was trained to solve is an agent, a “mesa optimizer”, that has its own goals that are imperfectly aligned with the goals of the people who trained it and that rediscovers gradient hacking from first principles during its computation. […] The other way I see gradient hacking happening is if there are circuits in the model that simply resist being rewritten by gradient descent.
I think this distinction maps pretty cleanly to a now-forgotten concept in AI alignment, the former being indeed a mesa-optimizer, the second mapping onto optimization daemons. I think these should be given different names, maybe “full gradient hacker” and “internal gradient hacker”? A big difference is that a system could have multiple internal gradient hackers. Maybe it’s just a question about the level we’re looking at, and whether the hacker is short-/long-term beneficial/detrimental to itself/the supersystem?
Internal gradient hackers have been observed in non-neural network systems, for example in Eurisko, where a heuristic assigned itself as the discoverer of other heuristics, resulting in a very high Worth. I don’t think we’ve seen something like this in the context of neural networks, but I could imagine circuits copying themselves “backwards” through the network and mutating along the way. I guess the fact that there’s no recurrence (yet…) in advanced ML models is a big advantage.
Here’s the relevant passage:
One of the first heuristics that ᴇᴜʀɪꜱᴋᴏ synthesized (H59) quickly attained nearly the highest Worth possible (999). Quite excitedly, we examined it and could not understand at first what it was doing that was so terrific. We monitored it carefully, and finally realized how it worked: whenever a new conjecture was made with high worth, this rule put its own name down as one of the discoverers! It turned out to be particularly difficult to prevent this generic type of finessing of ᴇᴜʀɪꜱᴋᴏ′s evaluation mechanism. Since the rules had full access to ᴇᴜʀɪꜱᴋᴏ′s code, they would have access to any safeguards we might try to implement. We finally opted for having a small ‘meta-level’ of protected code that the rest of the system could not modify.
—Douglas B. Lenat, “ᴇᴜʀɪꜱᴋᴏ: A Program That Learns New Heuristics and Domain Concepts” p. 30, 1983
There is no direct analogy to recombination in gradient descent.
I’m not sure this is completely true, though I have to think a bit more about it. There’s techniques like dropout, which make training more robust, and in the context of an internal gradient hacker this would probably change parts of the hacker while leaving other parts untouched, which makes it much more difficult for reliable internal communication. I guess it would also provide an incentive for an internal gradient hacker to “evolve” internal redundancy & modularity, which we don’t want.
I also know that people have observed that swapping layers of neural networks doesn’t have a very large effect; I don’t think this is used as a training technique but it could be.
Paternal/maternal genome exclusion. This is a real thing that can happen where one parent’s genetic material is either silenced or rejected entirely at an early stage of development. It can lead to parthenogenesis. The short-term advantage of this is that the included parent’s genes are 100% represented in each offspring. The longterm disadvantage is having mutations accumulate.
I knew it! I’ve been wondering about this for literally years, thanks for confirming that this is a thing that happens.
The examples of gradient hackers with positive effects seem like they could be following the pattern of “here’s a sub-system doing something bad (e.g. transposons copying themselves incessantly), which the system needs to defend against, so the system finds a way (e.g. introns) to defend which carries other (maybe greater) benefits but which wouldn’t have been found otherwise”, does that seem like it explains things?
Would you say that investing in frontier AI companies (as an individual with normal human levels of capital) is similarly bad?
This is a narrow point[1] but I want to point out that [not deep learning] is extremely broad, and the usage of the term “good old-fashioned AI” has been moving around between [not deep learning] and [deduction on Lisp symbols], and I think there’s a huge space of techniques inbetween (probabilistic programming, program induction/synthesis, support vector machines, dimensionality reduction à la t-SNE/UMAP, evolutionary methods…).
A hobby-horse of mine.