Seth Herd

Karma: 84

Seth Herd Apr 3, 2024, 4:10 PM
5 points
1 ∶ 0
on: The Rationale-Shaped Hole At The Heart Of Forecasting
I think a major issue is that the people who would be best at predicting AGI usually don’t want to share their rationale.
Gears-level models of the phenomenon in question are highly useful in making accurate predictions. Those with the best models are either worriers who don’t want to advance timelines, or enthusiasts who want to build it first. Neither has an incentive to convince the world it’s coming soon by sharing exactly how that might happen.
The exceptions are people who have really thought about how to get from AI to AGI, but are not in the leading orgs and are either uninterested in racing or want to attract funding and attention for their approach. Yann LeCun comes to mind.
Imagine trying to predict the advent of heavier-than-air flight without studying either birds or mechanical engineering. You’d get predictions like the ones we saw historically—so wild as to be worthless, except those from the people actually trying to achieve that goal.
(copied from LW comment since the discussion is happening over here)

Seth Herd Dec 8, 2023, 12:08 AM
3 points
0 ∶ 0
in reply to: Yarrow🔸’s comment on: Biological superintelligence: a solution to AI safety
I personally think LLMs will plateau around human level, but that they will be made agentic and self-teaching, and therefore and self-aware (in sum, “sapient”) and truly dangerous by scaffolding them into language model agents or language model cognitive architectures. See Capabilities and alignment of LLM cognitive for my logic in expecting that.
That would be a good outcome. We’d have agents with their own goals, capable enough to do useful and dangerous things, but probably not quite capable enough to self-exfiltrate, and probably initially under the control of relatively sane people. That would scare the pants off of the world, and we’d see some real efforts to align the things. Which is uniquely do-able, since they’d take top-level goals in natural language, and be readily interpretable by default (with real concerns still there aplenty, including waluigi effects and their utterances not reliably reflecting their real underlying cognition).

Seth Herd Dec 4, 2023, 10:50 PM
2 points
0 ∶ 0
on: Biological superintelligence: a solution to AI safety
I think the general consensus, which I share, is that neither mind uploading nor good BCI to allow brain extensions are likely to happen before AGI. I wish I had citations ready to hand.
I haven’t heard as much discussion of the biological superbrains approach. I think it’s probably feasible to increase intelligence through genetic engineering, but that’s probably also too long to help with alignment before AGI happens, if you took the route of altering embryos. Altering adults would be tougher and more limited. And it would hit the same legal problems.
I think that neuromorphic AGI is a possibility, which is why some of my alignment work addresses it. I think the best and most prominent work on that topic is Steve Byrnes’ Intro to Brain-Like-AGI Safety.

Seth Herd Dec 3, 2023, 11:36 PM
1 point
0 ∶ 0
in reply to: Geoffrey Miller’s comment on: We’re Not Ready: thoughts on “pausing” and responsible scaling policies
I think that’s quite a pessimistic take. I take Altman seriously on caring about x-risk, although I’m not sure he takes it quite seriously enough. This is based on public comments to that effect around 2013, before he started running OpenAI. And Sutskever definitely seems properly concerned.

Seth Herd Nov 2, 2023, 7:52 PM
1 point
0 ∶ 0
in reply to: Geoffrey Miller’s comment on: We’re Not Ready: thoughts on “pausing” and responsible scaling policies
I agree that those teams aren’t completely trustworthy, and in an ideal world, we should be making this decision by including everyone on earth. But with a partial pause, do you expect to have better or worse teams in the lead for achieving AGI? That was my point.

Seth Herd Oct 27, 2023, 8:39 PM
−1 points
0 ∶ 1
on: We’re Not Ready: thoughts on “pausing” and responsible scaling policies
I’m not sure which is the better place to have this discussion, so I’m trying both. Copied from my comment on Less Wrong:
That all makes sense. To expand a little more on some of the logic:
It seems like the outcome of a partial pause rests in part on whether that would tend to put people in the lead of the AGI race who are more or less safety-concerned.
I think it’s nontrivial that we currently have three teams in the lead who all appear to honestly take the risks very seriously, and changing that might be a very bad idea.
On the other hand, the argument for alignment risks is quite strong, and we might expect more people to take the risks more seriously as those arguments diffuse. This might not happen if polarization becomes a large factor in beliefs on AGI risk. The evidence for climate change was also pretty strong, but we saw half of America believe in it less, not more, as evidence mounted. The lines of polarization would be different in this case, but I’m afraid it could happen. I outlined that case a little in AI scares and changing public beliefs
In that case, I think a partial pause would have a negative expected value, as the current lead decayed, and more people who believe in risks less get into the lead by circumventing the pause.
This makes me highly unsure if a pause would be net-positive. Having alignment solutions won’t help if they’re not implemented because the taxes are too high.
The creation of compute overhang is another reason to worry about a pause. It’s highly uncertain how far we are from making adequate compute for AGI affordable to individuals. Algorithms and compute will keep getting better during a pause. So will theory of AGI, along with theory of alignment.
This puts me, and I think the alignment community at large, in a very uncomfortable position of not knowing whether a realistic pause would be helpful.
It does seem clear that creating mechanisms and political will for a pause are a good idea.
Advocating for more safety work also seems clear cut.
To this end, I think it’s true that you create more political capitol by successfully pushing for policy.
A pause now would create even more capitol, but it’s also less likely to be a win, and it could wind up creating polarization and so costing rather than creating capitol. It’s harder to argue for a pause now when even most alignment folks think we’re years from AGI.
So perhaps the low-hanging fruit is pushing for voluntary RSPs, and government funding for safety work. These are clear improvements, and likely to be wins that create capitol for a pause as we get closer to AGI.
There’s a lot of uncertainty here, and that’s uncomfortable. More discussion like this should help resolve that uncertainty, and thereby help clarify and unify the collective will of the safety community.

Seth Herd Oct 24, 2023, 1:58 AM
3 points
0 ∶ 0
in reply to: Jobst Heitzig (vodle.it)’s comment on: My lab’s small AI safety agenda
I agree with you that humans have mismatched goals among ourselves, so some amount of goal mismatch is just a fact we have to deal with. I think the ideal is that we get an AGI that makes its goal the overlap in human goals; see [Empowerment is (almost) All We Need](https://www.lesswrong.com/posts/JPHeENwRyXn9YFmXc/empowerment-is-almost-all-we-need) and others on preference maximization.
I also agree with your intuition that having a non-maximizer improves the odds of an AGI not seeking power or doing other dangerous things. But I think we need to go far beyond the intuition; we don’t want to play odds with the future of humanity. To that end, I have more thoughts on where this will and won’t happen.
I’m saying “the problem” with optimization is actually mismatched goals, not optimization/maximization. In more depth, and hopefully more usefully: I think unbounded goals are the problem with optimization (not the only problem, but a very big one).
If an AGI had a bounded goal like “make on billion paperclips”, it wouldn’t be nearly as dangerous; it might decide to eliminate humanity to make the odds of getting to a billion as good as possible (I can’t remember where I saw this important point; I think maybe Nate Soares made it). But it might decide that its best odds would just be making some improvements to the paperclip business, in which case it wouldn’t cause problems.

Seth Herd Oct 20, 2023, 7:04 PM
2 points
0 ∶ 1
in reply to: Jobst Heitzig (vodle.it)’s comment on: My lab’s small AI safety agenda
Mismatched goals is the problem. The logic of instrumental convergence applies to any goal, not just maximization goals.

Seth Herd Oct 20, 2023, 12:10 AM
5 points
0 ∶ 0
on: My lab’s small AI safety agenda
This is a start, but just a start. Optimization/maximization isn’t actually the problem. Any highly competent agent with goals that don’t match ours is the problem.
A world that’s 10% paperclips and the rest composed of other stuff we don’t care about is no better than a true optimizer.
The idea “just don’t optimize” has a surprising amount of support in AGI safety, including quantilizers and satisficing. But they seem like only a bare start on taking the points off of the tiger’s teeth to me. The tiger will still gnaw you to death if it wants to even a little.

Seth Herd Oct 20, 2023, 12:09 AM
2 points
0 ∶ 0
in reply to: Jobst Heitzig (vodle.it)’s comment on: My lab’s small AI safety agenda
It means humans are highly imperfect maximizers of some imperfectly defined and ever-changing thing: your estimated future rewards according to your current reward function.
It doesn’t matter that you’re not exactly maximizing one certain thing; you’re working toward some set of things, and if you’re really good at that, it’s really bad for anyone who doesn’t like that set of things.
Optimization/maximization is a red herring. Highly compentent agents with goals different from yours is the core problem.

Seth Herd Oct 20, 2023, 12:06 AM
3 points
1 ∶ 0
in reply to: titotal’s comment on: My lab’s small AI safety agenda
From a neuroscience/psychology perspective, I’d say that you are maximizing your future reward. And while that’s not a well-defined thing, it doesn’t matter; if you were highly competent, you’d make a lot of changes to the world according to what tickles you, and those might or might not be good for others, depending on your preferences (reward function). The slight difference between turning the world into one well-defined thing and a bunch of things you like isn’t that important to anyone who doesn’t like what you like.
This is a broader and more intuitive form of the argument Miles is trying to make precise.
If you can be dutch-booked without limit, well, you’re just not competent enough to be a threat; but you’re not going to let that happen, let alone a superintelligent version of you.

Seth Herd Jul 4, 2023, 7:40 PM
7 points
5 ∶ 0
on: Grant applications and grand narratives
The primary problem you mention is exaggerating the importance of your project. That is a fundamental issue with every grant. Every grantmaker wants to fund projects with maximum impact per dollar.
There is an incentive to aggrandize your work, but there’s a counterincentive to not bullshit. A lot of the work of reviewing grants is having a well-tuned bullshit detector.
I don’t think there’s any way around the tension between those two factors. You can change the goalposts, but there’s always a goal, and a claim of efficiency in moving toward that goal.
The other issue here is with the intended use of the grant money. If these organizations really only want to fund projects that improve our chances of survival and flourishing, that is their choice. If that’s their goal, there has to be a chain of logic for how that is going to happen. Sometimes grantmakers come up with that chain of logic, and so they fund projects like “better understanding health psychology” because they believe accomplishing that will produce a better world. The organizations you mention are trying to be broad by allowing anyone to convince them that their unique project will make the world better with a good $/benefit ratio. This work can’t be skipped, but it can be shared by grantmaker and applicant.
Therefore, I’d suggest that they add “if you don’t have a grand narrative that’s fine; we might have a grand narrative for your work that you’re not seeing. Of course it helps your odds if you do have a convincing answer for a way your project achieves our goal (X) with a good cost ratio, in case we don’t have one.”
My career to date has been mostly funded by US government grants. These do not require a well thought out grand narrative, or any other sort of direct causal reasoning about impacts. I believe this is disastrous. It shifts most of the competition to cultural knowledge of the granting agency and the types of individuals who are likely to be reviewers. And by not requiring much explicit logic about likely outcomes and therefore payoff ratio, I believe the government is wasting money like crazy. They effectively fund projects that “sound like good work” to the people already doing similar work, which creates a clique mentality divorced from actual impact of the funded work.
My experience with EA organization granting processes has been vastly better, primarily based on their focus on the careful payoff logic you seem to be arguing against.