I think it’s plausible that “solving the alignment problem” isn’t a very clear way of phrasing the goal of technical AI safety research. Consider the question “will we solve the rocket alignment problem before we launch the first rocket to the moon”—to me the interesting question is whether the first rocket to the moon will indeed get there. The problem isn’t really “solved” or “not solved”, the rocket just gets to the moon or not. And it’s not even obvious whether the goal is to align the first AGI; maybe the question is “what proportion of resources controlled by AI systems end up being used for human purposes”, where we care about a weighted proportion of AI systems which are aligned.
I am not sure whether I’d bet for or against the proposition that humans will go extinct for AGI-misalignment-related-reasons within the next 100 years.
Are your referring to this comment from Eliezer Yudkowsky,
This is crunch time. This is crunch time for the entire human species. This is the hour before the final exam, we are trying to get as much studying done as possible, and it may be that you can’t make yourself feel that, for a decade, or 30 years on end or however long this crunch time lasts.
Sure. “Crunch time” is not exactly a technically precise term, and it is quite likely our time is measured in decades. The thing I want to ask is whether Buck expects the timeline will fully run out before we solve alignment, or whether we’ll manage to successfully build an AGI that helps us achieve our values and an existential win, or whether something different will happen instead.
Will we solve the alignment problem before crunch time?
I think it’s plausible that “solving the alignment problem” isn’t a very clear way of phrasing the goal of technical AI safety research. Consider the question “will we solve the rocket alignment problem before we launch the first rocket to the moon”—to me the interesting question is whether the first rocket to the moon will indeed get there. The problem isn’t really “solved” or “not solved”, the rocket just gets to the moon or not. And it’s not even obvious whether the goal is to align the first AGI; maybe the question is “what proportion of resources controlled by AI systems end up being used for human purposes”, where we care about a weighted proportion of AI systems which are aligned.
I am not sure whether I’d bet for or against the proposition that humans will go extinct for AGI-misalignment-related-reasons within the next 100 years.
Apologies, aren’t we already in crunch time?
Are your referring to this comment from Eliezer Yudkowsky,
Sure. “Crunch time” is not exactly a technically precise term, and it is quite likely our time is measured in decades. The thing I want to ask is whether Buck expects the timeline will fully run out before we solve alignment, or whether we’ll manage to successfully build an AGI that helps us achieve our values and an existential win, or whether something different will happen instead.
I see. I asked only because I was confused why you asked “before crunch time” rather than leaving that part out.