Not this person, but many AI risk arguments are necessarily logical rather than empiricalāthere are good reasons to believe the relevant behaviors wonāt appear, or be trivially easy to counter (at least re: harmful outputs), until you have very capable systems.
Like, if I can construct a deceptive response-to-training strategy (but current models canāt), thatās enough evidence to be concerned future superhuman models might do similar deceptive alignment. Other concerns like inner optimizers (e.g. humans stopped being kid-maxxers at high capability, because our proxy decoupled from evolutionās target) might not show up, or change in character, as models become less limited. And even when you can demonstrate the behavior empirically, people dismiss it as overly-induced or a toy environmentāwhich was the whole point, just to show plausibility not prove it.
More fundamentally: If I argue that a future thing logically implies certain risks arise, responding with āthereās no empirical evidenceā is silly. Logical chains and structural arguments are still valid epistemic tools.
Iām not trying to imply it, Iām trying to state it clearly. You dismiss the arguments made in the book as not being empirical. If you havenāt read your post, here are some quotes indicating where you do this explicitly: āthe chapter presents effectively zero empirical researchā āone might expect Y&S to substantiate their case with empirical evidenceā ā³lack of empirical evidenceā
I did not write the post, or read the book. However, based on the podcasts with Eliezer Yudkowsky and Nate Soares I have listened to, I would also like them to focus more on empirical evidence.
Do you see another bet we could make about AI risk? I remain open tobetsagainst short AI timelines, or what they supposedly imply, up to 10 k$. I am also open the increasing the stakes of our bet.
Hi David. Are you implying this post is neglecting non-empirical evidence? If so, which type of evidence do you have in mind?
Not this person, but many AI risk arguments are necessarily logical rather than empiricalāthere are good reasons to believe the relevant behaviors wonāt appear, or be trivially easy to counter (at least re: harmful outputs), until you have very capable systems.
Like, if I can construct a deceptive response-to-training strategy (but current models canāt), thatās enough evidence to be concerned future superhuman models might do similar deceptive alignment. Other concerns like inner optimizers (e.g. humans stopped being kid-maxxers at high capability, because our proxy decoupled from evolutionās target) might not show up, or change in character, as models become less limited. And even when you can demonstrate the behavior empirically, people dismiss it as overly-induced or a toy environmentāwhich was the whole point, just to show plausibility not prove it.
More fundamentally: If I argue that a future thing logically implies certain risks arise, responding with āthereās no empirical evidenceā is silly. Logical chains and structural arguments are still valid epistemic tools.
Iām not trying to imply it, Iām trying to state it clearly. You dismiss the arguments made in the book as not being empirical. If you havenāt read your post, here are some quotes indicating where you do this explicitly:
āthe chapter presents effectively zero empirical researchā
āone might expect Y&S to substantiate their case with empirical evidenceā
ā³lack of empirical evidenceā
I did not write the post, or read the book. However, based on the podcasts with Eliezer Yudkowsky and Nate Soares I have listened to, I would also like them to focus more on empirical evidence.
Do you see another bet we could make about AI risk? I remain open to bets against short AI timelines, or what they supposedly imply, up to 10 k$. I am also open the increasing the stakes of our bet.