The “go extinct” condition is a bit fuzzy. It seems like it would be better to express what you want to change your mind about as something more like (forget the term for this). P(go extinct| AGI)/P(go extinct).
I know you’ve written the question in terms of go extinct because of AGI but I worry this leads to relatively trivial/uninformative about AI ways to shift that value upward.
For instance, consider a line of argument:
AGI is quite likely (probably by your own lights) to be developed by 2070.
If AGI is developed either it will suffer from serious alignment problems (so reason to think we go extinct) or it will seem to be reliable and extremely capable so will quickly be placed into key roles controlling things like nukes, military responses etc...
The world is a dangerous place and there is a good possibility that there is a substantial nuclear exchange between countries before 2070 which would substantially curtail our future potential (eg by causing a civ collapse which, due to our use of all much of the easily available fossil fuels/minerals etc we can’t recover from).
By 2 that exchange will, with high probability, have AGI serving as a key element in the causal pathway that leads to the exchange. Even tho the exchange may we’ll have happened w/o AGI it will be the case that the ppl who press the button relied on critical Intel collected by AGI or AGI was placed directly in charge of some of the weapons systems involved in one of the escalating incidents etc...
I think it might be wise to either
a) Shift to a condition in terms of the ratio between chance of extinction and chance of extinction conditional on AGI so the focus is on the effect of AGI on likelihood of extinction.
b) If not that at least clarify the kind of causation required. Is it sufficient that the particular causal pathway that occured include AGI somewhere in it? Can I play even more unfairly and simply point out that by butterfly effect style argument the particular incident that leads to extinction is probably but for caused by almost everything that happens before (if not for some random AI thing years ago the soldiers who provoked the initial confrontation would probably have behaved/been different and instead of that year and incident it would have been one year before or hence).
But hey, if you aren’t going to clarify away these issues or say that you’ll evaluate to the spirit of the Q not technical formulation I’m going to include in my submission (if I find I have the time for one) a whole bunch of technically responsive but not really what you want arguments about how extinction from some cause is relatively likely and that AGI will appear in that causal chain in a way that makes it a cause of the outcome.
I mean I hope you actually judge on something that ensures you’re really learning about impact of AGI but gotta pick up all the allowed percentage points one can ;-).
I agree with this, and the “drastic reduction in long term value” part is even worse. It is implicitly counterfactual—drastic reductions have to be in reference to *something * - but what exactly the proposed counterfactual is is extremely vague. I worry that to some extent this vagueness will lead to people not exploring some answers to the question because they’re trying to self impose a “sensible counterfactual” constraint which, due to vagueness, won’t actually line up well with the kinds of counterfactuals the FTX foundation is interested in exploring.
The “go extinct” condition is a bit fuzzy. It seems like it would be better to express what you want to change your mind about as something more like (forget the term for this). P(go extinct| AGI)/P(go extinct).
I know you’ve written the question in terms of go extinct because of AGI but I worry this leads to relatively trivial/uninformative about AI ways to shift that value upward.
For instance, consider a line of argument:
AGI is quite likely (probably by your own lights) to be developed by 2070.
If AGI is developed either it will suffer from serious alignment problems (so reason to think we go extinct) or it will seem to be reliable and extremely capable so will quickly be placed into key roles controlling things like nukes, military responses etc...
The world is a dangerous place and there is a good possibility that there is a substantial nuclear exchange between countries before 2070 which would substantially curtail our future potential (eg by causing a civ collapse which, due to our use of all much of the easily available fossil fuels/minerals etc we can’t recover from).
By 2 that exchange will, with high probability, have AGI serving as a key element in the causal pathway that leads to the exchange. Even tho the exchange may we’ll have happened w/o AGI it will be the case that the ppl who press the button relied on critical Intel collected by AGI or AGI was placed directly in charge of some of the weapons systems involved in one of the escalating incidents etc...
I think it might be wise to either
a) Shift to a condition in terms of the ratio between chance of extinction and chance of extinction conditional on AGI so the focus is on the effect of AGI on likelihood of extinction.
b) If not that at least clarify the kind of causation required. Is it sufficient that the particular causal pathway that occured include AGI somewhere in it? Can I play even more unfairly and simply point out that by butterfly effect style argument the particular incident that leads to extinction is probably but for caused by almost everything that happens before (if not for some random AI thing years ago the soldiers who provoked the initial confrontation would probably have behaved/been different and instead of that year and incident it would have been one year before or hence).
But hey, if you aren’t going to clarify away these issues or say that you’ll evaluate to the spirit of the Q not technical formulation I’m going to include in my submission (if I find I have the time for one) a whole bunch of technically responsive but not really what you want arguments about how extinction from some cause is relatively likely and that AGI will appear in that causal chain in a way that makes it a cause of the outcome.
I mean I hope you actually judge on something that ensures you’re really learning about impact of AGI but gotta pick up all the allowed percentage points one can ;-).
I agree with this, and the “drastic reduction in long term value” part is even worse. It is implicitly counterfactual—drastic reductions have to be in reference to *something * - but what exactly the proposed counterfactual is is extremely vague. I worry that to some extent this vagueness will lead to people not exploring some answers to the question because they’re trying to self impose a “sensible counterfactual” constraint which, due to vagueness, won’t actually line up well with the kinds of counterfactuals the FTX foundation is interested in exploring.