I think this post contains some major and some minor errors and is overall fairly âone-sidedâ, and that the post will therefore tend to overall worsen & confuse (rather than improve & clarify) debates and readersâ beliefs. Below I discuss what I see as some of the errors or markers of one-sidedness in this post. I then close with some other points, e.g. emphasising that I do think good critiques and red-teaming is valuable, noting positives of this post, and acknowledging that this comment probably feels kind-of rude :)
Here are some of the things I see as issues in this post. Some are in themselves important, and others are in themselves minor but seem to me like indications of the post generally seeming quite inclined to support a given conclusion rather than more neutrally surveying a topic and seeing what it lands on.* Iâve bolded key points to help people skim this.
As Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions.
Though I do think Bostrom probably could and should have been clearer about this, given that many people have gotten this impression from the paper.
You seem to argue (or at least give the vibe that) that thereâs thereâs so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. It seems to me that this conclusion is probably incorrect (though I do feel unsure), that the arguments youâve presented for it are somewhat weak, and that you havenât adequately discussed arguments against it.
Your arguments for this conclusion include that itâs hard to predict the potential benefits and harms of various technologies, that some dangerous and powerful techs like AI can also protect us from other things, and that actors who would steer technological development have motives other than just making the world better.
I think these are in fact all true and important points, and itâs good for people to consider them.
But I think there are still many cases where we can be pretty confident that our best bet is that some tech will reduce risk or will increase it and that some way of steering tech will have net positive effects.
I donât mean that we can be confident that this will indeed happen this way, but that we can be confident that even after another 10,000 hours of thinking and research weâd still conclude these actions are net positive in expectation (or net negative, in the cases where thatâs our guess). And we should take action on that basis.
(I wonât try to justify this here due to time constraints, and it would be fair to not be convinced. But hopefully readers can try to think of examples and realise for themselves that my stance seems right.)
And if we canât be confident of that right now, then it seems to me that we should try to (a) gain greater clarity on what tech steering would be good and greater ability to learn that or implement our learnings effectively, and (b) avoid actively accelerating tech dev in the meantime. (As opposed to treating our inability to usefully steer things as so unchangeable that we should just charge ahead and hope for the best.)
It seems odd to me to act as though we should be so close to agnostic about the net benefits or harms of all techs and so close to untrusting of any actors who could steer tech development that we should instead just race ahead as fast as we can in all directions.
In some cases, I think making simple models, Fermi estimates, or forecasts could help make âeach sideââs claims more clear and help us figure out which should get more weight. An example of what this could look like is here: https://ââblog.givewell.org/ââ2015/ââ09/ââ30/ââdifferential-technological-development-some-early-thinking/ââ (This actually overall highlights the plausibility of the âmaybe accelerating AI is goodâ stance. And I agree that thatâs plausible. Iâm not saying this post supports my conclusion, just that it seems like an example of a productive way to advance this discussion.)
I havenât re-read your post closely to check what your precise claims and are if you somewhere provide appropriate caveats. But I at least think that the impression people would walk away with is something like âwe should just race ahead as fast as possibleâ.
A core premise/âargument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means weâre safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible.
I think the key reason this is false is that âdiscoveringâ a technology or âpulling a ball from the urnâ does not mean it has reached maturity and been deployed globally. So even if weâve discovered both the dangerous and protective technology, itâs still possible for the dangerous technology to be deployed in a sufficiently bad way before the protective technology has been deployed in a sufficiently good way.
I think there are also reasons why that might be likely, e.g. in some ways it seems easier to destroy than to create, and some dangerous technologies would just need to be deployed once somewhere whereas some protective technologies would need to be deployed continuously and everywhere. (That might be the same point stated in two separate waysânot sure.)
OTOH, there are also reasons why that might be unlikely, e.g. far more people want to avoid existential catastrophe than to enact it.
Overall Iâm not sure which is more likely, but it definitely seems at least plausible that we could end up with disaster if we discover both a very dangerous tech and a paired protective tech at the same time.
Iâll illustrate with one of your own examples: â[Increasing our technological ability] slowly, one ball at a time, just means less chance at pulling antidote technologies in time to disable black ball risks.For example, terraforming technology which allows small groups of humans to make changes to a planetâs atmosphere and geography may increase existential risk until space-settling technology puts people on many planets. If terraforming technology typically precedes space-settling then accelerating the pace of progress reduces risk.â But I think if we develop such terraforming technology and such space-settling technology at the same time, or even develop space-settling technology somewhat earlier, that does guarantee we will in fact have built self-sustaining settlements on many places before an individual uses the terraforming technology in a bad way.
Itâs still totally possible for us to all die due to the terraforming technology before those self-sustaining settlements are set up.
Another way to illustrate this: You write âIf we discovered all possible technologies at once (which in Bostromâs wide definition of technology in the VWH paper includes ideas about coordination and insight), we would be in the safe region.â I encourage readers to genuinely try to imagine that literally tomorrow literally the ~8 billion people who exist collectively discover literally all possible technologies at once, and then consider whether theyâre confident humanity will exist and be on track to thrive in 2023. Do you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?
I also donât agree, and donât think Bostrom would claim, that technological maturity means having discovered all possible technologies,or that we would necessarily be safe if weâd discovered & deployed all possible technologies (even if we survive the initial transition to that world).
Bostrom writes âBy âtechnological maturityâ we mean the attainment of capabilities affording a level of economic productivity and control over nature close to the maximum that could feasibly be achieved (in the fullness of time) (Bostrom, 2013).â That phrasing a bit vague, but I think that attaining that level of capabilities doesnât mean that weâve actually got all possible technologies or that every given individual has the maximum possible capabilities.
It seems plausible/âlikely that some technologies are sufficiently dangerous that weâll only be safe if weâre in a world where them ever being discovered or ever being deployed is preventedâi.e., that no protective measure would be adequate except prevention.
iirc, Bostromâs discussion of âType-0 vulnerabilitiesâ is relevant here.
I think the following bolded claim is false, and I think itâs very weird to make this empirical claim without providing any actual evidence for it: âAI safety researchers argue over the feasibility of âboxingâ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster.â
I am 100% confident that not all AI safety researchers have even considered that question, let alone formed the stance you suggest they all agree on.
Perhaps you meant they âwould all agreeâ? Still though, it would seem odd to be confident of that without providing any justification.
And I think in fact many would disagree if asked. In fact, I expect that many of them would believe that what the future should look like would technically or basically involve this happening; we have a properly aligned superintelligent AI that either already has access to those things or could gain access to those things if it simply chose to do so.
I think âIf it is to fulfill its mission of preventing anthropogenic risk long into the future, the global surveillance state cannot afford to risk usurpationâ and related claims are basically false or misleading.
It appears to me that weâre fairly likely to be in or soon be in a âtime of perilsâ, where existential risk is unusually high. There are various reasons to expect this to subside in future besides a global surveillance state. So it seems pretty plausible that it would be best to temporarily have unusually strong/âpervasive surveillance, enforcement, etc. for particular types of activities.
And if weâve set this actor up properly, then it should be focused on whatâs net positive overall and should not conflate âensuring this actor has an extremely high chance of maintaining power helps reduce some risksâ with âensuring this actor has an extremely high chance of maintaining power is overall net beneficialâ.
To be clear, Iâm not saying that we should do things like this or that itâd work if we tried; Iâm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesnât require thinking that permanent extreme levels (centralised in a single state-like entity) would be good.
The following seems like a misrepresentation of Bostrom, and one which is in line with what I perceive as a general one-sidedness or uncharitability or in this post: âBostrom continues to assume that the power to take a socially beneficial action is sufficient to guarantee that the state will actually do it. âStates have frequently failed to solve easier collective action problems ⌠With effective global governance, however, the solution becomes trivial: simply prohibit all states from wielding the black-ball technology destructively.ââ
That quote does not state that the power to take a socially beneficial action is sufficient to guarantee that a state will actually take it. A solution can be trivial but not taken.
Also, the âeffectiveâ in âeffective global governanceâ might be adding something beyond âpowerâ along the lines of âthis governance is pointed in the right directionâ?
I havenât read the VWH paper in a while, so maybe he does make this claim elsewhere, or maybe he repeatedly implies it without stating it. But that quote does not seem to demonstrate this.
Some other things I want to make sure I say (not issues with the post):
To be clear, I do think itâs valuable to critically discuss & red-team the VWH paper in particular and also other ideas and writings that are prominent within longtermism. And I personally wish Bostrom had written the VWH paper somewhat differently, and I donât feel confident that the interventions it discusses are net positive. So this comment is not meant to discourage other critical discussions or to strongly defend the interventions discussed in VWH.
But I do think itâs important to counter mistaken and misleading posts in general, even if the posts are good-faith and are attempting to play a valuable role of criticizing prominent ideas.
I wrote this comment pretty quickly, so I donât fully justify things and my tone is sometimes a bit sharp or uncharitableâapologies in advance for that.
(I expect that if the original poster and I instead had a call we would get on the same page faster and feel more positively toward each other, and that I would come across as a bit less rude than this comment might.)
I do think there are some good elements of this post (e.g., the writing is generally clear, you include a decent summary at the start, you keep things organized nicely with headings, and some of your points seem true and important). I focus on the negatives since they seem more important and due to time constraints.
As a heads up, Iâm unlikely to reply to replies to this, since Iâm trying to focus on my main work atm.
*To be clear, Iâm a fan of red-teaming, which is not neutral surveying but rather deliberately critical. But that should then be framed explicitly as red-teaming.
Thank you for reading and for your detailed comment. In general I would agree that my post is not a neutral survey of the VWH but a critical response, and I think I made that clear in the introduction even if I did not call it red-teaming explicitly.
Iâd like to respond to some of the points you make.
âAs Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions.â
I think this is overall unclear in Bostromâs paper, but he does have a section called Policy Implications right at the top of the paper where he says âIn order for civilization to have a general capacity to deal with âblack ballâ inventions of this type, it would need a system of ubiquitous real-time worldwide surveillance. In some scenarios, such a system would need to be in place before the technology is invented.â I think it is confusing because he starts out analyzing the urn of technology, then conditioned on there being black balls in the urn he recommends ubiquitous real-time worldwide surveillance, and then the âhigh-tech panopticonâ example is just one possible incarnation of that surveillance that he is analyzing. I think it is hard to deny that he is recommending the panopticon if existential risk prevention is the only value weâre measuring. He doesnât claim all-things-considered support, but my response isnât about other considerations of a panopticon. I donât think a panopticon is any good even if existential risk is all we care about.
âYou seem to argue (or at least give the vibe that) that thereâs thereâs so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. â
I think this is true insofar as it goes, but you miss what is in my opinion the more important second part of the argument. Predicting the benefits of future tech is very difficult, but even if we knew all of that, getting the government to actually steer in the right direction is harder. For example, economists have known for centuries that domestic farming subsidies are inefficient. They are wasteful and they produce big negative externalities. But almost every country on earth has big domestic farming subsidies because they benefit a small, politically active group in most countries. I admit that we have some foreknowledge of which technologies look dangerous and which do not. That is far from sufficient for using the government to decrease risk.
The point of Enlightenment Values is not that no one should think about the risks of technology and we should all charge blindly forward. Rather, it is that decisions about how best to steer technology for the better can and should be made on the individual level where they are more voluntary, constrained by competition, and mistakes are hedged by lots of other people making different decisions.
âA core premise/âargument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means weâre safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible.â
There are technologies like engineered viruses and vaccines, but how they interact depends much more on their relative costs. An antidote to $5-per-infection viruses might need to be $1-per-dose vaccines or $0.5-per-mask PPE. If you just define an antidote to be âa technology which is powerful and cheap enough to counter the black ball should they be pulled simultaneouslyâ then the premise stands.
âDo you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?â
Until meta-understanding of technology greatly improves this is ultimately a matter of opinion. If you think there exists some technology that is incompatible with civilization in all contexts then I canât really prove you wrong but it doesnât seem right to me.
Type-0 vulnerabilities were âsurprising strangelets.â Not techs that are incompatible with civilization in all contexts, but risks that come from unexpected phenomena like the Hadron Collider opening a black hole or something like that.
âI think the following bolded claim is false, and I think itâs very weird to make this empirical claim without providing any actual evidence for it: âAI safety researchers argue over the feasibility of âboxingâ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster.â
Youâre right that I didnât get any survey of AI researchers for this question. The near-tautological nature of âproperly aligned superintelligenceâ guarantees that if we had it, everything would go well. So yeah, probably lots of AI researchers would agree that a properly aligned superintelligence would use surveillance to improve the world. This is a pretty empty statement imo. The question is about what we should do next. This hypothetical aligned intelligence tells us nothing about what increasing state AI surveillance capacity does on the margin. Note that Bostrom is not recommending that an aligned superintelligent-being do the surveillance. His recommendations are about increasing global governance and surveillance on the margin. The AI he mentions is just a machine learning classifier that can help a human government blur out the private parts the cameras collect.
âIâm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesnât require thinking that permanent extreme levels (centralised in a single state-like entity) would be good.â
This is only true if you have a reliable way of taking back increased surveillance, enforcement, and moves towards global governance. The alignment and instrumental convergence problems I outlined in those sections give strong reasons why these capabilities are extremely difficult to take back. Bostrom scantly mentions the issue of getting governments to enact his risk reducing policies once they have the power to enforce them, let alone give a mechanism design which would judiciously use its power to guide us through the time of perils and then reliably step down. Without such a plan the issues of power-seeking and misalignment are not ones you can ignore
I think this post contains some major and some minor errors and is overall fairly âone-sidedâ, and that the post will therefore tend to overall worsen & confuse (rather than improve & clarify) debates and readersâ beliefs. Below I discuss what I see as some of the errors or markers of one-sidedness in this post. I then close with some other points, e.g. emphasising that I do think good critiques and red-teaming is valuable, noting positives of this post, and acknowledging that this comment probably feels kind-of rude :)
Here are some of the things I see as issues in this post. Some are in themselves important, and others are in themselves minor but seem to me like indications of the post generally seeming quite inclined to support a given conclusion rather than more neutrally surveying a topic and seeing what it lands on.* Iâve bolded key points to help people skim this.
As Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions.
Though I do think Bostrom probably could and should have been clearer about this, given that many people have gotten this impression from the paper.
You seem to argue (or at least give the vibe that) that thereâs thereâs so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. It seems to me that this conclusion is probably incorrect (though I do feel unsure), that the arguments youâve presented for it are somewhat weak, and that you havenât adequately discussed arguments against it.
Your arguments for this conclusion include that itâs hard to predict the potential benefits and harms of various technologies, that some dangerous and powerful techs like AI can also protect us from other things, and that actors who would steer technological development have motives other than just making the world better.
I think these are in fact all true and important points, and itâs good for people to consider them.
But I think there are still many cases where we can be pretty confident that our best bet is that some tech will reduce risk or will increase it and that some way of steering tech will have net positive effects.
I donât mean that we can be confident that this will indeed happen this way, but that we can be confident that even after another 10,000 hours of thinking and research weâd still conclude these actions are net positive in expectation (or net negative, in the cases where thatâs our guess). And we should take action on that basis.
(I wonât try to justify this here due to time constraints, and it would be fair to not be convinced. But hopefully readers can try to think of examples and realise for themselves that my stance seems right.)
And if we canât be confident of that right now, then it seems to me that we should try to (a) gain greater clarity on what tech steering would be good and greater ability to learn that or implement our learnings effectively, and (b) avoid actively accelerating tech dev in the meantime. (As opposed to treating our inability to usefully steer things as so unchangeable that we should just charge ahead and hope for the best.)
It seems odd to me to act as though we should be so close to agnostic about the net benefits or harms of all techs and so close to untrusting of any actors who could steer tech development that we should instead just race ahead as fast as we can in all directions.
In some cases, I think making simple models, Fermi estimates, or forecasts could help make âeach sideââs claims more clear and help us figure out which should get more weight. An example of what this could look like is here: https://ââblog.givewell.org/ââ2015/ââ09/ââ30/ââdifferential-technological-development-some-early-thinking/ââ (This actually overall highlights the plausibility of the âmaybe accelerating AI is goodâ stance. And I agree that thatâs plausible. Iâm not saying this post supports my conclusion, just that it seems like an example of a productive way to advance this discussion.)
I havenât re-read your post closely to check what your precise claims and are if you somewhere provide appropriate caveats. But I at least think that the impression people would walk away with is something like âwe should just race ahead as fast as possibleâ.
A core premise/âargument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means weâre safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible.
I think the key reason this is false is that âdiscoveringâ a technology or âpulling a ball from the urnâ does not mean it has reached maturity and been deployed globally. So even if weâve discovered both the dangerous and protective technology, itâs still possible for the dangerous technology to be deployed in a sufficiently bad way before the protective technology has been deployed in a sufficiently good way.
I think there are also reasons why that might be likely, e.g. in some ways it seems easier to destroy than to create, and some dangerous technologies would just need to be deployed once somewhere whereas some protective technologies would need to be deployed continuously and everywhere. (That might be the same point stated in two separate waysânot sure.)
OTOH, there are also reasons why that might be unlikely, e.g. far more people want to avoid existential catastrophe than to enact it.
Overall Iâm not sure which is more likely, but it definitely seems at least plausible that we could end up with disaster if we discover both a very dangerous tech and a paired protective tech at the same time.
Iâll illustrate with one of your own examples: â[Increasing our technological ability] slowly, one ball at a time, just means less chance at pulling antidote technologies in time to disable black ball risks.For example, terraforming technology which allows small groups of humans to make changes to a planetâs atmosphere and geography may increase existential risk until space-settling technology puts people on many planets. If terraforming technology typically precedes space-settling then accelerating the pace of progress reduces risk.â But I think if we develop such terraforming technology and such space-settling technology at the same time, or even develop space-settling technology somewhat earlier, that does guarantee we will in fact have built self-sustaining settlements on many places before an individual uses the terraforming technology in a bad way.
Itâs still totally possible for us to all die due to the terraforming technology before those self-sustaining settlements are set up.
Another way to illustrate this: You write âIf we discovered all possible technologies at once (which in Bostromâs wide definition of technology in the VWH paper includes ideas about coordination and insight), we would be in the safe region.â I encourage readers to genuinely try to imagine that literally tomorrow literally the ~8 billion people who exist collectively discover literally all possible technologies at once, and then consider whether theyâre confident humanity will exist and be on track to thrive in 2023. Do you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?
I also donât agree, and donât think Bostrom would claim, that technological maturity means having discovered all possible technologies, or that we would necessarily be safe if weâd discovered & deployed all possible technologies (even if we survive the initial transition to that world).
Bostrom writes âBy âtechnological maturityâ we mean the attainment of capabilities affording a level of economic productivity and control over nature
close to the maximum that could feasibly be achieved (in the fullness of time) (Bostrom, 2013).â That phrasing a bit vague, but I think that attaining that level of capabilities doesnât mean that weâve actually got all possible technologies or that every given individual has the maximum possible capabilities.
It seems plausible/âlikely that some technologies are sufficiently dangerous that weâll only be safe if weâre in a world where them ever being discovered or ever being deployed is preventedâi.e., that no protective measure would be adequate except prevention.
iirc, Bostromâs discussion of âType-0 vulnerabilitiesâ is relevant here.
I think the following bolded claim is false, and I think itâs very weird to make this empirical claim without providing any actual evidence for it: âAI safety researchers argue over the feasibility of âboxingâ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster.â
I am 100% confident that not all AI safety researchers have even considered that question, let alone formed the stance you suggest they all agree on.
Perhaps you meant they âwould all agreeâ? Still though, it would seem odd to be confident of that without providing any justification.
And I think in fact many would disagree if asked. In fact, I expect that many of them would believe that what the future should look like would technically or basically involve this happening; we have a properly aligned superintelligent AI that either already has access to those things or could gain access to those things if it simply chose to do so.
I think âIf it is to fulfill its mission of preventing anthropogenic risk long into the future, the global surveillance state cannot afford to risk usurpationâ and related claims are basically false or misleading.
It appears to me that weâre fairly likely to be in or soon be in a âtime of perilsâ, where existential risk is unusually high. There are various reasons to expect this to subside in future besides a global surveillance state. So it seems pretty plausible that it would be best to temporarily have unusually strong/âpervasive surveillance, enforcement, etc. for particular types of activities.
And if weâve set this actor up properly, then it should be focused on whatâs net positive overall and should not conflate âensuring this actor has an extremely high chance of maintaining power helps reduce some risksâ with âensuring this actor has an extremely high chance of maintaining power is overall net beneficialâ.
To be clear, Iâm not saying that we should do things like this or that itâd work if we tried; Iâm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesnât require thinking that permanent extreme levels (centralised in a single state-like entity) would be good.
The following seems like a misrepresentation of Bostrom, and one which is in line with what I perceive as a general one-sidedness or uncharitability or in this post: âBostrom continues to assume that the power to take a socially beneficial action is sufficient to guarantee that the state will actually do it. âStates have frequently failed to solve easier collective action problems ⌠With effective global governance, however, the solution becomes trivial: simply prohibit all states from wielding the black-ball technology destructively.ââ
That quote does not state that the power to take a socially beneficial action is sufficient to guarantee that a state will actually take it. A solution can be trivial but not taken.
Also, the âeffectiveâ in âeffective global governanceâ might be adding something beyond âpowerâ along the lines of âthis governance is pointed in the right directionâ?
I havenât read the VWH paper in a while, so maybe he does make this claim elsewhere, or maybe he repeatedly implies it without stating it. But that quote does not seem to demonstrate this.
Some other things I want to make sure I say (not issues with the post):
To be clear, I do think itâs valuable to critically discuss & red-team the VWH paper in particular and also other ideas and writings that are prominent within longtermism. And I personally wish Bostrom had written the VWH paper somewhat differently, and I donât feel confident that the interventions it discusses are net positive. So this comment is not meant to discourage other critical discussions or to strongly defend the interventions discussed in VWH.
But I do think itâs important to counter mistaken and misleading posts in general, even if the posts are good-faith and are attempting to play a valuable role of criticizing prominent ideas.
I wrote this comment pretty quickly, so I donât fully justify things and my tone is sometimes a bit sharp or uncharitableâapologies in advance for that.
(I expect that if the original poster and I instead had a call we would get on the same page faster and feel more positively toward each other, and that I would come across as a bit less rude than this comment might.)
I do think there are some good elements of this post (e.g., the writing is generally clear, you include a decent summary at the start, you keep things organized nicely with headings, and some of your points seem true and important). I focus on the negatives since they seem more important and due to time constraints.
As a heads up, Iâm unlikely to reply to replies to this, since Iâm trying to focus on my main work atm.
*To be clear, Iâm a fan of red-teaming, which is not neutral surveying but rather deliberately critical. But that should then be framed explicitly as red-teaming.
Thank you for reading and for your detailed comment. In general I would agree that my post is not a neutral survey of the VWH but a critical response, and I think I made that clear in the introduction even if I did not call it red-teaming explicitly.
Iâd like to respond to some of the points you make.
âAs Zach mentioned, I think you at least somewhat overstate the extent to which Bostrom is recommending as opposed to analyzing these interventions.â
I think this is overall unclear in Bostromâs paper, but he does have a section called Policy Implications right at the top of the paper where he says âIn order for civilization to have a general capacity to deal with âblack ballâ inventions of this type, it would need a system of ubiquitous real-time worldwide surveillance. In some scenarios, such a system would need to be in place before the technology is invented.â I think it is confusing because he starts out analyzing the urn of technology, then conditioned on there being black balls in the urn he recommends ubiquitous real-time worldwide surveillance, and then the âhigh-tech panopticonâ example is just one possible incarnation of that surveillance that he is analyzing. I think it is hard to deny that he is recommending the panopticon if existential risk prevention is the only value weâre measuring. He doesnât claim all-things-considered support, but my response isnât about other considerations of a panopticon. I donât think a panopticon is any good even if existential risk is all we care about.
âYou seem to argue (or at least give the vibe that) that thereâs thereâs so little value in trying to steer technological development for the better than we should mostly not bother and instead just charge ahead as fast as possible. â
I think this is true insofar as it goes, but you miss what is in my opinion the more important second part of the argument. Predicting the benefits of future tech is very difficult, but even if we knew all of that, getting the government to actually steer in the right direction is harder. For example, economists have known for centuries that domestic farming subsidies are inefficient. They are wasteful and they produce big negative externalities. But almost every country on earth has big domestic farming subsidies because they benefit a small, politically active group in most countries. I admit that we have some foreknowledge of which technologies look dangerous and which do not. That is far from sufficient for using the government to decrease risk.
The point of Enlightenment Values is not that no one should think about the risks of technology and we should all charge blindly forward. Rather, it is that decisions about how best to steer technology for the better can and should be made on the individual level where they are more voluntary, constrained by competition, and mistakes are hedged by lots of other people making different decisions.
âA core premise/âargument in your post appears to be that pulling a black ball and an antidote (i.e., discovering a very dangerous technology and a technology that can protect us from it) at the same time means weâre safe. This seems false, and I think that substantially undermines the case for trying to rush forward and grab balls from the urn as fast as possible.â
There are technologies like engineered viruses and vaccines, but how they interact depends much more on their relative costs. An antidote to $5-per-infection viruses might need to be $1-per-dose vaccines or $0.5-per-mask PPE. If you just define an antidote to be âa technology which is powerful and cheap enough to counter the black ball should they be pulled simultaneouslyâ then the premise stands.
âDo you (the reader) feel confident that everything will go well in that world where all possible techs and insights on dumped on us at once?â
Until meta-understanding of technology greatly improves this is ultimately a matter of opinion. If you think there exists some technology that is incompatible with civilization in all contexts then I canât really prove you wrong but it doesnât seem right to me.
Type-0 vulnerabilities were âsurprising strangelets.â Not techs that are incompatible with civilization in all contexts, but risks that come from unexpected phenomena like the Hadron Collider opening a black hole or something like that.
âI think the following bolded claim is false, and I think itâs very weird to make this empirical claim without providing any actual evidence for it: âAI safety researchers argue over the feasibility of âboxingâ AIs in virtual environments, or restricting them to act as oracles only, but they all agree that training an AI with access to 80+% of all human sense-data and connecting it with the infrastructure to call out armed soldiers to kill or imprison anyone perceived as dangerous would be a disaster.â
Youâre right that I didnât get any survey of AI researchers for this question. The near-tautological nature of âproperly aligned superintelligenceâ guarantees that if we had it, everything would go well. So yeah, probably lots of AI researchers would agree that a properly aligned superintelligence would use surveillance to improve the world. This is a pretty empty statement imo. The question is about what we should do next. This hypothetical aligned intelligence tells us nothing about what increasing state AI surveillance capacity does on the margin. Note that Bostrom is not recommending that an aligned superintelligent-being do the surveillance. His recommendations are about increasing global governance and surveillance on the margin. The AI he mentions is just a machine learning classifier that can help a human government blur out the private parts the cameras collect.
âIâm just saying that thinking that increased surveillance, enforcement, moves towards global governance, etc. would be good doesnât require thinking that permanent extreme levels (centralised in a single state-like entity) would be good.â
This is only true if you have a reliable way of taking back increased surveillance, enforcement, and moves towards global governance. The alignment and instrumental convergence problems I outlined in those sections give strong reasons why these capabilities are extremely difficult to take back. Bostrom scantly mentions the issue of getting governments to enact his risk reducing policies once they have the power to enforce them, let alone give a mechanism design which would judiciously use its power to guide us through the time of perils and then reliably step down. Without such a plan the issues of power-seeking and misalignment are not ones you can ignore