Error
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Unrecognized LW server error:
Field "fmCrosspost" of type "CrosspostOutput" must have a selection of subfields. Did you mean "fmCrosspost { ... }"?
Interesting idea – thanks for sharing and would be cool to see some further development.
I think there’s a crucial disanalogy between your proposal and the carbon emissions case, assuming your primary concern is x-risk. Pigouvian carbon taxes make sense because you have a huge number of emitters whose negative externality is each roughly proportional to the amount they emit. 1000 motorists collectively cause 1000 times the marginal harm, and thus collectively pay 1000 times as much, as 1 motorist. However, the first company to train GPT-n imposes a significant x-risk externality on the world by advancing the capabilities frontier, but each subsequent company that develops a similar or less powerful model imposes a somewhat lower externality. Once GPT-5 comes out, I don’t think charging 1bn (or whatever) to train models as powerful as, say, ChatGPT affects x-risk either way. Would be interested to hear whether you have a significantly different perspective, or if I’ve misunderstood your proposal.
I’m wondering whether it makes more sense to base a tax on some kind of dynamic measure of the “state-of-the-art” – e.g. any new model with at least 30% the parameter count of some SOTA model (currently, say, GPT4, which has on the order of 100tn parameters) must pay a levy proportional to how far over the threshold the new model is (these details are purely illustrative).
Moreover, especially if you have shorter timelines, the number of firms at any given time who have a realistic chance of winning the AGI race is probably less than five. Even if you widen this to “somehow meaningfully advance broad AI capabilities”, I don’t think it’s more than 20(?)
A Pigouvian tax is very appealing when you have billions of decentralised actors each pursuing their own self-interest with negative externalities—in most reasonable models you get a (much) more socially efficient outcome through carbon taxation than direct regulation. For AGI though I honestly think it’s far more feasible and well-targeted to introduce international legislation along the lines of “if you want to train a model more powerful than [some measure of SOTA], you need formal permission from this international body” than to tax it – apart from the revenue argument, I don’t think you’ve made the case for why taxes are better.
That being said, as a redistributive proposal your tax makes a lot of sense (although a lot depends on whether economic impact scales roughly linearly with model size, whereas my guess is, again, that one pioneering firm advancing capabilities a little has a more significant economic effect than 100 laggards building more GPT-3s, certainly in terms of how much profit its developers would expect, because of returns to scale).
Also, my whole argument relies on the intuition that the cost to society (in terms of x-risk) of a given model is primarily a function of its size relative to the state of the art (and hence propensity to advance capabilities), rather than absolute size, at least until AGI arrives. Maybe on further thought I’d change my mind on this.
Thanks you for this feedback, and well put! I’ve been having somewhat similar thoughts in the back of my mind, and this clarifies many of those thoughts.
This is a good proposal to have out there, but needs work on talking about the weaknesses. A couple examples:
How would this be enforced? Global carbon taxes are a good analogue and have never gotten global traction. Linked to the cooperation problem between different countries, the hardware can just go to an AWS server in a permissive country.
From a technical side, I can break down a large model into sub-components and then ensemble them together. It will be tough to have definitions that avoid these kind of work-around and also don’t affect legitimate use cases.
Thank you for the examples! Could you elaborate on the technical example of breaking down a large model into sub-components, then training each sub-components individually, and finally assembling it into a large model? Will such a method realistically be used to train AGI-level systems? I would think that the model needs to be sufficiently large during training to learn highly complex functions. Do you have any resources you could share that indicate that large models can be successfully trained this way?
This a unique, interesting and simple proposal I have not seen presented in academic form yet. With the development of the article, you’ll of course need to change the framing of a few sections to introduce the idea, the viability, along with the multi-purpose potential of the proposal.
Despite unlikely effective enforcement of the policy, it seems like a valuable idea to publish. Combining it with newer work in GPU monitoring firmware (Shavit, 2023) and your own proposals for required GPU server tracking.
To comment on kpurens comment, carbon taxation was a non-political issue before it became contentious and if the lobbying hadn’t hit as hard, it seems like there would be a larger chance for a global carbon tax. At the same time, compute governance seems more enforceable because of the centralization of data centers.
Thanks for the feedback and for sharing Yonadav Shavits paper!