I did this analysis a while back, but it’s worth doing again, let’s see what happens:
If you are spending 1e25 FLOP per simulated second simulating the neurons of the creatures, you can afford to spend 4e24 FLOP per simulated second simulating the environment & it will just be a rounding error on your overall calculation so it won’t change the bottom line. So the question is, can we make a sufficiently detailed environment for 4e24 FLOP per second?
There are 5e14 square meters on the surface of the earth according to wolfram alpha.
So that’s about 1e10 FLOP per second per square meter available. So, you could divide the world into 10x10 meter squares and then have a 1e12 FLOP computer assigned to each square to handle the physics and graphics. If I’m reading this chart right, that’s about what a fancy high-end graphics card can do. (Depends on if you want double or single-precision I think?). That feels like probably enough to me; certainly you could have a very detailed physics simulation at least. Remember also that you can e.g. use a planet 1 OOM smaller than Earth but with more habitable regions, and also dynamically allocate compute so that you have more of it where your creatures are and don’t waste as much simulating empty areas. Also, if you think this is still maybe not enough, IIRC Ajeya has error bars of like +/- SIX orders of magnitude on her estimate, so you can just add 3 OOMs no problem without really changing the bottom line that much.
It would be harder if you wanted to assign a graphics card to each nematode worm, instead of each chunk of territory. There are a lot of nematode worms or similar tiny creatures—Ajeya says 1e21 alive at any given point of time. So that would only leave you with 10,000 flops per second per worm to do the physics and graphics! If you instead wanted a proper graphics card for each worm you’d probably have to add 7 OOMs to that, getting you up to a 100 GFLOP card. This would be a bit higher than Ajeya estimated; it would be +25 OOMs more than GPT-3 cost instead of +18.
Personally I don’t think the worms matter that much, so I think the true answer is more likely to be along the lines of “a graphics card per human-sized creature” which would be something like 10 billion graphics cards which would let you have 5e14 FLOP per card which would let you create some near-photorealistic real time graphics for each human-sized creature.
Then there’s also all the various ways in which we could optimize the evolutionary simulation e.g. as described here. I wouldn’t be surprised if this shaves off 6 OOMs of cost.
Note that this analysis is going to wildly depend on how progress on “environment simulation efficiency” compares to progress on “algorithmic efficiency”. If you think it will be slower then the analysis above doesn’t work.
If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors? Seems plausible to me, could also be faster, idk. I don’t think this point undermines Ajeya’s report though because (a) we are never going to get to the evolution anchor anyway, or anywhere close, so how fast it approaches isn’t really relevant except in very long-timelines scenarios, and (b) her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.
Meta: I feel like the conversation here and with Nuno’s reply looks kinda like:
Nuno: People who want to use the evolutionary anchor as an upper bound on timelines should consider that it might be an underestimate, because the environment might be computationally costly.
You: It’s not an underestimate: here’s a plausible strategy by which you can simulate the environment.
Nuno / me: That strategy does not seem like it clearly supports the upper bound on timelines, for X, Y and Z reasons.
You: The evolution anchor doesn’t matter anyway and barely affects timelines.
This seems bad:
If you’re going to engage with a subpoint that OP made that was meant to apply in some context (namely, getting an upper bound on timelines), stick within that context (or at least signpost that you’re no longer engaging with the OP).
I don’t really understand why you bothered to do the analysis if you’re not changing the analysis based on critiques that you agree are correct. (If you disagree with the critique then say that instead.)
If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors?
Yes, and in particular, the mechanism is that environment simulation cost might not decrease as fast as machine learning algorithmic efficiency. (Like, the numbers for algorithmic efficiency are anchored on estimates like AI and Efficiency, those estimates seem pretty unlikely to generalize to “environment simulation cost”.)
her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.
Just because someone could change the numbers to get a different output doesn’t mean that the original numbers weren’t flawed and that there’s no value in pointing that out?
E.g. suppose I had the following timelines model:
Input: N, the number of years till AGI.
Output: Timeline is 2022 + N.
I publish a report estimating N = 1000, so that my timeline is 3022. If you then come and give a critique saying “actually N should be 10 for a timeline of 2032”, presumably I shouldn’t say “oh, my spreadsheet already allows you to choose your own value of N, so it handles that nuance”.
To be clear, my own view is also that the evolution anchor doesn’t matter, and I put very little weight on it and the considerations in this post barely affect my timelines.
Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.
It seems like you think I was strongly disagreeing with your claims; I wasn’t. I upvoted your response and said basically “Seems plausible idk. Could go either way.”
And then I said that it doesn’t really impact the bottom line much, for reasons XYZ. And you agree.
But now it seems like we are opposed somehow even though we seem to basically be on the same page.
For context: I think I didn’t realize until now that some people actually took the evolution anchor seriously as an argument for AGI by 2100, not in the sense I endorse (which is as a loose upper bound on our probability distribution over OOMs of compute) but in the much stronger sense I don’t endorse (as an actual place to clump lots of probability mass around, and naively extrapolate moore’s law towards across many decades). I think insofar as people are doing that naive thing I don’t endorse, they should totally stop. And yes, as Nuno has pointed out, insofar as they are doing that naive thing, then they should really pay more attention to the environment cost as well as the brain-simulation cost, because it could maaaybe add a few OOMs to the estimate which would push the extrapolated date of AGI back by decades or even centuries.
Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.
No, that’s not what I meant. I’m saying that the conversational moves you’re making are not ones that promote collaborative truth-seeking.
Any claim of actual importance usually has a giant tree of arguments that back it up. Any two people are going to disagree on many different nodes within this tree (just because there are so many nodes). In addition, it takes a fair amount of effort just to understand and get to the same page on any one given node.
So, if you want to do collaborative truth-seeking, you need to have the ability to look at one node of the tree in isolation, while setting aside the rest of the nodes.
In general when someone is talking about some particular node (like “evolution anchor for AGI timelines”), I think you have two moves available:
Say “I think the actually relevant node to our disagreement is <other node>”
Engage with the details of that particular node, while trying to “take on” the views of the other person for the other nodes
(As a recent example, the ACX post on underpopulation does move 2 for Sections 1-8 and move 1 for Section 9.)
In particular, the thing not to do is to talk about the particular node, then jump around into other nodes where you have other disagreements, because that’s a way to multiply the number of disagreements you have and fail to make any progress on collaborative truth-seeking. Navigating disagreements is hard enough that you really want to keep them as local / limited as possible.
(And if you do that, then other people will learn that they aren’t going to learn much from you because the disagreements keep growing rather than progress being made, and so they stop trying to do collaborative truth-seeking with you.)
Of course sometimes you start doing move (2) and then realize that actually you think your partner is correct in their assessment given their views on the other nodes, and so you need to switch to move (1). I think in that situation you should acknowledge that you agree with their assessment given their other views, and then say that you still disagree on the top-level claim because of <other node>.
(Aside: So, did I or didn’t I come across as unfriendly/hostile? I never suggested that you said that, only that maybe it was true. This matters because I genuinely worry that I did & am thinking about being more cautious in the future as a result.)
So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?
The thing about changing my mind also resonates—that definitely happened to some extent during this conversation, because (as mentioned above) I didn’t realize Nuno was talking about people who put lots of probability mass on the evolution anchor. For those people, a shift up or down by a couple OOMs really matters, and so the BOTEC I did about how probably the environment can be simulated for less than 10^41 flops needs to be held to a higher standard of scrutiny & could end up being judged insufficient.
So, did I or didn’t I come across as unfriendly/hostile?
You didn’t to me, but also (a) I know you in person and (b) I’m generally pretty happy to be in forceful arguments and don’t interpret them as unfriendly / hostile, while other people plausibly would (see also combat culture). So really I think I’m the wrong person to ask.
So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?
I think you can do both, if it’s clear that you’re doing these as two separate things. (Which could be by having two different comments, or by signposting clearly in a single comment.)
In this particular situation I’m objecting to starting with (2), then switching to (1) after a critique without acknowledging that you had updated on (2) and so were going to (1) instead. When I see that behavior from a random Internet commenter I’m like “ah, you are one of the people who rationalizes reasons for beliefs, and so your beliefs do not respond to evidence, I will stop talking with you now”. You want to distinguish yourself from the random Internet commenter.
(And if you hadn’t updated on (2), then my objection would have been “you are bad at collaborative truth-seeking, you started to engage on one node and then you jumped to a totally different node before you had converged on that one node, you’ll never make progress this way”.)
Something about your characterization of what happened continues to feel unfair & inaccurate to me, but there’s definitely truth in it & I think your advice is good so I will stop arguing & accept the criticism & try to remember it going forward. :)
Hey, thanks for sharing these. They seem like a good starting point. But I don’t know whether to take them literally.
On a quick read, things I may not buy:
So that’s about 1e10 FLOP per second per square meter available. So, you could divide the world into 10x10 meter squares and then have a 1e12 FLOP computer assigned to each square to handle the physics and graphics
Not sure if I buy this decomposition. For instance, taking into account that things can move from one 10x10m region to another/simulating the boundaries seems like it would be annoying. But you could have the world as a series of 10x10 rooms?
dynamically allocate compute so that you have more of it where your creatures are and don’t waste as much simulating empty areas
I buy this, but worried about the world being consistent. There is also a memory tradeoff here.
Well totally this thing would take a fuckton of wall-clock time etc. but that’s not a problem, this is just a thought experiment—“If we did this bigass computation, would it work?” If the answer is “Yep, 90% likely to work” then that means our distribution over OOMs should have 90% by +18.
Mmh, then OOMs of compute stops being predictive of timelines in this anchor, because we can’t just think about how much compute we have but also about whether we can use it for this.
Sorta? Like, yeah, suppose you have 10% of your probability mass on the evolution anchor. Well, that means that like maaaaybe in 2090 or so we’ll have enough compute to recapitulate evolution, and so maaaaybe you could say you have 10% credence that we’ll actually build AGI in 2090 using the recapitulate evolution method. But that assumes basically no algorithmic progress on other paths to AGI. But anyhow if you were doing that, then yes it would be a good counterargument that actually even if we had all the compute in 2090 we wouldn’t have the clock time because latency etc. would make it take dozens of years at least to perform this computation. So then (that component of) your timelines would shift out even farther.
I think this matters approximately zero, because it is a negligible component of people’s timelines and it’s far away anyway so making it move even farther away isn’t decision-relevant.
Well, I agree that this is pretty in the weeds, but personally this has made me view the evolutionary anchor as less forceful.
Like, the argument isn’t “ha, we’re not going to be able to simulate evolution, checkmate AGI doomers”, it’s “the evolutionary anchor was a particularly forceful argument for giving a substantial probability to x-risk this century, even to people who might otherwise be very skeptical. The fact that it doesn’t go through has a variety of small update, e.g., it marginally increases the value of non-x-risk longtermism”
Huh, I guess I didn’t realize how much weight some people put on the evolution anchor. I thought everyone was (like me) treating it as a loose upper bound basically, not something to actually clump lots of probability mass on.
In other words: The people I know who were using the evolutionary anchor (people like myself, Ajeya, etc.) weren’t using it in a way that would be significantly undermined by having to push the anchor up 6 OOMs or so. Like I said, it would be a minor change to the bottom line according to the spreadsheet. Insofar as people were arguing for AGI this century in a way which can be undermined by adding 6 OOMs to the evolutionary anchor then those people are silly & should stop, for multiple reasons, one of which is that maaaybe environmental simulation costs mean that the evolution anchor really is 6 OOMs bigger than Ajeya estimates.
I did this analysis a while back, but it’s worth doing again, let’s see what happens:
If you are spending 1e25 FLOP per simulated second simulating the neurons of the creatures, you can afford to spend 4e24 FLOP per simulated second simulating the environment & it will just be a rounding error on your overall calculation so it won’t change the bottom line. So the question is, can we make a sufficiently detailed environment for 4e24 FLOP per second?
There are 5e14 square meters on the surface of the earth according to wolfram alpha.
So that’s about 1e10 FLOP per second per square meter available. So, you could divide the world into 10x10 meter squares and then have a 1e12 FLOP computer assigned to each square to handle the physics and graphics. If I’m reading this chart right, that’s about what a fancy high-end graphics card can do. (Depends on if you want double or single-precision I think?). That feels like probably enough to me; certainly you could have a very detailed physics simulation at least. Remember also that you can e.g. use a planet 1 OOM smaller than Earth but with more habitable regions, and also dynamically allocate compute so that you have more of it where your creatures are and don’t waste as much simulating empty areas. Also, if you think this is still maybe not enough, IIRC Ajeya has error bars of like +/- SIX orders of magnitude on her estimate, so you can just add 3 OOMs no problem without really changing the bottom line that much.
It would be harder if you wanted to assign a graphics card to each nematode worm, instead of each chunk of territory. There are a lot of nematode worms or similar tiny creatures—Ajeya says 1e21 alive at any given point of time. So that would only leave you with 10,000 flops per second per worm to do the physics and graphics! If you instead wanted a proper graphics card for each worm you’d probably have to add 7 OOMs to that, getting you up to a 100 GFLOP card. This would be a bit higher than Ajeya estimated; it would be +25 OOMs more than GPT-3 cost instead of +18.
Personally I don’t think the worms matter that much, so I think the true answer is more likely to be along the lines of “a graphics card per human-sized creature” which would be something like 10 billion graphics cards which would let you have 5e14 FLOP per card which would let you create some near-photorealistic real time graphics for each human-sized creature.
Then there’s also all the various ways in which we could optimize the evolutionary simulation e.g. as described here. I wouldn’t be surprised if this shaves off 6 OOMs of cost.
Note that this analysis is going to wildly depend on how progress on “environment simulation efficiency” compares to progress on “algorithmic efficiency”. If you think it will be slower then the analysis above doesn’t work.
If I understand you correctly, you are saying that the Evolution Anchor might not decrease in cost with time as fast as the various neural net anchors? Seems plausible to me, could also be faster, idk. I don’t think this point undermines Ajeya’s report though because (a) we are never going to get to the evolution anchor anyway, or anywhere close, so how fast it approaches isn’t really relevant except in very long-timelines scenarios, and (b) her spreadsheet splits up algorithmic progress into different buckets for each anchor, so the spreadsheet already handles this nuance.
Meta: I feel like the conversation here and with Nuno’s reply looks kinda like:
This seems bad:
If you’re going to engage with a subpoint that OP made that was meant to apply in some context (namely, getting an upper bound on timelines), stick within that context (or at least signpost that you’re no longer engaging with the OP).
I don’t really understand why you bothered to do the analysis if you’re not changing the analysis based on critiques that you agree are correct. (If you disagree with the critique then say that instead.)
Yes, and in particular, the mechanism is that environment simulation cost might not decrease as fast as machine learning algorithmic efficiency. (Like, the numbers for algorithmic efficiency are anchored on estimates like AI and Efficiency, those estimates seem pretty unlikely to generalize to “environment simulation cost”.)
Just because someone could change the numbers to get a different output doesn’t mean that the original numbers weren’t flawed and that there’s no value in pointing that out?
E.g. suppose I had the following timelines model:
I publish a report estimating N = 1000, so that my timeline is 3022. If you then come and give a critique saying “actually N should be 10 for a timeline of 2032”, presumably I shouldn’t say “oh, my spreadsheet already allows you to choose your own value of N, so it handles that nuance”.
To be clear, my own view is also that the evolution anchor doesn’t matter, and I put very little weight on it and the considerations in this post barely affect my timelines.
Thanks Rohin, I really appreciate this comment.
Did I come across as unfriendly and hostile? I am sorry if so, that was not my intent.
It seems like you think I was strongly disagreeing with your claims; I wasn’t. I upvoted your response and said basically “Seems plausible idk. Could go either way.”
And then I said that it doesn’t really impact the bottom line much, for reasons XYZ. And you agree.
But now it seems like we are opposed somehow even though we seem to basically be on the same page.
For context: I think I didn’t realize until now that some people actually took the evolution anchor seriously as an argument for AGI by 2100, not in the sense I endorse (which is as a loose upper bound on our probability distribution over OOMs of compute) but in the much stronger sense I don’t endorse (as an actual place to clump lots of probability mass around, and naively extrapolate moore’s law towards across many decades). I think insofar as people are doing that naive thing I don’t endorse, they should totally stop. And yes, as Nuno has pointed out, insofar as they are doing that naive thing, then they should really pay more attention to the environment cost as well as the brain-simulation cost, because it could maaaybe add a few OOMs to the estimate which would push the extrapolated date of AGI back by decades or even centuries.
No, that’s not what I meant. I’m saying that the conversational moves you’re making are not ones that promote collaborative truth-seeking.
Any claim of actual importance usually has a giant tree of arguments that back it up. Any two people are going to disagree on many different nodes within this tree (just because there are so many nodes). In addition, it takes a fair amount of effort just to understand and get to the same page on any one given node.
So, if you want to do collaborative truth-seeking, you need to have the ability to look at one node of the tree in isolation, while setting aside the rest of the nodes.
In general when someone is talking about some particular node (like “evolution anchor for AGI timelines”), I think you have two moves available:
Say “I think the actually relevant node to our disagreement is <other node>”
Engage with the details of that particular node, while trying to “take on” the views of the other person for the other nodes
(As a recent example, the ACX post on underpopulation does move 2 for Sections 1-8 and move 1 for Section 9.)
In particular, the thing not to do is to talk about the particular node, then jump around into other nodes where you have other disagreements, because that’s a way to multiply the number of disagreements you have and fail to make any progress on collaborative truth-seeking. Navigating disagreements is hard enough that you really want to keep them as local / limited as possible.
(And if you do that, then other people will learn that they aren’t going to learn much from you because the disagreements keep growing rather than progress being made, and so they stop trying to do collaborative truth-seeking with you.)
Of course sometimes you start doing move (2) and then realize that actually you think your partner is correct in their assessment given their views on the other nodes, and so you need to switch to move (1). I think in that situation you should acknowledge that you agree with their assessment given their other views, and then say that you still disagree on the top-level claim because of <other node>.
Thanks for this thoughtful explanation & model.
(Aside: So, did I or didn’t I come across as unfriendly/hostile? I never suggested that you said that, only that maybe it was true. This matters because I genuinely worry that I did & am thinking about being more cautious in the future as a result.)
So, given that I wanted to do both 1 and 2, would you think it would have been fine if I had just made them as separate comments, instead of mentioning 1 in passing in the thread on 2? Or do you think I really should have picked one to do and not done both?
The thing about changing my mind also resonates—that definitely happened to some extent during this conversation, because (as mentioned above) I didn’t realize Nuno was talking about people who put lots of probability mass on the evolution anchor. For those people, a shift up or down by a couple OOMs really matters, and so the BOTEC I did about how probably the environment can be simulated for less than 10^41 flops needs to be held to a higher standard of scrutiny & could end up being judged insufficient.
You didn’t to me, but also (a) I know you in person and (b) I’m generally pretty happy to be in forceful arguments and don’t interpret them as unfriendly / hostile, while other people plausibly would (see also combat culture). So really I think I’m the wrong person to ask.
I think you can do both, if it’s clear that you’re doing these as two separate things. (Which could be by having two different comments, or by signposting clearly in a single comment.)
In this particular situation I’m objecting to starting with (2), then switching to (1) after a critique without acknowledging that you had updated on (2) and so were going to (1) instead. When I see that behavior from a random Internet commenter I’m like “ah, you are one of the people who rationalizes reasons for beliefs, and so your beliefs do not respond to evidence, I will stop talking with you now”. You want to distinguish yourself from the random Internet commenter.
(And if you hadn’t updated on (2), then my objection would have been “you are bad at collaborative truth-seeking, you started to engage on one node and then you jumped to a totally different node before you had converged on that one node, you’ll never make progress this way”.)
OK. I’ll DM Nuno.
Something about your characterization of what happened continues to feel unfair & inaccurate to me, but there’s definitely truth in it & I think your advice is good so I will stop arguing & accept the criticism & try to remember it going forward. :)
Hey, thanks for sharing these. They seem like a good starting point. But I don’t know whether to take them literally.
On a quick read, things I may not buy:
Not sure if I buy this decomposition. For instance, taking into account that things can move from one 10x10m region to another/simulating the boundaries seems like it would be annoying. But you could have the world as a series of 10x10 rooms?
I buy this, but worried about the world being consistent. There is also a memory tradeoff here.
Mmh, maybe I’m not so worried about FLOPs per se but about paralelizability/wall-clock time.
Well totally this thing would take a fuckton of wall-clock time etc. but that’s not a problem, this is just a thought experiment—“If we did this bigass computation, would it work?” If the answer is “Yep, 90% likely to work” then that means our distribution over OOMs should have 90% by +18.
Mmh, then OOMs of compute stops being predictive of timelines in this anchor, because we can’t just think about how much compute we have but also about whether we can use it for this.
Sorta? Like, yeah, suppose you have 10% of your probability mass on the evolution anchor. Well, that means that like maaaaybe in 2090 or so we’ll have enough compute to recapitulate evolution, and so maaaaybe you could say you have 10% credence that we’ll actually build AGI in 2090 using the recapitulate evolution method. But that assumes basically no algorithmic progress on other paths to AGI. But anyhow if you were doing that, then yes it would be a good counterargument that actually even if we had all the compute in 2090 we wouldn’t have the clock time because latency etc. would make it take dozens of years at least to perform this computation. So then (that component of) your timelines would shift out even farther.
I think this matters approximately zero, because it is a negligible component of people’s timelines and it’s far away anyway so making it move even farther away isn’t decision-relevant.
Well, I agree that this is pretty in the weeds, but personally this has made me view the evolutionary anchor as less forceful.
Like, the argument isn’t “ha, we’re not going to be able to simulate evolution, checkmate AGI doomers”, it’s “the evolutionary anchor was a particularly forceful argument for giving a substantial probability to x-risk this century, even to people who might otherwise be very skeptical. The fact that it doesn’t go through has a variety of small update, e.g., it marginally increases the value of non-x-risk longtermism”
Huh, I guess I didn’t realize how much weight some people put on the evolution anchor. I thought everyone was (like me) treating it as a loose upper bound basically, not something to actually clump lots of probability mass on.
In other words: The people I know who were using the evolutionary anchor (people like myself, Ajeya, etc.) weren’t using it in a way that would be significantly undermined by having to push the anchor up 6 OOMs or so. Like I said, it would be a minor change to the bottom line according to the spreadsheet. Insofar as people were arguing for AGI this century in a way which can be undermined by adding 6 OOMs to the evolutionary anchor then those people are silly & should stop, for multiple reasons, one of which is that maaaybe environmental simulation costs mean that the evolution anchor really is 6 OOMs bigger than Ajeya estimates.