Hey David, thanks for this excellent comment.
Re: cooling skepticism, actually this has been helpful. On review I think the net rejection will be greater than 633 W/m2. You’re right, we get absorption = ε for longwave radiation from Earth (Kirchhoff would not be pleased).
On view factor, hmm I think this will just vary over the orbital band. Computing the analytic per-face as F = (1/π)[θE − ½sin(2θE)] with sin(θE) = R⊕/(R⊕+h) I get:
Altitude | F per face | F total |
|---|---|---|
550 km | 0.258 | 0.515 |
1000 km | 0.194 | 0.388 |
2000 km | 0.118 | 0.236 |
Maybe we should go worst case and also initial ODCs would prefer to be on the low end of altitude for as long as slots are available (lower cost to orbit, less radiation) You do I think want to go high enough to avoid occasional shading in dawn-dusk-SSO so perhaps ~675km and up. If we correct the absorptivity for Earth IR and take the low end view factor, I get your 522 W/m2. That looks like a ~1% increase in total cost for ODCs if you’re right.
While checking this though it occurs to me you should be able to be have the radiators edge on to the sun while still radiating from both sides, something like this:
Basically a Starlink v3 with panels at 90 degree pivots. Then with shading from direct sunlight I think I get 650 W/m2 for 675 km altitude, F = 0.473, so an improvement on radiator performance overall. I’ll have to think about this a bit more and potentially update the appendix. Certainly we can fix the erroneous use of α in the 3rd term.
Agree loss from averaging over radiator temp looks modest.
Also agree that scaling the plumbing to a massive modular station architecture looks rough. Also there are issues with stresses/strains due to maneuvers for larger orbital platforms and some structural scaling that has to happen to avoid e.g. floppiness. My guess is that the architecture isn’t viable at least near-term.
Re: inference, not my area of expertise and I don’t think the computer architecture totally decided but from looking into it I think it looks like you’d only be handing off the query and response. The routing should be the same as for traditional satellites so this problem is already ~solved (though you may need to scale the satellite mesh as demand/traffic increase). The orbital compute is in a polar orbit so typically not overhead. The trip in a kind of worst case scenario might look something like this:
User → ground station (let’s say 100 Gbps uplink from ground station)
Ground station → uplink satellite(s) (whichever is in view at the moment, can stream seamlessly as multiple pass overhead)
Intersatellite link hop, routing toward wherever the GPUs are, let’s say there is as yet only one cluster in orbit and it takes 15 hops to reach it, each at ~100 Gbps (Starlink v2)
Arrive at cluster → run inference → return output (this is where the KV cache, session memory etc. stays, in the compute cluster)
ISL hops routing back to ground station (say 15 hops at 100 Gbps again, each is maybe 5.5 ms)
Downlink to ground station → serve output to user
For example I think for a 100 GB workload through a 100 Gbps optical ground station, total time is ~8 seconds serialization/transfer, essentially the same ~8 seconds as terrestrial on 100 Gbps direct connect, plus something like 175 ms of constellation overhead.
Re: terrestrial solar and battery. Good points: these make terrestrial microgrids look a good bit worse. For solar on Earth in addition to the seasonality we’re also assuming some of the best solar sites in the world so this should be fairly bullish for terrestrial data centers. We didn’t spot any fundamental blockers to scaling microgrids through some combo of solar overbuy + battery and gas but prices may be at a premium either for turbines or for batteries as you say. In some sense it seems like the data center buildout may have hyperscalers acting like water flowing down hill, pivoting into whichever buildout channel offers least resistance at the moment. Similarly if ODCs start going up en masse there could be lower lying supply chain issues that emerge. The most biting constraint of all is probably chips and memory.
Thanks for this guys, re Murphy’s law, it’s true we can just have system level failures (e.g. cooling unit, unseated connects, flywheel breaks and we lose pointing etc.) Then worst case we lose an entire satellite this way. It’s difficult to say quantitatively what failure rates may be so we bill this as a key uncertainty. ‘Space quals’ would mean every part would undergo extensive testing and many iterations for reliability (tests include e.g. vibe table, acoustic, pyroshock, vacuum testing and EMI/EMC) so a host of possible bugs would be worked out before the data centers get off the ground. Even so, certainly things do fail. One way of getting at this could be again via Starlink which has over 10k active satellites. It looks like failure rates for Starlink V2 have been around 1.1% (3 out of 5 years into their lifecycle) based on this catalog by McDowell. It’s my understand that SpaceX finds it more economical at their scale to actually skimp a bit on space quals and accept higher failure rates since Starlink’s satellites are relatively cheap. This would flip the other way if you have valuable chips on board. On the other hand there are more moving parts and more potential points of failure for ODCs than for Starlink sats so not clear to me which way this swings. You could say for example perhaps 1% of all ODC satellites fail or you could go higher or lower for the reasons I mention and then tac that on at a X% all-cost increase. True failure rates are a significant uncertainty overall.
re: back calculation, the wattage and scale is consistent with SpaceX/Musks V3 specs, we just move them to the dawn-dusk-SSO orbit instead of the typical orbit where they get higher power draw.