This is likely based on using a FP32 number representation rather than the FP16 number representation which is now more common for large language models, including BLOOM
BLOOM and Galactica-130B already support INT8. GLM-130B supports INT4, and the developers of LLM.int8() are working on int4.
However, I am 80% confident that before July 2022, no other GPT-3-like models had their trained weights widely available for download.
While this is true, GLM-130B was released for download in August.
I estimate there are 5000 people (90% CI: 100 to 45,000) that are capable of running BLOOM independently.[2]
I ran Galactica-120B before via HuggingFace, it only took me about ~5 hours and $10. Considering BLOOM is also a HuggingFace model- which almost always run easily- this seems like a serious underestimation. The number of people who feel capable of running such a model and are interested in doing so, however, is much smaller.
Most CS graduates could in principle afford the financial cost of $240 to run BLOOM for one day, but running BLOOM for a year (say) would then cost ~$90K which would only be affordable for perhaps tens to hundreds of individuals.
Many software engineers make post-tax $190K or $300K a year.
BLOOM and Galactica-130B already support INT8. GLM-130B supports INT4, and the developers of LLM.int8() are working on int4.
While this is true, GLM-130B was released for download in August.
I ran Galactica-120B before via HuggingFace, it only took me about ~5 hours and $10. Considering BLOOM is also a HuggingFace model- which almost always run easily- this seems like a serious underestimation. The number of people who feel capable of running such a model and are interested in doing so, however, is much smaller.
Many software engineers make post-tax $190K or $300K a year.