I briefly touched on the topic of âdoomsday archivesâ in a recent post. As far as Iâm aware currently, the two most notable organizations working in this area are the Arctic World Archive, whose archive is located in Svalbard and whose headquarters is in mainland Norway, and the Arch (pronounced âarkâ) Mission Foundation based in the U.S. with archives in multiple locations, including on the Moon â really.
Iâm personally interested in storage media that are long-lasting and resilient to major disasters and donât require a continuous electricity supply or continuous copying over to new discs or new hard drives. Paper is probably the best medium, all things considered. Itâs cheap, it lasts a long time, and anyone can read it without anything other than the paper itself, as long as they can read. The main disadvantage is storage density.
In the form of ebooks, you could keep 50 million books on fifty 20 TB hard drives in a closet for about $10,000. Storing 50 million paper books would cost many millions of dollars, both for the books and the building to keep them in. So, hard drives are unbelievably superior in terms of storage density. But they are incredibly short-lasting and non-resilient to major disasters.
New technologies are trying to combine the long-lastingness of paper with the storage density of hard drives. The Arctic World Archive uses piqlFilm, developed by its associated for-profit company, Piql, in Norway. piqlFilm involves printing QR codes on film, estimated to last 2,000 years under ideal conditions, along with normal text instructions for decoding the QR codes. (You can also just print text or black-and-white images on it, but thatâs much less information dense.) The Arch Mission Foundation uses Nanofiche, which is text and images engraved on nickel. It only requires magnification to read.
Microsoft is working on putting data into quartz glass. The research project is called Project Silica. The storage medium is a clear square piece of quartz glass about the size of a floppy disk or a glass coaster. Each one can hold 7 TB. The estimated longevity is over 10,000 years. However, it requires advanced technology to read the data. It isnât intended for a doomsday archive. Microsoft is exploring its use for cold storage for cloud data as a competitor to hard drive storage, on the thinking that rarely accessed data could be more cheaply stored on quartz glass than on idle hard drives.
Encoding data on dehydrated DNA is another idea that has received a lot of attention and is an active area of research. Dehydrated DNA can last a long time. DNA is also incredibly information dense. But it is ungodly expensive. It also requires advanced technology to read.
We are somewhere in between having no good options for storage and having the perfect, magical solution. Weâre somewhere in the middle. Paper is really excellent and might be the best option overall. piqlFilm is a very intriguing option, but it costs about $30 per GB and requires cameras and computers to interpret. I donât know the cost of Nanofiche, but I imagine itâs expensive.
In a way, books and libraries are already fairly good doomsday archives. Particularly since they are everywhere. The main three ways paper could improved upon are:
Storing data types other than text (with a rare black-and-white photo or diagram)
Storing a lot more information in a small physical volume for a manageable cost, e.g. storing the equivalent of 50 million ebooks in a closet for $10,000
Extending longevity even further than the few hundred years estimated for modern acid-free paper with a small alkaline buffer (although longevity depends a lot on storage conditions, particularly humidity, temperature, and exposure to direct sunlight â cool, dry, and dark is ideal)
On the first point, I think piqlFilm might be the current best option for storing recorded music for hundreds of years (or more). Vinyl records might also be quite good, although itâs hard to find any reliable, hard data on their estimated longevity. Every source seems to just say about 100 years without any explanation of how that estimate was arrived at.
Movies and other video can be printed on film. Movie studios routinely store a copy of heavily digital movies like Avengers: Endgame on analog film as part of their standard backup strategy. Modern film, which uses the same polyester base as piqlFilm, apparently has good longevity and is expected to last for centuries when stored properly.
The part of doomsday archiving that is possibly more vexing than just storing the data is figuring out how to present information to a society in ruins that has suffered potentially decades or centuries of disruption, decline, or collapse from a major disaster like a nuclear war (God forbid). This is particularly salient when the storage medium requires building a certain level of technology to read it, which is the case for every medium except paper. But even apart from that, there are questions of how to present the information in a way that is understandable, that shows people where to start, and tells them whatâs most important. You canât necessarily rely on people in a future scenario like that knowing what to look for and if the pile of information is extremely large â e.g. if it contains millions of books â then it could be daunting.
If anyone wants to ask me random questions about this or related topics to see if my research has already turned up an answer for you, please ask away (even if youâre seeing this comment a long time from now, although you may need to send me a private message that includes your email address).
M-Discs are certainly interesting. Whatâs complicated is that the company that invented M-Discs, Millenniata, went bankrupt, and that has sort of introduced a cloud of uncertainty over the technology.
There is a manufacturer, Verbatim, with the license to manufacture discs using the M-Disc standard and the M-Disc branding. Some customers have accused Verbatim of selling regular discs with the M-Disc branding at a huge markup and this accusation could be completely wrong and baseless â Verbatim has denied it â but itâs sort of hard to verify whatâs going on anymore.
If Millenniata were still around, they would be able to tell us for sure whether Verbatim is still complying properly with the M-Disc standard and whether we can rely on their discs. I donât understand the nuances of optical disc storage well enough to really know whatâs going on. I would love to see some independent third-party who has expertise in this area and who is reputable and trustworthy tell us whether the accusations against Verbatim are really just a big misunderstanding.
Millenniataâs bankruptcy is an example of the unfortunate economics of archival storage media. Rather than pay more for special long-lasting media, itâs far more cost-effective to use regular, short-term storage media â today, almost entirely hard drives â and periodically copy over the data to new media. This means the market for archival media is small.
As for how many physical locations digital data is kept in, that depends on what it is. The CLOCKSS academic archive keeps digital copies of 61.4 million academic papers and 550,000 books in 12 distinct physical locations. I donât know how Wikipedia does its backups, mirroring, or archiving internally, but every month an updated copy of the English Wikipedia is released that anyone can download. Given Wikipediaâs openness, it is unusually well-replicated across physical locations, just considering the number of people who download copies.
I also donât know how the EA Forum manages its backups or archiving internally, but a copy of posts can be saved using the Wayback Machine, which will create at least 2 additional physical copies on the Internet Archiveâs servers. I donât know what Google does with YouTube videos. I think for Google Drive data they keep enough data to recover files in at least two physically separate datacentres, but those could be two datacentres in the same region. I also donât know if they do the same for YouTube data â I hope so.
I think in the event of a global catastrophe like a nuclear war, what we should think about is not whether the data would physically survive somewhere on a hard drive, but, more practically, whether it would ever actually be recovered. If society is in ruins, then it doesnât really matter if the data physically survives somewhere unless it can be accessed and continually copied over so that itâs preserved. Since hard drives last for such a short time, the window of time for society to recover enough to find, access, and copy the data from hard drives is quite narrow.
I donât know if you were asking about paper books or ebooks, but for paper books, it seems clear that for any book on the New York Times bestseller list, there must be at least one copy of that book in many different libraries, bookstores, and homes in many locations. I donât know how to think about the probability of copies ending up in Argentina, Iceland, or New Zealand, but it seems like at least a lot of English bestsellers must end up in various libraries, stores, and homes in New Zealand.
Paper books printed on acid-free paper with a 2% alkaline reserve, which, as far as I understand, is the standard for paper books printed over the last 20 years or so, are expected to last over 100 years provided they are kept in reasonably cool, dry, and dark conditions. Iâm not sure how exactly the longevity would be estimated to change for books kept in a tropical climate vs. a temperate one. The 2% alkaline reserve on the paper is so that as the natural acid in the cellulose in the paper is slowly released over time, the alkaline counteracts it and keeps the paper neutral. Paper is really such a fascinating technology and more miraculous than we give it credit for.
Vinyl records are more important for preserving culture â specifically music â rather than knowledge or information, but itâs interesting that vinyl sales are so high and that vinyl would probably end up being the most important technology for the preservation of music in some sort of global disaster scenario. In 2024, the top ten bestselling albums on vinyl in the U.S. sold between 175,000 copies (for Olivia Rodrigo at #10) and 1,489,000 copies (for Taylor Swift at #1). The principle here is the same as for paper books. You have to imagine these records are spread out all over the United States. Given that both vinyl records and many of the same musicians are popular in other countries like Canada, the UK, Australia, and New Zealand, it seems likely there are many copies elsewhere in the world too.
Since looking into this topic, I have warmed considerably on vinyl. I didnât really get the vinyl trend before. I guess I still donât, really, but now I think vinyl is a wonderful thing, even if the reasons people are buying it are not that it makes the preservation of music more resilient to a global disaster.
I didnât need any convincing to be fond of paper books, but paper just seems more and more impressive the more I think about it.
I briefly touched on the topic of âdoomsday archivesâ in a recent post. As far as Iâm aware currently, the two most notable organizations working in this area are the Arctic World Archive, whose archive is located in Svalbard and whose headquarters is in mainland Norway, and the Arch (pronounced âarkâ) Mission Foundation based in the U.S. with archives in multiple locations, including on the Moon â really.
Iâm personally interested in storage media that are long-lasting and resilient to major disasters and donât require a continuous electricity supply or continuous copying over to new discs or new hard drives. Paper is probably the best medium, all things considered. Itâs cheap, it lasts a long time, and anyone can read it without anything other than the paper itself, as long as they can read. The main disadvantage is storage density.
In the form of ebooks, you could keep 50 million books on fifty 20 TB hard drives in a closet for about $10,000. Storing 50 million paper books would cost many millions of dollars, both for the books and the building to keep them in. So, hard drives are unbelievably superior in terms of storage density. But they are incredibly short-lasting and non-resilient to major disasters.
New technologies are trying to combine the long-lastingness of paper with the storage density of hard drives. The Arctic World Archive uses piqlFilm, developed by its associated for-profit company, Piql, in Norway. piqlFilm involves printing QR codes on film, estimated to last 2,000 years under ideal conditions, along with normal text instructions for decoding the QR codes. (You can also just print text or black-and-white images on it, but thatâs much less information dense.) The Arch Mission Foundation uses Nanofiche, which is text and images engraved on nickel. It only requires magnification to read.
Microsoft is working on putting data into quartz glass. The research project is called Project Silica. The storage medium is a clear square piece of quartz glass about the size of a floppy disk or a glass coaster. Each one can hold 7 TB. The estimated longevity is over 10,000 years. However, it requires advanced technology to read the data. It isnât intended for a doomsday archive. Microsoft is exploring its use for cold storage for cloud data as a competitor to hard drive storage, on the thinking that rarely accessed data could be more cheaply stored on quartz glass than on idle hard drives.
Encoding data on dehydrated DNA is another idea that has received a lot of attention and is an active area of research. Dehydrated DNA can last a long time. DNA is also incredibly information dense. But it is ungodly expensive. It also requires advanced technology to read.
We are somewhere in between having no good options for storage and having the perfect, magical solution. Weâre somewhere in the middle. Paper is really excellent and might be the best option overall. piqlFilm is a very intriguing option, but it costs about $30 per GB and requires cameras and computers to interpret. I donât know the cost of Nanofiche, but I imagine itâs expensive.
In a way, books and libraries are already fairly good doomsday archives. Particularly since they are everywhere. The main three ways paper could improved upon are:
Storing data types other than text (with a rare black-and-white photo or diagram)
Storing a lot more information in a small physical volume for a manageable cost, e.g. storing the equivalent of 50 million ebooks in a closet for $10,000
Extending longevity even further than the few hundred years estimated for modern acid-free paper with a small alkaline buffer (although longevity depends a lot on storage conditions, particularly humidity, temperature, and exposure to direct sunlight â cool, dry, and dark is ideal)
On the first point, I think piqlFilm might be the current best option for storing recorded music for hundreds of years (or more). Vinyl records might also be quite good, although itâs hard to find any reliable, hard data on their estimated longevity. Every source seems to just say about 100 years without any explanation of how that estimate was arrived at.
Movies and other video can be printed on film. Movie studios routinely store a copy of heavily digital movies like Avengers: Endgame on analog film as part of their standard backup strategy. Modern film, which uses the same polyester base as piqlFilm, apparently has good longevity and is expected to last for centuries when stored properly.
The part of doomsday archiving that is possibly more vexing than just storing the data is figuring out how to present information to a society in ruins that has suffered potentially decades or centuries of disruption, decline, or collapse from a major disaster like a nuclear war (God forbid). This is particularly salient when the storage medium requires building a certain level of technology to read it, which is the case for every medium except paper. But even apart from that, there are questions of how to present the information in a way that is understandable, that shows people where to start, and tells them whatâs most important. You canât necessarily rely on people in a future scenario like that knowing what to look for and if the pile of information is extremely large â e.g. if it contains millions of books â then it could be daunting.
If anyone wants to ask me random questions about this or related topics to see if my research has already turned up an answer for you, please ask away (even if youâre seeing this comment a long time from now, although you may need to send me a private message that includes your email address).
Interesting, thanks! I might actually sign up for the Arctic Archive thing! I donât see you mention m-discs like thisâany reason for that?
Also, do you have any takes on how many physical locations a typical X is stored in, for various X?
X could be:
A wikipedia page
An EA Forum post
A YouTube video
A book thatâs sold 100/â1k/â10k/â100k/â1M copies
Etc
M-Discs are certainly interesting. Whatâs complicated is that the company that invented M-Discs, Millenniata, went bankrupt, and that has sort of introduced a cloud of uncertainty over the technology.
There is a manufacturer, Verbatim, with the license to manufacture discs using the M-Disc standard and the M-Disc branding. Some customers have accused Verbatim of selling regular discs with the M-Disc branding at a huge markup and this accusation could be completely wrong and baseless â Verbatim has denied it â but itâs sort of hard to verify whatâs going on anymore.
If Millenniata were still around, they would be able to tell us for sure whether Verbatim is still complying properly with the M-Disc standard and whether we can rely on their discs. I donât understand the nuances of optical disc storage well enough to really know whatâs going on. I would love to see some independent third-party who has expertise in this area and who is reputable and trustworthy tell us whether the accusations against Verbatim are really just a big misunderstanding.
Millenniataâs bankruptcy is an example of the unfortunate economics of archival storage media. Rather than pay more for special long-lasting media, itâs far more cost-effective to use regular, short-term storage media â today, almost entirely hard drives â and periodically copy over the data to new media. This means the market for archival media is small.
As for how many physical locations digital data is kept in, that depends on what it is. The CLOCKSS academic archive keeps digital copies of 61.4 million academic papers and 550,000 books in 12 distinct physical locations. I donât know how Wikipedia does its backups, mirroring, or archiving internally, but every month an updated copy of the English Wikipedia is released that anyone can download. Given Wikipediaâs openness, it is unusually well-replicated across physical locations, just considering the number of people who download copies.
I also donât know how the EA Forum manages its backups or archiving internally, but a copy of posts can be saved using the Wayback Machine, which will create at least 2 additional physical copies on the Internet Archiveâs servers. I donât know what Google does with YouTube videos. I think for Google Drive data they keep enough data to recover files in at least two physically separate datacentres, but those could be two datacentres in the same region. I also donât know if they do the same for YouTube data â I hope so.
I think in the event of a global catastrophe like a nuclear war, what we should think about is not whether the data would physically survive somewhere on a hard drive, but, more practically, whether it would ever actually be recovered. If society is in ruins, then it doesnât really matter if the data physically survives somewhere unless it can be accessed and continually copied over so that itâs preserved. Since hard drives last for such a short time, the window of time for society to recover enough to find, access, and copy the data from hard drives is quite narrow.
I donât know if you were asking about paper books or ebooks, but for paper books, it seems clear that for any book on the New York Times bestseller list, there must be at least one copy of that book in many different libraries, bookstores, and homes in many locations. I donât know how to think about the probability of copies ending up in Argentina, Iceland, or New Zealand, but it seems like at least a lot of English bestsellers must end up in various libraries, stores, and homes in New Zealand.
Paper books printed on acid-free paper with a 2% alkaline reserve, which, as far as I understand, is the standard for paper books printed over the last 20 years or so, are expected to last over 100 years provided they are kept in reasonably cool, dry, and dark conditions. Iâm not sure how exactly the longevity would be estimated to change for books kept in a tropical climate vs. a temperate one. The 2% alkaline reserve on the paper is so that as the natural acid in the cellulose in the paper is slowly released over time, the alkaline counteracts it and keeps the paper neutral. Paper is really such a fascinating technology and more miraculous than we give it credit for.
Vinyl records are more important for preserving culture â specifically music â rather than knowledge or information, but itâs interesting that vinyl sales are so high and that vinyl would probably end up being the most important technology for the preservation of music in some sort of global disaster scenario. In 2024, the top ten bestselling albums on vinyl in the U.S. sold between 175,000 copies (for Olivia Rodrigo at #10) and 1,489,000 copies (for Taylor Swift at #1). The principle here is the same as for paper books. You have to imagine these records are spread out all over the United States. Given that both vinyl records and many of the same musicians are popular in other countries like Canada, the UK, Australia, and New Zealand, it seems likely there are many copies elsewhere in the world too.
Since looking into this topic, I have warmed considerably on vinyl. I didnât really get the vinyl trend before. I guess I still donât, really, but now I think vinyl is a wonderful thing, even if the reasons people are buying it are not that it makes the preservation of music more resilient to a global disaster.
I didnât need any convincing to be fond of paper books, but paper just seems more and more impressive the more I think about it.