April 14, 2015  |  Conservation, Media Conservation
MoMA’s Digital Art Vault
MoMA's data center. Photo: Ben Fino-Radin

MoMA’s data center. Photo: Ben Fino-Radin

Recently on Inside/Out, we heard from Assistant Media Conservator Peter Oleksik about MoMA’s efforts to preserve and digitize its collection of analog video art, amassed over the course of four decades. The heroic undertaking of digitizing over 4,000 videotapes was absolutely critical for preservation and access purposes. However, when we digitize analog videotape, we have just begun a new chapter in the artwork’s life, one that is rife with grave challenges and risks that are unique to digital materials. In another recent post, we learned from Media Conservator Kate Lewis that it is increasingly common for time-based media artworks to be delivered to the Museum in digital form, due to evolving tools and artistic practices. Today I’ll describe how MoMA has faced head-on the significant challenges in digital preservation by designing a state-of-the-art digital vault for these collections. In order to distill some rather technical and complex ideas that inform this effort, I’ll break this digital art vault down into three parts: the packager, the warehouse, and the indexer.


The packager addresses the most fundamental challenge in digital preservation: all digital files are encoded. They require special tools in order to be understood as anything more than a pile of bits and bytes. Just as a VHS tape is useless without a VCR, a digital video file is useless without some kind of software that understands how to interpret and play it, or tell you something about its contents. At least with a VHS tape you can hold it in your hand and say, “Hey, this looks like a VHS tape and it probably has an analog video signal recorded on it.” But there is essentially nothing about a QuickTime .MOV file that says, “Hello, I am a video file! You should use this sort of software to view me.” We rely on specially designed software—be it an operating system or something more specialized—to tell us these things. The problem is that these tools may not always be around, or may not always understand all formats the way they do today. This means that even if we manage to keep a perfect copy of a video file for 100 years, no one may be able to understand that it’s a video file, let alone what to do with it. To avoid this scenario, the “packager”—free, open-source software called Archivematica—analyzes all digital collections materials as they arrive, and records the results in an obsolescence-proof text format that is packaged and stored with the materials themselves. We call this an “archival information package.”


In addition to the issue of how to ensure that our successors will understand what a given stream of bits is supposed to represent, we have also the problem of authenticity. How can we prove in 100 years that a given digital object in the collection has not become corrupt, and has not been maliciously modified, since the moment it entered the collection? It would be of course impossible to periodically manually inspect millions of digital files. To address this issue, the packager passes each and every digital object through a cryptographic algorithm called a “checksum.” The checksum value for one digital file is essentially a sequence of a few hundred letters and numbers. This provides us with the ability to come back to an archival package in the future, run the digital files through the same cryptographic process, and check to make sure that we wind up with the same values that were originally recorded. So in summary, these archival packages contain not just MoMA’s digital collections, but the information that we will need in the future in order to understand what the materials are and to confirm their authenticity. These archival packages are then sent off to what we call the “warehouse”—a digital storage system maintained by the infrastructure division of MoMA’s IT department.

This is the digital equivalent of MoMA QNS, our physical art-storage facility in Long Island City. The “warehouse” is a very large cluster of hard drives configured as a Redundant Array of Independent Disks (RAID) that lives in our data center at 53 Street, along with a duplicate of the entire cluster that lives offsite at MoMA QNS. This has served us well for about five years, but this type of disk-based storage becomes an untenable expense with very large amounts of data. MoMA’s digital collection currently is about 80 terabytes in size (80,000 gigabytes). This is a lot of data, but it is minuscule compared to our anticipated collection growth over the course of the next 10 years. As MoMA acquires more digital artworks, and as the image resolutions used by artists and filmmakers increase, we project the digital collection to grow to approximately 1.2 petabytes (1.2 million gigabytes) by 2025.

A close-up of one of the servers currently used to store our digital collections. Photo: Ben Fino-Radin

A close-up of one of the units currently used to store our digital collections materials. Photo: Ben Fino-Radin

It would be irresponsibly expensive to continue using hard drive storage, as it was not quite intended for this scale of data. We are currently in the final stages of designing a completely new “warehouse” with a company called Arkivum. This system will include a small cluster of hard drives, but for primary long-term storage it adds a very cool new element to the mix: data tapes. When archival packages are first stored, they land on the cluster of disks, but are shortly thereafter copied to data tape, a process that is automated by software (and robots!). The video below provides an inside look at a machine very similar to the one that will be storing the projected 1.2 million gigabytes in MoMA’s digital collection.

This system will allow us to store the projected 1.2 million gigabytes of digital collections material redundantly in three locations: the Museum, our art storage facility in Long Island City, and our film preservation center in Hamlin, Pennsylvania.

The two parts of MoMA’s digital art vault we have discussed here (the packager, and the warehouse) ensure that we will be able to keep a bit-for-bit stable and authentic copy of all digital collections objects, and that we will be able to understand what these objects are and how to use them decades into the future. Unfortunately, neither the packager nor the warehouse facilitates day-to-day, active management of the contents of the warehouse. MoMA searched near and far for a system to facilitate these aspects, and found none. So we built one ourselves. Stay tuned for the next post for more about this new system.