It’s hard to believe that MoMA’s website, which celebrates its 20th anniversary today, is older than Google. It began with two relatively simple (by today’s standards) HTML exhibition sites for the Mutant Materials and Video Spaces exhibitions in 1995. Since then, over 200 exhibition sites have been created, documenting not only the Museum’s evolving curatorial interests, but also huge changes in Web coding and design. This collection of exhibition sites almost serves as its own online museum of the Internet. Clicking through them helps us see the major trends in Web design, coding, and plug-ins (Shockwave, anyone?). For the average onlooker, it’s a nostalgic tour through the history of website design. For the Web archivist, it’s a mountain to climb—an exciting, sometimes harrowing, and ultimately rewarding process that can prepare you for anything the Internet might throw at you.
But first, what has led us to this point, discussing how a Web archivist might feel about moma.org? Well, since 2013, with funding from The Andrew W. Mellon Foundation, the New York Art Resources Consortium (NYARC)—a collaboration between the libraries of the Brooklyn Museum, The Frick Collection, and MoMA—has been actively Web-archiving born-digital art-historical resources since 2013. This initiative addresses the “digital black hole” by attempting to prevent significant art-historical resources from being lost to future scholars. While NYARC’s Web archiving project targets catalogues raisonnés, artists’ websites, auction catalogs, and other born-digital materials produced outside of our museums, a key component is to also archive our own institutional sites, including the moma.org domain.
The NYARC team has been archiving MoMA’s online presence since the beginning of 2014. In this time, we are proud to say that of our 200-plus exhibition sites, only 14 are not yet fully archived, and these are in process. The process is the tricky part: some sites are easier to archive than others.
For some sites—such as Mutant Materials, the site that started it all—it’s usually enough to turn the Web crawler loose to follow each link in the site, collecting the structural .html and .css files along with any embedded image or sound files. However, other some sites are more complex and require human intervention. While Web crawlers get us pretty far in the archiving process, all of the work they’ve done must be verified by a Web archivist during QA (quality assurance). Any links missed by the Web crawler can then be gathered manually. Unfortunately, in the case of more complex sites, this can be a lot of links.
For example, the site for the 2003 Kiki Smith: Prints, Books, & Things exhibition (shown above) is primarily built in Flash. This means the site is made up of dozens of .swf files, many of which have to play through before revealing the next set of .swf, .html, and .jpeg files. The Web crawler, lacking the patience to wait, misses those files. This is one of the reasons so many sites on the Internet Archive’s Wayback Machine appear incomplete. Rectifying this leads the Web archivist into a multiweek cycle of activating, manually crawling, and waiting. Once all of these files are crawled, however, they live bundled in a WARC file (an archival file that combines all of the archived digital files and resources of a website) to be hosted by the Internet Archive and accessed by researchers in perpetuity. The image below shows a side-by-side view of the moma.org homepage with and without the Web archivist’s intervention. As you can see, our NYARC capture is far more complete.
As MoMA’s site enters its 21st year, it is certain to continue to grow and change, this project ensures that past iterations of moma.org will live on.
To look at our archive of moma.org, along with our other Web-archived collections, visit nyarc.org/webarchive.