Preserving a Presidential Administration’s Social Media Activity

Project spearheaded by GW Libraries is capturing social media produced by the federal government under President Obama.

January 9, 2017

Alt Text

Laura Wrubel and Daniel Kerchner have developed a software that harvests social media in data form. (Logan Werlinger/GW Today)

By Ruth Steinhardt

As the inauguration of Donald Trump nears, the George Washington University libraries are taking steps to preserve one ephemeral portion of President Obama’s administrative legacy: the floods of material posted by federal entities on social media.

The effort is part of the End of Term Presidential Harvest 2016 (EOT), a collaborative volunteer project to preserve public U.S. government web sites. Partners include the Internet Archive, the Library of Congress, the U.S. Government Publishing Office and libraries at the University of North Texas and Stanford University.

“This is part of American history,” said Laura Wrubel, software development librarian at the GW Libraries. “If someone in the future wanted to research this period, if they didn’t know about the government’s web presence—long gone by then—they would be missing a huge part of our history.”

This is the third EOT, with previous harvests at the end of George W. Bush’s second term in 2008 and at the end of President Obama’s first term in 2012. But this is the first preservation effort to include social media.

“The government doesn’t consider all of their social media to be part of the federal record,” said Daniel Kerchner, senior software developer at the GW Libraries. “They have an obligation to capture some of it, but even so, that doesn’t mean it would be available to researchers or to the general public.”

The GW Libraries’ major innovation is open-source software called Social Feed Manager, developed as a prototype in 2012 and improved since 2014 with a grant from the National Historical Publications & Records Commission. The software archives not only the text of a post on a social media platform like Twitter, Flickr or Tumblr, but also the metadata—time of posting, number of likes or retweets—associated with that post.

“It’s not only about gathering content that is likely to change or disappear with the [presidential] transition,” Ms. Wrubel said. “Capturing this social media as data will be really useful to future researchers who want to study this presidential administration on the web.”

To begin the EOT, the GW Libraries team used the U.S. Digital Registry to start a comprehensive list of accounts constituting the federal government’s social media presence, including agencies, offices and public figures. Quality control has been part of the job: Some accounts on the list were no longer active, and a few were missing.

The results were overwhelming. The first harvest covered almost 3,000 accounts and captured more than 5 million Tweets. And the team will continue to archive its list of feeds until March, well after the presidential transition.

Part of the complexity of the project lies in ensuring its durability. Historians now use physical artifacts like letters and documents to recapture a lost past, Mr. Kerchner pointed out. But few such relics will be accessible to researchers in the future.

“You don’t even have to think that far ahead,” he said. “What if you had data stored on diskettes from 15 years ago? How would you even read that now?”

Even the language by which we understand social media is in constant evolution.

“Researchers are already thinking about how to study these things,” Ms. Wrubel said. “For example, in five years, will people understand what it means to click the ‘heart’ on Twitter? It used to be a star. We’re starting to see questions like that.”

In this case, there will be no physical archive—no shadowy basement repository stacked with printed-out Tweets. Data collected by the EOT will be stored digitally on the Internet Archive, which will have to evolve to stay accessible.

“That’s one advantage of working with partners like the Internet Archive who are strong in this area,” Mr. Kerchner said. “They’ve been around for 20 years or so, and their reason for existing is to make web history available.”

In fact, the technology is already in use by researchers in the present. Faculty and students at GW and elsewhere have used Social Feed Manager for projects on targeted messaging, gender and political campaigns. Ms. Wrubel, Mr. Kerchner and their team have worked with classes in schools across the university, including the School of Engineering and the School of Media and Public Affairs. One religion student used one of Social Feed Manager’s newest capabilities—capturing the images associated with social media posts—to study the portrayal of Muslim women by mainstream news outlets on Twitter.

“This is definitely an ‘only at GW’ kind of thing,” Mr. Kerchner said.