It is fantastic to see members of our library communities
adding their own knowledge and opinions to our content through use of features
such as tags and comments, and social media tools such as Twitter and Facebook. More libraries are opening their content and
sites to their communities through these tools and features than ever have
before. We call this content user generated content (UGC) or social metadata.
But if we think about it for too long it gives us a big
headache. Being of the ‘collecting’ mind we really want to care for and keep the
UGC in the same way we care for our collection content. Caring for it means:
- knowing how much has been added and keeping meaningful statistics.
- keeping the UGC in context with the data the users meant it to be related to.
- archiving it for the long-term.
- being able to migrate it along with our own content as our services and interfaces change in the future.
- being able to mobilise it to share with other services.
- being able to easily supply it back to the original creators if they want it.
Firstly let’s take a simple concept. A member of the
community is actively engaged with your site.
They are contributing a lot of data to it in the form of comments and
descriptions. After a while they want to
get all of ‘their’ data out so they can use it for something else they are
working on. Let’s call this ‘user takeout’. Seems reasonable, seems simple, but
I don’t know of any library site that does this. For example a ‘user takeout’ option in Trove
newspapers would let a contributor get a copy of all of the comments and tags
they have added to historic newspaper articles. You may ask “Do people want to do this?” Contributors seem to accept that content they
add to sites will be locked to that site.
I’m not sure they even think about it very much when they start to add
stuff, or check the user licence for the terms. Many don’t intend to add the
volume of stuff that they do. But
suddenly they think about it when either a better site comes along that they
would like to transfer or copy their content to, or the site they are adding to
is unexpectedly taken down or frozen. Recent
examples in the news are Facebook users wanting to be able to transfer or ‘user
takeout’ their photographs from the site.
Although of course it is easily technically possible to implement this
social media sites such as Facebook are reluctant to let users do this, for
fear they will take their content and move to competitors sites. However in the
library world it is reasonable that users may want to share their value added
data around multiple library sites, and yet we still don’t enable it. Another item in the news was the suddenclosure of poetry.com. Over 7 million users were given 15 days
notice that the 14 million poems they had added would be taken down when the
site was sold. They were not given an
easy option to ‘takeout’ their poems, but instead it was suggested that they
could copy and paste their poems if they had time. This infuriated many users
who read the message, and many others who didn’t read the message in time. It’s worth pointing out here that a lot of
sites people use frequently and think are for the common good are actually
commercial sites that can do exactly what they like, and do not ever promise to
keep, manage or archive content in the same way libraries do. Although the new
owners restored the poetry.com site, it appears that the 14 million poems added
prior to 2012 are still not restored hence the large pink box at the top
‘Where’s my poem?’
If we think about measuring our user activity and data through
all channels i.e. our own site as well as Twitter, Facebook etc we hit a brick
wall. Providing useful statistics on
both volume and value of data social metadata is difficult. For social media sites such as Twitter and
Facebook your options are to either buy costly software and do it yourself, or
employ a company (many of which are springing up) to do it for you. These
companies however would have great difficulty integrating measurements on the
value and content from social media sources with those that go directly to your
site i.e. your own comments, tags, blogs.
Doing measurements separately is difficult, but combining them even more
so.
Many libraries are part of central or local government so
have requirements to archive records and content they create, which should also
include social metadata and media. But
does anyone know the best way to do this and are our archives agencies telling
us how to do it? The simple answer is
no. The National Archives in USA (NARA) say they are working on it
as a matter of urgency. They are due to explain how it should be done by this
July. The National Archives of Australia website states that “The Archives Act 1983 does not define a record by its format.
Generally, records created as a result of using social media are subject to the
same business and legislative requirements as records created by other means.”
But the guidelines on the NAA website as to how
this should be done simply say “Methods of capturing social media content as a
record may vary according to the tools being used”. This month the Public Record Office of
Victoria released an issues paper for comment: ‘Recordkeeping implications ofsocial media’.
An extract of the PROV proposed guidelines for archiving
social metadata follows:
How should the record be captured?
Currently printing screenshots to .pdf and registering the
resulting document in an Electronic Document and Record Management System
(EDRMS) to record the necessary metadata is the
most accessible and expedient method of creating social media records.
Necessary metadata includes who sent it (username and real name), date and time
of sending, context and purpose of content, name of tool used to create it.
My first reaction on reading this was ‘this is mad!’ Perhaps the archives are under-estimating the amount
of social metadata and media activity that is going on. Taking a screenshot of every tweet for
example would assume that you are not going to get thousands, whereas
successful sites and topics such as Trove do get thousands and millions of
interactions, which makes this unworkable from a staff resourcing point of view.
Twitter is notorious for ‘disappearing tweets’ after a very short amount of
time – sometimes less than a week because of the volume of activity that takes
place. This also puts pressure on to
archive tweets at the time of creation.
You don’t have the luxury to go back and archive later. This suggested
form of archiving only gives a screen-based image, which is not in context, not
searchable, has no metadata, no timestamp, and is not authenticable. It seems
there is money to be made if someone develops a simple software system to mechanically
capture the tweet, its response and its components and safely and uniformly
archives/indexes them along with descriptive metadata. The tool could also render
the page "as it appears" and save it as a PDF if that is required.
In April 2010 The Library of Congress rather bravelyannounced that it intended to archive all tweets since they began in 2006 to record the social fabric of the world and signed an agreement with Twitter and Google. In 2010 the Twitter archive was growing rapidly with users sending 50 million tweets a day. A year and a half later several news agencies tried to get a progressreport from LC without much success. Other than trying to transfer the data from Twitter servers to LC servers the LC weren’t giving any detail on what technological developments they were creating to do the mammoth task. The task seemed to be growing bigger by the day with usage of Twitter increasing. Currently 140 million tweets are sent every day.
In April 2010 The Library of Congress rather bravelyannounced that it intended to archive all tweets since they began in 2006 to record the social fabric of the world and signed an agreement with Twitter and Google. In 2010 the Twitter archive was growing rapidly with users sending 50 million tweets a day. A year and a half later several news agencies tried to get a progressreport from LC without much success. Other than trying to transfer the data from Twitter servers to LC servers the LC weren’t giving any detail on what technological developments they were creating to do the mammoth task. The task seemed to be growing bigger by the day with usage of Twitter increasing. Currently 140 million tweets are sent every day.
A core element of the archive process should be that the
data is kept in context with that it was referring to, and other elements
surrounding it. Most libraries that are
keeping UGC and social metadata are keeping it in a separate layer to their own
content in the database to protect the provenance, but may integrate it for
public display. If it is kept separate
it can easily be stored, managed, and moved, but is at risk of becoming
separated from the context it is related to. This is something libraries need
to work out. This will become more
pressing in a few years time when existing services are migrated as part of
their maintenance. The UGC needs to be
migrated in context with them.
On this topic I have more questions than answers. I think
libraries and archives need to work together to take an active role in firstly encouraging
mobilisation of social metadata -‘user takeout’, and secondly demonstrating how social metadata and social media
activity can be archived. I see massive opportunities for start-ups to create
archiving tools to bolt onto Facebook, Twitter, Youtube and Blogging software
to meet the requirements of government archiving.
Photo: Prime Ministers Chiefly and Curtin chat on the way to work 1945. Bronze sculpture by Peter Corlett outside the National Archives of Australia, Canberra. Rose Holley