Saturday, 25 August 2012

Crowdsourcing and Social Media at US National Archives (NARA). The Citizen Archivist Dashboard

Last week I attended the International Congress of Archives (ICA 2012) which was held in Brisbane. Over 1,000 Archivists from 93 countries attended.

The much anticipated opening keynote on the first day was given by David Ferriero
head of US National Archives.  He is the first librarian to become a National Archivist, previously being in charge of New York Public Library and known for promoting use of social media and relationships with Google and Wikipedia.  His talk was called ‘A world of social media’. I was looking forward to hearing what the US National Archives are doing with social media and crowdsourcing.  People were generally of the opinion that this organisation will/is leading by example in this field.

David Ferriero took to the stage and took us by surprise.  He only used 20 minutes of his 40 minute slot, gave no presentation, instead reading from his notes at breakneck speed and bombarding us with statistics that were largely out of context. At the end he took no questions and dashed off the stage.  He left a surprised and bewildered audience behind.  I for one was immensely disappointed not to see and hear more about some of the exciting US Archives activities. He of course may have had mitigating circumstances that I am totally unaware of.  He did however give small tasters of what his organisation is doing. There was brief mention of large scale crowdsourcing on unspecified projects, a citizen archivists dashboard, and a relationship with Wikipedia which peaked my interest.

So I decided to follow up online and find out for myself what may be happening at NARA. I took me quite some time to search the internet and blogs and get the information I had hoped David would give in his keynote, but it was worth it. Here is what I found:

1. Citizen Archivist Dashboard Webpage

In January 2012 the US National Archives launched the Citizen Archivist Dashboard. This is a great webpage bringing all the online and physical social engagement and crowdsourcing activities together.  It is easy for someone to see what options they may have to help the US National Archives. It is very clearly designed and I like it a lot.

2. Transcription Projects

There are two transcription projects going on for handwritten records. Firstly the National Archives Transcription Pilot Project. It appears still to be in ‘pilot’ mode (started in January 2012) since only 300 documents (about 1,000 pages) are available for transcription. They have been very carefully selected from a collection of billions of pages and graded by colour codes according to how difficult the handwriting is to read. This pre-selection must have taken very valuable staff time. You can browse or search by difficulty of transcription, year, and the status of transcription: “Not Yet Started,” “Partially Transcribed,” and “Completed.” You then choose a page to work on and then that page is blocked to other users, so it’s not being edited by multiple users at the same time.  The interface is very simple, much like the Australian Newspapers. In a free text box beside the image you can transcribe what you see. No login is required, though you do have to complete a captcha. 

The missing part is that I can’t see how many people have transcribed what.  It’s not clear if the documents disappear from here when fully transcribed, and how and where they become full text searchable in the collection.  It also seems to be a time consuming process for NARA staff to do the pre-selection and difficulty rating of the documents. This is of course a very small pilot and hopefully lessons will be learnt and the site will be developed further to reach it’s full potential. Also it would be good if more documents became available for transcription. This is one of the easiest handwritten transcription tools I have seen.  I could not find any information about who developed the tool and if it is available open source.

Interestingly David Ferriero says that many US school children are no longer taught cursive handwriting and therefore cannot read handwriting. He says ‘Help us transcribe records and guarantee that school children can make use of our documents’. I’m not quite clear if he thinks this is a potential crowdsourcing exercise for school children to learn handwriting and become better educated, or if adults are supposed to do it so that school children can just read the finished text.

The National Archives have developed a relationship with the Wikipedia Community and currently have a Wikipedian in residence. As part of that program they have shared some primary handwritten national documents into ‘Wikisource’ for transcription via the Wikisource Tool. These documents are mostly at the beginner level in terms of difficulty. I’m not clear if they are the same ones in being used in the Archives own pilot, or different documents. I’m also not clear why they are piloting two different methods for transcription, or what the initial results are compared to each other. Wikisource offers more than transcription however, Wikipedians (if they can get access to original documents or copies) can also scan documents and OCR them.

3. Scanning Projects

  • Scanathons
For reasons I don’t understand the US National Archives has only digitised 750,000 of its 40 million images. This is a very low figure for an organisation like this. They seem to be focusing quite a lot of effort on getting physical volunteers to come in person to the Archives to digitise/scan images for them at ‘Scanathons’. This started in 2011. In January 2012 there was a 4 day Wikipedia ExtravaSCANza. Over the 4 days a group of Wikipedians met in the Still Pictures Research Room and scanned 500 images on desktop scanners. Each day there was a theme: NASA, women’s history, Chile, and battleships.

NARA encourages readers to take their own photos of records in the reading rooms and upload them to a special group in Flickr.  The important thing here is that they should also be described with title, series, and record group if possible so they can be found. So far only 20 people have joined the group and 133 photos have been uploaded (most of these by the same person). I’m not clear how NARA intends to link these digital images back to the item descriptions in their collections but this is a great idea to tackle large scale digitisation of images.


The tagging facility, unlike the other pilots seems to me to be unlikely to succeed in its objectives. This is perhaps because of the tight controls that have been placed around it and the isolation of the activity from normal search and browse behaviour. Whilst anyone can easily transcribe a record without needing to login the process for tagging is difficult.

The activity is focused on Tuesdays and themed around a topic.  Records for the topic are pre-selected by the Archives and available in an online group e.g. Elvis, Titanic.  Volunteers must register and follow a set of guidelines; Tags will be reviewed by NARA staff before being accepted and going live on the database. I looked at the topics and it was unclear to me why if the Archives had already identified the items as being about Elvis they couldn’t simply generate an automatic tag for ‘Elvis’. In my opinion tagging is not actually a crowdsourcing activity because individuals are motivated to add tags to help themselves find things, it is a by product of search. Research shows it is rare for users to have concensus on tag terms and use. Crowdsourcing activities achieve a big clear goal that could not be achieved by individuals alone, and everyone in the crowd should be aware of how they are helping the ultimate goal.  

5. Indexing the 1940 Census

On April 2, 2012, NARA released the digital images of the 1940 United States Federal Census after a 72 year embargo. The census images will be uploaded and made available on,, National Archives, ProQuest, and The entire 1940 census data will be indexed by a community of volunteers and made available for free. The free index of the census records and corresponding images will be available to the public for perpetuity.

6. Useful Links

I found a recent presentation given this year by Pamela Wright – Chief Digital Access Strategist at NARA which gives screenshots of what I have talked about above. ‘From access to engagement’

7. Social Media

NARA are active users of social media channels and they have started to monitor their activity. The Social media statistics from NARA May 2012 may be interesting reading for some.

I would be interested in reading more presentations or articles about the citizen archivist pilot projects from NARA and finding out what they have achieved and learnt so far. I hope this information is made available to the archives and library community soon.  Please reply in comment if you have any more information on the pilot activities.

1 comment:

  1. Hello Frndz...
    Great Information! Nice post,it is really very helpful for me.One of the few articles I’ve read today.I’m saying thanks