Tuesday 2 April 2013

Crowdsourcing text correction and transcription of digitised historic newspapers: a list of sites


Last month two new websites were launched giving the public access to digitised historic newspapers.  The release of a new ‘old’ digitised newspaper site is becoming a regular monthly occurrence now, with a library somewhere in the world completing a newspaper digitisation project with astonishing regularity, after what seems like such a long wait. 

The two new sites this month were the Welsh Newspapers online and the Louiseville Leader.

The Welsh site has been several years in progress and seriously considered using the National Library of Australia software for text correction, before putting it in the ‘too hard basket’. The National Library of Wales is to be commended on making the Welsh Newspapers service free (unlike the English newspapers which are still in a subscription model from the British Library).
 
The Louiseville site delivers all the issues of a key African American community newspaper covering local, national, and international news published in Louisville, Kentucky from 1917-1950. Unfortunately the building which housed original copies of the paper was badly damaged by a fire. The remaining issues, loaned by Kentucky State University and the widow of the publisher, were microfilmed by the University of Louisville, with the digital files created from that microfilm. The long and winding road the texts have taken toward digital representation has made them less than ideal candidates for optical character recognition (OCR), which has difficulty transcribing faded, torn, or misaligned texts, even when they are readable to the human eye. For this reason the site has enabled public transcription to help improve the accuracy and searchability of the newspaper content.
 
It’s great to see both of these new sites and I fully understand the difficult process many libraries have gone through to get to this point, having been there and managed a newspaper digitisation project myself. I still have a particular interest in those newspaper sites which involve the public in text correction, which is another step perhaps just too challenging for many libraries to take.  After the worldwide library applaud of the Australian Newspapers/Trove text correction beta five years ago, now an internationally hailed success, and the stated intent of many libraries to follow suit with public text correction the question arises “how many actual did?”
 
There are many libraries internationally that now offer websites to search across digitised historic newspapers and I’m not going to list all of them, just the handful that give their users the text correction or transcription ability. With Australian text correctors, now addicted to text correction of newspapers and looking elsewhere to sate their ample appetites I thought it was time to compile a list specifically of text correction websites for historic newspapers. To the best of my knowledge there are 9 sites now.  Who will be the 10th?? If I have inadvertently missed a site perhaps let me know in the comments. Most of the sites are for English language content but it is interesting to see a few coming through for other languages.  As a note of interest there were several foreign language historic newspapers published in Australia (Chinese, Greek, Hebrew, German) but these were put in the ‘too hard basket’ for the first stage of Australian Newspapers/Trove and sadly did not make it into the second stage either.  They give a very interesting perspective on sub communities within a wider community.
 
Congratulations to all the libraries listed below who took the first difficult step to digitise and then the more challenging step to crowdsource. Happy text correcting to all the amazing people that volunteer their valuable time to help libraries make old newspapers more accessible, I hope you enjoy the list. The sites are all slightly different but work on the general basis of showing a digitised page and asking for public correction/transcription of the OCR text created from that page. If the OCR text is improved then keyword searching of the newspapers is improved.  It particularly helps to correct people’s names, especially in family notices, births and deaths, since these are often the first thing that users search on.
 
List of historic/old digitised newspaper sites that offer public text correction/transcription: March 2013
US Newspapers
 
Australian Newspapers
Finnish Newspapers
Vietnamese Newspapers
Russian
 
Useful Resource:
Frederick Zarndt’s recent PowerPoint on crowdsourcing in libraries with a particular focus on newspapers: