Tuesday, 2 April 2013

Crowdsourcing text correction and transcription of digitised historic newspapers: a list of sites


Last month two new websites were launched giving the public access to digitised historic newspapers.  The release of a new ‘old’ digitised newspaper site is becoming a regular monthly occurrence now, with a library somewhere in the world completing a newspaper digitisation project with astonishing regularity, after what seems like such a long wait. 

The two new sites this month were the Welsh Newspapers online and the Louiseville Leader.

The Welsh site has been several years in progress and seriously considered using the National Library of Australia software for text correction, before putting it in the ‘too hard basket’. The National Library of Wales is to be commended on making the Welsh Newspapers service free (unlike the English newspapers which are still in a subscription model from the British Library).
 
The Louiseville site delivers all the issues of a key African American community newspaper covering local, national, and international news published in Louisville, Kentucky from 1917-1950. Unfortunately the building which housed original copies of the paper was badly damaged by a fire. The remaining issues, loaned by Kentucky State University and the widow of the publisher, were microfilmed by the University of Louisville, with the digital files created from that microfilm. The long and winding road the texts have taken toward digital representation has made them less than ideal candidates for optical character recognition (OCR), which has difficulty transcribing faded, torn, or misaligned texts, even when they are readable to the human eye. For this reason the site has enabled public transcription to help improve the accuracy and searchability of the newspaper content.
 
It’s great to see both of these new sites and I fully understand the difficult process many libraries have gone through to get to this point, having been there and managed a newspaper digitisation project myself. I still have a particular interest in those newspaper sites which involve the public in text correction, which is another step perhaps just too challenging for many libraries to take.  After the worldwide library applaud of the Australian Newspapers/Trove text correction beta five years ago, now an internationally hailed success, and the stated intent of many libraries to follow suit with public text correction the question arises “how many actual did?”
 
There are many libraries internationally that now offer websites to search across digitised historic newspapers and I’m not going to list all of them, just the handful that give their users the text correction or transcription ability. With Australian text correctors, now addicted to text correction of newspapers and looking elsewhere to sate their ample appetites I thought it was time to compile a list specifically of text correction websites for historic newspapers. To the best of my knowledge there are 9 sites now.  Who will be the 10th?? If I have inadvertently missed a site perhaps let me know in the comments. Most of the sites are for English language content but it is interesting to see a few coming through for other languages.  As a note of interest there were several foreign language historic newspapers published in Australia (Chinese, Greek, Hebrew, German) but these were put in the ‘too hard basket’ for the first stage of Australian Newspapers/Trove and sadly did not make it into the second stage either.  They give a very interesting perspective on sub communities within a wider community.
 
Congratulations to all the libraries listed below who took the first difficult step to digitise and then the more challenging step to crowdsource. Happy text correcting to all the amazing people that volunteer their valuable time to help libraries make old newspapers more accessible, I hope you enjoy the list. The sites are all slightly different but work on the general basis of showing a digitised page and asking for public correction/transcription of the OCR text created from that page. If the OCR text is improved then keyword searching of the newspapers is improved.  It particularly helps to correct people’s names, especially in family notices, births and deaths, since these are often the first thing that users search on.
 
List of historic/old digitised newspaper sites that offer public text correction/transcription: March 2013
US Newspapers
 
Australian Newspapers
Finnish Newspapers
Vietnamese Newspapers
Russian
 
Useful Resource:
Frederick Zarndt’s recent PowerPoint on crowdsourcing in libraries with a particular focus on newspapers:

5 comments:

  1. Frederick Zarndt's presentation was one of the best I attended at RootsTech. (As usual I regretted that you aren't on Twitter, so that I couldn't give you a real-time shout-out when he mentioned you.)

    ReplyDelete
  2. Great post Rose! I absolutely love the way in which libraries are going about digitising newspapers from a wide variety of years. It gives users of the libraries content another way of accessing the information they need, whilst also creating a whole new window of discovery and understanding of changes in culture over time and the history of our nation.
    I can see you have had quite a significant amount of experience with the newspaper digitisation project for the National Library of Australia's Trove database. As an Australian myself I absolutely love Trove and can definitely see why it is so successful. Easy access and the simple to use search interface means we are able search for newspapers dating as far back as 1845 - all in the one place.
    From personal experience I can relate to the whole 'text correction/OCR' benefits that Trove has in comparison to other digitised newspaper collections. A lot of the newspapers are quite old, and probably quite hard to digitise them in a way that make them totally readable. They are also very heavily text based, so the use of optical character recognition makes it so much easier for users to search the newspaper for the information they need.

    ReplyDelete
  3. GACORQQ adalah situs slot gacor online terbaik dengan tingkat kemenangan tertinggi dan memiliki reputasi baik di Indonesia. Kami agen slot memberikan berbagai macam jenis permainan slot online dari bermacam macam provider dari berbagai negara. Tersedia sampai 31 provider slot online yang disediakan di situs judi online.Kami sudah berdiri sejak lama melayani perjudian online di Indonesia, sudah memiliki kapasitas dan merupakan situs slot paling bonafit di Indonesia.

    ReplyDelete
  4. USAHA188 menawarkan game online terbaik dengan tanggung jawab penuh dan game fairplay. Keamanan selalu menjadi prioritas kami.
    Keamanan

    ReplyDelete
  5. TAKTIK88 Kini, dunia bet telah berubah drastis. Siapa yang perlu pergi ke Las Vegas, jika hanya dengan berada di rumah, Anda sudah bisa merasakan serunya bermain judi? Dengan memiliki akun di TAKTIK88, Anda mendapatkan akses ke situs bet slot online terlengkap di Indonesia dengan sekali klik. Tak perlu repot-repot, kemudahan akses yang diberikan benar-benar memudahkan; tanpa membutuhkan VPN pun Anda bisa langsung bermain. Bagi Anda, peminat bet slot online gacor, kini dapat merasakan pengalaman optimal bermain di situs gacor taktik88 tanpa hambatan. Apalagi, dengan perkembangan teknologi yang semakin canggih, Anda bisa memainkannya melalui berbagai perangkat, mulai dari smartphone, PC, hingga tablet, asalkan Anda memiliki koneksi internet berkualitas. Sangat praktis, bukan?

    ReplyDelete