When I was working as Manager of the Australian Newspapers Digitisation Program I led the development of two pieces of software: the Digital Newspaper Content Management System and the Australian Newspapers Delivery Service. Quite early in the process of developing the public delivery system concerns were raised by the National, State and Territory Libraries of the poor quality of the newspaper OCR text, which negatively effected public searching. In a team brainstorming session we came up with the idea of opening up the OCR text for public correction and improvement. We developed our own software to do this. Because we knew that what we were doing was quite groundbreaking at the time, may be very useful for other archives and libraries, and no ‘off the shelf’ software existed to do this, we made a decision to share the code as open-source. The code was developed by Kent Fitch Director of Project Computing: System Architect for the National Library of Australia, ably assisted by Ninh Nguyen, programmer at the National Library of Australia. They created a fantastic service for Australians. The public interface for delivery of newspapers and text correction was designed, developed and tested on real users in a matter of weeks by Alexi Paschalidis Creative Director of Oxide Interactive .
Over the course of the next two years I had approaches from about 20 National Libraries from around the world asking if they could have the code. Although we said yes every time, I was disappointed that to the best of my knowledge no library actually implemented public text correction by using or adapting our open source software as we had hoped. This mainly seemed to be because libraries other than us were still very unsure about “allowing” user edits, crowdsourcing, public interaction - whatever you want to call it. Giving user’s freedom over data, rather than retaining tight control seemed to be a daunting prospect for libraries and a step they didn’t want to take. Several commented they would like a bit more time to observe whether our user activity was a ‘fad’ before committing to doing the same. For this reason only three libraries as far as we were aware even got to the stage of having their IT support download the open source code from the National Library of Australia’s ‘LibraryForge’ website. True it would have been quite difficult for a library to implement and adapt our code to hook into a content management system, but this did not seem to be the reason for the low uptake. We never shared the code on the newspaper content management system because our philosophy was to only share code once the product was in a ready state and usable, and the content management system was under constant development for three years. Now after four years the National Library of Australia has decided to remove the open-source code for Australian Newspapers from its download site.
However all is not lost for libraries wanting to have software for text correction or to deliver newspapers and journals, since I was recently alerted to the fact that a New Zealand company had developed software to replicate the text correction functionality of Australian Newspapers. DL Consulting are selling a product called Veridian, which has already been installed by some US libraries. The company has a background in newspaper software development having been involved in a very early open source digital product called ‘Greenstone’ which was used to deliver Maori Newspapers from the University of Waikato back in 2002. I remember the service well from my time working at the University of Auckland . The digital full-text search access was groundbreaking for the Maori, history and research community.
Kent Fitch has continued to work for the National Library of Australia on system architecture and went on to lead development of Trove and integrate the Newspapers service into it. Kent has worked as a programmer for over 30 years. Since 1982 he has been a principal of the Canberra software development company, Project Computing Pty Ltd. Kent has developed many commercial systems and communications packages and custom software for many clients. In the past ten years, his work has focused on library-related systems including AustLit, NLA Newspapers Digitisation, and Trove.
Last week Kent gave a presentation to the Australian Computer Society called ‘Scaling up: the technology behind the NLA's newspaper digitisation andthe Trove search service.’ where he describes in more detail some of the technical aspects.