tag:blogger.com,1999:blog-7991487040281464395.post1360094558229551010..comments2023-10-07T06:36:59.684-07:00Comments on Rose Holley's Blog - views and news on digital libraries and archives: Software for journal and newspaper text correctionRose Holley - Digital Library Specialisthttp://www.blogger.com/profile/14815485499572077644noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-7991487040281464395.post-7727717150719570712012-03-11T15:10:50.175-07:002012-03-11T15:10:50.175-07:00Dear Hsien-min, I have three things to suggest and...Dear Hsien-min, I have three things to suggest and 2 of them would require a short-term programmer to help you:<br />1) Utilise Re-Captcha. Re-Captcha is free but you would need some programming help to be able share your text with the programme and then feed the changes back into your system. Re-Captcha is used by all sorts of services so the people using it are unaware they are doing text correction for your project. At the moment Google Books and the Historic versions of the New York Times are using it on their OCR text.<br />2). Use some open source software for text correction and integrate this into your system with help of a programmer. Software I am aware of is that from the National Library of Australia, SCRIBE from Galaxy Zoo (though aimed more at handwritten text, could be adapted), Wikisource. I'm not sure if the IMPACT (Improving Access to Text)European Project has finsihed the open source software they were developing for text correction yet and how this stacks up against other software.<br />4). Contact Wikisource Transcription Projects to see if they want to set up a project to help you. http://en.wikisource.org/wiki/Wikisource:Transcription_Projects They have only been transcribing books so far, I'm not sure what their capability is for newspapers. It may be best to talk to your local chapter first.<br /><br />Hope that helps.Rose Holley - Digital Library Specialisthttps://www.blogger.com/profile/14815485499572077644noreply@blogger.comtag:blogger.com,1999:blog-7991487040281464395.post-78986487298891076732012-03-07T13:04:44.200-08:002012-03-07T13:04:44.200-08:00Thanks for the post. The New Brunswick Free Public...Thanks for the post. The New Brunswick Free Public Library in New Jersey, USA has 51 year worth of newspaper digitization, but most of it has very poor OCR results. The vendor we use does not have the ability to correct those texts, and we do not have the budget to pay for Veridian. Do you have any suggestion for us? Thanks<br />Hsien-min Chen,<br />Principal LibrarianRey-Rey Chenhttps://www.blogger.com/profile/10624334551854018147noreply@blogger.com