It came to my notice last week that a publisher in the UK had
digitised both the current and back issues of their magazine the ‘Commercial
Motor’ which in its early life was a newspaper. In the cut throat world of
publishing there are few publishers left that are still publishing the same
title they were 100 years ago, and who also have a complete set of back
copies. If they fall into this bracket
they are in the unique position of being able to either digitise the content
themselves for their readers (usually at a loss); offer it to a library to
digitise (at no cost); or sell it to a commercial e-vendor to package with
another product for academia (and make a profit). Unfortunately most choose the latter, which
makes this type of content only really accessible to academics and students via
academic libraries. E-vendors normally
charge high subscription rates to digitised magazines and newspapers and
package them up with other content, making it only viable for large
universities and national libraries to purchase, and therefore severely
restricting readership to the content.
Not a lot of publishers are digitising their own content
because generally speaking the cost of preparation, digitisation and OCR, and
building a good website to deliver the content outweigh the amount of money
they would ever recuperate from reader subscriptions. Normally a subscription
to the current copy would be packaged up with old copies. And that’s where the model fails, because
often current readers have no interest in the old stuff. People who do have an interest in the old
stuff are generally a different group of people – historians, researchers
etc. The exception to this rule appears
to be anything to do with hobbies such as knitting, cooking, railways, cars, and
stamps.
The Commercial Motor Archive http://archive.commercialmotor.com/
came to my notice because it is available for free for the month of October and
I wondered why. It is a rich archive going from 1905 to the present day,
covering a complete century and two world wars, well illustrated and with
everything you ever wanted to know about commercial vehicles.
The search and browse mechanism is very impressive and works
well. For example articles on pages have
been zoned so you can search and find the article easily within a page. It has many similarities to the hugely
popular Australian Newspapers http://trove.nla.gov.au/newspaper. The page displays alongside the OCR text to
make it easier to read. You can browse covers, browse by date, and zoom in on
pages. Results can be filtered. Users
can add comments and tags. The quality
of the OCR text and therefore search is very good.
It has one thing that was never implemented on Australian
Newspapers (though often asked for by users) which is a little box on each page
called ‘Report an error’ and this is
the reason for its free access. The site
owner is hoping that as people use and read the pages in the archive they will
report errors as they see them, and for this they get free access to the
content. The known errors that need to
be identified are incomplete articles (where the zoning has gone wrong); OCR
text error in headlines of articles; and OCR errors in text. However readers
can only report them, not actually fix them.
The site states:
“the archive is beta because it isn’t perfect at the moment
and there are a few glitches to be ironed out. Every article page has a
'Noticed an error?' button you can use to report a problem. Please don’t expect
an immediate change to the error - we gather all the reports together and
prioritise them, fixing the most pressing errors first.”
It sounds like they don’t know how many errors there are,
how many people will report errors, and who and how the errors will be
fixed. Interesting.
Although the ‘report an error’ button was never implemented
on Australian Newspapers (it was mainly needed to report upside down and
duplicate pages) it had already been decided that a ‘super user’ would be the
person to review these reports and take action.
In a world where some volunteer text correctors wanted to take on extra
responsibility and have special roles like the hierarchy in the Wikipedia
editors community this would have been a good thing for trusted volunteers to
do.
The Commercial Motor Archive has impressed me because they
are clearly striving for perfection, they understand that the fewer mistakes there
are the better the search will be and they have taken a brave step by asking
the public to help in return for free access.
This is indeed unusual for a commercial publisher, belonging more in the
realms of libraries and archives and referred to as crowdsourcing……