The National Archives of Australia (NAA) has made a bold
step into the cultural heritage crowdsourcing arena with ‘The Hive’ which was released
two weeks ago. The brand makes a clever play on the word ‘Archive’ combined
with the idea of a hive of working bees (the public). The site encourages the public to transcribe
archive records.
Early this year when David Fricker became Director General
of the NAA he was quick to encourage staff to think innovatively, embrace
change, and to harness opportunities such as crowdsourcing to improve access to
our collections. He publicly spoke in favour of crowdsourcing and a changing business model
for archives at the International Council of Archives Congress in August:
“Another key development in expanding access is
crowdsourcing. As many of us are now seeing, by allowing the public to
contribute to the description of archival resources we are enhancing the
ability of future generations to discover and learn from our archives. I also
think it is a wonderful opportunity for the public to be more engaged with us
as archives and to share in the work we do – preserving the memories of our
nations. There is still some work to do here, in order to maximise the value of
contributions and to maintain the integrity of our archives as authentic and
accurate. However, I do not believe these problems are insurmountable, and
indeed I believe these systems can to some extent be self-correcting.
This is a type of the co-design, citizen first activity…
drawing on the interest and enthusiasm of the community to bring more of our
archives into view – discoverable and retrievable…Access will be online and
everywhere, improved by rich new data visualisation techniques and expanded
descriptive contributions from an engaged citizenry”.
The Hive is the Archives pilot and experimentation into the
potential of large scale transcription crowdsourcing to improve access to
records. Staff have looked closely at
other crowdsourcing sites on offer and attempted to build on their knowledge
and techniques, to provide a site that could be used as a large scale platform
for a variety of transcription crowdsourcing projects.
At present the site offers just over 800 lists for the
public to transcribe. Some of these are typed and some handwritten. They are rated in difficulty as easy, medium
or hard. Part of the difficulty with
this project is that the public need to have some understanding of how archives
receive and describe their records to make sense of what they are being asked
to do. In simple terms archives receive
vast amounts of records (referred to as consignments). Each consignment comes with a list of the
items in it. However because of the
large volume of records being received it is usual that only the consignment
record is entered into the catalogue e.g. ‘100 boxes of plans and drawings’
from x government agency, rather than all the individual items on the consignment
list being described in the catalogue. The
ideal scenario for users of the archives is that every item e.g. plan and
drawing is described on the catalogue so that it can be found. Without this a lot of guess work goes into
finding relevant things, or alternatively personal visits are required to view
the hard copy consignment lists.
The project that the archives is undertaking
is to digitise consignment lists and then make them available for transcription
by the public. Once transcribed they become searchable and the items within
them can be found more easily. Because
so many of the lists are old and handwritten it is virtually impossible to get
good OCR on them. That’s where the
public come in who can read them with the human eye. Also the time of the
public is needed to speed up the access. Projections on the time it would take
archives staff to describe the lists without public help currently stand at 210
years. It is anticipated that a member
of the public could with relative ease describe several hundred items per hour
with the Hive tool, which would make a big difference, especially if there was
a swarm.
The consignment lists in the pilot are those that have
proved most popular with researchers and contain items in the ‘open period’,
that is older than 30 years and now open to the public. The top interest is lists of architectural
drawings and historic buildings. This is closely followed by PNG patrol officer
records, maritime incidents, personal records from the war office, prisoners of
war, meteorology and cyclones, WW1 intelligence, and oil drilling on the Great Barrier Reef .
In the first 2 weeks 300 records have been transcribed of
the 800. There is a definite preference for the lists rated hard (handwritten)
and ones that involve names.
The site is well presented and gives volunteer transcribers
things we know they want such as progress chart, recent activity, points
scoring system, rewards, optional login using Open ID e.g. their Google ID,
ability to search and choose items, or just take the next one served up, to
pick easy or difficult items, to add a marker for where they got to if they are
interrupted, and to favourite records.
The only slight drawback is the placing of the transcription window at
the bottom of the screen rather than right or left, which often means it is
hard to see the transcription window and the content you are transcribing at
the same time. Also the OCR text in the transcription window and the cursor is
not hooked directly to the text in the image so it is easy to get lost whilst
transcribing sometimes. This is largely
because most of the lists are in tables, and the table rows and columns have
not been retained in the OCR, so the OCR is somewhat muddled. Further development of the site will largely
depend on feedback given by the public users, and the ability of the archives
to keep up a steady supply of new, interesting digitised consignment lists to
the Hive. The Archives is still considering
how it may be able to integrate the public content back into its main catalogue
RecordSearch, or integrate the Hive into RecordSearch. In the meantime the list
content will remain searchable in the Hive.
There is obviously an expectation from the Archives that by
making its content more discoverable it will lead to more access requests. This is why at point of transcription there
is a button which enables the user to request a copy of the item. These requests are being met by digitising
the item, and then uploading them into the main catalogue ‘RecordSearch’ with
the full item description.
I congratulate the National Archives of Australia Access
Team on the development of this exciting new site, which holds so much
potential to improve access to records and engage with our citizens in new
ways.
The screenshots below show the site in action:
Medium Level Transcription - ABC Drama Scripts
Difficult level transcription - Plans
Thanks for this, Rose. I'd seen the program announced but hadn't had time to take a look at it, so the screenshots are invaluable.
ReplyDeleteDo you happen to know what software platform they're using? It looks like it may be a heavily styled version of Mediawiki with the plug-ins used for Wikisource, but I'm not sure.
Also, when are you going to join Twitter?
Much obliged concerning this, Rose. I'd seen the project affirmed however hadn't had sufficient energy to examine it, so the screenshots are significant Top crowdsourcing sites.
ReplyDeleteWow! It’s so nice that the National Archives have decided to employ crowdsourcing to successfully transcribe these documents. It’s one way to be able to digitize and preserve these documents. Just a suggestion maybe is to store them in a media vault once they’ve saved and the transcriptions are ready for storage, in order to make sure that they won’t get destroyed over time.
ReplyDeleteRuby Badcoe