Friday, 29 August 2014

Audiovisual achievements - National Archives of Australia


I always get a sense of achievement from a job well done, and this week the audiovisual IT project I have managed at the National Archives of Australia (NAA) over the last 2 years has reached fruition – on time and under budget, which makes the achievement even better. 
The project was a big one costing several million and was the implementation of both an audiovisual asset management system, and an audiovisual digital preservation system.   It has been a long held ambition of the NAA to achieve these two goals. The concept crystallised into a firm plan in 2006.  Implementation commenced in 2012 and the project became the highest strategic objective of the NAA for the next two years, involving approximately half of the 400 NAA staff in some capacity. The Chester Hill office at Sydney took the lead because this office is the centre of expertise for audiovisual collections.  I feel fortunate to have had the opportunity to work with such a fantastic and knowledgeable group of people.

The project which is known internally at NAA as ‘AVAMS’ (audiovisual asset management system project), and its achievements is described in more detail in my AVAMS presentation available on slideshare.

The chosen software that has been implemented is Mediaflex from a UK based company called TransMedia Dynamics.  The National Archives is the second Archives client to install the Asset Management Software as the Collection Management System for both physical and digital audiovisual assets, and the first client in the world to install the Mediaflex digital preservation platform.  Other Australian clients include the National Film and Sound Archive, and DAMsmart an audiovisual digitisation contractor.
 
The project has been important to the NAA because firstly audiovisual is a significant part of the collection amounting to nearly 1 million items, and secondly there is a need to increase capability and capacity to ingest born digital audiovisual from transferring agencies.  One of the main agencies transferring audiovisual material to the NAA is the Australian Broadcasting Corporation (ABC) who creates all radio and TV programs digitally now and has done so for some time.  Because older parts of the NAA audiovisual collection are analogue, and these formats deteriorate quickly there has been an active and ongoing NAA audiovisual digitisation program to convert analogue to digital formats for at least the last 10 years in state-of-the-art digitisation labs onsite at Sydney.
This youtube video gives a small glimpse behind the scenes at the Sydney Office, and a sample of a very  deteriorated analogue film now digitised is available to view on youtube in 'a cautionary tale'.  
For these two reasons the NAA already holds a sizable store of digital AV assets. These are now being migrated into the digital preservation system ‘the AV Digital Archive’, which will replace the previous rather clunky and very slow system that was based on a system backup procedure.  It will give increased surety that important digital assets are secure and preserved into the future. It is a giant leap forward to have a robust and easy to use digital preservation system.  The screenshot below shows the console that an archivist would use to manage the digital preservation copies.  The traffic light system is particularly easy to use.






The requirements to manage audiovisual digitisation workflows, storage of physical items, ingest and digital preservation of items are much more complex than those for other format types such as photographs or paper.  In order to better manage and search on collection items a data model with multi-layers is needed.  It is usual in libraries to have 3 data layers, and for archives to have 5 or 6 for paper formats, however in the case of audiovisual the ideal data model has 12 layers.  This is the new model that has now been implemented at the National Archives. It has caused great excitement for those who understand the complexities of audiovisual metadata and realise the benefits this will bring long-term to the management of the collection.  However it has been a steep learning curve for staff to become familiar with the audiovisual data model.
 
An ambition for Archivists has been to expose more of the audiovisual collection to public searchers, because at the moment for various reasons it is largely invisible.  The new data model means that can be changed and improved.  In addition Mediaflex can automatically create low resolution digital access copies on the fly, which brings the potential to make more of the collection digitally available.  There is still more work to be done in this area since RecordSearch is remaining the front end for public searchers for the foreseeable future.  Therefore a fair amount of configuration work has already been undertaken to enable exchange of metadata between the Audiovisual Asset Management System Mediaflex and RecordSearch.   
An immediate benefit that Mediaflex has brought is the ability to much better manage storage of audiovisual items.  These items require repositories of different temperatures e.g. cold and cool, and conditioning rooms between for the gradual movement of items into room temperature for access or digitisation.  In addition there are a variety of different shelving configurations for different sizes and types of items.  Mediaflex allows the management of all this, but in addition 'capacity management'.  A visual interface shows where spare space is and how full shelves are in real time.  This really helps to micro manage over 30 km of audiovisual repository space in multiple locations.

It is rewarding to see how the project achievements - the implementation of an audiovisual asset management system and digital preservation system are already having positive benefits for the NAA. As I reflect on the last 2 years (which feel as if they have passed in the blink of an eye) I attribute the success to the fantastic project team members at both the NAA and Transmedia Dynamics, as well as NAA making the right choice of software. The core project teams contributed their audiovisual expertise and worked diligently under my direction with enthusiasm and total commitment towards the end result.  There is no doubt it was challenging at times, but everyone rose to the challenge with tenacity, determination and persistence.
The National Archives of Australia is now strongly and ably positioned in the audiovisual digital arena.  It has the capability to undertake its core business much better, as well as do groovy and amazing things with the new software.  It’s very unfortunate that the current tight fiscal constraints may now hamper the capacity of the NAA to uptake the new benefits as quickly as it would like, but I am assured it will happen in time. This project achievement has boosted the confidence of the National Archives of Australia and is indeed a job well done!
 Mediaflex in use in the sound preservation lab at National Archives of Australia, Sydney Office.


Tuesday, 2 April 2013

Crowdsourcing text correction and transcription of digitised historic newspapers: a list of sites


Last month two new websites were launched giving the public access to digitised historic newspapers.  The release of a new ‘old’ digitised newspaper site is becoming a regular monthly occurrence now, with a library somewhere in the world completing a newspaper digitisation project with astonishing regularity, after what seems like such a long wait. 

The two new sites this month were the Welsh Newspapers online and the Louiseville Leader.

The Welsh site has been several years in progress and seriously considered using the National Library of Australia software for text correction, before putting it in the ‘too hard basket’. The National Library of Wales is to be commended on making the Welsh Newspapers service free (unlike the English newspapers which are still in a subscription model from the British Library).
 
The Louiseville site delivers all the issues of a key African American community newspaper covering local, national, and international news published in Louisville, Kentucky from 1917-1950. Unfortunately the building which housed original copies of the paper was badly damaged by a fire. The remaining issues, loaned by Kentucky State University and the widow of the publisher, were microfilmed by the University of Louisville, with the digital files created from that microfilm. The long and winding road the texts have taken toward digital representation has made them less than ideal candidates for optical character recognition (OCR), which has difficulty transcribing faded, torn, or misaligned texts, even when they are readable to the human eye. For this reason the site has enabled public transcription to help improve the accuracy and searchability of the newspaper content.
 
It’s great to see both of these new sites and I fully understand the difficult process many libraries have gone through to get to this point, having been there and managed a newspaper digitisation project myself. I still have a particular interest in those newspaper sites which involve the public in text correction, which is another step perhaps just too challenging for many libraries to take.  After the worldwide library applaud of the Australian Newspapers/Trove text correction beta five years ago, now an internationally hailed success, and the stated intent of many libraries to follow suit with public text correction the question arises “how many actual did?”
 
There are many libraries internationally that now offer websites to search across digitised historic newspapers and I’m not going to list all of them, just the handful that give their users the text correction or transcription ability. With Australian text correctors, now addicted to text correction of newspapers and looking elsewhere to sate their ample appetites I thought it was time to compile a list specifically of text correction websites for historic newspapers. To the best of my knowledge there are 9 sites now.  Who will be the 10th?? If I have inadvertently missed a site perhaps let me know in the comments. Most of the sites are for English language content but it is interesting to see a few coming through for other languages.  As a note of interest there were several foreign language historic newspapers published in Australia (Chinese, Greek, Hebrew, German) but these were put in the ‘too hard basket’ for the first stage of Australian Newspapers/Trove and sadly did not make it into the second stage either.  They give a very interesting perspective on sub communities within a wider community.
 
Congratulations to all the libraries listed below who took the first difficult step to digitise and then the more challenging step to crowdsource. Happy text correcting to all the amazing people that volunteer their valuable time to help libraries make old newspapers more accessible, I hope you enjoy the list. The sites are all slightly different but work on the general basis of showing a digitised page and asking for public correction/transcription of the OCR text created from that page. If the OCR text is improved then keyword searching of the newspapers is improved.  It particularly helps to correct people’s names, especially in family notices, births and deaths, since these are often the first thing that users search on.
 
List of historic/old digitised newspaper sites that offer public text correction/transcription: March 2013
US Newspapers
 
Australian Newspapers
Finnish Newspapers
Vietnamese Newspapers
Russian
 
Useful Resource:
Frederick Zarndt’s recent PowerPoint on crowdsourcing in libraries with a particular focus on newspapers:

Monday, 18 March 2013

The Australian National Cultural Policy 2013 released: an overview of ‘Creative Australia’ for GLAM’s (galleries, libraries, archives, museums).



After a much longer than anticipated wait Minister Simon Crean announced the release of the Australian National Cultural Policy on 13-3-13, the week of Canberra’s Centenary celebrations. The Policy named ‘Creative Australia’ is a weighty 150 pages, though happily has an online summary and search feature.

The question that Australian libraries, archives, museums and galleries will be asking is “Does the National Cultural Policy deliver all that we hoped it would for GLAM’s, and how far will it help or drive forward the challenges surrounding the digital agenda?” 

Back in January 2012 I wrote a post explaining what the purpose of the National Cultural Policy was intended to be, and summarised the feedback that the National Cultural Institutions had provided against the Draft Policy in October 2011. I also followed this post with another which explained in more detail the Digital Deluge Challenges that GLAM’s had raised in their feedback, with possible resolutions that they would like addressed in the National Cultural Policy.  It was widely hoped by National Cultural Institutions such as the National Library of Australia, the National Archives of Australia, and the National Film and Sound Archives that the Policy would provide extra or contestable funding to help with the challenges of digitising, collecting born digital, and delivering collections digitally, and the legislation that surrounded that. At that point Simon Crean had indicated the Policy would be released in March 2012 and would have considerable funding associated with it.  However due to constraints in Government funding the release was delayed since Crean said there was no point in releasing a policy which did not have the funding to back it up.  This further fuelled the expectations of the GLAM sector that the policy may release significant extra funding to them.

So does the National Cultural Policy help GLAM’s deal with the digital challenges?  The answer in a nutshell is “not really”. The Policy is much more focused on fostering the creation of new digital cultural and artistic content rather than collecting or curating it. However there are a few exceptions which I will highlight below.

As Crean had hinted the Policy comes with considerable funding - $235 million to be exact. However the lion’s share of this (over $75 million) goes to reforming the Australia Council. Crean says:

"The Australian Government will immediately implement structural reforms to the Australia Council. These are the most significant since its creation 40 years ago at a time when the arts were only beginning to realise their potential. I will be introducing new legislation into Parliament next week, which will be backed by an investment of $75.3 million in new funding for the Australia Council over four years. The Australia Council will be a more responsive funding body with a clear mandate to support and promote a vibrant and distinctively Australian creative arts practice, and have a new emphasis on independent peer-assessed grants to recognise and build artistic excellence.”

A summary breakdown of the funding as given in Crean’s press release is below:


 

The Policy has five goals and the funding is intended to be targeted to attain the goals.

Goal 1: Recognise, respect and celebrate the centrality of Aboriginal and Torres Strait Islander cultures to the uniqueness of Australian identity.

Goal 2: Ensure that government support reflects the diversity of Australia and that all citizens, wherever they live, whatever their background or circumstances, have a right to shape our cultural identity and its expression.

Goal 3: Support excellence and the special role of artists and their creative collaborators as the source of original work and ideas, including telling Australian stories.

Goal 4: Strengthen the capacity of the cultural sector to contribute to national life, community wellbeing and the economy.

Goal 5: Ensure Australian creativity thrives here and abroad in the digitally enabled 21st century, by supporting innovation, the development of new creative content, knowledge and creative industries.

The relevant parts of the National Cultural Policy for GLAM are:

Digitising collections:

The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) finally gets a good chunk of money. They’ve been given $12.8 million   for the digitisation of their indigenous collections. This potentially can go a long way if they set up mass digitisation processes such as the National Library did.  At the National Library $10 million digitised 50 million items. But if mass digitisation was not in place this money would likely only cover digitisation of up to 1 million paper items, less if it was AV.

Collecting born digital:

The Cultural Policy signals the intent of the Government to finally change the 1968 Copyright Act which would give the National Library of Australia the right to collect digital as well as hard copy published items.  This is known as legal deposit. Digital legal deposit would cover content on Australian websites as well as e-books and blogs.  The National Library has been campaigning for years without success to change legal deposit to include digital, so this statement of intent is a positive step forward.  There is still no timeframe around the legal change and it’s likely to take some time. The Australian Law Reform Commission is reviewing copyright exceptions for the digital environment. The copyright Inquiry is being led by Professor Jill McKeough. An issues paper was released in August 2012. A discussion paper is likely to be released later in 2013 with another call for responses from interested parties such as publishers, content developers and collecting institutions who commented last time round.

Crean has also stated:  "We will also work to develop a new legal deposit scheme for the National Film and Sound Archive of Australia to collect and preserve Australian audio-visual material."

This is all good but raises some questions on the specific roles and potential overlap of functions of the National Archives of Australia, the National Library of Australia and the National Film and Sound Archive. The National Library has been collecting Government websites for some time now, but this is actually a core role of the National Archives.  The National Film and Sound Archives collect commercial and non-commercial AV content, whilst the National Archives collect material from Government broadcasters such as the ABC.  Interestingly although the National Library, and National Film and Sound Archive get several mentions in the policy the National Archives hardly does. This may be due to the much stronger, detailed responses the NLA and NFSA sent in to the draft policy.

Dealing with the Digital Deluge vs. Physical:

The Cultural collecting sector clearly stated that they were appreciative of the money given to them by Government each year to build, manage and maintain their collections.  The Policy states, as the draft policy did how much this is for 2012/2013:

  • National Archives of Australia $62.6 million
  • National Library of Australia $59.6 million
  • National Gallery of Australia $46.4 million
  • National Museum of Australia $42.9 million
  • National Film and Sound Archive $26.9 million
  • Australian National Maritime Museum $23.9 million
The Policy also states:

“The Australian Government remains committed to ensuring the National Collecting Institutions can continue to facilitate access to their collections and programs. The Government also remains committed to the digitisation of the collections to preserve them for future generations and provide access to a range of culturally significant material”.

However although the money sounds considerable much more is required to address the digital challenges.  The analogue/physical collections are not decreasing or requiring less management but the digital is exponentially increasing. Collecting institutions do not have the infrastructure they need to deal with it and make it accessible.  The Policy does not address this issue at all, though it acknowledges in the Appendices that collecting institutions raised it. 

The Policy actually helps to significantly increase the amount of digital cultural artefacts that will be created and therefore require collecting, particularly in audio-visual broadcasting. Large sums of money will target creation of more audio-visual content from Screen Australia and SBS. This will no doubt exacerbate the digital deluge problem for the National Film and Sound Archive, the National Archives and the National Library.

Searching and engaging with collections and content:

The Policy waxes lyrical about Trove the search and user engagement service developed by the National Library of Australia (co-incidentally that I managed from 2008-2012) even going as far as calling it a “golden moment for the cultural economy, as the historic obstacles of distance and the size of the local market disappear.” This is all very nice and good patting on the back stuff, but no money is provided to ensure that the ‘moment’ can be sustained and the collaborative service can continue or be developed. I’m not sure if the Minister was aware that the development work on the service all but ceased in 2011 when the National Library made a decision to divert its priorities elsewhere. 

National Collaboration and Networks:

An action in the policy is to “Establish a national network for museums and galleries to be managed in partnership between the National Museum of Australia and Museums Australia. The Network will work to share resources and improve access to collections across Australia, to assist industry, researchers and the public.”

I’m not quite sure what the intent of this is, whether is it a collaborative network between museums, a digital network, a shared discovery service like Trove, or simply a replacement for Collections Australia Network (CAN) ,which has had its funding entirely pulled on more than one occasion.

The expectation was that GLAM’s would be required to work more closely and collaboratively with each other to achieve their aims and pool resources, particularly for digitisation and digital discovery/access but this is not mentioned in the Policy.  There has not been a natural propensity for Australian GLAM’s to communicate, collaborate, or share openly in a formal or informal way before, so although it could be done without a policy, there was an expectation that a Policy would drive it.  Within each specific sector there are good networks, especially for libraries, but cross sector there is still some resistance to focusing on similarities rather than differences.

Leverage:

Only time will tell if the National Cultural Policy can be used as leverage to assist the work of GLAM’s, or whether it is just another document/file to be put in the ‘recycle bin’. Its intended life span is 10 years, and most of the initial funding covers a 3-4 year time period.  With a government election taking place this year and bets being placed on a change of government we will have to wait and see whether it can hold its own in the years ahead.

Saturday, 26 January 2013

Freedom, Openness and Datasets: An Australia Day View


Today, January 26th is Australia Day. This means everyone is having a day off work, and in this ‘free’ time we can reflect how lucky we are to live in our nation and celebrate this. The benefits and privileges of living in Australia are summed up by always having a sense of freedom and openness. This comes not just from the physical landscape, the big wide open red desert spaces and blue sky, but in the day to day experience of living, and the rights Australians have. 

I was very interested to read some new research last week which set out to rank countries on their level of ‘Freedom’ and give them a score out of ten. The research is published in the book ‘Towards a Worldwide Index of Human Freedom’, which was released on 8 January 2013 by the Fraser Institute. Chapter 3 by Ian Vásquez and Tanja Štumberger gives An Index of Freedom in the World’. Freedom is looked at in four areas:  Security and Safety; Freedom of Movement; Freedom of Expression; and Relationship Freedoms. The authors say:

 “We have tried to capture the degree to which people are free to enjoy the major civil liberties—freedom of speech, religion, and association and assembly—in each country in our survey. In addition, we include indicators of crime and violence, freedom of movement, and legal discrimination against homosexuals. We also include six variables pertaining to women’s freedom that are found in various categories of the index”.

The categories in detail are:

I. Security and safety

A. Government’s threat to a person

1. Extrajudicial killings

2. Torture

3. Political imprisonment

4. Disappearances

B. Society’s threat to a person

1. Intensity of violent conflicts

2. Level of organized conflict (internal)

3. Female genital mutilation

4. Son preference

5. Homicide

6. Human trafficking

7. Sexual violence

8. Assault

9. Level of perceived criminality

C. Threat to private property

1. Theft

2. Burglary

3. Inheritance

D. Threat to foreigners

II. Movement

A. Forcibly displaced populations

B. Freedom of foreign movement

C. Freedom of domestic movement

D. Women’s freedom of movement

III. Expression

A. Press killings

B. Freedom of speech

C. Laws and regulations that influence media content

D. Political pressures and controls on media content

E. Dress code in public

IV. Relationship freedoms

A. Freedom of assembly and association

B. Parental authority

C. Government restrictions on religion

D. Social hostility toward religion

E. Male-to-male relationships

F. Female-to-female relationships

G. Age of consent for homosexual couples

H. Adoption by homosexuals

The country which has the best freedom in the world and comes top in the Freedom Index is New Zealand. Australia comes 4th and the UK 18th out of 123. The table below shows the top countries. (Scores out of 10)




I feel lucky to have lived in three of the top ranked countries. Based on my own experience I think the rankings of New Zealand, Australia and UK is right.

The countries which lack freedom and are bottom are Zimbabwe 123rd; Burma/Myanmar 122nd; Pakistan 121st; Sri-Lanka 120th; and Syria 119th. We feel for their citizens who often feature in our TV news. The extract of bottom countries is below:



The report is fascinating and I suggest you read it. You might be wondering why I think this study has any relevance for librarians or archivists. Being a librarian I most commonly associate Freedom with ‘Freedom and Openness of Information’.  I was originally reading the study to see how Freedom of Information or Open Government had been scored and ranked. However this was not included in the study, perhaps because it wasn’t thought of it, or it was simply too hard.

It follows that if a country is very free then a lot more information will be generated both commercially and by the Government. This is likely to be in the public sphere at time of creation and then remain in the public sphere when it gets passed on/purchased/made accessible by National Archives, Libraries and Research Institutions. 

If information is not publicly accessible then countries with a high Freedom Index score have Freedom of Information (FOI) Acts. This enables members of the public to request to see information. USA was the first country to have a FOI in 1966. Australia and New Zealand followed in 1982, and the UK finally launched FOI in 2000.

Most of the top ranked countries in the Freedom Index are involved in a movement known as ‘Open Government’ which started in about 2009 and basically builds on the Freedom of Information Act principles. Open Government aims to make a concerted effort to release reports, research, statistics and data sets into the public domain and be transparent; to involve the citizens of the country in decision making based on the fact they would have equal access to the same information as policy decision makers; AND for citizens to help with information creation, collation, dissemination and interpretation.

In June 2009 the British Prime Minister Gordon Brown announced that Tim Berners-Lee (inventor of the Internet) would work with the UK Government to help make data more open and accessible on the Web in the UK, building on the work of the Power of Information Task Force.

On his first day in Office in January 2009 Barack Obama issued a Memorandum on Transparency and Open Government, instructing the Director of the Office of Management and Budget (OMB) to issue an Open Government Directive, which would direct agencies to take specific actions regarding transparency, participation, and collaboration.

In Australia in 2009 the Government 2.0 Taskforce recommended that Australia should have an Open Government. The Australian Declaration of Open Government was made in 2010.

At this time I had a particular interest in the Australian Declaration because it was relevant to me in my day to day work at the National Library of Australia. It said among other things:

“Collaboration with citizens is to be enabled and encouraged. Agencies are to reduce barriers to online engagement, undertake social networking, crowd sourcing and online collaboration projects and support online engagement by employees…”

In 2011 the New Zealand Government made a Declaration of OpenGovernment.

After these dramatic declarations by the USA, Australia and New Zealand President Obama took little time to try and influence the world. In September 2011 he formed the ‘Open Government Partnership’ (OGP) and 8 governments joined: Brazil, Indonesia, Mexico, Norway, Philippines, South Africa, United Kingdom, and United States (but not Australia or New Zealand).  To become a member of the OGP, participating countries must do three things:

·         embrace an agreed high-level Open Government Declaration

·         deliver a concrete action plan, developed with public consultation

·         commit to independent reporting on their progress going forward

At last check 60 countries have now joined with 47 having delivered action plans and 13 working on them. However Australia and New Zealand are not members.  Obviously it is much easier said than done to actually implement Open Government. Pia Waugh, Australian expert on Open Government has given many talks on Open Government and to read some more about the challenges and what it really means check out her 2011 blog post ‘OpenGovernment: What is it really?

The UK is notably now amending its Freedom of Information Act in consultation with the public, to take into account the opening up of data sets.  More info

Perhaps the Freedom Index had trouble ranking Open Government, so how would you do it?
Interestingly last week Craig Thomler reported in a blog post that he had attempted to rank countries by comparing the number of open data sets they had released through their national government open data sites.  He has relied on the ‘open data’ provided on the USA Open Data site to do this and notes that the results are a bit dubious. Data.gov lists 41 countries as having open data websites, out of almost 200 countries.

Government Open Data sites include:

·         Data.gov (USA) http://www.data.gov/

·         Data.gov.uk (UK) http://data.gov.uk/data

·         Data NZ  http://data.govt.nz/

·         Data (Australia)  http://data.gov.au/

The ranking results of countries providing Open Data via Government Data Sites in January 2013 are:

1. US (378,529 data sets)

2. France (353,226)

3. Canada (273,052)

4. Denmark (23,361)

5. United Kingdom (8,957)

6. Singapore (7,754)

7. South Korea (6,460)

8. Netherlands (5,193)

9. New Zealand (2,265)

10. Estonia (1,655)

11. Australia (1,124)

Is this really right that Australia is 11th? Perhaps not, because this is not the big picture.  It is wrong to assume that all data sets are created by Government (although of course a lot are).  Many more are created by researchers in academia and by commercial companies.  Geospatial and mapping data is a good example of this. For example if I was looking for Open Data Sets in Australia there are at least
8 portals I know of where I could look. Also many more individual sites that offer their own data sets. The portal sites listed below either publicly list or actually make available Australian data sets.
 
Australian Data Set Portals
Number of data sets included as at 26 January 2013
 
53,000:  National Library of Australia Trove Service, mostly from the University sector

31,000:  Research Data Australia, from the Academic and Research sector

1,124:    Data (Department of Finance), from Federal Government Agencies.

466:      Atlas of Living Australia, from Research Institutes

250:      Data.nsw.gov.au, from State Government Departments

167:      Data.vic www.data.vic.gov.au Victoria, from State Government Departments
 
78:        Data.qld.gov.au, from State Government Departments

72:      DataACT www.data.act.gov.au, from State Government Departments

This takes the total figure of Australian open data sets to between 50,000 - 94,000 depending on the duplication, if any, between these sites, and possibly moves us up to fourth position in the rankings.  Duplication… that makes me want to put my librarian hat on again.  Wouldn’t it be good if the Australian Government took on the bigger challenge and picture for data sets by utilising the knowledge and delivery services of the National Library and National Archives of Australia. they could develop an open data set portal that co-ordinated, listed, delivered and was searchable for ALL Australian data sets, rather than each sector (Government, Research Institutes, Commercial, Academic, Libraries, Archives) attempting to develop its own portal.  This would much better serve the citizens of Australia who want to find, access and use the data sets. At the end of the day the main point of the Open Government movement is about trying to better help, inform, engage and involve our citizens. Since both the National Library and National Archives of Australia are not only part of the Government, but also professional organisations that have a mandate to manage information then they have a key leadership role in Open Government and Open Data in particular.  It will be very interesting to see how this area develops over the next couple of years for them.

This evening the televised 5 minute 2013 Australia Day Address from the Governor-General talked about the importance of looking for answers to big questions, saying the internet is often our first stop. She spoke about significant research and how changes in technology and access to information can assist with ideas and innovation which often translates into economic growth. Everything she said applied to opening up data sets.

The take home messages for Australian and New Zealand Librarians and Archivists about the implications of being up there in the top of the Freedom Index and Open Government rankings are that it means:

·         Our digital collections will grow rapidly with this explosion of open and free digital data. 

·         We must further develop our search and discovery and delivery platforms to keep up with Google and ensure we maintain our relevance in digital society.

·         We need to take a lead in the Open Data movement – most especially by being involved in development of open data portals.

·         We must campaign for Digital Legal Deposit and make it a reality for Australia as it is in New Zealand, to help Libraries and Archives collect published Digital Material from the Commercial and Government sectors at point of creation.

·         Libraries and Archives are founded on freedom of information, equal access and openness; this is our tour de force.

Happy Australia Day!

Useful Extra Reading:

UK Government- Open Data White Paper: Unleashing the Potential, June 2012 http://www.cabinetoffice.gov.uk/resource-library/open-data-white-paper-unleashing-potential

 

Saturday, 10 November 2012

National Archives of Australia embraces crowdsourcing and releases ‘The Hive’.


 
The National Archives of Australia (NAA) has made a bold step into the cultural heritage crowdsourcing arena with ‘The Hive’ which was released two weeks ago. The brand makes a clever play on the word ‘Archive’ combined with the idea of a hive of working bees (the public).  The site encourages the public to transcribe archive records.

Early this year when David Fricker became Director General of the NAA he was quick to encourage staff to think innovatively, embrace change, and to harness opportunities such as crowdsourcing to improve access to our collections. He publicly spoke in favour of  crowdsourcing and a changing business model for archives at the International Council of Archives Congress in August:

“Another key development in expanding access is crowdsourcing. As many of us are now seeing, by allowing the public to contribute to the description of archival resources we are enhancing the ability of future generations to discover and learn from our archives. I also think it is a wonderful opportunity for the public to be more engaged with us as archives and to share in the work we do – preserving the memories of our nations. There is still some work to do here, in order to maximise the value of contributions and to maintain the integrity of our archives as authentic and accurate. However, I do not believe these problems are insurmountable, and indeed I believe these systems can to some extent be self-correcting.

This is a type of the co-design, citizen first activity… drawing on the interest and enthusiasm of the community to bring more of our archives into view – discoverable and retrievable…Access will be online and everywhere, improved by rich new data visualisation techniques and expanded descriptive contributions from an engaged citizenry”.

The Hive is the Archives pilot and experimentation into the potential of large scale transcription crowdsourcing to improve access to records.  Staff have looked closely at other crowdsourcing sites on offer and attempted to build on their knowledge and techniques, to provide a site that could be used as a large scale platform for a variety of transcription crowdsourcing projects.

At present the site offers just over 800 lists for the public to transcribe. Some of these are typed and some handwritten.  They are rated in difficulty as easy, medium or hard.  Part of the difficulty with this project is that the public need to have some understanding of how archives receive and describe their records to make sense of what they are being asked to do.  In simple terms archives receive vast amounts of records (referred to as consignments).  Each consignment comes with a list of the items in it.  However because of the large volume of records being received it is usual that only the consignment record is entered into the catalogue e.g. ‘100 boxes of plans and drawings’ from x government agency, rather than all the individual items on the consignment list being described in the catalogue.  The ideal scenario for users of the archives is that every item e.g. plan and drawing is described on the catalogue so that it can be found.  Without this a lot of guess work goes into finding relevant things, or alternatively personal visits are required to view the hard copy consignment lists.
 
The project that the archives is undertaking is to digitise consignment lists and then make them available for transcription by the public. Once transcribed they become searchable and the items within them can be found more easily.  Because so many of the lists are old and handwritten it is virtually impossible to get good OCR on them.  That’s where the public come in who can read them with the human eye. Also the time of the public is needed to speed up the access. Projections on the time it would take archives staff to describe the lists without public help currently stand at 210 years.  It is anticipated that a member of the public could with relative ease describe several hundred items per hour with the Hive tool, which would make a big difference, especially if there was a swarm.

The consignment lists in the pilot are those that have proved most popular with researchers and contain items in the ‘open period’, that is older than 30 years and now open to the public.  The top interest is lists of architectural drawings and historic buildings. This is closely followed by PNG patrol officer records, maritime incidents, personal records from the war office, prisoners of war, meteorology and cyclones, WW1 intelligence, and oil drilling on the Great Barrier Reef.

In the first 2 weeks 300 records have been transcribed of the 800. There is a definite preference for the lists rated hard (handwritten) and ones that involve names.

The site is well presented and gives volunteer transcribers things we know they want such as progress chart, recent activity, points scoring system, rewards, optional login using Open ID e.g. their Google ID, ability to search and choose items, or just take the next one served up, to pick easy or difficult items, to add a marker for where they got to if they are interrupted, and to favourite records.  The only slight drawback is the placing of the transcription window at the bottom of the screen rather than right or left, which often means it is hard to see the transcription window and the content you are transcribing at the same time. Also the OCR text in the transcription window and the cursor is not hooked directly to the text in the image so it is easy to get lost whilst transcribing sometimes.  This is largely because most of the lists are in tables, and the table rows and columns have not been retained in the OCR, so the OCR is somewhat muddled.  Further development of the site will largely depend on feedback given by the public users, and the ability of the archives to keep up a steady supply of new, interesting digitised consignment lists to the Hive.  The Archives is still considering how it may be able to integrate the public content back into its main catalogue RecordSearch, or integrate the Hive into RecordSearch. In the meantime the list content will remain searchable in the Hive.

There is obviously an expectation from the Archives that by making its content more discoverable it will lead to more access requests.  This is why at point of transcription there is a button which enables the user to request a copy of the item.  These requests are being met by digitising the item, and then uploading them into the main catalogue ‘RecordSearch’ with the full item description.

I congratulate the National Archives of Australia Access Team on the development of this exciting new site, which holds so much potential to improve access to records and engage with our citizens in new ways.

The screenshots below show the site in action:



 Easy level transcription- Archived drawings

Medium Level Transcription - ABC Drama Scripts
 


Difficult level transcription - Plans
 


Tuesday, 2 October 2012

Digital Motor Archive available free for the month of October in return for…...


It came to my notice last week that a publisher in the UK had digitised both the current and back issues of their magazine the ‘Commercial Motor’ which in its early life was a newspaper. In the cut throat world of publishing there are few publishers left that are still publishing the same title they were 100 years ago, and who also have a complete set of back copies.  If they fall into this bracket they are in the unique position of being able to either digitise the content themselves for their readers (usually at a loss); offer it to a library to digitise (at no cost); or sell it to a commercial e-vendor to package with another product for academia (and make a profit).  Unfortunately most choose the latter, which makes this type of content only really accessible to academics and students via academic libraries.  E-vendors normally charge high subscription rates to digitised magazines and newspapers and package them up with other content, making it only viable for large universities and national libraries to purchase, and therefore severely restricting readership to the content.

Not a lot of publishers are digitising their own content because generally speaking the cost of preparation, digitisation and OCR, and building a good website to deliver the content outweigh the amount of money they would ever recuperate from reader subscriptions. Normally a subscription to the current copy would be packaged up with old copies.  And that’s where the model fails, because often current readers have no interest in the old stuff.  People who do have an interest in the old stuff are generally a different group of people – historians, researchers etc.  The exception to this rule appears to be anything to do with hobbies such as knitting, cooking, railways, cars, and stamps. 

The Commercial Motor Archive http://archive.commercialmotor.com/ came to my notice because it is available for free for the month of October and I wondered why. It is a rich archive going from 1905 to the present day, covering a complete century and two world wars, well illustrated and with everything you ever wanted to know about commercial vehicles.   

The search and browse mechanism is very impressive and works well.  For example articles on pages have been zoned so you can search and find the article easily within a page.  It has many similarities to the hugely popular Australian Newspapers http://trove.nla.gov.au/newspaper.  The page displays alongside the OCR text to make it easier to read. You can browse covers, browse by date, and zoom in on pages.  Results can be filtered. Users can add comments and tags.  The quality of the OCR text and therefore search is very good.

It has one thing that was never implemented on Australian Newspapers (though often asked for by users) which is a little box on each page called ‘Report an error’ and this is the reason for its free access.  The site owner is hoping that as people use and read the pages in the archive they will report errors as they see them, and for this they get free access to the content.  The known errors that need to be identified are incomplete articles (where the zoning has gone wrong); OCR text error in headlines of articles; and OCR errors in text. However readers can only report them, not actually fix them.  The site states:

“the archive is beta because it isn’t perfect at the moment and there are a few glitches to be ironed out. Every article page has a 'Noticed an error?' button you can use to report a problem. Please don’t expect an immediate change to the error - we gather all the reports together and prioritise them, fixing the most pressing errors first.”

It sounds like they don’t know how many errors there are, how many people will report errors, and who and how the errors will be fixed.  Interesting.

Although the ‘report an error’ button was never implemented on Australian Newspapers (it was mainly needed to report upside down and duplicate pages) it had already been decided that a ‘super user’ would be the person to review these reports and take action.  In a world where some volunteer text correctors wanted to take on extra responsibility and have special roles like the hierarchy in the Wikipedia editors community this would have been a good thing for trusted volunteers to do.

The Commercial Motor Archive has impressed me because they are clearly striving for perfection, they understand that the fewer mistakes there are the better the search will be and they have taken a brave step by asking the public to help in return for free access.  This is indeed unusual for a commercial publisher, belonging more in the realms of libraries and archives and referred to as crowdsourcing……