Tuesday 2 April 2013

Crowdsourcing text correction and transcription of digitised historic newspapers: a list of sites


Last month two new websites were launched giving the public access to digitised historic newspapers.  The release of a new ‘old’ digitised newspaper site is becoming a regular monthly occurrence now, with a library somewhere in the world completing a newspaper digitisation project with astonishing regularity, after what seems like such a long wait. 

The two new sites this month were the Welsh Newspapers online and the Louiseville Leader.

The Welsh site has been several years in progress and seriously considered using the National Library of Australia software for text correction, before putting it in the ‘too hard basket’. The National Library of Wales is to be commended on making the Welsh Newspapers service free (unlike the English newspapers which are still in a subscription model from the British Library).
 
The Louiseville site delivers all the issues of a key African American community newspaper covering local, national, and international news published in Louisville, Kentucky from 1917-1950. Unfortunately the building which housed original copies of the paper was badly damaged by a fire. The remaining issues, loaned by Kentucky State University and the widow of the publisher, were microfilmed by the University of Louisville, with the digital files created from that microfilm. The long and winding road the texts have taken toward digital representation has made them less than ideal candidates for optical character recognition (OCR), which has difficulty transcribing faded, torn, or misaligned texts, even when they are readable to the human eye. For this reason the site has enabled public transcription to help improve the accuracy and searchability of the newspaper content.
 
It’s great to see both of these new sites and I fully understand the difficult process many libraries have gone through to get to this point, having been there and managed a newspaper digitisation project myself. I still have a particular interest in those newspaper sites which involve the public in text correction, which is another step perhaps just too challenging for many libraries to take.  After the worldwide library applaud of the Australian Newspapers/Trove text correction beta five years ago, now an internationally hailed success, and the stated intent of many libraries to follow suit with public text correction the question arises “how many actual did?”
 
There are many libraries internationally that now offer websites to search across digitised historic newspapers and I’m not going to list all of them, just the handful that give their users the text correction or transcription ability. With Australian text correctors, now addicted to text correction of newspapers and looking elsewhere to sate their ample appetites I thought it was time to compile a list specifically of text correction websites for historic newspapers. To the best of my knowledge there are 9 sites now.  Who will be the 10th?? If I have inadvertently missed a site perhaps let me know in the comments. Most of the sites are for English language content but it is interesting to see a few coming through for other languages.  As a note of interest there were several foreign language historic newspapers published in Australia (Chinese, Greek, Hebrew, German) but these were put in the ‘too hard basket’ for the first stage of Australian Newspapers/Trove and sadly did not make it into the second stage either.  They give a very interesting perspective on sub communities within a wider community.
 
Congratulations to all the libraries listed below who took the first difficult step to digitise and then the more challenging step to crowdsource. Happy text correcting to all the amazing people that volunteer their valuable time to help libraries make old newspapers more accessible, I hope you enjoy the list. The sites are all slightly different but work on the general basis of showing a digitised page and asking for public correction/transcription of the OCR text created from that page. If the OCR text is improved then keyword searching of the newspapers is improved.  It particularly helps to correct people’s names, especially in family notices, births and deaths, since these are often the first thing that users search on.
 
List of historic/old digitised newspaper sites that offer public text correction/transcription: March 2013
US Newspapers
 
Australian Newspapers
Finnish Newspapers
Vietnamese Newspapers
Russian
 
Useful Resource:
Frederick Zarndt’s recent PowerPoint on crowdsourcing in libraries with a particular focus on newspapers:

Monday 18 March 2013

The Australian National Cultural Policy 2013 released: an overview of ‘Creative Australia’ for GLAM’s (galleries, libraries, archives, museums).



After a much longer than anticipated wait Minister Simon Crean announced the release of the Australian National Cultural Policy on 13-3-13, the week of Canberra’s Centenary celebrations. The Policy named ‘Creative Australia’ is a weighty 150 pages, though happily has an online summary and search feature.

The question that Australian libraries, archives, museums and galleries will be asking is “Does the National Cultural Policy deliver all that we hoped it would for GLAM’s, and how far will it help or drive forward the challenges surrounding the digital agenda?” 

Back in January 2012 I wrote a post explaining what the purpose of the National Cultural Policy was intended to be, and summarised the feedback that the National Cultural Institutions had provided against the Draft Policy in October 2011. I also followed this post with another which explained in more detail the Digital Deluge Challenges that GLAM’s had raised in their feedback, with possible resolutions that they would like addressed in the National Cultural Policy.  It was widely hoped by National Cultural Institutions such as the National Library of Australia, the National Archives of Australia, and the National Film and Sound Archives that the Policy would provide extra or contestable funding to help with the challenges of digitising, collecting born digital, and delivering collections digitally, and the legislation that surrounded that. At that point Simon Crean had indicated the Policy would be released in March 2012 and would have considerable funding associated with it.  However due to constraints in Government funding the release was delayed since Crean said there was no point in releasing a policy which did not have the funding to back it up.  This further fuelled the expectations of the GLAM sector that the policy may release significant extra funding to them.

So does the National Cultural Policy help GLAM’s deal with the digital challenges?  The answer in a nutshell is “not really”. The Policy is much more focused on fostering the creation of new digital cultural and artistic content rather than collecting or curating it. However there are a few exceptions which I will highlight below.

As Crean had hinted the Policy comes with considerable funding - $235 million to be exact. However the lion’s share of this (over $75 million) goes to reforming the Australia Council. Crean says:

"The Australian Government will immediately implement structural reforms to the Australia Council. These are the most significant since its creation 40 years ago at a time when the arts were only beginning to realise their potential. I will be introducing new legislation into Parliament next week, which will be backed by an investment of $75.3 million in new funding for the Australia Council over four years. The Australia Council will be a more responsive funding body with a clear mandate to support and promote a vibrant and distinctively Australian creative arts practice, and have a new emphasis on independent peer-assessed grants to recognise and build artistic excellence.”

A summary breakdown of the funding as given in Crean’s press release is below:


 

The Policy has five goals and the funding is intended to be targeted to attain the goals.

Goal 1: Recognise, respect and celebrate the centrality of Aboriginal and Torres Strait Islander cultures to the uniqueness of Australian identity.

Goal 2: Ensure that government support reflects the diversity of Australia and that all citizens, wherever they live, whatever their background or circumstances, have a right to shape our cultural identity and its expression.

Goal 3: Support excellence and the special role of artists and their creative collaborators as the source of original work and ideas, including telling Australian stories.

Goal 4: Strengthen the capacity of the cultural sector to contribute to national life, community wellbeing and the economy.

Goal 5: Ensure Australian creativity thrives here and abroad in the digitally enabled 21st century, by supporting innovation, the development of new creative content, knowledge and creative industries.

The relevant parts of the National Cultural Policy for GLAM are:

Digitising collections:

The Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) finally gets a good chunk of money. They’ve been given $12.8 million   for the digitisation of their indigenous collections. This potentially can go a long way if they set up mass digitisation processes such as the National Library did.  At the National Library $10 million digitised 50 million items. But if mass digitisation was not in place this money would likely only cover digitisation of up to 1 million paper items, less if it was AV.

Collecting born digital:

The Cultural Policy signals the intent of the Government to finally change the 1968 Copyright Act which would give the National Library of Australia the right to collect digital as well as hard copy published items.  This is known as legal deposit. Digital legal deposit would cover content on Australian websites as well as e-books and blogs.  The National Library has been campaigning for years without success to change legal deposit to include digital, so this statement of intent is a positive step forward.  There is still no timeframe around the legal change and it’s likely to take some time. The Australian Law Reform Commission is reviewing copyright exceptions for the digital environment. The copyright Inquiry is being led by Professor Jill McKeough. An issues paper was released in August 2012. A discussion paper is likely to be released later in 2013 with another call for responses from interested parties such as publishers, content developers and collecting institutions who commented last time round.

Crean has also stated:  "We will also work to develop a new legal deposit scheme for the National Film and Sound Archive of Australia to collect and preserve Australian audio-visual material."

This is all good but raises some questions on the specific roles and potential overlap of functions of the National Archives of Australia, the National Library of Australia and the National Film and Sound Archive. The National Library has been collecting Government websites for some time now, but this is actually a core role of the National Archives.  The National Film and Sound Archives collect commercial and non-commercial AV content, whilst the National Archives collect material from Government broadcasters such as the ABC.  Interestingly although the National Library, and National Film and Sound Archive get several mentions in the policy the National Archives hardly does. This may be due to the much stronger, detailed responses the NLA and NFSA sent in to the draft policy.

Dealing with the Digital Deluge vs. Physical:

The Cultural collecting sector clearly stated that they were appreciative of the money given to them by Government each year to build, manage and maintain their collections.  The Policy states, as the draft policy did how much this is for 2012/2013:

  • National Archives of Australia $62.6 million
  • National Library of Australia $59.6 million
  • National Gallery of Australia $46.4 million
  • National Museum of Australia $42.9 million
  • National Film and Sound Archive $26.9 million
  • Australian National Maritime Museum $23.9 million
The Policy also states:

“The Australian Government remains committed to ensuring the National Collecting Institutions can continue to facilitate access to their collections and programs. The Government also remains committed to the digitisation of the collections to preserve them for future generations and provide access to a range of culturally significant material”.

However although the money sounds considerable much more is required to address the digital challenges.  The analogue/physical collections are not decreasing or requiring less management but the digital is exponentially increasing. Collecting institutions do not have the infrastructure they need to deal with it and make it accessible.  The Policy does not address this issue at all, though it acknowledges in the Appendices that collecting institutions raised it. 

The Policy actually helps to significantly increase the amount of digital cultural artefacts that will be created and therefore require collecting, particularly in audio-visual broadcasting. Large sums of money will target creation of more audio-visual content from Screen Australia and SBS. This will no doubt exacerbate the digital deluge problem for the National Film and Sound Archive, the National Archives and the National Library.

Searching and engaging with collections and content:

The Policy waxes lyrical about Trove the search and user engagement service developed by the National Library of Australia (co-incidentally that I managed from 2008-2012) even going as far as calling it a “golden moment for the cultural economy, as the historic obstacles of distance and the size of the local market disappear.” This is all very nice and good patting on the back stuff, but no money is provided to ensure that the ‘moment’ can be sustained and the collaborative service can continue or be developed. I’m not sure if the Minister was aware that the development work on the service all but ceased in 2011 when the National Library made a decision to divert its priorities elsewhere. 

National Collaboration and Networks:

An action in the policy is to “Establish a national network for museums and galleries to be managed in partnership between the National Museum of Australia and Museums Australia. The Network will work to share resources and improve access to collections across Australia, to assist industry, researchers and the public.”

I’m not quite sure what the intent of this is, whether is it a collaborative network between museums, a digital network, a shared discovery service like Trove, or simply a replacement for Collections Australia Network (CAN) ,which has had its funding entirely pulled on more than one occasion.

The expectation was that GLAM’s would be required to work more closely and collaboratively with each other to achieve their aims and pool resources, particularly for digitisation and digital discovery/access but this is not mentioned in the Policy.  There has not been a natural propensity for Australian GLAM’s to communicate, collaborate, or share openly in a formal or informal way before, so although it could be done without a policy, there was an expectation that a Policy would drive it.  Within each specific sector there are good networks, especially for libraries, but cross sector there is still some resistance to focusing on similarities rather than differences.

Leverage:

Only time will tell if the National Cultural Policy can be used as leverage to assist the work of GLAM’s, or whether it is just another document/file to be put in the ‘recycle bin’. Its intended life span is 10 years, and most of the initial funding covers a 3-4 year time period.  With a government election taking place this year and bets being placed on a change of government we will have to wait and see whether it can hold its own in the years ahead.

Saturday 26 January 2013

Freedom, Openness and Datasets: An Australia Day View


Today, January 26th is Australia Day. This means everyone is having a day off work, and in this ‘free’ time we can reflect how lucky we are to live in our nation and celebrate this. The benefits and privileges of living in Australia are summed up by always having a sense of freedom and openness. This comes not just from the physical landscape, the big wide open red desert spaces and blue sky, but in the day to day experience of living, and the rights Australians have. 

I was very interested to read some new research last week which set out to rank countries on their level of ‘Freedom’ and give them a score out of ten. The research is published in the book ‘Towards a Worldwide Index of Human Freedom’, which was released on 8 January 2013 by the Fraser Institute. Chapter 3 by Ian Vásquez and Tanja Štumberger gives An Index of Freedom in the World’. Freedom is looked at in four areas:  Security and Safety; Freedom of Movement; Freedom of Expression; and Relationship Freedoms. The authors say:

 “We have tried to capture the degree to which people are free to enjoy the major civil liberties—freedom of speech, religion, and association and assembly—in each country in our survey. In addition, we include indicators of crime and violence, freedom of movement, and legal discrimination against homosexuals. We also include six variables pertaining to women’s freedom that are found in various categories of the index”.

The categories in detail are:

I. Security and safety

A. Government’s threat to a person

1. Extrajudicial killings

2. Torture

3. Political imprisonment

4. Disappearances

B. Society’s threat to a person

1. Intensity of violent conflicts

2. Level of organized conflict (internal)

3. Female genital mutilation

4. Son preference

5. Homicide

6. Human trafficking

7. Sexual violence

8. Assault

9. Level of perceived criminality

C. Threat to private property

1. Theft

2. Burglary

3. Inheritance

D. Threat to foreigners

II. Movement

A. Forcibly displaced populations

B. Freedom of foreign movement

C. Freedom of domestic movement

D. Women’s freedom of movement

III. Expression

A. Press killings

B. Freedom of speech

C. Laws and regulations that influence media content

D. Political pressures and controls on media content

E. Dress code in public

IV. Relationship freedoms

A. Freedom of assembly and association

B. Parental authority

C. Government restrictions on religion

D. Social hostility toward religion

E. Male-to-male relationships

F. Female-to-female relationships

G. Age of consent for homosexual couples

H. Adoption by homosexuals

The country which has the best freedom in the world and comes top in the Freedom Index is New Zealand. Australia comes 4th and the UK 18th out of 123. The table below shows the top countries. (Scores out of 10)




I feel lucky to have lived in three of the top ranked countries. Based on my own experience I think the rankings of New Zealand, Australia and UK is right.

The countries which lack freedom and are bottom are Zimbabwe 123rd; Burma/Myanmar 122nd; Pakistan 121st; Sri-Lanka 120th; and Syria 119th. We feel for their citizens who often feature in our TV news. The extract of bottom countries is below:



The report is fascinating and I suggest you read it. You might be wondering why I think this study has any relevance for librarians or archivists. Being a librarian I most commonly associate Freedom with ‘Freedom and Openness of Information’.  I was originally reading the study to see how Freedom of Information or Open Government had been scored and ranked. However this was not included in the study, perhaps because it wasn’t thought of it, or it was simply too hard.

It follows that if a country is very free then a lot more information will be generated both commercially and by the Government. This is likely to be in the public sphere at time of creation and then remain in the public sphere when it gets passed on/purchased/made accessible by National Archives, Libraries and Research Institutions. 

If information is not publicly accessible then countries with a high Freedom Index score have Freedom of Information (FOI) Acts. This enables members of the public to request to see information. USA was the first country to have a FOI in 1966. Australia and New Zealand followed in 1982, and the UK finally launched FOI in 2000.

Most of the top ranked countries in the Freedom Index are involved in a movement known as ‘Open Government’ which started in about 2009 and basically builds on the Freedom of Information Act principles. Open Government aims to make a concerted effort to release reports, research, statistics and data sets into the public domain and be transparent; to involve the citizens of the country in decision making based on the fact they would have equal access to the same information as policy decision makers; AND for citizens to help with information creation, collation, dissemination and interpretation.

In June 2009 the British Prime Minister Gordon Brown announced that Tim Berners-Lee (inventor of the Internet) would work with the UK Government to help make data more open and accessible on the Web in the UK, building on the work of the Power of Information Task Force.

On his first day in Office in January 2009 Barack Obama issued a Memorandum on Transparency and Open Government, instructing the Director of the Office of Management and Budget (OMB) to issue an Open Government Directive, which would direct agencies to take specific actions regarding transparency, participation, and collaboration.

In Australia in 2009 the Government 2.0 Taskforce recommended that Australia should have an Open Government. The Australian Declaration of Open Government was made in 2010.

At this time I had a particular interest in the Australian Declaration because it was relevant to me in my day to day work at the National Library of Australia. It said among other things:

“Collaboration with citizens is to be enabled and encouraged. Agencies are to reduce barriers to online engagement, undertake social networking, crowd sourcing and online collaboration projects and support online engagement by employees…”

In 2011 the New Zealand Government made a Declaration of OpenGovernment.

After these dramatic declarations by the USA, Australia and New Zealand President Obama took little time to try and influence the world. In September 2011 he formed the ‘Open Government Partnership’ (OGP) and 8 governments joined: Brazil, Indonesia, Mexico, Norway, Philippines, South Africa, United Kingdom, and United States (but not Australia or New Zealand).  To become a member of the OGP, participating countries must do three things:

·         embrace an agreed high-level Open Government Declaration

·         deliver a concrete action plan, developed with public consultation

·         commit to independent reporting on their progress going forward

At last check 60 countries have now joined with 47 having delivered action plans and 13 working on them. However Australia and New Zealand are not members.  Obviously it is much easier said than done to actually implement Open Government. Pia Waugh, Australian expert on Open Government has given many talks on Open Government and to read some more about the challenges and what it really means check out her 2011 blog post ‘OpenGovernment: What is it really?

The UK is notably now amending its Freedom of Information Act in consultation with the public, to take into account the opening up of data sets.  More info

Perhaps the Freedom Index had trouble ranking Open Government, so how would you do it?
Interestingly last week Craig Thomler reported in a blog post that he had attempted to rank countries by comparing the number of open data sets they had released through their national government open data sites.  He has relied on the ‘open data’ provided on the USA Open Data site to do this and notes that the results are a bit dubious. Data.gov lists 41 countries as having open data websites, out of almost 200 countries.

Government Open Data sites include:

·         Data.gov (USA) http://www.data.gov/

·         Data.gov.uk (UK) http://data.gov.uk/data

·         Data NZ  http://data.govt.nz/

·         Data (Australia)  http://data.gov.au/

The ranking results of countries providing Open Data via Government Data Sites in January 2013 are:

1. US (378,529 data sets)

2. France (353,226)

3. Canada (273,052)

4. Denmark (23,361)

5. United Kingdom (8,957)

6. Singapore (7,754)

7. South Korea (6,460)

8. Netherlands (5,193)

9. New Zealand (2,265)

10. Estonia (1,655)

11. Australia (1,124)

Is this really right that Australia is 11th? Perhaps not, because this is not the big picture.  It is wrong to assume that all data sets are created by Government (although of course a lot are).  Many more are created by researchers in academia and by commercial companies.  Geospatial and mapping data is a good example of this. For example if I was looking for Open Data Sets in Australia there are at least
8 portals I know of where I could look. Also many more individual sites that offer their own data sets. The portal sites listed below either publicly list or actually make available Australian data sets.
 
Australian Data Set Portals
Number of data sets included as at 26 January 2013
 
53,000:  National Library of Australia Trove Service, mostly from the University sector

31,000:  Research Data Australia, from the Academic and Research sector

1,124:    Data (Department of Finance), from Federal Government Agencies.

466:      Atlas of Living Australia, from Research Institutes

250:      Data.nsw.gov.au, from State Government Departments

167:      Data.vic www.data.vic.gov.au Victoria, from State Government Departments
 
78:        Data.qld.gov.au, from State Government Departments

72:      DataACT www.data.act.gov.au, from State Government Departments

This takes the total figure of Australian open data sets to between 50,000 - 94,000 depending on the duplication, if any, between these sites, and possibly moves us up to fourth position in the rankings.  Duplication… that makes me want to put my librarian hat on again.  Wouldn’t it be good if the Australian Government took on the bigger challenge and picture for data sets by utilising the knowledge and delivery services of the National Library and National Archives of Australia. they could develop an open data set portal that co-ordinated, listed, delivered and was searchable for ALL Australian data sets, rather than each sector (Government, Research Institutes, Commercial, Academic, Libraries, Archives) attempting to develop its own portal.  This would much better serve the citizens of Australia who want to find, access and use the data sets. At the end of the day the main point of the Open Government movement is about trying to better help, inform, engage and involve our citizens. Since both the National Library and National Archives of Australia are not only part of the Government, but also professional organisations that have a mandate to manage information then they have a key leadership role in Open Government and Open Data in particular.  It will be very interesting to see how this area develops over the next couple of years for them.

This evening the televised 5 minute 2013 Australia Day Address from the Governor-General talked about the importance of looking for answers to big questions, saying the internet is often our first stop. She spoke about significant research and how changes in technology and access to information can assist with ideas and innovation which often translates into economic growth. Everything she said applied to opening up data sets.

The take home messages for Australian and New Zealand Librarians and Archivists about the implications of being up there in the top of the Freedom Index and Open Government rankings are that it means:

·         Our digital collections will grow rapidly with this explosion of open and free digital data. 

·         We must further develop our search and discovery and delivery platforms to keep up with Google and ensure we maintain our relevance in digital society.

·         We need to take a lead in the Open Data movement – most especially by being involved in development of open data portals.

·         We must campaign for Digital Legal Deposit and make it a reality for Australia as it is in New Zealand, to help Libraries and Archives collect published Digital Material from the Commercial and Government sectors at point of creation.

·         Libraries and Archives are founded on freedom of information, equal access and openness; this is our tour de force.

Happy Australia Day!

Useful Extra Reading:

UK Government- Open Data White Paper: Unleashing the Potential, June 2012 http://www.cabinetoffice.gov.uk/resource-library/open-data-white-paper-unleashing-potential