Rose Holley's Blog - views and news on digital libraries and archives: Australian Newspapers

Showing posts with label Australian Newspapers. Show all posts

Thursday, 26 October 2017

National Digital Library of India

I was recently invited to travel to UNESCO HQ in New Delhi, India to give a keynote presentation at the UNESCO-National Digital Library of India International Workshop ‘Knowledge Engineering for Digital Library Design’. A small group of professional international experts had been invited to share their knowledge in the area of their expertise, mine being crowdsourcing in libraries, newspaper text correction and user led digital library design.

The Government of India along with the Ministry of Human Resource Development and the Indian Institute of Technology Kharagpur (IITK) are working on a project to develop the National Digital Library (NDL). This is going to be an important part of their national academic infrastructure and is being led by Professor Partha Chakrabarti from IITK. The question they really wanted me to answer for Trove and Australian Newspapers was “if you were doing it all again, starting now, what would you do differently”? This is so they can apply the knowledge that Australia learnt in the Indian Digital Library project.

Unlike other countries India has a growing young, rather than old population and they are not able to build and expand their universities in a timely way to meet educational demand. Therefore the government are seeing the National Digital Library as an academic network for individual and community based learning. It mainly contains books and courses and they will shortly develop the module to deliver historic newspapers. Academic libraries are also having their subscription resources harvested into it, since it is intended to be the backbone of the academic learning network. It shares some similarities with the Australian equivalent Trove. This interesting explains the National Digital Library of India.

I was really pleased to find out that India are now utilising their technology expertise for themselves in this way. If it was not for India we would not have Trove and Australian Newspapers. The National Library of Australia has been sending the more technical workflow aspects of the historic newspaper digitisation out to India contractors for the last ten years. In 2008 I was overseeing this and I had the pleasure to visit the digitisation facilities and meet the hundreds of staff in Hyderabad, Chennai, and Delhi, as well as some of the call centres and technology companies in Kolcatta. It was an eye opening experience for me to see such cutting edge technology and bright young people filled with hope and career aspirations, alongside with such extreme poverty and slums. India is an experience that lets you see the whole of humanity in a single day, which can be quite overwhelming.

UNSW has just started to make a series of targeted and highly strategic investments in developing transformative partnerships in India. India represents a major priority for UNSW as part of its 2025 Strategy under the Global Impact pillar. Building successful research and knowledge exchange partnerships in India will be key to the success of the UNSW India Strategy. India is a growing source of innovation and is home to some of the world’s most dynamic and innovative companies who are at the forefront of digital disruption, social enterprise and inclusive development. India’s research system is also growing as the Government of India considers its investment and capacity building strategy in higher education and research.

This week for Diwali the UNSW campus is being transformed and ‘The Festival of India 2017’ will be a stimulating, event-packed week that celebrates and promotes Australia’s partnership and friendship with the rapidly emerging global powerhouse – India. As the campus grounds transform into a little India – this unique festival will showcase not only the country’s rich, cultural offerings but also its ground breaking developments in innovation, finance, scientific research and economic growth. In November there will be an inaugural Research Roadshow:

To enable UNSW researchers to travel to India to make new connections and/or strengthen existing relationships;
To showcase UNSW’s capabilities, especially in the following research areas: smart cities, energy, water, climate, health and social enterprise sectors to prospective Indian partners.
To initiate and nurture strategic industry partnerships that will lead to knowledge exchange outcomes.

Expected outcomes will be the identification or consolidation of opportunities that will lead to future collaborative research partnerships with academic, industry and government partners.

In the meantime as I contemplated preparing my presentation I unfortunately was one of the many Australian’s struck down with the virulent strain of influenza in the recent Australia wide outbreak of flu. As the time approached to travel I realised I really was still not well enough. This resulted in me asking the Creative Media Unit at UNSW Canberra for help. John Carroll used his wonderful skill and technologies to create a 40 minute video of my presentation which was delivered to New Delhi on video screen, link below.

I discuss the findings of my nine years of research into crowdsourcing based curation in libraries. Using the digitised historic Australian Newspapers as an example, I look at how the functionality and interface was developed in close relationship with the users, and how this led on to text correction of newspaper articles. It is nearly ten years since this pioneering project began and the motivations and achievements of the 50,000 volunteers are examined over this time. I question how successfully the goal of improving text quality and therefore search has been achieved? I propose that if a similar project was begun now then artificial intelligence software would be used such as OverProof post OCR correction tool to improve the quality of the text. OverProof has been trained on the manual corrections of the Australian newspaper corpus and trials demonstrate it is able to dramatically improve the quality of the corpus. Volunteer text correction could still continue afterwards for difficult text but the software would do the main donkey work, allowing users to have a better quality search.

The PowerPoint is on my slideshare account.

Sunday, 26 February 2012

Crowdsourcing Australian Climate Change

In my last blog post I described how knitters and yarn enthusiasts were crowdsourcing knitting patterns from digitised Australian newspapers in Trove for use in a crowdsourcing site called Ravelry http://www.ravelry.com. This week I wanted to give another example of the Australian Newspapers in Trove giving leverage to yet another crowdsourcing site. This time it’s for research into climate change and the site is called OzDocs. http://ozdocs.climatehistory.com.au/

Australian newspapers hold unique content, for example convict records and climate records. In Australia official weather records only began in 1908 when the Bureau of Meteorology was established. However there are weather tables and forecasts appearing in Australian newspapers from 1803 onwards. The newspapers therefore provide 200 years of weather records. Newspapers not only give tables with statistics of temperature, rainfall, winds etc, but also eye witness accounts of weather conditions such as floods, droughts and fire.

The citizens and politicians of Australia have a high and ongoing level of interest in climate change and how it is affecting our nation. A project investigating climate change is SEARCH: SouthEastern Australian Recent Climate History. It spans the sciences and the humanities, drawing together a team of leading climate scientists, water managers and historians in Australia to better understand south-eastern Australian climate history over the past 200–500 years. The digital newspapers in Trove are a fundamental part of SEARCH’s research process. However even though the newspapers are full-text searchable it is still a challenge to find and bring together in context the eye-witness accounts and the weather tables so that the temperatures, rainfall and other statistics can be transcribed into a research database. This is why the SEARCH project has established this month the OzDocs citizen science project, with a $10,000 grant from the University of Melbourne so that the public can help them. Basically the public are asked to find and tag historic newspaper articles on weather conditions, and transcribe useful weather statistics from historic newspapers into a database.

In 2010 Joelle Gergis, the lead SEARCH investigator spoke to me and said:

“Having all this information online and being able to quickly access it has been amazing. Being able to find weather tables from 1803 onwards in the Sydney Gazette is crucial to our research. Official records from the Bureau of Meteorology only began 100 years ago so being able to access the newspaper records which are earlier than this is really useful. The sources in Trove also show how weather events have affected society, with eye witness accounts of floods and bushfires. For example we have been researching the 1851 Black Thursday bushfires in Victoria.”

In 2010 as a locust plague swept across the south-eastern side of Australia the pilot volunteers for the project working at the State Library of New South Wales noted that weather conditions in 1825 were very similar (heavy rainfall followed by nice weather then a terrible locust plague) and found eye-witness accounts in The Sydney Gazette and New South Wales Advertiser, 24 March 1825 of a similar locust plague.

“Prior to the late rains the caterpillar, that old enemy to the agriculturist interest of the Colony, made its appearance ; but, upon the visitation of the heavy showers, their ranks were consider ably thinned. However, since the present enchanting fine weather has again set in, the number of these destructive insects has increased to an unparalleled extent, covering whole fields in their course, which in some spots seemed to be towards the South, in a line from East to West. Wherever they make their appearance, the most complete destruction immediately follows. Upon Captain Campbell’s estate, in the district of Cooke, they were supposed to be at least two inches in height.”

This month I caught up with Joelle again to see how the newly released crowdsourcing part of the project is going. She said:

“We now have over 100 volunteers who have contributed over 4000 articles. The database will be searchable in our next release. It will be the country’s first publicly searchable database of climate information using a diverse collection of pre-20th century historical records. The database will give easy access to information for researchers, organisations, government departments and the public.”

The scope of OzDocs work has now been expanded to include not only the digitised newspapers but other resources that are held in the State Library of Victoria, State Library of New South Wales and the National Library of Australia. These pre-date the newspapers by another 100 years and go back to the 1700’s. Joelle said:

“Our OzDocs volunteers will be working their way through logbooks of the first European explorers, governors’ correspondence, early settlers’ diaries, newspapers and the works of 18th and 19th century scholars.”

On of the questions the SEARCH project hopes to answer is what the South East region of Australia’s ‘natural’ climate has been like since 1788. This may ultimately help to refine current climate models, allowing more accurate climate change estimates to be developed for the future. The lack of records in a consistent accessible format before 1900 is currently making this difficult.

For the project to be successful the volunteer numbers really need to increase a lot. At the moment this is still quite a small scale effort compared to the knitting project Ravelry that I reported on in my last blog. Volunteers can join by accessing the OzDocs site. http://ozdocs.climatehistory.com.au/

Friday, 24 February 2012

Crowdsourcing knitting patterns

Wordle showing the most popular search terms in Trove.

I have been project managing the Australian Newspapers Digitisation Program, The Australian Newspapers service and Trove at the National Library of Australia for the last 5 years. The content of Trove – a free discovery service for Australian content, is now massive with a total of 250 million items from different organisations around Australia. This includes archives, pictures, books, music and newspapers. There are 62 million full-text articles coming from Australian Newspapers 1803-1954 and the Australian Women’s Weekly 1932-1982, which the National Library of Australia has digitised. About 100,000 new newspaper articles are being added each week to Trove.

Despite the content of Trove being varied over 80% of Trove usage and engagement still revolves around digitised historic Australian newspapers. They are the most used content the National Library has ever had, eclipsing everything else with usage continuing to increase. One fifth of the Australian population are regular users of Trove (4 million people). There are about 50,000 searches every hour. Over 40,000 online volunteers have corrected over 58 million lines of newspaper text to help improve the searching. This is the crowdsourcing aspect. Today there will be more than 100,000 lines of newspaper text corrected by users, this week more than 10,000 items tagged by users and this month 2,000 comments added to items by users.

I’m often asked what the most accessed items are in Trove. Unfortunately I cannot answer this because it isn’t logged on our servers. However I do know that newspapers are used more than any other content. I keep an eye on the most used search terms using Google Analytics to get a feel for what people are looking for. In fact anyone can look at search terms as they happen, second by second, by clicking the link immediately above the Trove search box. For a very long time the top search terms have been Smith, George, death, birth, hanging, suicide, murder, cricket and gold, and also anything topical e.g. Lionel Logue (the Kings Speech). The most popular articles seem to be births, deaths and marriages and articles on murders. However I have seen a recent trend whereby the terms “knitting pattern” and “knit + cast on” have knocked “death” off the top spot. I decided to look into this a bit further. I was fascinated to discover that the crowdsourcing in Trove is inter-connecting with another crowdsourcing project. It’s for knitters and is called Ravelry http://www.ravelry.com

Ravelry is a place for knitters, crocheters, designers, spinners, weavers and dyers to keep track of their yarn, tools, project and pattern information, and look to others for ideas and inspiration. The content on the site is user- driven and created by the knitting community. Ravelry lets you keep notes about your projects, see what other people are making, find the perfect pattern and connect with people who love to play with yarn from all over the world in forums.

The site was started by Jess who had been a knitter and a blogger for a while. She knew that there was all this great information out there from other fiber lovers – but with the growing number of crochet and knitting blogs, finding that information just kept getting harder. It was getting frustrating for her to try and find information about the patterns and yarns that she was interested in using. Her programmer partner Casey thought that he would be able to build a website that could solve her problems, so they started working on it together, introducing it to a few friends at a time.

A key part of the site is the database of knitting patterns, gathered together by the community, described and catalogued by them, and then knitted by them. The user community can favourite them, add comments, add patterns to projects and lists to do. They often photograph the end results and add these to the database. They can seek help from other knitters on patterns, yarns, techniques and designs.

The site is free but does require a login to look at the patterns. It is proving immensely popular. In the first weekend 15,000 knitters had signed up. Apparently quite a lot of these happened to be librarians. * In July 2010 Ravelry appealed to the community to both find and describe patterns. In one week 23,500 users categorised and assigned metadata to 160,000 patterns. The advanced search which draws on these fields for faceted searching is quite amazing, and quite frankly leaves most library catalogues for dead. Facets include availability, category, yardage, gender, source, fibre, needle size, rating, fibre, difficulty, language and more. I was quite stunned by this because this it is one of the few crowdsourcing projects I have seen that has very successfully engaged a crowd to help assign metadata to records to the highest possible level. Cataloguing is a task that most cataloguers and librarians think cannot be done well by anyone except themselves, besides which it would be far too boring to attract people’s interest. However the knitters can clearly see the value in adding descriptive metadata. For example by adding yardage or meterage required they can easily find out by searching on that field what patterns they can knit when they only have x yards left of wool.

So how does Trove come into all this? Well, as knitting regains popularity and we see the resurgence in ‘retro’ fashion from yesteryear the knitting community are falling with glee on digitised historic Australian newspapers and the Australian Women’s Weekly, particularly from the 1950’s. Someone has helpfully added the instructions into Ravelry for how to find vintage knitting patterns in Trove, which is search for knit+"cast on" or knitting patterns (now one of the top search terms – see the wordle above). If you do this in Trove you get nearly 73,000 results for knitting patterns. Most newspaper included at least one pattern a week. Of all those patterns the community has chosen to add some of the more popular ones into Ravelry so more community engagement can happen. So far 290 have been added from Australian newspapers and the Australian Women’s Weekly.

The two screenshots from Ravelry below show firstly a classic number ‘a cosy cardigan’ which appeared in the Sydney Morning Herald of 1953. It has been favourited by 145 people, one has knitted it and 93 people have added it to their queue of things to knit next. The person who has knitted it has added notes and instructions on how they did it with a colour picture of the finished garment.

The second shot shows that the most favourited pattern added to Ravelry from the National Library of Australia’s digitised Australian Women’s Weekly collection is… wait for it…… the ‘Elegant Elephant’. It has been favourited by 690 people, rated 4 out of 5 and easy to knit, knitted by 21 people in a variety of colours, and 174 more people intend to knit it soon. If you click on the pattern you will see uploaded photos of finished knitted elephants….

If you don’t want to log in to Ravelry get an overview by watching this 6 min video on Ravelry.

This is a very interesting example of re-use of material from old newspapers, one that was not even considered when newspapers were digitised. Ravelry is an outstanding site offering community engagement and crowdsourcing that has really impressed me. I love the advanced pattern search by facets. It clearly shows that for some items users don’t want a dumb it down simple search box. They want ADVANCED SEARCHING, MORE DESCRIPTIVE METADATA AND FACETS! They are prepared to add the descriptive metadata themselves.

The only thing I have ever knitted myself was a pink and blue tea-cosy for my mother as a present when I was 14. My mother was a teapot collector then, but interestingly only had 2 tea cosies. Knitted tea cosies are becoming popular again. However I am contemplating knitting the Australian Women’s Weekly ‘Elegant Elephant’, maybe in pink, my favourite colour? At least if I get stuck I know I will be able to get online help in Ravelry, and it looks easier than a tea cosy!

* I acknowledge the use of Nyssa Parkes article ‘Fibre FRBRisation’ in the November 2011 issue of Incite Magazine. The statistics on Ravelry user engagement come from this article.

Saturday, 11 February 2012

Crowdsourcing: more cool sites to give libraries, archives and museums inspiration

Many people know of my interest in the relevance and application of online digital crowdsourcing for libraries, archives and museums, due to an article I wrote in 2010 called ‘Crowdsourcing: how and why should libraries do it?’, and my initiation of the Australian Newspapers public text correction. People therefore often send me links to sites they think may interest me. This is really great. Sometimes sites which are nothing to do with libraries or archives may give us ideas. There is a ‘List of Crowdsourcing Projects’ in Wikipedia (which is separate to the main article on crowdsourcing). This is a useful starting point to get an overview of the sorts of activities going on. It goes without saying that Wikipedia is of course the greatest crowdsourcing project ever!

In this post I wanted to mention some newish crowdsourcing projects that I have been looking at that interest me, and that I haven’t written about before.

1. Star Wars Uncut (SWU) Released August 2011

About the project: In 2009, Casey Pugh a web developer asked thousands of Internet users to remake "Star Wars: A New Hope" into a fan film, 15 seconds at a time. Contributors were allowed to recreate scenes from Star Wars however they wanted. Multiple submissions were submitted for each scene, and votes were held to determine which ones would be added to the final film. Although the scenes reflect the dialogue and imagery of the original film, each scene is created in a separate distinct style, such as live-action, animation and stop-motion. Within just a few months SWU grew into a wild success. The creativity that poured into the project was unimaginable. SWU has been featured in documentaries, news features and conferences around the world for its unique appeal. In 2010 it won a Primetime Emmy for Outstanding Creative Achievement in Interactive Media. Now the crowdsourced project has been stitched together and put online in YouTube and Vimeo. The "Director's Cut" is a feature-length film that contains hand-picked scenes from the entire StarWarsUncut.com collection.

Relevance for libraries and archives: In the world of film, TV and radio fans and consumers are the subject experts. They not only have in-depth knowledge, but also have the motivation and interest to share their knowledge with others in creative ways. This project really shows that. The fans apparently had no trouble identifying specific seconds in a very long film. This type of knowledge and interest is really useful for librarians and archivists when you want to open up discovery of audio items. It is much more likely that a fan will know which series, episode, minute and second a subject came up, or a thing was said than the librarian who created the catalogue record. The knowledge could be used to help with the discovery process. At the moment most audio is still catalogued and described at item level for example “it’s an interview with x”. It is still a costly and difficult process to convert speech from audio into text, and to manually add subject tags. Most of our historic audio collections do not have this level of discoverability. A crowdsourcing project which taps into the crowd to help make films more discoverable by use of public tags is ‘Waisda’. We know that the public like to consume by watching and listening, but they also want to create and share. There is potential for crowdsourcing to improve accessibility of historic digitised audio especially that which has a fan base or is iconic.

2. What’s on the Menu (New York Public Library) Launched April 2011.

About the project: With approximately 40,000 menus dating from the 1840s to the present, The New York Public Library’s restaurant menu collection is one of the largest in the world, used by historians, chefs, novelists and everyday food enthusiasts. But the menus cannot be searched for specific information about the dishes and prices. To solve this problem the NYPL is appealing for the public to transcribe the menus, dish by dish. Doing this will enable the collection to be accessed and researched in new ways, opening the door to new kinds of discoveries. The site was launched in late April 2011 and the original aim was to transcribe the 9,000 menus photographed several years before for inclusion in the NYPL Digital Gallery. Volunteers transcribed all of these in the first three months, so more items have been scanned from the collection and are now awaiting transcription. As of 5 February 2012, there have been 758,748 dishes transcribed from 12,167 menus. The ultimate goal is to get the whole collection transcribed and to turn it into a powerful research tool. NYPL are also looking into partnering with other libraries and archives with menu collections.

Researchers who use the collection for example historians, chefs, nutritional scientists, and novelists, are looking for a juicy period detail. They often have very specific questions they’re trying to answer for example:

“Where were oysters served in 19^th century New York and how did their varieties and cost change over time?”

“When did apple pie first appear on a menu? What about pizza?”

“What was the price of a cup of coffee in 1907?”

To find out these sorts of things more easily, the text on the cards needs to be transcribed. Quotes on their website about the usefulness of the project:
Rich Torrisi, New York Chef:

“What’s on the Menu is a tremendous educational resource that breathes life into our city’s most beloved restaurants and dishes. It has been an indispensable and hugely inspirational tool in the ongoing development of my restaurant…”

Mario Batali, New York Chef, Author, Entrepreneur:

“Menu writing is an art form seldom appreciated, In our restaurants, we put an incredible amount of time and thought into crafting menus. It’s remarkable to see menus being preserved and documented, for them to become a resource for future chefs, sociologists, historians and everyone who loves food. It’s not just What’s on the Menu, it reveals so much more.”

Relevance for libraries and archives: Libraries love to collect and keep stuff and that includes things like menu’s, tickets, pamphlets, posters, invitations, theatre programs and greeting cards. We call this stuff ‘ephemera’. Ephemera is a Greek word and it means printed matter that it is intended to be transitory, short lived, or only last a day. When the item is created it is not intended that it will be retained or preserved. However I haven’t encountered a single library that did not have a large ‘ephemera’ collection and intend to keep it long-term. The National Library of Australia is no exception and collects ephemera because it is “a record of Australian life and social customs, popular culture, national events, and issues of national concern”. There are 2.3 million items of ephemera in the collection at the NLA. Nearly 170,000 of them have been digitised and are browsable by title.

However their full potential has still not been unlocked. Ephemera is printed on a few pages which usually contain both words and pictures. When ephemera is digitised it is scanned or photographed as an image file, and therefore the text is not indexed or searchable. It would be very hard to apply OCR on the text because of the varying and usually fancy typefaces used. The only way to make the text searchable, thereby unlocking the full discoverability potential is to manually transcribe it. Librarians don’t have time for this, but an interested public do. Give them a really interesting or topical ephemera collection like the menu cards and watch them go!

3. Historypin Launched July 2011

About the project: Historypin was launched in July 2011. It allows people to upload historic and contemporary photos, videos and sounds to a specific geo location on a map of the world. Well it’s actually not just any map, it’s a Google map and this is likely to make all the difference. It’s a combination of a crowdsourcing project (they want organisations and individuals to load content), a useful educational site, and a service that libraries and archives can hook into to expose their content and collections to new audiences (similar to Flickr Commons). I’ve seen quite a few sites like this before, but on a small scale for specific locations. For example Sydney Sidetracks was launched in 2008 by the ABC in partnership with The Dictionary of Sydney, The National Film and Sound Archive, The City of Sydney, The Powerhouse Museum, The State Library of New South Wales and the Museum of Contemporary Art. There is a website and mobile app from which historic images, videos and sound are available for locations in central Sydney overlaid on a map.

The big difference with Historypin is that it has been developed by ‘We Are What We Do’,( a not for profit organisation that creates ways for millions of people to do more small, good things) in partnership with Google. Google is the main technology partner on the project and has helped with Google tools, including Google Maps, Google Street View, Picasa, Google App Engine and Android. Google has supported the development costs of the project with donations and sponsorship. It has also given marketing support and created the video to promote the service: a one minute introduction to Historypin. This means this is not some small scale project that may suffer from lack of budget, development, maintenance or marketing. It is something likely to be around for a while and perhaps rival Flickr Commons. Google says “We share ‘We Are What We Do’s commitment to Historypin as a non-commercial, collaborative project that delivers social impact and contributes to digital inclusion.”

The marketing blurb says “Historypin is a way for millions of people to come together, from across different generations, cultures and places, to share small glimpses of the past and to build up the huge story of human history through a well-known medium - picture.”

Relevance for libraries and archives: Interestingly although the initial crowd Historypin were trying to attract was the public to contribute their photos and stories, it now appears that the crowd may actually be the libraries and archives community. This community has massive amounts of digitised content in image, video and sound format, and they want it more widely exposed, tagged, and used. A service in which libraries and archives can do this, which they don’t have to develop and support themselves, and has no geographical boundaries is certainly a drawcard. Batch upload has already been enabled, as has ‘make your own collection’ and ‘view slideshow’. You can pin your content on any Google Street View scene, in any country of the world. If you happen to be somewhere that Street View hasn’t yet been – don’t worry you can still pin your content down. It is a service that will be more valuable the more content there is. I only wonder if they have under-estimated the interest that libraries and archives will have in joining, and the volume of content they will have. If so it is advisable to get in early in case there is a three year waiting list like Flickr Commons had when it started. This is a crowdsourcing project that has a direct relevance to libraries and archives, no matter what their size or where they are located.

TEDx video: Nick Stanhope on mapping history

4. Ancient Lives – Decoding Papyri Launched July 2011

About the project: The Ancient Lives project presents you with fragments of 1,000-year-old papyri to decode. The papyrus was discovered by researchers from Oxford University over a century ago in Oxyrhynchus (the city of the long-nosed fish).

“With about 100 men from the local village, Grenfell and Hunt dug in the high winds roaring across the desert. In early January of 1897 a papyrus containing the apocryphal Gospel of Thomas was unearthed, and then a fragment of St. Matthew’s Gospel. The flow of papyri began. Within a few years not only Thucydides and Plato were delicately pulled from the sand, but also Greek lyric poetry that had not been seen or read in about 1000 years. Further, the private documents of this vanished city were collected en masse: private letters, accounts, wills, marriage certificates, land leases, etc. Ancient garbage became a modern treasure. By 1907 the digging ceased. 700 boxes of papyri, potentially carrying about 500,000 fragments, made the long journey back to Oxford University, where Grenfell and Hunt opened up a new branch of study: papyrology. A little over a century later, only a small percentage has been translated by scholars. The Oxyrhynchus collection is owned and overseen by the Egypt Exploration Society.”

The papyrus can be decoded easily by volunteers who match known characters from a grid to the unknown characters on the fragment. Fragments can be matched by adding measurements of the fragments and the columns within them. The task is mammoth and before the arrival of the online tool could only be undertaken by scholars who were familiar with the code. A very difficult task has been effectively simplified, whilst retaining the challenge that is found in crosswords or code-breaking.

The project was launched in July 2011 and is part of the the Citizen Science Alliance, which is a transatlantic collaboration of universities and museums who are dedicated to involving everyone in the process of science. Growing out of the wildly successful Galaxy Zoo project, it builds and maintains the Zooniverse network of crowdsourcing projects, of which Ancient Lives is one of the newest. Nearly half a million people are contributing to the Zooniverse crowdsourcing projects.

Relevance for libraries and archives: This is a good example of a task that appears on the surface to be too difficult and extensive for a crowd to undertake. By clever breaking down of the task and designing a simple user interface it becomes achievable. It also demonstrates that private information about people is of eternal interest to the public. This project along with all the other Zooniverse projects has extensive public discussion forums to firstly foster the volunteer community and secondly let them know how their work helps new discoveries and knowledge grow and develop. We can learn much from how Zooniverse treats its volunteer community.

5. Duolingo - translate the web and learn a new language Launched November 2011

About the project: Luis Von Ahn of the Carnegie Mellon University is the creator of CAPTCHA and reCAPTCHA. Google bought both and reCAPTCHA has effectively helped Google Books improve the OCR in its digitised books word by word. Each year 750 million people are unwittingly converting the equivalent of 2.5 million books by using reCAPTCHA. This is a crowdsourcing project where people don’t realise they are in a crowd or what they are doing. Luis is now working on a new project: Duolingo. Luis says “Before the internet the biggest projects had 100,000 people involved and with that you could for example put a man on the moon. My question is what can you achieve with the internet when you can have 100 million people working together on something?” A good question. Especially when you combine the number of people with all that ‘cognitive surplus’ that Clay Shirky is always talking about.

Duolingo will help people learn a new language and simultaneously (unwittingly) translate the Web. He says “It is estimated that there are over 1 billion people learning a foreign language at any given time”. OK so this means a big potential crowd. The Google translator tool is quite good at translating websites but not as good as he thinks the new project Duolingo will be. The site went live in beta mode in November 2011, but only a few road testers have been accepted. There is a waiting list of 100,000 who want to join the site already. Luis says “Duolingo is a 100% free language learning site in which people learn by helping to translate the Web. That is, they learn by doing.” The difference to reCAPTCHA is that people will know what they are doing and consciously want to do it. Watch this space.

Relevance for libraries and archives: I’m not sure what the relevance for libraries and archives will be. Although reCAPTCHA is a free program that is obviously very relevant for libraries and archives it has only been utilised by commercial companies so far, namely the New York Times historic newspaper archive and Google Books. No library has utilised it. I thought I should mention the new project Duolingo since the potential also seems big. It’s a good idea to translate the web, but I also like the idea of something Luis didn’t mention which is translating books and newspapers into different languages. A question that the National Library of Australia was thinking about last week was “will our volunteer newspaper text correctors be as keen to correct Australian newspapers in foreign languages as they are the English ones? Will they correct them even if they don’t speak the language?” We are asking this because we will soon be adding Australian newspapers in foreign languages to Trove. If this content is classed as ‘part of the web waiting to be translated’, then I guess Duolingo holds big relevance for all national libraries. Duolingo is at an early stage of development so we will have to wait and see. That is unless libraries want to be really pro-active and actually make suggestions to the development team for things that would help them make their content more widely accessible and used……

The TEDx video: Luis talking on CAPTCHA, reCAPTCHA and Duo-lingo

I hope you find some inspiration from these five crowdsourcing sites for your library, archive or museum. If there is a newish site of relevance to libraries and archives that you think I’ve missed please add a comment to this post and share. Crowdsourcing sites I have previously reviewed are:

· Picture Australia (National Library of Australia)

· FamilySearchIndexing (Church of Latter Day Saints)

· Distributed Proofreaders (contributes to Project Gutenberg)

· Wikipedia

· UK MP's Expenses (The Guardian)

· Galaxy Zoo (Citizen Science Alliance)

· BBC WorldWar2 Peoples War (BBC)

· Digitalkoot (National Library of Finland)

· Old Weather (National Maritime Museum and Citizen Science Alliance)

· Remember Me: Displaced Children of the Holocaust (United States Holocaust Memorial Museum)

· Trove Australian Newspapers (National Library of Australia)

· Transcribe Bentham (University College of London)

· Waisda (Netherlands Institute for Sound and Vision)

Read more - related posts by Rose Holley on crowdsourcing:

· Gold star to text correctors for e-books, 13 December 2011

· Software for journal and newspaper text correction, 18 December 2011

· Digital cultural heritage awards for crowdsourcing, 4 February 2012

In March 2011 images of the digitised Australian Women's Weekly 1932- 1984 were projected onto the National Library of Australia building as part of the ‘Enlighten’ Festival in Canberra. Nearly 395,000 articles from the Australian Women's Weekly can be improved by public text correction in Trove. Photograph by Paul Hagon.