Sunday, 26 February 2012

Crowdsourcing Australian Climate Change

In my last blog post I described how knitters and yarn enthusiasts were crowdsourcing knitting patterns from digitised Australian newspapers in Trove for use in a crowdsourcing site called Ravelry  This week I wanted to give another example of the Australian Newspapers in Trove giving leverage to yet another crowdsourcing site. This time it’s for research into climate change and the site is called OzDocs.

Australian newspapers hold unique content, for example convict records and climate records. In Australia official weather records only began in 1908 when the Bureau of Meteorology was established. However there are weather tables and forecasts appearing in Australian newspapers from 1803 onwards.  The newspapers therefore provide 200 years of weather records.  Newspapers not only give tables with statistics of temperature, rainfall, winds etc, but also eye witness accounts of weather conditions such as floods, droughts and fire.

The citizens and politicians of Australia have a high and ongoing level of interest in climate change and how it is affecting our nation. A project investigating climate change is SEARCH: SouthEastern Australian Recent Climate History. It spans the sciences and the humanities, drawing together a team of leading climate scientists, water managers and historians in Australia to better understand south-eastern Australian climate history over the past 200–500 years. The digital newspapers in Trove are a fundamental part of SEARCH’s research process. However even though the newspapers are full-text searchable it is still a challenge to find and bring together in context the eye-witness accounts and the weather tables so that the temperatures, rainfall and other statistics can be transcribed into a research database. This is why the SEARCH project has established this month the OzDocs citizen science project, with a $10,000 grant from the University of Melbourne so that the public can help them. Basically the public are asked to find and tag historic newspaper articles on weather conditions, and transcribe useful weather statistics from historic newspapers into a database.

In 2010 Joelle Gergis, the lead SEARCH investigator spoke to me and said:

“Having all this information online and being able to quickly access it has been amazing. Being able to find weather tables from 1803 onwards in the Sydney Gazette is crucial to our research. Official records from the Bureau of Meteorology only began 100 years ago so being able to access the newspaper records which are earlier than this is really useful. The sources in Trove also show how weather events have affected society, with eye witness accounts of floods and bushfires. For example we have been researching the 1851 Black Thursday bushfires in Victoria.”

In 2010 as a locust plague swept across the south-eastern side of Australia the pilot volunteers for the project working at the State Library of New South Wales noted that weather conditions in 1825 were very similar (heavy rainfall followed by nice weather then a terrible locust plague) and found eye-witness accounts in The Sydney Gazette and New South Wales Advertiser, 24 March 1825 of a similar locust plague.

“Prior to the late rains the caterpillar, that old enemy to the agriculturist interest of the Colony, made its appearance ; but, upon the visitation of the heavy showers, their ranks were consider ably thinned. However, since the present enchanting fine weather has again set in, the number of these destructive insects has increased to an unparalleled extent, covering whole fields in their course, which in some spots seemed to be towards the South, in a line from East to West. Wherever they make their appearance, the most complete destruction immediately follows. Upon Captain Campbell’s estate, in the district of Cooke, they were supposed to be at least two inches in height.”

This month I caught up with Joelle again to see how the newly released crowdsourcing part of the project is going. She said:

“We now have over 100 volunteers who have contributed over 4000 articles. The database will be searchable in our next release. It will be the country’s first publicly searchable database of climate information using a diverse collection of pre-20th century historical records. The database will give easy access to information for researchers, organisations, government departments and the public.”

The scope of OzDocs work has now been expanded to include not only the digitised newspapers but other resources that are held in the State Library of Victoria, State Library of New South Wales and the National Library of Australia.  These pre-date the newspapers by another 100 years and go back to the 1700’s. Joelle said:

“Our OzDocs volunteers will be working their way through logbooks of the first European explorers, governors’ correspondence, early settlers’ diaries, newspapers and the works of 18th and 19th century scholars.”

On of the questions the SEARCH project hopes to answer is what the South East region of Australia’s ‘natural’ climate has been like since 1788. This may ultimately help to refine current climate models, allowing more accurate climate change estimates to be developed for the future.  The lack of records in a consistent accessible format before 1900 is currently making this difficult.

For the project to be successful the volunteer numbers really need to increase a lot.  At the moment this is still quite a small scale effort compared to the knitting project Ravelry that I reported on in my last blog.  Volunteers can join by accessing the OzDocs site.

Friday, 24 February 2012

Crowdsourcing knitting patterns

Wordle showing the most popular search terms in Trove.
I have been project managing the Australian Newspapers Digitisation Program, The Australian Newspapers service and Trove at the National Library of Australia for the last 5 years.  The content of Trove – a free discovery service for Australian content, is now massive with a total of 250 million items from different organisations around Australia.  This includes archives, pictures, books, music and newspapers. There are 62 million full-text articles coming from Australian Newspapers 1803-1954 and the Australian Women’s Weekly 1932-1982, which the National Library of Australia has digitised. About 100,000 new newspaper articles are being added each week to Trove. 

Despite the content of Trove being varied over 80% of Trove usage and engagement still revolves around digitised historic Australian newspapers. They are the most used content the National Library has ever had, eclipsing everything else with usage continuing to increase. One fifth of the Australian population are regular users of Trove (4 million people). There are about 50,000 searches every hour. Over 40,000 online volunteers have corrected over 58 million lines of newspaper text to help improve the searching.  This is the crowdsourcing aspect. Today there will be more than 100,000 lines of newspaper text corrected by users, this week more than 10,000 items tagged by users and this month 2,000 comments added to items by users.
I’m often asked what the most accessed items are in Trove.  Unfortunately I cannot answer this because it isn’t logged on our servers.  However I do know that newspapers are used more than any other content. I keep an eye on the most used search terms using Google Analytics to get a feel for what people are looking for.  In fact anyone can look at search terms as they happen, second by second, by clicking the link immediately above the Trove search box. For a very long time the top search terms have been Smith, George, death, birth, hanging, suicide, murder, cricket and gold, and also anything topical e.g. Lionel Logue (the Kings Speech). The most popular articles seem to be births, deaths and marriages and articles on murders.  However I have seen a recent trend whereby the terms “knitting pattern” and “knit + cast on” have knocked “death” off the top spot.  I decided to look into this a bit further. I was fascinated to discover that the crowdsourcing in Trove is inter-connecting with another crowdsourcing project.  It’s for knitters and is called Ravelry

Ravelry is a place for knitters, crocheters, designers, spinners, weavers and dyers to keep track of their yarn, tools, project and pattern information, and look to others for ideas and inspiration. The content on the site is user- driven and created by the knitting community. Ravelry lets you keep notes about your projects, see what other people are making, find the perfect pattern and connect with people who love to play with yarn from all over the world in forums.
The site was started by Jess who had been a knitter and a blogger for a while. She knew that there was all this great information out there from other fiber lovers – but with the growing number of crochet and knitting blogs, finding that information just kept getting harder. It was getting frustrating for her to try and find information about the patterns and yarns that she was interested in using. Her programmer partner Casey thought that he would be able to build a website that could solve her problems, so they started working on it together, introducing it to a few friends at a time.

A key part of the site is the database of knitting patterns, gathered together by the community, described and catalogued by them, and then knitted by them. The user community can favourite them, add comments, add patterns to projects and lists to do. They often photograph the end results and add these to the database.  They can seek help from other knitters on patterns, yarns, techniques and designs.
The site is free but does require a login to look at the patterns.  It is proving immensely popular. In the first weekend 15,000 knitters had signed up.  Apparently quite a lot of these happened to be librarians. * In July 2010 Ravelry appealed to the community to both find and describe patterns. In one week 23,500 users categorised and assigned metadata to 160,000 patterns. The advanced search which draws on these fields for faceted searching is quite amazing, and quite frankly leaves most library catalogues for dead.  Facets include availability, category, yardage, gender, source, fibre, needle size, rating, fibre, difficulty, language and more.  I was quite stunned by this because this it is one of the few crowdsourcing projects I have seen that has very successfully engaged a crowd to help assign metadata to records to the highest possible level.  Cataloguing is a task that most cataloguers and librarians think cannot be done well by anyone except themselves, besides which it would be far too boring to attract people’s interest.  However the knitters can clearly see the value in adding descriptive metadata.  For example by adding yardage  or meterage required they can easily find out by searching on that field what patterns they can knit when they only have x yards left of wool. 

So how does Trove come into all this?  Well, as knitting regains popularity and we see the resurgence in ‘retro’ fashion from yesteryear the knitting community are falling with glee on digitised historic Australian newspapers and the Australian Women’s Weekly, particularly from the 1950’s.  Someone has helpfully added the instructions into Ravelry for how to find vintage knitting patterns in Trove, which is search for knit+"cast on" or knitting patterns (now one of the top search terms – see the wordle above).  If you do this in Trove you get nearly 73,000 results for knitting patterns.  Most newspaper included at least one pattern a week. Of all those patterns the community has chosen to add some of the more popular ones into Ravelry so more community engagement can happen.  So far 290 have been added from Australian newspapers and the Australian Women’s Weekly.
The two screenshots from Ravelry below show firstly a classic number ‘a cosy cardigan’ which appeared in the Sydney Morning Herald of 1953. It has been favourited by 145 people, one has knitted it and 93 people have added it to their queue of things to knit next. The person who has knitted it has added notes and instructions on how they did it with a colour picture of the finished garment.

The second shot shows that the most favourited pattern added to Ravelry from the National Library of Australia’s digitised Australian Women’s Weekly collection is… wait for it…… the ‘Elegant Elephant’. It has been favourited by 690 people,  rated 4 out of 5 and easy to knit, knitted by 21 people in a variety of colours, and 174 more people intend to knit it soon.  If you click on the pattern you will see uploaded photos of finished knitted elephants….

If you don’t want to log in to Ravelry get an overview by watching this 6 min video on Ravelry.
This is a very interesting example of re-use of material from old newspapers, one that was not even considered when newspapers were digitised.  Ravelry is an outstanding site offering community engagement and crowdsourcing that has really impressed me. I love the advanced pattern search by facets.  It clearly shows that for some items users don’t want a dumb it down simple search box. They want ADVANCED SEARCHING, MORE DESCRIPTIVE METADATA AND FACETS!  They are prepared to add the descriptive metadata themselves.
The only thing I have ever knitted myself was a pink and blue tea-cosy for my mother as a present when I was 14. My mother was a teapot collector then, but interestingly only had 2 tea cosies.  Knitted tea cosies are becoming popular again.  However I am contemplating knitting the Australian Women’s Weekly ‘Elegant Elephant’, maybe in pink, my favourite colour? At least if I get stuck I know I will be able to get online help in Ravelry, and it looks easier than a tea cosy!
* I acknowledge the use of Nyssa Parkes article ‘Fibre FRBRisation’ in the November 2011 issue of Incite Magazine.  The statistics on Ravelry user engagement come from this article.

Sunday, 19 February 2012

Digital protection of indigenous knowledge

The governments of New Zealand and Australia are unfortunately not as advanced as India in respect to protection of indigenous knowledge.  It is hardly believable but India has not only tackled the issue but also partially solved it digitally with an online database.  This post will have a look at this important topic in more detail.  It is relevant for libraries, archives, galleries and museums because they are collecting, storing, describing, handling,  loaning, digitising and displaying items which are classified as indigenous knowledge such as:
  •  moveable cultural property
  • literary and artistic works (including music, dance, song, ceremonies, symbols and designs, narratives and poetry)
  • scientific, agricultural, technical and ecological knowledge
  • human remains
  • sacred sites, burials and sites of historical significance
  • documents of Indigenous peoples' heritage (including film, photographs, video and audio recordings, and archival collections).
The problem with protecting indigenous knowledge is that most countries in the world have difficulties reconciling locally indigenous traditions, laws and cultural norms with predominantly western legal systems, effectively leaving indigenous peoples' individual and communal intellectual property rights unprotected. Also most countries do not have specific legislation or systems in place to protect and therefore prevent misuse, or commercial use of indigenous knowledge. It is usually up to individuals or tribal groups to take costly, long running and often unsuccessful court action to protect their knowledge and intellectual property. This is not how it should be.
The Wikipedia entry on Indigenous Intellectual property gives a brief overview of the subject which is a topic of international concern.  It notes two important declarations:

New Zealand: Mataatua Declaration on Cultural and Intellectual Property Rights of Indigenous Peoples (June 1993)
150 delegates from fourteen countries, including indigenous representatives from Japan, Australia, Cook Islands, Fiji, India, Panama, Peru, Philippines, Surinam, USA and New Zealand
·         Affirmed indigenous peoples' knowledge is of benefit to all humanity.
·         Recognised indigenous peoples are willing to offer their knowledge to all humanity provided their fundamental rights to define and control this knowledge is protected by the international community.
·         Insisted the first beneficiaries of indigenous knowledge must be the direct indigenous descendants of such knowledge.
·         Declared all forms of exploitation of Indigenous knowledge must cease.
Section 2 of the declaration asks State, National and International Agencies to:
·         Recognise that Indigenous peoples are the guardians of their customary knowledge and have the right to protect and control dissemination of that knowledge.
·         Recognise that indigenous peoples also have the right to create new knowledge based on cultural tradition.
·         Accept that the cultural and intellectual property rights of Indigenous peoples are vested with those who created them.

Australia: Julayinbul Statement on Indigenous Intellectual Property Rights (November 1993)
A meeting of indigenous and non-indigenous specialists agreed that indigenous intellectual property rights are best determined from within the customary laws (Aboriginal common laws) of the indigenous groups themselves.  These laws must be acknowledged and treated as equal to any other systems of law.
·         Indigenous Peoples and Nations reaffirm their right to define for themselves their own intellectual property, acknowledging the uniqueness of their own particular heritage
·         Indigenous Peoples and Nations declare that we are willing to share [our intellectual property] with all humanity provided that our fundamental rights to define and control this property are recognised by the international community
·         Aboriginal intellectual property, within Aboriginal Common Law, is an inherent, inalienable right which cannot be terminated, extinguished, or taken .. Any use of the intellectual property of Aboriginal Nations and Peoples may only be done in accordance with Aboriginal Common Law, and any unauthorised use is strictly prohibited.

Examples of indigenous knowledge being protected
1. New Zealand: The haka
Maori’s have been trying to defend their rights to the haka dance for over 10 years.  Ka Mate is the most widely known haka because it has traditionally been performed by All Blacks rugby teams at the opening of international games. It is agreed that Ka Mate was composed by Te Rauparaha, war leader of the Ngāti Toa tribe (iwi) of the North Island of New Zealand.
Between 1998 and 2006, the Ngati Toa iwi attempted to trademark Ka Mate to prevent its use by commercial organisations without their permission, but in 2006 the Intellectual Property Office of New Zealand turned their claim down on the grounds that Ka Mate had achieved wide recognition in New Zealand and abroad as representing New Zealand as a whole and not a particular trader.  However in 2009, as a part of a wider settlement of grievances, the New Zealand government agreed to:

"...record the authorship and significance of the haka Ka Mate to Ngāti Toa and ... work with Ngāti Toa to address their concerns with the haka... [but] does not expect that redress will result in royalties for the use of Ka Mate or provide Ngāti Toa with a veto on the performance of Ka Mate...".
In March 2011 a few months before the Rugby World Cup started in New Zealand the NZ Rugby Union came to an amicable agreement with the Ngati Toa not to bring the mana of Ka Mate into disrepute. In one of the final games France were fined $10,000 by the International Rugby Board for advancing towards the haka, but I think this was more to do with protecting rugby rules than the haka itself. 
Matiu Rei, head of the Ngati Tao Maori tribe said in October 2011
“We are not seeking compensation, we are seeking recognition.”
2. India: Yoga
For more than 10 years India watched as western governments granted patents, trademarks, and copyrights to what was India’s indigenous knowledge, for example yoga and herbal cures. The U.S. Patent and Trademark Office alone issued 150 yoga-related copyrights, 134 patents on yoga accessories, and 2,315 yoga trademarks. Yoga is big business around the world and is estimated to make $3 billion a year in America alone. The Indian Government did not stand by and do nothing, they decided to take action in the form of the Traditional Knowledge Digital Library (TKDL).

The aim of the TKDL is to make digitally available the indigenous knowledge of India in multiple languages, so that Patent offices can search the knowledge and then reject patents applications that are actually traditional knowledge.  India is quite lucky because things like yoga and herbal medicine are actually well described in ancient texts.  But these are in different  languages such as Sanskrit, Urdu, Arabic, Persian, Tamil (usually not English), hard to get hold of in hard copy, and not widely understood by the western world or patent examiners.  TKDL breaks the language and format barrier and makes available this information in English, French, Spanish, German and Japanese in patent application format, which is easily understandable by patent examiners. TKDL is thus a tool providing defensive protection to the rich traditional knowledge of India. In June 1999 the World Intellectual Property Organization (WIPO) and the Standing Committee on Information Technology (SCIT) recognised the need for developing countries to create Traditional Knowledge (TK) data bases. The concept of the Indian TKDL was formed in 2001.

By August 2011, 150 books on yoga, ayurveda, unani, siddha and natural medicines from multiple Indian languages had been digitised and transcribed into 4 European languages and Japanese.  In addition 1,300 yoga 'asanas' had been documented making them public knowledge. Around 250 of these `asanas' have also been made into video clips with an expert performing them.

But this hasn’t stopped self-styled yoga gurus such as Bikram Choudhury in the USA still trying to patent ‘Hot Yoga’, a set of 26 sequences practised in a heated room. Bikram and the TKDL made headline news last year because of this. Boingboing reported in March 2011 that apparently Mr. Choudhury was threatening to sue people teaching a popular style of yoga he claims to have invented and copyrighted. He also reputedly said "Because I have balls like atom bombs, two of them, 100 megatons each. Nobody fucks with me."   Perhaps he has not finished reading the full library of indigenous yoga knowledge, since I’m sure this is an attitude not encouraged by yogis?
Dr V P Gupta, who created TKDL, said "All the 26 sequences which are part of Hot Yoga have been mentioned in Indian yoga books written thousands of years ago." He added, "However, we will not legally challenge Choudhury. By putting the information in the public domain, TKDL will be a one-stop reference point for patent offices across the world. Every time, somebody applies for a patent on yoga, the office can check which ancient Indian book first mentioned it and cancel the application."

These two examples of defending indigenous knowledge and the quotes by the leaders reinforce the statements in the declarations, namely:

“indigenous peoples are willing to offer their knowledge to all humanity provided their fundamental rights to define and control this knowledge is protected by the international community.”
Libraries, archives, galleries and museums are part of that community and are often seen as gatekeepers of knowledge. We should be pro-actively supporting, encouraging and enabling this outcome.

What is happening in Australia?

In Australia there are many examples of non-indigenous Australians inappropriately using indigenous knowledge and artworks without permission. Most recently this has revolved around sacred rock art.  In April 2009, the Australian Government adopted the UN’s Declaration on the Rights of Indigenous Peoples. It states that Indigenous people have the right to maintain, control, protect and develop cultural heritage, traditional knowledge and traditional cultural expressions including oral traditions, literature, designs, visual and performing arts. It also includes the right for Indigenous people to maintain, control, protect and develop their intellectual property over such cultural heritage, traditional knowledge and traditional cultural expressions. However we currently don’t have the infrastructure or systems in Australia to do this easily or effectively.

In April 2008 at the Australia 2020 Summit Terri Janke proposed the establishment of a National Indigenous Knowledge Centre (NIKC). This idea was followed through with a feasibility study into a NKIC which was submitted to FaHCSIA in October 2011. In 2009 Terri Janke wrote her own report ‘Beyond Guarding Ground: A vision for a National Indigenous Cultural Authority’. Both of these reports require legislation and a system to be established for the protection of indigenous intellectual and cultural knowledge. Part of this infrastructure could be a digital library database.
Some of the suggestions for the NKIC are that it would build on the existing role of the Australian Institute for Aboriginal and Torres Strait Islanders Studies (AIATSIS):
·         Become a reference point for Aboriginal and Torres Strait Islander culture.
·         Engage in research to harness traditional knowledge to support sustainable management of country.
·         Support the education and understanding of indigenous culture and affairs across Australia and preserve indigenous heritage.
·         Become a national gathering place for the celebration and discussion of indigenous culture in a physical or virtual sense.

The AIATIS response to the National Cultural Policy quite rightly questions if and how protecting indigenous knowledge will fit into the proposed NCP.
Further Reading:
Michael Davis, Indigenous Peoples and Intellectual Property Rights, 1996 Research Report

ATSILERN Protocols: Guidance for libraries, archives and information services in appropriate ways to interact with Aboriginal and Torres Strait Islander people, their culture and heritage.  
The Australian Wattle, photo by Rose Holley

Saturday, 11 February 2012

Crowdsourcing: more cool sites to give libraries, archives and museums inspiration

Many people know of my interest in the relevance and application of online digital crowdsourcing for libraries, archives and museums, due to an article I wrote in 2010 called ‘Crowdsourcing: how and why should libraries do it?’, and my initiation of the Australian Newspapers public text correction. People therefore often send me links to sites they think may interest me.  This is really great. Sometimes sites which are nothing to do with libraries or archives may give us ideas. There is a ‘List of Crowdsourcing Projects’ in Wikipedia (which is separate to the main article on crowdsourcing). This is a useful starting point to get an overview of the sorts of activities going on. It goes without saying that Wikipedia is of course the greatest crowdsourcing project ever!
In this post I wanted to mention some newish crowdsourcing projects that I have been looking at that interest me, and that I haven’t written about before. 
1.       Star Wars Uncut (SWU) Released August 2011
About the project:  In 2009, Casey Pugh a web developer asked thousands of Internet users to remake "Star Wars: A New Hope" into a fan film, 15 seconds at a time. Contributors were allowed to recreate scenes from Star Wars however they wanted.  Multiple submissions were submitted for each scene, and votes were held to determine which ones would be added to the final film. Although the scenes reflect the dialogue and imagery of the original film, each scene is created in a separate distinct style, such as live-action, animation and stop-motion.  Within just a few months SWU grew into a wild success. The creativity that poured into the project was unimaginable. SWU has been featured in documentaries, news features and conferences around the world for its unique appeal. In 2010 it won a Primetime Emmy for Outstanding Creative Achievement in Interactive Media. Now the crowdsourced project has been stitched together and put online in YouTube and Vimeo. The "Director's Cut" is a feature-length film that contains hand-picked scenes from the entire collection.

Relevance for libraries and archives:  In the world of film, TV and radio fans and consumers are the subject experts.  They not only have in-depth knowledge, but also have the motivation and interest to share their knowledge with others in creative ways. This project really shows that.  The fans apparently had no trouble identifying specific seconds in a very long film.  This type of knowledge and interest is really useful for librarians and archivists when you want to open up discovery of audio items.  It is much more likely that a fan will know which series, episode, minute and second a subject came up, or a thing was said than the librarian who created the catalogue record. The knowledge could be used to help with the discovery process.  At the moment most audio is still catalogued and described at item level for example “it’s an interview with x”. It is still a costly and difficult process to convert speech from audio into text, and to manually add subject tags.  Most of our historic audio collections do not have this level of discoverability. A crowdsourcing project which taps into the crowd to help make films more discoverable by use of public tags is ‘Waisda’.   We know that the public like to consume by watching and listening, but they also want to create and share. There is potential for crowdsourcing to improve accessibility of historic digitised audio especially that which has a fan base or is iconic.

2.       What’s on the Menu (New York Public Library) Launched April 2011.
About the project:  With approximately 40,000 menus dating from the 1840s to the present, The New York Public Library’s restaurant menu collection is one of the largest in the world, used by historians, chefs, novelists and everyday food enthusiasts. But the menus cannot be searched for specific information about the dishes and prices. To solve this problem the NYPL is appealing for the public to transcribe the menus, dish by dish. Doing this will enable the collection to be accessed and researched in new ways, opening the door to new kinds of discoveries. The site was launched in late April 2011 and the original aim was to transcribe the 9,000 menus photographed several years before for inclusion in the NYPL Digital Gallery.   Volunteers transcribed all of these in the first three months, so more items have been scanned from the collection and are now awaiting transcription. As of 5 February 2012, there have been 758,748 dishes transcribed from 12,167 menus. The ultimate goal is to get the whole collection transcribed and to turn it into a powerful research tool.  NYPL are also looking into partnering with other libraries and archives with menu collections.

Researchers who use the collection for example historians, chefs, nutritional scientists, and novelists, are looking for a juicy period detail. They often have very specific questions they’re trying to answer for example:

“Where were oysters served in 19th century New York and how did their varieties and cost change over time?”
 “When did apple pie first appear on a menu? What about pizza?”
“What was the price of a cup of coffee in 1907?”

To find out these sorts of things more easily, the text on the cards needs to be transcribed.  Quotes on their website about the usefulness of the project:
Rich Torrisi, New York Chef:

What’s on the Menu is a tremendous educational resource that breathes life into our city’s most beloved restaurants and dishes.  It has been an indispensable and hugely inspirational tool in the ongoing development of my restaurant…”

Mario Batali, New York Chef, Author, Entrepreneur:

“Menu writing is an art form seldom appreciated, In our restaurants, we put an incredible amount of time and thought into crafting menus. It’s remarkable to see menus being preserved and documented, for them to become a resource for future chefs, sociologists, historians and everyone who loves food.  It’s not just What’s on the Menu, it reveals so much more.”

Relevance for libraries and archives:  Libraries love to collect and keep stuff and that includes things like menu’s, tickets, pamphlets, posters, invitations, theatre programs and greeting cards. We call this stuff ‘ephemera’. Ephemera is a Greek word and it means printed matter that it is intended to be transitory, short lived, or only last a day.  When the item is created it is not intended that it will be retained or preserved.  However I haven’t encountered a single library that did not have a large ‘ephemera’ collection and intend to keep it long-term. The National Library of Australia is no exception and collects ephemera because it is “a record of Australian life and social customs, popular culture, national events, and issues of national concern”. There are 2.3 million items of ephemera in the collection at the NLA. Nearly 170,000 of them have been digitised and are browsable by title.
However their full potential has still not been unlocked.  Ephemera is printed on a few pages which usually contain both words and pictures.  When ephemera is digitised it is scanned or photographed as an image file, and therefore the text is not indexed or searchable.  It would be very hard to apply OCR on the text because of the varying and usually fancy typefaces used.  The only way to make the text searchable, thereby unlocking the full discoverability potential is to manually transcribe it.  Librarians don’t have time for this, but an interested public do.  Give them a really interesting or topical ephemera collection like the menu cards and watch them go!
3.       Historypin Launched July 2011
About the project:  Historypin was launched in July 2011.  It allows people to upload historic and contemporary photos, videos and sounds to a specific geo location on a map of the world.  Well it’s actually not just any map, it’s a Google map and this is likely to make all the difference. It’s a combination of a crowdsourcing project (they want organisations and individuals to load content), a useful educational site, and a service that libraries and archives can hook into to expose their content and collections to new audiences (similar to Flickr Commons).  I’ve seen quite a few sites like this before, but on a small scale for specific locations. For example Sydney Sidetracks was launched in 2008 by the ABC in partnership with The Dictionary of Sydney, The National Film and Sound Archive, The City of Sydney, The Powerhouse Museum, The State Library of New South Wales and the Museum of Contemporary Art. There is a website and mobile app from which historic images, videos and sound are available for locations in central Sydney overlaid on a map.
The big difference with Historypin is that it has been developed by ‘We Are What We Do’,( a not for profit organisation that creates ways for millions of people to do more small, good things) in partnership with Google. Google is the main technology partner on the project and has helped with Google tools, including Google Maps, Google Street View, Picasa, Google App Engine and Android. Google has supported the development costs of the project with donations and sponsorship.  It has also given marketing support and created the video to promote the service:  a one minute introduction to Historypin. This means this is not some small scale project that may suffer from lack of budget, development, maintenance or marketing.  It is something likely to be around for a while and perhaps rival Flickr Commons. Google says “We share ‘We Are What We Do’s commitment to Historypin as a non-commercial, collaborative project that delivers social impact and contributes to digital inclusion.”
The marketing blurb says “Historypin is a way for millions of people to come together, from across different generations, cultures and places, to share small glimpses of the past and to build up the huge story of human history through a well-known medium - picture.”
Relevance for libraries and archives: Interestingly although the initial crowd Historypin were trying to attract was the public to contribute their photos and stories, it now appears that the crowd may actually be the libraries and archives community. This community has massive amounts of digitised content in image, video and sound format, and they want it more widely exposed, tagged, and used.  A service in which libraries and archives can do this, which they don’t have to develop and support themselves, and has no geographical boundaries is certainly a drawcard.  Batch upload has already been enabled, as has ‘make your own collection’ and ‘view slideshow’.  You can pin your content on any Google Street View scene, in any country of the world.  If you happen to be somewhere that Street View hasn’t yet been – don’t worry you can still pin your content down. It is a service that will be more valuable the more content there is.  I only wonder if they have under-estimated the interest that libraries and archives will have in joining, and the volume of content they will have.  If so it is advisable to get in early in case there is a three year waiting list like Flickr Commons had when it started. This is a crowdsourcing project that has a direct relevance to libraries and archives, no matter what their size or where they are located.
TEDx video: Nick Stanhope on mapping history  

4.       Ancient Lives  – Decoding Papyri Launched July 2011
About the project: The Ancient Lives project presents you with fragments of 1,000-year-old papyri to decode. The papyrus was discovered by researchers from Oxford University over a century ago in Oxyrhynchus (the city of the long-nosed fish).
With about 100 men from the local village, Grenfell and Hunt dug in the high winds roaring across the desert. In early January of 1897 a papyrus containing the apocryphal Gospel of Thomas was unearthed, and then a fragment of St. Matthew’s Gospel. The flow of papyri began. Within a few years not only Thucydides and Plato were delicately pulled from the sand, but also Greek lyric poetry that had not been seen or read in about 1000 years. Further, the private documents of this vanished city were collected en masse: private letters, accounts, wills, marriage certificates, land leases, etc. Ancient garbage became a modern treasure. By 1907 the digging ceased. 700 boxes of papyri, potentially carrying about 500,000 fragments, made the long journey back to Oxford University, where Grenfell and Hunt opened up a new branch of study: papyrology. A little over a century later, only a small percentage has been translated by scholars. The Oxyrhynchus collection is owned and overseen by the Egypt Exploration Society.”
The papyrus can be decoded easily by volunteers who match known characters from a grid to the unknown characters on the fragment.  Fragments can be matched by adding measurements of the fragments and the columns within them. The task is mammoth and before the arrival of the online tool could only be undertaken by scholars who were familiar with the code. A very difficult task has been effectively simplified, whilst retaining the challenge that is found in crosswords or code-breaking.
The project was launched in July 2011 and is part of the the Citizen Science Alliance, which is a transatlantic collaboration of universities and museums who are dedicated to involving everyone in the process of science. Growing out of the wildly successful Galaxy Zoo project, it builds and maintains the Zooniverse network of crowdsourcing projects, of which Ancient Lives is one of the newest. Nearly half a million people are contributing to the Zooniverse crowdsourcing projects.
Relevance for libraries and archives: This is a good example of a task that appears on the surface to be too difficult and extensive for a crowd to undertake.  By clever breaking down of the task and designing a simple user interface it becomes achievable.  It also demonstrates that private information about people is of eternal interest to the public. This project along with all the other Zooniverse projects has extensive public discussion forums to firstly foster the volunteer community and secondly let them know how their work helps new discoveries and knowledge grow and develop. We can learn much from how Zooniverse treats its volunteer community.
5.       Duolingo -  translate the web and learn a new language Launched November 2011
About the project: Luis Von Ahn of the Carnegie Mellon University is the creator of CAPTCHA and reCAPTCHA. Google bought both and reCAPTCHA has effectively helped Google Books improve the OCR in its digitised books word by word. Each year 750 million people are unwittingly converting the equivalent of 2.5 million books by using reCAPTCHA.  This is a crowdsourcing project where people don’t realise they are in a crowd or what they are doing. Luis is now working on a new project: Duolingo.  Luis says “Before the internet the biggest projects had 100,000 people involved and with that you could for example put a man on the moon.  My question is what can you achieve with the internet when you can have 100 million people working together on something?”  A good question.  Especially when you combine the number of people with all that ‘cognitive surplus’ that Clay Shirky is always talking about.
Duolingo will help people learn a new language and simultaneously (unwittingly) translate the Web.  He says “It is estimated that there are over 1 billion people learning a foreign language at any given time”. OK so this means a big potential crowd. The Google translator tool is quite good at translating websites but not as good as he thinks the new project Duolingo will be.  The site went live in beta mode in November 2011, but only a few road testers have been accepted.  There is a waiting list of 100,000 who want to join the site already. Luis says “Duolingo is a 100% free language learning site in which people learn by helping to translate the Web. That is, they learn by doing.” The difference to reCAPTCHA is that people will know what they are doing and consciously want to do it. Watch this space.
Relevance for libraries and archives: I’m not sure what the relevance for libraries and archives will be.  Although reCAPTCHA is a free program that is obviously very relevant for libraries and archives it has only been utilised by commercial companies so far, namely the New York Times historic newspaper archive and Google Books. No library has utilised it. I thought I should mention the new project Duolingo since the potential also seems big.  It’s a good idea to translate the web, but I also like the idea of something Luis didn’t mention which is translating books and newspapers into different languages. A question that the National Library of Australia was thinking about last week was “will our volunteer newspaper text correctors be as keen to correct Australian newspapers in foreign languages as they are the English ones? Will they correct them even if they don’t speak the language?” We are asking this because we will soon be adding Australian newspapers in foreign languages to Trove. If this content is classed as ‘part of the web waiting to be translated’, then I guess Duolingo holds big relevance for all national libraries. Duolingo is at an early stage of development so we will have to wait and see. That is unless libraries want to be really pro-active and actually make suggestions to the development team for things that would help them make their content more widely accessible and used……
The TEDx video:  Luis talking on CAPTCHA, reCAPTCHA and Duo-lingo

I hope you find some inspiration from these five crowdsourcing sites for your library, archive or museum.  If there is a newish site of relevance to libraries and archives that you think I’ve missed please add a comment to this post and share. Crowdsourcing sites I have previously reviewed are:
·         Picture Australia (National Library of Australia)
·         FamilySearchIndexing (Church of Latter Day Saints)
·         Distributed Proofreaders (contributes to Project Gutenberg)
·         Wikipedia  
·         UK MP's Expenses (The Guardian)
·         Galaxy Zoo  (Citizen Science Alliance)
·         BBC WorldWar2 Peoples War (BBC)
·         Digitalkoot (National Library of Finland)
·         Old Weather (National Maritime Museum and Citizen Science Alliance)
·         Remember Me: Displaced Children of the Holocaust (United States Holocaust Memorial Museum)
·         Trove Australian Newspapers (National Library of Australia)
·         Transcribe Bentham (University College of London)
·         Waisda (Netherlands Institute for Sound and Vision)

Read more - related posts by Rose Holley on crowdsourcing:
·         Gold star to text correctors for e-books, 13 December 2011
·         Software for journal and newspaper text correction, 18 December 2011
·         Digital cultural heritage awards for crowdsourcing, 4 February 2012

In March 2011 images of the digitised Australian Women's Weekly 1932- 1984 were projected onto the National Library of Australia building as part of the ‘Enlighten’ Festival in Canberra. Nearly 395,000 articles from the Australian Women's Weekly can be improved by public text correction in Trove.  Photograph by Paul Hagon.