Thursday 26 October 2017

National Digital Library of India

I was recently invited to travel to UNESCO HQ in New Delhi, India to give a keynote presentation at the UNESCO-National Digital Library of India International Workshop  ‘Knowledge Engineering for Digital Library Design’.  A small group of professional international experts had been invited to share their knowledge in the area of their expertise, mine being crowdsourcing in libraries, newspaper text correction and user led digital library design.

The Government of India along with the Ministry of Human Resource Development and the Indian Institute of Technology Kharagpur (IITK) are working on a project to develop the National Digital Library (NDL).  This is going to be an important part of their national academic infrastructure and is being led by Professor Partha Chakrabarti from IITK.  The question they really wanted me to answer for Trove and Australian Newspapers was “if you were doing it all again, starting now, what would you do differently”? This is so they can apply the knowledge that Australia learnt in the Indian Digital Library project.

Unlike other countries India has a growing young, rather than old population and they are not able to build and expand their universities in a timely way to meet educational demand.  Therefore the government are seeing the National Digital Library as an academic network for individual and community based learning. It mainly contains books and courses and they will shortly develop the module to deliver historic newspapers.  Academic libraries are also having their subscription resources harvested into it, since it is intended to be the backbone of the academic learning network.  It shares some similarities with the Australian equivalent Trove. This interesting explains the National Digital Library of India.


I was really pleased to find out that India are now utilising their technology expertise for themselves in this way.  If it was not for India we would not have Trove and Australian Newspapers.  The National Library of Australia has been sending the more technical workflow aspects of the historic newspaper digitisation out to India contractors for the last ten years. In 2008 I was overseeing this and I had the pleasure to visit the digitisation facilities and meet the hundreds of staff in Hyderabad, Chennai, and Delhi, as well as some of the call centres and technology companies in Kolcatta.  It was an eye opening experience for me to see such cutting edge technology and bright young people filled with hope and career aspirations, alongside with such extreme poverty and slums.  India is an experience that lets you see the whole of humanity in a single day, which can be quite overwhelming.

UNSW has just started to make a series of targeted and highly strategic investments in developing transformative partnerships in India. India represents a major priority for UNSW as part of its 2025 Strategy under the Global Impact pillar.  Building successful research and knowledge exchange partnerships in India will be key to the success of the UNSW India Strategy. India is a growing source of innovation and is home to some of the world’s most dynamic and innovative companies who are at the forefront of digital disruption, social enterprise and inclusive development. India’s research system is also growing as the Government of India considers its investment and capacity building strategy in higher education and research.

This week for Diwali the UNSW campus is being transformed and ‘The Festival of India 2017’ will be a stimulating, event-packed week that celebrates and promotes Australia’s partnership and friendship with the rapidly emerging global powerhouse – India. As the campus grounds transform into a little India – this unique festival will showcase not only the country’s rich, cultural offerings but also its ground breaking developments in innovation, finance, scientific research and economic growth.  In November there will be an inaugural Research Roadshow:


  • To enable UNSW researchers to travel to India to make new connections and/or strengthen existing relationships;
  • To showcase UNSW’s capabilities, especially in the following research areas: smart cities, energy, water, climate, health and social enterprise sectors to prospective Indian partners.
  • To initiate and nurture strategic industry partnerships that will lead to knowledge exchange outcomes.


Expected outcomes will be the identification or consolidation of opportunities that will lead to future collaborative research partnerships with academic, industry and government partners.

In the meantime as I contemplated preparing my presentation I unfortunately was one of the many Australian’s struck down with the virulent strain of influenza in the recent Australia wide outbreak of flu.  As the time approached to travel I realised I really was still not well enough.  This resulted in me asking the Creative Media Unit at UNSW Canberra for help. John Carroll used his wonderful skill and technologies to create a 40 minute video of my presentation which was delivered to New Delhi on video screen, link below.

I discuss the findings of my nine years of research into crowdsourcing based curation in libraries.  Using the digitised historic Australian Newspapers as an example, I look at how the functionality and interface was developed in close relationship with the users,  and how this led on to text correction of newspaper articles. It is nearly ten years since this pioneering project began and the motivations and achievements of the 50,000 volunteers are examined over this time. I question how successfully the goal of improving text quality and therefore search has been achieved?  I propose that if a similar project was begun now then artificial intelligence software would be used such as OverProof post OCR correction tool to improve the quality of the text.  OverProof has been trained on the manual corrections of the Australian newspaper corpus and trials demonstrate it is able to dramatically improve the quality of the corpus. Volunteer text correction could still continue afterwards for difficult text but the software would do the main donkey work, allowing users to have a better quality search.



The PowerPoint is on my slideshare account.