Rose Holley's Blog - views and news on digital libraries and archives: August 2012

Since I was attending the International Congress of Archives (ICA 2012) in Brisbane last week I had the opportunity to visit the Queensland Art Gallery which is a stone’s throw from the Brisbane Convention Centre. As an added bonus all conference attendees got a discounted entry to the ‘Portrait of Spain’ Exhibition, which has 100 paintings from the Prado, Madrid on show.

I was surprised to see that visitors were encouraged to get their iphones out at the start of the tour. The primary reason was to take a photo of yourself against a backdrop of a Prado gallery, so that you could pretend to friends you had actually been to the Prado. The ticket collector obliged and took my photo:

As I approached the first painting a security guard then warned me that photos were not allowed, with or without flash, and I should put my iphone away. At the next painting when I commented to another visitor how little description was provided beside paintings another security guard overheard and said ‘oh you can use your iphone’. Now somewhat perplexed I asked again if I could take a photo. ‘No, but you can scan the QR codes beside the paintings which tell you more information about them’. [If you don't know what QR codes are and how galleries and museums use them read this short explanation].

I couldn’t be bothered with that. That is, until I got to a breathtaking painting that had absolutely no explanation about why it took your breathe away. The painting appeared to be of a man (with hairy forearms, moustache, thick neck) but dressed in formal women’s court clothing with a bust. The description had no mention about this surprising phenomena. At that point I got my iphone out and checked the QR code. Disappointingly still no info on the surprise, simply that the artist had ‘a good eye for detail and had painted the hair well. The woman was graceless and had a less than feminine appearance’. I made a note of the painting to look it up afterwards and see if anyone else had more information on the person in it that they were willing to candidly share. Perhaps a Wikipedia entry? The painting is called Senora de Delicado de Imaz by Vincent Lopez Portana from the Spanish Court of 1833.

On leaving the exhibition I felt in dire need of a cup of tea so headed towards what I thought was the café, only to be blocked by a guard who demanded to see my entry ticket (yes there are a lot of guards). I queried why I should need to show my entry ticket to partake of tea and was told that this café was themed with Spanish food (?). I was about to turn away when she also mentioned that the photo booths were included in the ticket price. Being a sucker for a photo I couldn’t resist so passed through the checkpoint having no idea what the photo booth would do.

Well – talk about surprise!! You are absolutely not allowed to take a photo of a painting in an art gallery. The answer usually given is because of copyright or mis-use or inappropriate use of images. Clearly the images in the Prado exhibition are out of copyright dating from 1500-1800. Images of them are sold in the gift shop. However the photo-booth allowed me to stick my own face into a selection of the Prado portraits (in a similar way to sticking your head through a funfair cardboard cut out of Popeye and Olive) and create my own digital image. After all the high brow gallery poppycock I have heard over the years about galleries digitising images and rules they have made up around this, I was staggered and thrilled to be able to do this fun activity (which I think is probably aimed at children!). It was the most fun I have had for a while and my first foray into what is commonly called ‘digital vandalism’ or mis-appropriate use of digital artworks.

It made me recall the battle that Wikipedia had with art galleries and use of images over a long duration. On one particular occasion Liam Wyatt the VP of Wikimedia Australia spoke about how the public were not allowed to take or share photos of artworks for invalid reasons of copyright ownership or inappropriate or commercial use of images, but then the galleries or museums in question would use the same images themselves to make things like ties or mugs for a profit in the gift shop. This was such a case exactly, but more extreme than any I have yet seen. In case you are in doubt – I am endorsing the activity offered in the photo booths at Queensland Art Gallery. The e-mail I received with my bastardised portrait also had an animated version… where things like my hand and head moved – crikey! The experience led me to formally write to Queensland Art Gallery (via their online form) and ask them why if I can do this I was not allowed to take a photo without flash of the same painting in the exhibition? That was 7 days ago and I still have not had a reply….

Incidentally I cannot find a Wikipedia entry for the portrait of Senora de Delicado de Imaz, or anything about her life and circumstance which I am sure is most interesting. I could also not find a really good digital print online that matched the real life experience of seeing the painting. I did manage however to take a photo of the painting myself via the photo booth. For some strange reason the photo booth kept offering me this painting as the perfect match to put my face into.

However I preferred to go with a much more regal match.

Me with my head digitally stuck into the portrait of La infanta Isabel Clara Eugenia Magdalena Ruiz by Alonso Sanchez Coello 1588.
The animation had me fiddling with my minature and the monkeys.....

Last week I attended the International Congress of Archives (ICA 2012) which was held in Brisbane. Over 1,000 Archivists from 93 countries attended.

The much anticipated opening keynote on the first day was given by David Ferriero

head of US National Archives. He is the first librarian to become a National Archivist, previously being in charge of New York Public Library and known for promoting use of social media and relationships with Google and Wikipedia. His talk was called ‘A world of social media’. I was looking forward to hearing what the US National Archives are doing with social media and crowdsourcing. People were generally of the opinion that this organisation will/is leading by example in this field.

David Ferriero took to the stage and took us by surprise. He only used 20 minutes of his 40 minute slot, gave no presentation, instead reading from his notes at breakneck speed and bombarding us with statistics that were largely out of context. At the end he took no questions and dashed off the stage. He left a surprised and bewildered audience behind. I for one was immensely disappointed not to see and hear more about some of the exciting US Archives activities. He of course may have had mitigating circumstances that I am totally unaware of. He did however give small tasters of what his organisation is doing. There was brief mention of large scale crowdsourcing on unspecified projects, a citizen archivists dashboard, and a relationship with Wikipedia which peaked my interest.

So I decided to follow up online and find out for myself what may be happening at NARA. I took me quite some time to search the internet and blogs and get the information I had hoped David would give in his keynote, but it was worth it. Here is what I found:

1. Citizen Archivist Dashboard Webpage http://www.archives.gov/citizen-archivist/

In January 2012 the US National Archives launched the Citizen Archivist Dashboard. This is a great webpage bringing all the online and physical social engagement and crowdsourcing activities together. It is easy for someone to see what options they may have to help the US National Archives. It is very clearly designed and I like it a lot.

2. Transcription Projects

US National Archives Tool http://transcribe.archives.gov/

There are two transcription projects going on for handwritten records. Firstly the National Archives Transcription Pilot Project. It appears still to be in ‘pilot’ mode (started in January 2012) since only 300 documents (about 1,000 pages) are available for transcription. They have been very carefully selected from a collection of billions of pages and graded by colour codes according to how difficult the handwriting is to read. This pre-selection must have taken very valuable staff time. You can browse or search by difficulty of transcription, year, and the status of transcription: “Not Yet Started,” “Partially Transcribed,” and “Completed.” You then choose a page to work on and then that page is blocked to other users, so it’s not being edited by multiple users at the same time. The interface is very simple, much like the Australian Newspapers. In a free text box beside the image you can transcribe what you see. No login is required, though you do have to complete a captcha.

The missing part is that I can’t see how many people have transcribed what. It’s not clear if the documents disappear from here when fully transcribed, and how and where they become full text searchable in the collection. It also seems to be a time consuming process for NARA staff to do the pre-selection and difficulty rating of the documents. This is of course a very small pilot and hopefully lessons will be learnt and the site will be developed further to reach it’s full potential. Also it would be good if more documents became available for transcription. This is one of the easiest handwritten transcription tools I have seen. I could not find any information about who developed the tool and if it is available open source.

Interestingly David Ferriero says that many US school children are no longer taught cursive handwriting and therefore cannot read handwriting. He says ‘Help us transcribe records and guarantee that school children can make use of our documents’. I’m not quite clear if he thinks this is a potential crowdsourcing exercise for school children to learn handwriting and become better educated, or if adults are supposed to do it so that school children can just read the finished text.

Wikisource Tool for Wikiproject NARA. http://en.wikisource.org/wiki/Wikisource:WikiProject_NARA

The National Archives have developed a relationship with the Wikipedia Community and currently have a Wikipedian in residence. As part of that program they have shared some primary handwritten national documents into ‘Wikisource’ for transcription via the Wikisource Tool. These documents are mostly at the beginner level in terms of difficulty. I’m not clear if they are the same ones in being used in the Archives own pilot, or different documents. I’m also not clear why they are piloting two different methods for transcription, or what the initial results are compared to each other. Wikisource offers more than transcription however, Wikipedians (if they can get access to original documents or copies) can also scan documents and OCR them.

3. Scanning Projects

Scanathons

For reasons I don’t understand the US National Archives has only digitised 750,000 of its 40 million images. This is a very low figure for an organisation like this. They seem to be focusing quite a lot of effort on getting physical volunteers to come in person to the Archives to digitise/scan images for them at ‘Scanathons’. This started in 2011. In January 2012 there was a 4 day Wikipedia ExtravaSCANza. Over the 4 days a group of Wikipedians met in the Still Pictures Research Room and scanned 500 images on desktop scanners. Each day there was a theme: NASA, women’s history, Chile, and battleships.

Photograph it yourself http://www.flickr.com/groups/citizenarchivist/

NARA encourages readers to take their own photos of records in the reading rooms and upload them to a special group in Flickr. The important thing here is that they should also be described with title, series, and record group if possible so they can be found. So far only 20 people have joined the group and 133 photos have been uploaded (most of these by the same person). I’m not clear how NARA intends to link these digital images back to the item descriptions in their collections but this is a great idea to tackle large scale digitisation of images.

4. Tag it Tuesdays http://blogs.archives.gov/online-public-access/?cat=260

The tagging facility, unlike the other pilots seems to me to be unlikely to succeed in its objectives. This is perhaps because of the tight controls that have been placed around it and the isolation of the activity from normal search and browse behaviour. Whilst anyone can easily transcribe a record without needing to login the process for tagging is difficult.

The activity is focused on Tuesdays and themed around a topic. Records for the topic are pre-selected by the Archives and available in an online group e.g. Elvis, Titanic. Volunteers must register and follow a set of guidelines; Tags will be reviewed by NARA staff before being accepted and going live on the database. I looked at the topics and it was unclear to me why if the Archives had already identified the items as being about Elvis they couldn’t simply generate an automatic tag for ‘Elvis’. In my opinion tagging is not actually a crowdsourcing activity because individuals are motivated to add tags to help themselves find things, it is a by product of search. Research shows it is rare for users to have concensus on tag terms and use. Crowdsourcing activities achieve a big clear goal that could not be achieved by individuals alone, and everyone in the crowd should be aware of how they are helping the ultimate goal.

5. Indexing the 1940 Census

On April 2, 2012, NARA released the digital images of the 1940 United States Federal Census after a 72 year embargo. The census images will be uploaded and made available on Archives.com, FindMyPast.com, National Archives, ProQuest, and FamilySearch.org. The entire 1940 census data will be indexed by a community of volunteers and made available for free. The free index of the census records and corresponding images will be available to the public for perpetuity.

6. Useful Links

I found a recent presentation given this year by Pamela Wright – Chief Digital Access Strategist at NARA which gives screenshots of what I have talked about above. ‘From access to engagement’

7. Social Media

NARA are active users of social media channels and they have started to monitor their activity. The Social media statistics from NARA May 2012 may be interesting reading for some.

I would be interested in reading more presentations or articles about the citizen archivist pilot projects from NARA and finding out what they have achieved and learnt so far. I hope this information is made available to the archives and library community soon. Please reply in comment if you have any more information on the pilot activities.

Rose Holley's Blog - views and news on digital libraries and archives

Sunday, 26 August 2012

Digital vandalism or just good fun? The Prado comes to Brisbane

Saturday, 25 August 2012

Crowdsourcing and Social Media at US National Archives (NARA). The Citizen Archivist Dashboard