Monday, January 27, 2014

Crowdsourcing and online archives

In honor of the World War I centenary, the British National Archives have opened up a great new database - The British Army War Diaries 1914-1922.

 

This database contains over 1.5 million pages of war diaries and over the next few years the archives will be working on digitizing more appeal tribunals and service records of the Household Cavalry. The Archives currently has thousands of appeals against conscription, POW interviews, nurses' service records, Women's Army Auxiliary Corps service records, images, and Durham Home Guard records (WWII) digitized and searchable on their website.  

In and of itself, the database is a fantastic resource for researchers, but they're making it better by tagging the 1.5 million papers with names, dates, locations, happenings, and more. This undertaking would take years and millions of dollars (well, pounds) with an archive team working on it, but they're already almost half-way through because of crowdsourcing. Crowdsourcing, as defined by Wikipedia is "the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers." Computers can easily read typed letters in an image - like when you have a pdf of a document - however, they have a hard time accurately reading handwriting. Humans, unlike computers, can read handwriting and use context to figure out most words that aren't immediately legible. To safeguard against bad transcriptions, they have many people work on the same piece. If the same tag or transcription is used by multiple users for a document, only then will those tags and transcriptions be officially added to the record.

The main organization putting together scientific and historic crowdsourcing is Zooniverse. They have twenty different projects currently going on. If you aren't interested in WWI history, maybe you'd like to analyze cancer data? hear whales communicate? match black holes to their jets? or explore the surface of the moon? 


Zooniverse offers a lot of different projects, all with an easy-to-use interface. I highly recommend taking a look at their offerings - zooniverse.org.

Crowdsourcing doesn't have to work towards just processing data for researchers, it has also been used to directly help people. After typhoon Haiyan, organizations used crowdsourcing to determine areas of greatest destruction using volunteers from 82 countries to create and compare before and after maps. These maps also helped volunteers on the ground to get to areas of greatest need. See this article for more details on their efforts. The website fold.it managed to harness gaming as a method of crowdsourcing. In 10 days, gamers solved a mystery concerning the structure of an enzyme that plays a key part in the spread of AIDS which had eluded scientists for years. The Leo Tolstoy Museum was able to proofread 46,800 pages in two weeks. These were then digitized to be used by fans worldwide.

I have worked on a few crowdsourcing projects. They are fun because you get to look at historical documents, the solar system, far-away countries, etc, and everyone's small contribution adds up to a huge advance in knowledge. No matter what you're interested in, I'm sure there's a way you can contribute through crowdsourcing.

Get started by taking a look at the following sites:
zooniverse.org
scistarter.com
birds.cornell.edu/citsci/ 
National Geographic's Fieldscope Project

No comments:

Post a Comment