Software used to
fight spammers is now helping university researchers preserve old books and
manuscripts.
According to the BBC an automated test, called CAPTCHA, to
tell computers and humans apart when signing up to an account or logging in,
is being used by libraries to scan in millions of books into its data
base. Books are so old that OCR software is unable to read about one in
10 words, due to the poor quality of the original documents.
To fix
this problem the the team takes images of the words which the OCR software
can't read, and uses them as CAPTCHAs. These are distributed to websites
around the world to be used in place of conventional CAPTCHAs.
When
visitors decipher the reCAPTCHAs to gain access to the web site, the answers
are sent back to CMU. The system is helping to decipher about one million
words every day for CMU's book archiving project.
More here.