Spam software protects old books




Software used to fight spammers is now helping university researchers preserve old books and manuscripts.

According to the BBC an automated test, called CAPTCHA, to tell computers and humans apart when signing up to an account or logging in, is being used by libraries to scan in millions of books into its data base. Books are so old that OCR software is unable to read about one in 10 words, due to the poor quality of the original documents.

To fix this problem the the team takes images of the words which the OCR software can't read, and uses them as CAPTCHAs. These are distributed to websites around the world to be used in place of conventional CAPTCHAs.

When visitors decipher the reCAPTCHAs to gain access to the web site, the answers are sent back to CMU. The system is helping to decipher about one million words every day for CMU's book archiving project.

