Posts tagged ‘ReCAPTCHA’

April 3, 2011

reCAPTCHA definition and history

reCAPTCHA example

reCAPTCHA and OCR for digitization projects

What does a CAPTCHA do?

Humans can read the distorted text in CAPTCHA challenges* but current computer programs cannot.

A CAPTCHA is a program that protects websites against bots by generating and grading tests that humans can pass but current computer programs cannot.

What does CAPTCHA mean?

CAPTCHA is an acronym for Completely Automated Public Turing Test To Tell Computers and Humans Apart. It was coined in 2000 by Carnegie Mellon University computer science research staff who invented CAPTCHA originally.

What is the difference between CAPTCHA and reCAPTCHA?

This is how the reCAPTCHA Project explains the difference:

ReCAPTCHA helps prevent automated abuse of your site (such as comment spam or bogus registrations) by using a CAPTCHA to ensure that only humans perform certain actions.

Generally a CAPTCHA is a single word, whereas a ReCAPTCHA is two words. The reCAPTCHA project page explains this in greater detail. There are research papers, in *.pdf format available for download on the Google ReCAPTCHA website.

Google purchased CAPTCHA in 2009 and describes usage and further background on reCAPTCHA FAQs:

ReCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old-time radio shows.

ReCAPTCHA is free

While free to use, including the API, be aware that ReCAPTCHA is not open source software.

Other uses

ReCAPTCHA is best known for historic text digitization and spam filtering, which is an information security measure.

Answers to reCAPTCHA challenges are used to digitize textual documents… a combination of multiple OCR programs, probabilistic language models, and the answers from millions of humans on the internet, reCAPTCHA is able to achieve over 99.5% transcription accuracy at the word level….

OCR is an acronym. It means Optical Character Recognition. Compare the accuracy of standard OCR versus reCAPTCHA transcriptions of a medium quality scanned document on the reCAPTCHA digitization accuracy website. See some humorous reCAPTCHA examples from the official Google reCAPTCHA blog. Google announced an audio version of reCAPTCHA in 2009.

MailHide is another application, where potential for spam is reduced by requiring a reCAPTCHA challenge in order to disclose an otherwise partially obscured email address. More details are available in my post about MailHide from last month.

Recent developments

Recent research in the area of computer security led to some surprising discoveries about CAPTCHA and spam. Initially, it appeared that the CAPTCHA challenge had been defeated on a large scale, but localized very regionally. That was not true though. Human interaction of an unanticipated sort was still required to evade the CAPTCHA, on each and every spam comment and email that got through.

*Work continues on the original CAPTCHA project.

March 13, 2011

Mailhide

If you’ve ever looked at an open-source development project hosted by Google servers, usually on  http://code.google.com sites, Mailhide will be familiar. It is a less well-known application of the reCAPTCHA detection challenge.

reCAPTCHA now owned by Google

reCAPTCHA Turing test

Mailhide conceals part of an email address

This is how it prevents spammers from accessing email addresses using automated programs. Typically, the first few letters, or numbers, of the username part of the email is visible, followed by an ellipsis i.e. three dots, and then the domain name.

Most Google employees* use Mailhide. Mailhide is offered as an option to developers using Google Code sites.

Mailhide type functionality is also offered by Slashdot for user accounts. Slashdot is not necessarily using Google reCAPTCHA for encryption, however. There are other Turing tests besides reCAPTCHA.

reCAPTCHA is a Google product. It was not developed by Google, though. Google purchased the reCAPTCHA algorithm from Carnegie-Mellon University a few years ago, in 2008.

reCAPTCHA Mailhide API

Are you running a web application that lists users’ email addresses? Do your users a favor by shielding them from spam with reCAPTCHA Mailhide.

Google will give you an API (cryptographic) key. Use it to encrypt user email addresses. Google supplies full documentation for the Mailhide protocol. Everything is free of charge.

I am uncertain whether API restrictions on usage apply. That is a familiar restriction for applications developers relying on the Twitter API. It should not be a binding constraint in this case, as Mailhide is far less transactional that Twitter. Unless one is very, very popular!

reCAPTCHA comes in many flavors!

Libraries are available for PHP, Perl, Ruby and Python programs.

*Google employee accounts in the U.S.A., and many but not all other countries, have the format  userid@google.com.  Non-employee Google mail accounts are  userid@gmail.com.

 

October 27, 2010

reCAPTCHA

Another Google product. Wish I could grab a screen shot of the site. There are no identifying marks indicating a Google product other that the copyright 2010 at the bottom of the page:

reCAPTCHA is a free anti-bot that helps digitize books

via reCAPTCHA: Stop Spam, Read Books.

Tags: ,
Follow

Get every new post delivered to your Inbox.