Generating CAPTCHA images

Although this is not strictly data visualization in usual terms, the ability to generate images using Python comes in handy in many cases, and this is one of them.

In this recipe, we will be covering the generation of random images to tell humans and computers apart—CAPTCHA image.

Getting ready

CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart, and is trademarked by Carnegie Mellon University. This test is used to challenge computer programs (usually referred to as bots) that automatically fill various web forms that are primarily targeted at humans and that should not be automated. Usual examples are sign-up forms, login forms, surveys, and similar.

CAPTCHA itself can take various forms, but the most common form consists of a challenge where a human should read an image with distorted characters and numbers and type in the result in the related response field.

In this recipe, you will learn how to harness Python's Imaging Library to generate images, render lines and points, and also render text.

How to do it...

We will show you what is involved in creating a personal and simple CAPTCHA generator by performing the following steps:

  1. Define size, text, font size, background color, and CAPTCHA length.
  2. Pick random characters from the English alphabet.
  3. Draw those on the image using defined font and colors.
  4. Add some noise in the form of lines and arcs.
  5. Return the image object to the caller together with the CAPTCHA challenge.
  6. Show the generated image to the user.

The following code shows how to create a personal and simple CAPTCHA generator:

from PIL import Image, ImageDraw, ImageFont
import random
import string


class SimpleCaptchaException(Exception):
    pass


class SimpleCaptcha(object):
    def __init__(self, length=5, size=(200, 100), fontsize=36,
                 random_text=None, random_bgcolor=None):
        self.size = size
        self.text = "CAPTCHA"
        self.fontsize = fontsize
        self.bgcolor = 255
        self.length = length

        self.image = None  # current captcha image

        if random_text:
            self.text = self._random_text()

        if not self.text:
            raise SimpleCaptchaException("Field text must not be empty.")

        if not self.size:
            raise SimpleCaptchaException("Size must not be empty.")

        if not self.fontsize:
            raise SimpleCaptchaException("Font size must be defined.")

        if random_bgcolor:
            self.bgcolor = self._random_color()

    def _center_coords(self, draw, font):
        width, height = draw.textsize(self.text, font)
        xy = (self.size[0] - width) / 2., (self.size[1] - height) / 2.
        return xy

    def _add_noise_dots(self, draw):
        size = self.image.size
        for _ in range(int(size[0] * size[1] * 0.1)):
            draw.point((random.randint(0, size[0]),
                        random.randint(0, size[1])),
                        fill="white")
        return draw

    def _add_noise_lines(self, draw):
        size = self.image.size
        for _ in range(8):
            width = random.randint(1, 2)
            start = (0, random.randint(0, size[1] - 1))
            end = (size[0], random.randint(0,size[1]-1))
            draw.line([start, end], fill="white", width=width)            
        for _ in range(8):
            start = (-50, -50)
            end = (size[0] + 10, random.randint(0, size[1]+10))
            draw.arc(start + end, 0, 360, fill="white")
        return draw

    def get_captcha(self, size=None, text=None, bgcolor=None):
        if text is not None:
            self.text = text
        if size is not None:
            self.size = size
        if bgcolor is not None:
            self.bgcolor = bgcolor

        self.image = Image.new('RGB', self.size, self.bgcolor)
        # Note that the font file must be present
        # or point to your OS's system font 
        # Ex. on Mac the path should be '/Library/Fonts/Tahoma.ttf'
        font = ImageFont.truetype('fonts/Vera.ttf', self.fontsize)
        draw = ImageDraw.Draw(self.image)
        xy = self._center_coords(draw, font)
        draw.text(xy=xy, text=self.text, font=font)
        
        # Add some dot noise
        draw = self._add_noise_dots(draw)
        
        # Add some random lines
        draw = self._add_noise_lines(draw)

        self.image.show()
        return self.image, self.text


    def _random_text(self):
        letters = string.ascii_lowercase + string.ascii_uppercase
        random_text = ""
        for _ in range(self.length):
            random_text += random.choice(letters)
        return random_text

    def _random_color(self):
        r = random.randint(0, 255)
        g = random.randint(0, 255)
        b = random.randint(0, 255)
        return (r, g, b)
if __name__ == "__main__":
    sc = SimpleCaptcha(length=7, fontsize=36, random_text=True, random_bgcolor=True)
    sc.get_captcha()

This produces an image similar to the following:

How to do it...

How it works...

This example shows a process for using Python's imaging library to generate predefined images, to create a simple, yet effective, CAPTCHA generator.

We wrapped the functionality into one class SimpleCaptcha, because it gives us a safe space for future development. We also created a custom SimpleCaptchaException to accommodate future exception hierarchies.

Tip

If you are writing anything more than simple and quick scripts, it is always good to start writing and designing custom exception hierarchies for your domain, rather than using generic Python's standard exceptions. You will gain a lot in the readability and maintenance of the software.

Start reading from the main section. At the end of the code listing, we instantiate class giving settings of our future image as arguments to the constructor. Following that, we call the get_captcha method on the sc object. For this recipe's purposes, get_captcha shows the image object as a result, but we also return the image object to the potential caller of this method so it could make use of the result. The usage can vary; the caller could either save the image on the file, or if this was a web application, return the image stream and written challenge to the client requesting this CAPTCHA.

The important thing to note is that in order to finish the challenge-response process of the CAPTCHA test, we must return the CAPTCHA string generated on the image as text so that the caller can compare the user's response with the expected values.

The get_captcha method first verifies the input arguments, in order to override the class defaults if the user provides custom values. After that, a new image object is instantiated by Image.new. This object is saved in self.image, where we use it to draw and write text. Having written the text to the image, we add the noise of randomly placed points and lines, as well as some arc segments.

These tasks are carried out by the _add_noise_points and _add_noise_lines methods. The first one loops a few times and adds a point to a random location on the image, not too close to the edges of the image, and the latter one draws lines from the left-hand side of the image to the right-hand side of the image.

There's more...

We constructed this class using some assumptions about its use. We assumed that the user will just want to accept our default settings (that is, a random seven characters on a random background color) and receive the result from it. That is the reasoning behind placing helper functions in the constructor to set random text and random background color. If the most frequent and effective usage is to always override configuration, then we want to remove these operations from the constructor and place them in separate calls.

For example, maybe a user wants to always use English words as the CAPTCHA challenge. If this is the case, we want to be able to just call a method to provide us with results like that. This method could be get_english_captcha and with the random logic of this constructor, we would then construct that method to pick random words from the provided English dictionary. On a Unix system, there is a common English dictionary inside /usr/share/dict/words that we could use for this:

def get_english_captcha(self):
    words = '/usr/share/dict/words'
    with open(words, 'r') as wf:
        words = wf.readlines()
        aword = random.choice(words)
        aword = aword.strip()  # remove newline and spaces
    return self.get_captcha(text=aword)

Overall, the example of the CAPTCHA generation is not production quality and should not be used without adding more protection and randomness, such as letter rotation.

If you need to protect your web forms from bots, there are already third-party Python modules and libraries that you could use. There are even specialized modules built for the existing web frameworks.

There are event web services such as reCAPTCHA (http://www.google.com/recaptcha) with an already proven Python module recaptcha-client (https://pypi.python.org/pypi/recaptcha-client) that you can sign up and use. It does not require any imaging libraries because the image is pulled directly from the reCAPTCHA web service, but it has other dependencies such as pycrypto. Using this web service and library, you are also helping books scanned using Optical Character Recognition (OCR) from the Google Books project or old editions of The New York Times. Read more on the reCAPTCHA website.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.251.128