Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Identifying the gender

Identifying the gender of a name is an interesting task in NLP. We will use the heuristic that the last few characters in a name is its defining characteristic. For example, if the name ends with "la", it's most likely a female name, such as "Angela" or "Layla". On the other hand, if the name ends with "im", it's most likely a male name, such as "Tim" or "Jim". As we are sure of the exact number of characters to use, we will experiment with this. Let's see how to do it.

How to do it…

Create a new Python file, and import the following packages:

import random
from nltk.corpus import names
from nltk import NaiveBayesClassifier
from nltk.classify import accuracy as nltk_accuracy

We need to define a function to extract features from input words:

# Extract features from the input word
def gender_features(word, num_letters=2):
    return {'feature': word[-num_letters:].lower()}

Let's define the main function. We need some labeled training data:

if __name__=='__main__':
    # Extract labeled names
    labeled_names = ([(name, 'male') for name in names.words('male.txt')] +
            [(name, 'female') for name in names.words('female.txt')])

Seed the random number generator, and shuffle the training data:
```
    random.seed(7)
    random.shuffle(labeled_names)
```

Define some input names to play with:

    input_names = ['Leonardo', 'Amy', 'Sam']

As we don't know how many ending characters we need to consider, we will sweep the parameter space from 1 to 5. Each time, we will extract the features, as follows:

    # Sweeping the parameter space
    for i in range(1, 5):
        print '
Number of letters:', i
        featuresets = [(gender_features(n, i), gender) for (n, gender) in labeled_names]

Divide this into train and test datasets:

        train_set, test_set = featuresets[500:], featuresets[:500]

We will use the Naive Bayes classifier to do this:

        classifier = NaiveBayesClassifier.train(train_set)

Evaluate the classifier for each value in the parameter space:

        # Print classifier accuracy
        print 'Accuracy ==>', str(100 * nltk_accuracy(classifier, test_set)) + str('%')

# Predict outputs for new inputs
        for name in input_names:
            print name, '==>', classifier.classify(gender_features(name, i))

The full code is in the gender_identification.py file. If you run this code, you will see the following output printed on your Terminal:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Identifying the gender

Create new playlist

Sign In

Sign Up

Identifying the gender

How to do it…

Table of Contents for
Identifying the gender