Building a face recognition Java application

We are going through the code details of building a Java face recognition application, and by the end of this section, we shall be able to create a live demo version of the recognition application. 

Let's begin with exploring the code by creating a basic network:

Training the model for face recognition is time consuming and hard. To take care of this, we shall be using transfer recognition and to obtain pre-trained weights.  The time model we choose is based on the inception network GoogLeNet, and this will be used to obtain the encodings or the activations of the last layer.  Post this, instead of calculating the distance between them directly, we shall normalize the encodings using the L2-norm and only after this, we shall use the distance between the images.  

Notice that we are not using the squared distance but rather just the Euclidean distance between the images, which is different from what we have seen previously.

Usually, we have a certain number of people. For the same of this example, let us consider this as the number of employees:

If we want to gauge if the person who is entering the premises is someone new or an employee. For each of the employees, we are going to calculate the encoding, and save them in our database, or in memory. These images will be fed to the pre-trained model, which will give us the activation function defined in the previous diagram. More specifically, we are going to save the database with a normalized version of these activation functions. 

In case there is a new person, we need to check if the picture lies in the database. We start with calculating the normalized activation values of the last layer for this new photo:

The new person is labelled as a question mark. Moving ahead, we calculate the distance of this image with all the images in the database. 

Among all of these, we shall choose the image that has the least distance from the new image. We can set a threshold of 0.5 or 0.6 beyond, which will classify the person as unknown:

Here, hopefully the minimum distance is between our new image and image four should stay below the 0.5 threshold, implying that this new person is actually someone from our database.

Let's look at the code to do this: 

public class FaceNetSmallV2Model {
private int numClasses = 0;
private final long seed = 1234;
private int[] inputShape = new int[]{3, 96, 96};
private IUpdater updater = new Adam(0.1, 0.9, 0.999, 0.01);
private int encodings = 128;
public static int reluIndex = 1;
public static int paddingIndex = 1;
public ComputationGraphConfiguration conf() {
computationGraphConfiguration.GraphBuilder graph = new NeuralNetConfiguration.Builder().seed(seed)
.activation(Activation.IDENTITY)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.updater(updater)
.weightInit(WeightInit.RELU)
.l2(5e-5)
.miniBatch(true)
.graphBuilder();

First we have the model. As we mentioned this is a model quite similar to the inception network, but of course a bit smaller than it, although the code looks a bit large, it adds this layer according to the open source implementation in Keras of this GitHub project:

buildBlock3a(graph);
buildBlock3b(graph);
buildBlock3c(graph);

buildBlock4a(graph);
buildBlock4e(graph);

buildBlock5a(graph);
buildBlock5b(graph);

So first layers are quite manual till these three bs or we can add them here. We merge all of these together, and then append this block to the previous block and so on. But then, block for ae and so on are a bit more automated; we use this utility method here to append these blocks more easily.

At the end of the code, we have the final layers, which is this dense layer, fully connected neurons. We are going to use 128 of such neurons, because this has proven to give good results, and as we already mentioned, instead of using those activations directly, we are going to use the l2 normalization version of them, which is why we add the vertex here, l2 normalization vertex.

Usually, loading the weights with the Deepleaning4j is quite easy. This time it is different, because the weight we have our speed up in some axles. The way we are going to load these weights is like with this method here:

 static void loadWeights(ComputationGraph computationGraph) throws IOException 
{

Layer[] layers = computationGraph.getLayers();
for (Layer layer : layers) {
List<double[]> all = new ArrayList<>();
String layerName = layer.conf().getLayer().getLayerName();
if (layerName.contains("bn")) {
all.add(readWightsValues(BASE + layerName + "_w.csv"));
all.add(readWightsValues(BASE + layerName + "_b.csv"));
all.add(readWightsValues(BASE + layerName + "_m.csv"));
all.add(readWightsValues(BASE + layerName + "_v.csv"));
layer.setParams(mergeAll(all));
} else if (layerName.contains("conv")) {
all.add(readWightsValues(BASE + layerName + "_b.csv"));
all.add(readWightsValues(BASE + layerName + "_w.csv"));
layer.setParams(mergeAll(all));
} else if (layerName.contains("dense")) {
double[] w = readWightsValues(BASE + layerName + "_w.csv");
all.add(w);
double[] b = readWightsValues(BASE + layerName + "_b.csv");
all.add(b);
layer.setParams(mergeAll(all));
}
}
}

We build up the model to have similar layer names with those files, and then we locate the layer name. Based on the layer name we are going to find those axles, merge them together, and set as the layer parameters.

Notice from the convolution at dense there is a slight difference. In Deeplearning4j the convolution expressed bias first, while even though we have the dense layer, we have the weight first. So, switching this will actually cause the model to not work properly.

Then we have the prediction phase, where we define what the method is. The code is as follows:

  public String whoIs(String imagePath) throws IOException {
INDArray read = read(imagePath);
INDArray encodings = forwardPass(normalize(read));
double minDistance = Double.MAX_VALUE;
String foundUser = "";
for (Map.Entry<String, INDArray> entry : memberEncodingsMap.entrySet()) {
INDArray value = entry.getValue();
double distance = distance(value, encodings);
log.info("distance of " + entry.getKey() + " with " + new File(imagePath).getName() + " is " + distance);
if (distance < minDistance) {
minDistance = distance;
foundUser = entry.getKey();
}
}

This will iterate through the map, which is what we have done in database or memory. We then need to get the encodings and calculate the distance between them. 

This is the l2 distance. If the distance is smaller than the minimum, we register the user. But remember this is not the end, we are also going to do this final check, so if minimum distance is greater than the threshold, the user is an unknown user:

if (minDistance > THRESHOLD) {
foundUser = "Unknown user";
}
log.info(foundUser + " with distance " + minDistance);
return foundUser;

Only when the minimum distance is smaller than threshold, only then we know that this person exists in database. Let's now see what the application will look like. Here, we have the images that exist in in our database, and at the top left, we can load the images and compare them against what we have:

For example, here we load this person, which is GerhardSchroeder, and it actually matches one of the images in our database.

Let's choose another one, such as Arnold Schwarzenegger we saw previously, so it was able to actually detect this image in here. Now, let's choose another one as follows:

Actually, the application offers a feature for registering new persons. We could select a new person, and click on the Register New Member button:

The application could be tuned and trained further. So it has some room for improvement, especially with the way it consolidates the data, as we have not applied the standard approach.  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.254.192