10

Working with Plain Old Java Objects (POJOs)

Companies often use a mix of strategies that can deliver services up to the expected standards. In the case of services that use Machine Learning (ML), they need to consider how they can quickly and easily build, extract, and deploy their models in production without affecting their ongoing service.

Hence, the portability of trained models is very important. How do you take a model object created by your training pipeline built with a certain technology and use that in your prediction pipeline, which might be built using a different technology? Ideally, the model object should be an object that is self-contained and easily distributable.

In the world of software engineering, the Java programming language has been known to be one of the most widely used platform-independent programming languages. When Java compiles a program, it converts it into platform-independent byte code that can be interpreted by any machine that has a Java Virtual Machine (JVM) installed in it. And expanding on this feature, you have Plain Old Java Objects (POJOs).

POJOs are ordinary objects that can be run by any Java program, irrespective of any framework. This makes POJOs very portable when deployed to different kinds of machines. H2O also has provisions to extract trained models in the form of POJOs, which can then be used for deployment in production.

In this chapter, we shall dive deep into understanding what POJOs are and how we can download them after successfully training a model in Python, R, and H2O Flow. Then, we’ll learn how to load a POJO into a simple Java program to make predictions.

In this chapter, we will cover the following topics:

  • Introduction to POJOs
  • Extracting H2O models as POJOs
  • Using a H2O model as a POJO

By the end of this chapter, you should be able to extract trained models in the form of POJOs using Python, R, or H2O Flow and then load these POJO models into your ML program to make predictions.

Technical requirements

For this chapter, you will require the following:

  • The latest version of your preferred web browser.
  • An Integrated Development Environment (IDE) of your choice.
  • (Optional) Jupyter Notebook by Project Jupyter (https://jupyter.org/)

All the experiments conducted in this chapter are performed on Jupyter notebooks to provide you with better visual examples of outputs. You are free to follow along using the same setup or perform the same experiments in environments specific to the language you are using. All the code examples for this chapter can be found on GitHub at https://github.com/PacktPublishing/Practical-Automated-Machine-Learning-on-H2O/tree/main/Chapter%2010.

Introduction to POJOs

POJO is a term coined by Martin Fowler, Rebecca Parsons, and Josh Mackenzie in September 2000. It is an ordinary Java object, but what makes it plain old is not what it should do but rather what it should not do.

A Java object can be a POJO in the following circumstances:

  • The Java object does not extend from any class.
  • The Java object does not implement any interfaces.
  • The Java object does not use any annotations from outside.

What these three restrictions lead to is a Java object that is not dependent on any other library or object outside of itself and is self-contained d enough to perform its logic on its own. You can easily embed POJOs in any Java environment due to their portability, and because of Java’s platform independence, they can be run on any machine.

H2O can export trained models in the form of POJOs. These POJO models can then be deployed and used to make predictions on inbound data. The only dependency on using POJO models is the h2o-genmodel.jar file. This is a JAR file that is needed to compile and run H2O model POJOs. This JAR file is a library that contains the base classes and GenModel, a helper class to support Java-generated models, from which the model POJOs are derived. This same library is also responsible for supporting scoring by using the model POJOs.

When working with model POJOs in production, you will need the h2o-genmodel.jar file to compile, deploy, and run your model POJOs. POJOs are simple Java code that are not tied to any particular version of H2O. However, it is still recommended to use the latest version of h2o-genmodel.jar since it can load the current version, as well as older versions, of your POJO. You can find detailed documentation regarding h2o-genmodel.jar at https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.

Now that we know what POJOs are and how H2O model POJOs work, let’s learn how to extract trained H2O models using AutoML as POJOs by using simple examples.

Extracting H2O models as POJOs

Models trained using H2O’s AutoML can also be extracted as POJOs so that they can be deployed to your production systems.

In the following sub-sections, we shall learn how to extract the model POJOs using the Python and R programming languages, as well as how we can extract model POJOs using H2O Flow.

Downloading H2O models as POJOs in Python

Let’s see how we can extract H2O models as POJOs using a simple example in Python. We shall use the same Iris flower dataset we have been using so far. This dataset can be found at https://archive.ics.uci.edu/ml/datasets/iris.

Follow these steps to train models using H2O AutoML in Python. After doing this, you will extract the leader model and download it as a POJO:

  1. Import the h2o module and start your H2O server:

    import h2o

    h2o.init()

  2. Import the dataset by passing the location of the dataset in your system. Execute the following command:

    data_frame = h2o.import_file("Dataset/iris.data")

  3. Set the feature and label names by executing the following commands:

    features = data_frame.columns

    label = "C5"

    features.remove(label)

  4. Initialize the H2O AutoML object and set the max_model parameter to 10 and the seed value to 5 by executing the following commands:

    aml=h2o.automl.H2OAutoML(max_models=10, seed = 5)

  5. Trigger AutoML by passing the training dataset, the feature columns, and the label column as the parameters, as follows:

    aml.train(x = features, y = label, training_frame = data_frame)

  6. Once the training has finished, H2O AutoML should have trained a few models and ranked them based on a default ranking performance metric on a leaderboard. The highest ranking model on the leaderboard is called a leader and can be accessed directly by using the aml.leader command. Using this reference, you can download the leader model as a POJO by running the following command:

    h2o.download_pojo(aml.leader, path="~/Downloads/", jar_name="AutoMLModel")

This should download a model POJO called AutoMLModel, as specified in the jar_name parameter, to the path specified in the path parameter. If the path parameter is not set, then H2O will print the model POJO’s details on the console instead of downloading it as a JAR file.

You can also view the contents of the POJO by opening the file in any editor. The file will contain a single public class that is named after your leader model and extends the GenModel class, which is a part of h2o-genmodel.jar.

Now that we know how we can extract a POJO model using Python, let’s see a similar example in the R programming language.

Downloading H2O models as POJOs in R

Similar to how we can extract a model from the AutoML leaderboard in Python, we can do the same in the R programming language. We shall use the same Iris flower dataset in this section. Follow these steps to train models using H2O AutoML and then extract the leader model to download it as a POJO:

  1. Import the h2o module and spin up your H2O server:

    library(h2o)

    h2o.init()

  2. Import the dataset by passing the location of the dataset in your system. Execute the following command:

    data_frame <- h2o.importFile("Dataset/iris.data")

  3. Set the feature and label names by executing the following commands:

    label <- "C5"

    features <- setdiff(names(data), label)

  4. Trigger AutoML by passing the training dataset, the feature columns, and the label columns as parameters. Also, set max_models to 10 and the seed value to 5:

    aml <- h2o.automl(x = features, y = label, training_frame = data_frame, max_models=10, seed = 5)

  5. Once training is finished and you have the leaderboard, you can access the leader model using aml@leaderboard. We can also download the leader model as a POJO by executing the following command:

    h2o.download_pojo(aml@leaderboard, path="~/Downloads/", jar_name="AutoMLModel")

This will start downloading the AutoMLModel model POJO to your device at the specified path.

Now that we know how we can extract a POJO model in the R programming language, let’s see how we can do this in H2O Flow.

Downloading H2O models as POJOs in H2O Flow

Downloading model POJOs in H2O Flow is very easy. H2O allows models to be downloaded as POJOs by simply clicking on a button. In Chapter 2, Working with H2O Flow (H2O’s Web UI), in the Working with Model Training Functions in H2O Flow section, you learned how to access a specific model’s information.

For every model’s information output in H2O Flow, in the Actions subsection, you have an interactive button titled Download POJO, as shown in the following screenshot:

Figure 10.1 – Gathering model information with the Download POJO button

Figure 10.1 – Gathering model information with the Download POJO button

You can simply click the Download POJO button to download the model as a POJO. You can download all the models that have been trained by H2O using this interactive button in H2O Flow.

Now that we have explored how we can download models as POJOs in Python, R, and H2O Flow, let’s learn how to use this model POJO to make predictions.

Using a H2O model as a POJO

As mentioned in the previous section, a model POJO can be used on any platform that has a JVM installed. The only dependency is the h2o-genmodel.jar file, a JAR file that’s needed to compile and run the model POJO to make predictions.

So, let’s complete an experiment where we can use the model POJO along with the h2o-genmodel.jar file to understand how we can use model POJOs in any environment with JVM. We shall write a Java program that imports the h2o-genmodel.jar file and uses it to load the model POJO into the program. Once the model POJO has been loaded, we will use it to make predictions on the sample data.

So, let’s start by creating a folder where we can keep the H2O POJO file needed for the experiment and then write some code that uses it. Follow these steps:

  1. Open your terminal and create an empty folder by executing the following command:

    mkdir H2O_POJO

    cd H2O_POJO

  2. Now, copy your model POJO file to the folder by executing the following command:

    mv {path_to_download_location}/{name_of_model_POJO} .

Keep in mind that you may need to mention the name of the model you downloaded, as well as the path where you have downloaded your model POJO file.

  1. Then, you need to download the h2o-genmodel.jar file. There are two ways you can do this:
    1. You can download the h2o-genmodel.jar file from your currently running local H2O server by running the following command:

    curl http://localhost:54321/3/h2o-genmodel.jar > h2o-genmodel.jar

Keep in mind you will need an actively running H2O server present on localhost:54321. If your server is running on a different port, then edit the command with the appropriate port number.

  1. The h2o-genmodel.jar file is also available as a Maven dependency if you plan to use it in a Maven project. Apache Maven is a project management tool that does automated dependency management. Just add the following lines of code to your Maven pom.xml file inside its dependencies tag with, preferably, the latest version:

<dependency>

<dependency>

        <groupId>ai.h2o</groupId>

        <artifactId>h2o-genmodel</artifactId>

        <version>3.35.0.2</version>

</dependency>

    

    

    

The Maven repository for this can be found here: https://mvnrepository.com/artifact/ai.h2o/h2o-genmodel.

  1. Now, let’s create a sample Java program that uses the model POJO and the h2o-genmodel.jar file to make predictions on random data values. Create a Java program called main.java by executing the following command in your terminal:

    vim main.java

This should open the vim editor for you to write your program in.

  1. Let’s start writing our Java program:
    1. First, import the necessary dependencies, as follows:

    import hex.genmodel.easy.RowData;

    import hex.genmodel.easy.EasyPredictModelWrapper;

    import hex.genmodel.easy.prediction.*;

    1. Then, create the main class, as follows:

    public class main { }

    1. Inside the main class, declare our model POJO’s class name, as follows:

    private static final String modelPOJOClassName = "{name_of_model_POJO}";

    1. Then, create a main function inside the main class, as follows:

    public static void main(String[] args) throws Exception { }

    1. Inside this main function, declare the rawModel variable as a GenModel object and initialize it by creating it as an instance of your model POJO by passing modelPOJOClassName, as follows:

    hex.genmodel.GenModel rawModel;

    rawModel = (hex.genmodel.GenModel) Class.forName(modelPOJOClassName).getDeclaredConstructor().newInstance();

    1. Now, let’s wrap this rawModel object in an EasyPredictModelWrapper class. This class comes with easy-to-use functions that will make it easy for us to make predictions. Add the following code to your file:

    EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);

    1. Now that we have our modelPOJO object loaded and wrapped in EasyPredictModelWrapper, let’s create some sample data for making predictions. Since we are using a model trained using the Iris dataset, let’s create a RowData that contains C1, C2, C3, and C4 as features and some appropriate values. Add the following code to your file:

    RowData row = new RowData();

    row.put("C1", 5.1);

    row.put("C2", 3.5);

    row.put("C3", 1.4);

    row.put("C4", 0.2);

    1. Now, we need to create a prediction handler object that we can use to store the prediction results. Since the Iris dataset is for a multinomial classification problem, we will create an appropriate multinomial prediction handler object, as follows:

    MultinomialModelPrediction predictionResultHandler = model.predictMultinomial(row);

For different types of problems, you will need to use the appropriate types of prediction handler objects. You can find more information about this at https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.

  1. Now, let’s add some print statements so that we can get a clean and easy-to-understand output. Add the following print statements:

System.out.println("Predicted Class of Iris flower is: " + predictionResultHandler.label);

predictionResultHandler.label will contain the predicted label value.

  1. Let’s also print out the different class probabilities so that we have an idea of what probability the label was predicted:

System.out.println("Class probabilities are: ");

for (int labelClassIndex = 0; labelClassIndex < predictionResultHandler.classProbabilities.length; labelClassIndex++) {

        System.out.println(predictionResultHandler.classProbabilities[labelClassIndex]);

}

  1. Finally, as the most important step, make sure all your braces are closed correctly and save the file.
  1. Once your file is ready, just compile the file by executing the following command:

    javac -cp h2o-genmodel.jar -J-Xmx2g -J-XX:MaxPermSize=128m DRF_1_AutoML_1_20220619_210236.java main.java

  2. Once compilation is successful, execute the compiled file by running the following command in your Terminal:

    java -cp .:h2o-genmodel.jar main

You should get the following output:

Figure 10.2 – Prediction results from the H2O model POJO implementation

Figure 10.2 – Prediction results from the H2O model POJO implementation

As you can see, using the model POJO is very easy – you just need to create the POJO and use it in any regular Java program by implementing the h2o-genmodel.jar file.

Tip

If you plan on using model POJOs in production, then it is highly recommended that you understand the h2o-genmodel.jar library in detail. This library can provide you with lots of features and functionality that can make your deployment experience easy. You can find out more about this library here: https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.

Congratulations! This chapter has helped you understand how to build, extract, and deploy model POJOs to make predictions on inbound data. You are now one step closer to using H2O in production.

Summary

In this chapter, we started by understanding what the usual problems are when working with an ML service in production. We understood how the portability of software, as well as ML models, plays an important role in seamless deployments. We also understood how Java’s platform independence makes it good for deployments and how POJOs play a role in it.

Then, we explored what POJOs are and how they are independently functioning objects in the Java domain. We also learned that H2O has provisions to extract models trained by AutoML in the form of POJOs, which we can use as self-contained ML models capable of making predictions.

Building on top of this, we learned how to extract ML models in H2O as POJOs in Python, R, and H2O Flow. Once we understood how to download H2O ML models as POJOs, we learned how to use them to make predictions.

First, we understood that we need the h2o-genmodel.jar library and that it is responsible for interpreting the model POJO in Java. Then, we created an experiment where we downloaded the H2O model POJO and h2o-genmodel.jar and created a simple Java program that uses both of these files to make predictions on some sample data; this gave us some practical experience in working with model POJOs.

In the next chapter, we shall explore MOJOs, objects similar to POJOs but with some special benefits that can also be used in production.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.185.138