Companies often use a mix of strategies that can deliver services up to the expected standards. In the case of services that use Machine Learning (ML), they need to consider how they can quickly and easily build, extract, and deploy their models in production without affecting their ongoing service.
Hence, the portability of trained models is very important. How do you take a model object created by your training pipeline built with a certain technology and use that in your prediction pipeline, which might be built using a different technology? Ideally, the model object should be an object that is self-contained and easily distributable.
In the world of software engineering, the Java programming language has been known to be one of the most widely used platform-independent programming languages. When Java compiles a program, it converts it into platform-independent byte code that can be interpreted by any machine that has a Java Virtual Machine (JVM) installed in it. And expanding on this feature, you have Plain Old Java Objects (POJOs).
POJOs are ordinary objects that can be run by any Java program, irrespective of any framework. This makes POJOs very portable when deployed to different kinds of machines. H2O also has provisions to extract trained models in the form of POJOs, which can then be used for deployment in production.
In this chapter, we shall dive deep into understanding what POJOs are and how we can download them after successfully training a model in Python, R, and H2O Flow. Then, we’ll learn how to load a POJO into a simple Java program to make predictions.
In this chapter, we will cover the following topics:
By the end of this chapter, you should be able to extract trained models in the form of POJOs using Python, R, or H2O Flow and then load these POJO models into your ML program to make predictions.
For this chapter, you will require the following:
All the experiments conducted in this chapter are performed on Jupyter notebooks to provide you with better visual examples of outputs. You are free to follow along using the same setup or perform the same experiments in environments specific to the language you are using. All the code examples for this chapter can be found on GitHub at https://github.com/PacktPublishing/Practical-Automated-Machine-Learning-on-H2O/tree/main/Chapter%2010.
POJO is a term coined by Martin Fowler, Rebecca Parsons, and Josh Mackenzie in September 2000. It is an ordinary Java object, but what makes it plain old is not what it should do but rather what it should not do.
A Java object can be a POJO in the following circumstances:
What these three restrictions lead to is a Java object that is not dependent on any other library or object outside of itself and is self-contained d enough to perform its logic on its own. You can easily embed POJOs in any Java environment due to their portability, and because of Java’s platform independence, they can be run on any machine.
H2O can export trained models in the form of POJOs. These POJO models can then be deployed and used to make predictions on inbound data. The only dependency on using POJO models is the h2o-genmodel.jar file. This is a JAR file that is needed to compile and run H2O model POJOs. This JAR file is a library that contains the base classes and GenModel, a helper class to support Java-generated models, from which the model POJOs are derived. This same library is also responsible for supporting scoring by using the model POJOs.
When working with model POJOs in production, you will need the h2o-genmodel.jar file to compile, deploy, and run your model POJOs. POJOs are simple Java code that are not tied to any particular version of H2O. However, it is still recommended to use the latest version of h2o-genmodel.jar since it can load the current version, as well as older versions, of your POJO. You can find detailed documentation regarding h2o-genmodel.jar at https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.
Now that we know what POJOs are and how H2O model POJOs work, let’s learn how to extract trained H2O models using AutoML as POJOs by using simple examples.
Models trained using H2O’s AutoML can also be extracted as POJOs so that they can be deployed to your production systems.
In the following sub-sections, we shall learn how to extract the model POJOs using the Python and R programming languages, as well as how we can extract model POJOs using H2O Flow.
Let’s see how we can extract H2O models as POJOs using a simple example in Python. We shall use the same Iris flower dataset we have been using so far. This dataset can be found at https://archive.ics.uci.edu/ml/datasets/iris.
Follow these steps to train models using H2O AutoML in Python. After doing this, you will extract the leader model and download it as a POJO:
import h2o
h2o.init()
data_frame = h2o.import_file("Dataset/iris.data")
features = data_frame.columns
label = "C5"
features.remove(label)
aml=h2o.automl.H2OAutoML(max_models=10, seed = 5)
aml.train(x = features, y = label, training_frame = data_frame)
h2o.download_pojo(aml.leader, path="~/Downloads/", jar_name="AutoMLModel")
This should download a model POJO called AutoMLModel, as specified in the jar_name parameter, to the path specified in the path parameter. If the path parameter is not set, then H2O will print the model POJO’s details on the console instead of downloading it as a JAR file.
You can also view the contents of the POJO by opening the file in any editor. The file will contain a single public class that is named after your leader model and extends the GenModel class, which is a part of h2o-genmodel.jar.
Now that we know how we can extract a POJO model using Python, let’s see a similar example in the R programming language.
Similar to how we can extract a model from the AutoML leaderboard in Python, we can do the same in the R programming language. We shall use the same Iris flower dataset in this section. Follow these steps to train models using H2O AutoML and then extract the leader model to download it as a POJO:
library(h2o)
h2o.init()
data_frame <- h2o.importFile("Dataset/iris.data")
label <- "C5"
features <- setdiff(names(data), label)
aml <- h2o.automl(x = features, y = label, training_frame = data_frame, max_models=10, seed = 5)
h2o.download_pojo(aml@leaderboard, path="~/Downloads/", jar_name="AutoMLModel")
This will start downloading the AutoMLModel model POJO to your device at the specified path.
Now that we know how we can extract a POJO model in the R programming language, let’s see how we can do this in H2O Flow.
Downloading model POJOs in H2O Flow is very easy. H2O allows models to be downloaded as POJOs by simply clicking on a button. In Chapter 2, Working with H2O Flow (H2O’s Web UI), in the Working with Model Training Functions in H2O Flow section, you learned how to access a specific model’s information.
For every model’s information output in H2O Flow, in the Actions subsection, you have an interactive button titled Download POJO, as shown in the following screenshot:
Figure 10.1 – Gathering model information with the Download POJO button
You can simply click the Download POJO button to download the model as a POJO. You can download all the models that have been trained by H2O using this interactive button in H2O Flow.
Now that we have explored how we can download models as POJOs in Python, R, and H2O Flow, let’s learn how to use this model POJO to make predictions.
As mentioned in the previous section, a model POJO can be used on any platform that has a JVM installed. The only dependency is the h2o-genmodel.jar file, a JAR file that’s needed to compile and run the model POJO to make predictions.
So, let’s complete an experiment where we can use the model POJO along with the h2o-genmodel.jar file to understand how we can use model POJOs in any environment with JVM. We shall write a Java program that imports the h2o-genmodel.jar file and uses it to load the model POJO into the program. Once the model POJO has been loaded, we will use it to make predictions on the sample data.
So, let’s start by creating a folder where we can keep the H2O POJO file needed for the experiment and then write some code that uses it. Follow these steps:
mkdir H2O_POJO
cd H2O_POJO
mv {path_to_download_location}/{name_of_model_POJO} .
Keep in mind that you may need to mention the name of the model you downloaded, as well as the path where you have downloaded your model POJO file.
curl http://localhost:54321/3/h2o-genmodel.jar > h2o-genmodel.jar
Keep in mind you will need an actively running H2O server present on localhost:54321. If your server is running on a different port, then edit the command with the appropriate port number.
<dependency>
<dependency>
<groupId>ai.h2o</groupId>
<artifactId>h2o-genmodel</artifactId>
<version>3.35.0.2</version>
</dependency>
The Maven repository for this can be found here: https://mvnrepository.com/artifact/ai.h2o/h2o-genmodel.
This should open the vim editor for you to write your program in.
import hex.genmodel.easy.RowData;
import hex.genmodel.easy.EasyPredictModelWrapper;
import hex.genmodel.easy.prediction.*;
public class main { }
private static final String modelPOJOClassName = "{name_of_model_POJO}";
public static void main(String[] args) throws Exception { }
hex.genmodel.GenModel rawModel;
rawModel = (hex.genmodel.GenModel) Class.forName(modelPOJOClassName).getDeclaredConstructor().newInstance();
EasyPredictModelWrapper model = new EasyPredictModelWrapper(rawModel);
RowData row = new RowData();
row.put("C1", 5.1);
row.put("C2", 3.5);
row.put("C3", 1.4);
row.put("C4", 0.2);
MultinomialModelPrediction predictionResultHandler = model.predictMultinomial(row);
For different types of problems, you will need to use the appropriate types of prediction handler objects. You can find more information about this at https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.
System.out.println("Predicted Class of Iris flower is: " + predictionResultHandler.label);
predictionResultHandler.label will contain the predicted label value.
System.out.println("Class probabilities are: ");
for (int labelClassIndex = 0; labelClassIndex < predictionResultHandler.classProbabilities.length; labelClassIndex++) {
System.out.println(predictionResultHandler.classProbabilities[labelClassIndex]);
}
javac -cp h2o-genmodel.jar -J-Xmx2g -J-XX:MaxPermSize=128m DRF_1_AutoML_1_20220619_210236.java main.java
java -cp .:h2o-genmodel.jar main
You should get the following output:
Figure 10.2 – Prediction results from the H2O model POJO implementation
As you can see, using the model POJO is very easy – you just need to create the POJO and use it in any regular Java program by implementing the h2o-genmodel.jar file.
Tip
If you plan on using model POJOs in production, then it is highly recommended that you understand the h2o-genmodel.jar library in detail. This library can provide you with lots of features and functionality that can make your deployment experience easy. You can find out more about this library here: https://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/index.html.
Congratulations! This chapter has helped you understand how to build, extract, and deploy model POJOs to make predictions on inbound data. You are now one step closer to using H2O in production.
In this chapter, we started by understanding what the usual problems are when working with an ML service in production. We understood how the portability of software, as well as ML models, plays an important role in seamless deployments. We also understood how Java’s platform independence makes it good for deployments and how POJOs play a role in it.
Then, we explored what POJOs are and how they are independently functioning objects in the Java domain. We also learned that H2O has provisions to extract models trained by AutoML in the form of POJOs, which we can use as self-contained ML models capable of making predictions.
Building on top of this, we learned how to extract ML models in H2O as POJOs in Python, R, and H2O Flow. Once we understood how to download H2O ML models as POJOs, we learned how to use them to make predictions.
First, we understood that we need the h2o-genmodel.jar library and that it is responsible for interpreting the model POJO in Java. Then, we created an experiment where we downloaded the H2O model POJO and h2o-genmodel.jar and created a simple Java program that uses both of these files to make predictions on some sample data; this gave us some practical experience in working with model POJOs.
In the next chapter, we shall explore MOJOs, objects similar to POJOs but with some special benefits that can also be used in production.
18.223.185.138