In this chapter, you will learn about different features for model management in MLflow. You will learn about the model life cycle in MLflow and we will explain how to integrate it with your regular development workflow and how to create custom models not available in MLflow. A model life cycle will be introduced alongside the Model Registry feature of MLflow.
Specifically, we will look at the following sections in this chapter:
From a workbench perspective, we would like to use MLflow to manage our models and implement a clear model life cycle. The addition of managed model features to our benchmark leveraging MLflow will step up the quality and operations of our machine learning engineering solution.
For this chapter, you will need the following:
On the MLflow platform, you have two main components available to manage models:
An MLflow model is at its core a packaging format for models. The main goal of MLflow model packaging is to decouple the model type from the environment that executes the model. A good analogy of an MLflow model is that it’s a bit like a Dockerfile for a model, where you describe metadata of the model, and deployment tools upstream are able to interact with the model based on the specification.
As can be seen in the diagram in Figure 5.1, on one side you have your model library, for instance, TensorFlow or sklearn. At the core of MLflow, you have the MLflow model format, which is able to be served in a multitude of flavors (model formats) to cater to different types of inference tools on-premises and in the cloud:
Figure 5.1 was extracted from the URL https://www.infoq.com/presentations/mlflow-databricks/#.
The central piece of the definition of MLflow models is the MLflow model file, as depicted in the next screenshot:
An MLmodel example can be seen in Figure 5.2 and provides the following information:
The MLflow Models module provides you with the ability to deploy your models in a native environment of the library of your model or in a generic interoperable MLflow environment called pyfunc. This function is supported in any environment that supports Python, providing flexibility to the deployer of the model on how best to run the model once logged in MLflow:
make
The model in Figure 5.3 should be very similar to the one used in Chapter 4, Experiment Management in MLflow. Using mlflow.start_run, you can start logging your model in MLflow and use the innate capabilities of the platform to capture relevant details of the model being developed.
import mlflow
logged_model = ‘/data/artifacts/1/132e6fa332f2412d85f3cb9e6d6bc933/artifacts/model’
# Load model as a PyFuncModel.
loaded_model = mlflow.pyfunc.load_model(logged_model)
# Predict on a Pandas DataFrame.
import pandas as pd
loaded_model.predict(pd.DataFrame(X_test))
Alternatively, the model can be loaded in the native H5 Keras format and loaded to a completely different application, as shown in Figure 5.4, by using the /data/model/model.h5 file.
After introducing in this section the concept of models in MLflow, we will next delve a bit deeper into the different types of models in MLflow.
Model flavors in MLflow are basically the different models of different libraries supported by MLflow. This functionality allows MLflow to handle the model types with native libraries of each specific model and support some of the native functionalities of the models. The following list presents a selection of representative models to describe and illustrate the support available in MLflow:
mlflow.h2o.load_model(...)
mlflow.h2o.log_model(...)
A very comprehensive list of flavors/formats is supported by MLflow and their usage and support can be read about here: https://www.mlflow.org/docs/latest/python_api/index.html.
We can delve into the next excerpt of code and the custom RandomPredictor model. As long as you provide a class with an interface with the fit and predict methods, you can have your own custom MLflow model:
class RandomPredictor(mlflow.pyfunc.PythonModel):
def __init__(self):
pass
def fit(self):
pass
def predict(self, context, model_input):
return model_input.apply(
lambda column: random.randint(0,1))
In the preceding class, we basically use a random probability, and it can be used as a sample model in a system where you want to make sure that your model is better than a random model.
In this section, we introduced different types of model flavors and the creation of a custom mode. We will next look at some of the schemas and signature features of MLflow.
An important feature of MLflow is to provide an abstraction for input and output schemas of models and the ability to validate model data during prediction and training.
MLflow throws an error if your input does not match the schema and signature of the model during prediction:
from sklearn import datasets, svm, metrics
from sklearn.model_selection import train_test_split
import mlflow
digits = datasets.load_digits()
n_samples = len(digits.images)
data = digits.images.reshape((n_samples, -1))
clf = svm.SVC(gamma=0.001)
X_train, X_test, y_train, y_test = train_test_split(
data, digits.target, test_size=0.5, shuffle=False)
mlflow.sklearn.autolog()
with mlflow.start_run():
clf.fit(X_train, y_train)
# flatten the images
from mlflow.models.signature import infer_signature
with mlflow.start_run(run_name=’untuned_random_forest’):
…
signature = infer_signature(X_train,
wrappedModel.predict(None, X_train))
mlflow.pyfunc.log_model(“random_forest_model”,
python_model=wrappedModel,
signature=signature)
In the previous code block, the signature of the model is provided by the infer_signature method. As the model is logged through log_model, the signature is provided. One important advantage of the signatures being logged alongside the model is that they can serve as documentation and metadata for the model. Third-party systems can consume the metadata and interact with the models by validating the data or generating documentation for the models.
In this section, we introduced the model schema and signature features of MLflow models. We will now move on to the other critical module in this space, namely the Model Registry.
MLflow Model Registry is a module in MLflow that comprises a centralized store for Models, an API allowing the management of the life cycle of a model in a registry.
A typical workflow for a machine learning model developer is to acquire training data; clean, process, and train models; and from there on, hand over to a system or person that deploys the models. In very small settings, where you have one person responsible for this function, it is quite trivial. Challenges and friction start to arise when the variety and quantity of models in a team start to scale. A selection of common friction points raised by machine learning developers with regards to storing and retrieving models follows:
The main idea behind MLflow Model Registry is to provide a central store model in an organization where all the relevant models are stored and can be accessed by humans and systems. A good analogy would be a Git repository for models with associated relevant metadata and centralized state management for models.
In the MLflow UI (available in your local environment), you should click on the tab on the right side of Experiments with the label Models as indicated by the arrow:
When you add a new model, MLflow automatically increases the version and labels this version as the latest version and everyone in the organization can query the registry for the latest version of a model for a given problem.
Everything that can be done in the UI in MLflow can also be implemented through the MLflow API.
We can quickly go back to our use case of stock market prediction and add our first baseline model to Model Registry and run the hyperopt_optimization_logistic_regression_mlflow.ipynb notebook, available in the repo of this chapter, and sort the runs according to the F1 score metrics in descending order as represented by Figure 5.10:
From there, you should be able to register the best model with the name BTC StockPrediction as represented in Figure 5.11:
By returning to the models module, you will notice, as represented in Figure 5.12, your newly created model under Version 1:
Having introduced the functionalities of Model Registry, in the next section, we will describe a model development life cycle to help organize the management of your models.
Managing the model life cycle is quite important when working in a team of more than one model developer. It’s quite usual for multiple model developers to try different models within the same project, and having a reviewer decide on the model that ends up going to production is quite important:
A model in its life cycle can undergo the following stages if using a life cycle similar to the one represented in Figure 5.13:
For instance, a reviewer or supervisor, as represented in Figure 5.14, can move a model from the Development state to Staging for further deployment in a test environment and the model can be transitioned into production if approved by reviewers:
When transitioning from a state in MLflow, you have the option to send the model in an existing state to the next state:
The transitions from the Staging to Production stages in a mature environment are meant to be done automatically, as we will demonstrate in the upcoming chapters of the book.
With this section, we have concluded the description of the features related to models in MLflow.
In this chapter, we first introduced the Models module in MLflow and the support for different algorithms, from tree-based to linear to neural. We were exposed to the support in terms of the logging and metrics of models and the creation of custom metrics.
In the last two sections, we introduced the Model Registry model and how to use it to implement a model life cycle to manage our models.
In the next chapters and section of the book, we will focus on applying the concepts learned so far in terms of real-life systems and we will architect a machine learning system for production environments.
In order to solidify your knowledge and dive deeper into the concepts introduced in this chapter, you should look at the following links:
3.145.74.54