Chapter 9: Building an Enterprise ML Architecture with AWS ML Services

To support a large number of fast-moving machine learning (ML) initiatives, many organizations often decide to build enterprise ML platforms capable of supporting the full ML life cycle, as well as a wide range of usage patterns, which also needs to be automated and scalable. As a practitioner, I have often been asked to provide architecture guidance on how to build enterprise ML platforms. In this chapter, we will discuss the core requirements for enterprise ML platform design and implementation. We will cover topics such as workflow automation, infrastructure scalability, and system monitoring. You will learn about architecture patterns for building technology solutions that automate the end-to-end ML workflow and deployment at scale. We will also dive deep into other core enterprise ML architecture components such as model training, model hosting, the feature store, and the model registry at enterprise scale.

Specifically, we will cover the following topics:

  • Key requirements for an ML platform
  • Enterprise ML architecture pattern
  • Adopting ML Operations (MLOps) for an ML workflow
  • Hands-on exercise – building an MLOps pipeline on AWS

Governance and security is another important topic for enterprise ML, which we will cover in greater detail in Chapter 11, ML Governance, Bias, Explainability, and Privacy. To get started, let's discuss the key requirements for an enterprise ML platform.

Technical requirements

We will continue to use the AWS environment for the hands-on portion of this chapter. All the source code mentioned in this chapter can be found at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/tree/main/Chapter09.

Key requirements for an enterprise ML platform

To deliver the business values for ML at scale, organizations need to be able to experiment quickly with different scientific approaches, ML technologies, and datasets at scale. Once the ML models have been trained and validated, they need to be deployed to production with minimal friction. While there are similarities between a traditional enterprise software system and an ML platform, such as scalability and security, an enterprise ML platform poses many unique challenges, such as integrating with the data platform and high-performance computing infrastructure for large-scale model training. Now, let's talk about some specific enterprise ML platform requirements:

  • Support for the end-to-end ML life cycle: An enterprise ML platform needs to support both data science experimentation and production-grade operations/deployments. In Chapter 8, Building a Data Science Environment Using AWS ML Services, we learned about the key architecture components that are needed to build a data science experimentation environment. To enable production-grade operations and deployment, an enterprise ML platform also needs to have architecture components for large-scale model training, model management, feature management, and model hosting with high availability and scalability.
  • Support for continuous integration (CI), continuous training (CT), and continuous deployment (CD): An enterprise ML platform provides CI capabilities beyond just testing and validating code and components – it also provides such capabilities for data and models. The CD capability for ML is also more than just deploying a single piece of software; it is the combination of ML models and inference engines. CT is unique to ML, whereby a model is monitored continuously, and automated model retraining can be triggered when data drift or model drift is detected, or training data is changed. Data drift is a change in data whereby the characteristics of the data in production are statistically different from the model training data. Model drift is a change in model performance whereby the model performance degrades from the performance that was achieved during the model training stage.
  • MLOps support: An enterprise ML platform provides capabilities for monitoring the statuses, errors, and metrics of different pipeline workflows, processing/training jobs, and model serving engines. It also monitors infrastructure-level stats and resource usages. The automated alert mechanism is also a key component of MLOps. Where possible, automated failure recovery mechanisms should be implemented.
  • Support for different languages and ML frameworks: An enterprise ML platform allows data scientists and ML engineers to work on different ML problems using the programming language and ML libraries of their choice. It needs to support popular languages such as Python and R, as well as ML packages such as TensorFlow, PyTorch, and scikit-learn.
  • Computing hardware resource management: Depending on model training and inference needs and cost considerations, an enterprise ML platform needs to support different types of compute hardware, such as CPUs and GPUs. Where applicable, it should also support specialized ML hardware such as AWS's inferentia chip.
  • Integration with other third-party systems and software: An enterprise ML platform seldomly works in isolation. It needs to provide integration capabilities with other third-party software or platforms, such as workflow orchestration tools, container registries, and code repositories.
  • Authentication and authorization: An enterprise ML platform needs to provide different levels of authentication and authorization control to govern secure access to data, artifacts, and ML platform resources. This authentication and authorization can be a built-in capability of the ML platform or it can be provided by an external authentication and authorization service.
  • Data encryption: For regulated industries, such as financial services and healthcare, data encryption is a key requirement. An enterprise ML platform needs to provide capabilities for encrypting data at rest and in transit, often with customer-managed encryptions keys.
  • Artifacts management: An enterprise ML platform processes datasets and produces different artifacts at the different phases of the ML life cycle. To establish reproducibility and meet governance and compliance requirements, an enterprise ML platform needs to be able to track, manage, and version-control these artifacts.

With that, we have talked about the key requirements of an enterprise ML platform. Next, let's discuss how AWS ML and DevOps services, such as SageMaker, CodePipeline, and Step Functions, can be used to build an enterprise-grade ML platform.

Enterprise ML architecture pattern overview

Building an enterprise ML platform on AWS starts with creating different environments to enable different data science and operations functions. The following diagram shows the core environments that normally make up an enterprise ML platform. From an isolation perspective, in the context of the AWS cloud, each environment in the following diagram is a separate AWS account:

Figure 9.1 – Enterprise ML architecture environments

Figure 9.1 – Enterprise ML architecture environments

As we discussed in Chapter 8, Building a Data Science Environment Using AWS ML Services, data scientists use the data science environment for experimentation, model building, and tuning. Once these experiments are completed, the data scientists commit their work to the proper code and data repositories. The next step is to train and tune the ML models in a controlled and automated environment using the algorithms, data, and training scripts that were created by the data scientists. This controlled and automated model training process will help ensure consistency, reproducibility, and traceability for model building at scale. The following are the core functionalities and technology options provided by the training, hosting, and shared services environments:

  • The model training environment manages the full life cycle of model training, from computing and storage infrastructure resource provisioning to training job monitoring and model persistence. From a technology option perspective, you can build out your training infrastructure using proprietary or open source technology, or you can choose a fully managed ML service, such as the SageMaker training service.
  • The model hosting environment is used for serving the trained models behind web service endpoints or in batch inference mode. Model hosting environments can have services such as the SageMaker hosting service, Kubernetes/Kubeflow container-based model serving, Lambda, or EC2-based model serving running different model inference engines. Other supporting services such as the online feature store and API management service can also run in the model hosting environment.
  • The shared services environment hosts common services tooling such as workflow orchestration tools, CI/CD tools, code repositories, Docker image repositories, and private library package tools. A central model registry can also run in the shared services environment for model registration and model life cycle management. Service provisioning capabilities, such as creating resources in different environments through Infrastructure as Code (IaC) or APIs, also run out of this environment. Any service ticketing tools, such as ServiceNow, and service provisioning tools, such as Service Catalog, can also be hosted in this environment.

In addition to the core ML environments, there are other dependent environments, such as security, governance, monitoring, and logging, that are required in the enterprise ML platform:

  • The security and governance environment centrally manages authentication services, user credentials, and data encryption keys. Security audit and reporting processes also run in this environment. Native AWS services, such as AWS IAM, AWS KMS, and AWS Config, can be used for various security and governance functions.
  • The monitoring and logging environment centrally aggregates monitoring and logging data from other environments for further processing and reporting. Custom dashboarding and alerting mechanisms are normally developed to provide easy access to key metrics and alerts from the underlying monitoring and logging data.

With that, you have had an overview of the core building blocks of an enterprise ML platform. Next, let's dive deep into several core areas. Note that there are different patterns and services we can follow to build an ML platform on AWS. In this chapter, we will cover one of the enterprise patterns.

Model training environment

Within an enterprise, a model training environment is a controlled environment with well-defined processes and policies on how it is used and who can use them. Normally, it should be an automated environment that's managed by an MLOps team, though it can be self-service enabled for direct usage by data scientists.

Automated model training and tuning are the core capabilities of the model training environment. To support a broad range of use cases, a model training environment needs to support different ML and deep learning frameworks, training patterns (such as single-node and distributed training), and hardware (different CPUs and GPUs).

The model training environment manages the life cycle of the model training process. This can include authentication and authorization, infrastructure provisioning, data movement, data preprocessing, ML library deployment, training loop management and monitoring, model persistence and registry, training job management, and lineage tracking. From a security perspective, the training environment needs to provide security capabilities for different isolation requirements, such as network isolation, job isolation, and artifacts isolation. To assist with operational support, a model training environment also needs to be able to training status logging, metrics reporting, and training job monitoring and alerting.

Next, let's learn how the SageMaker training service can be used in a controlled model training environment in an enterprise setting.

Model training engine

The SageMaker training service provides built-in modeling training capabilities for a range of ML/DL libraries. In addition, you can bring your own Docker containers for customized model training needs. The following are a subset of supported options for the SageMaker Python SDK:

  • Training TensorFlow models: SageMaker provides a built-in training container for TensorFlow models. The following code sample shows how to train a TensorFlow model using the built-in container through the TensorFlow estimator API:

    from sagemaker.tensorflow import TensorFlow

    tf_estimator = TensorFlow(

      entry_point="<Training script name>",

      role= "<AWS IAM role>",

      instance_count=<Number of instances),

      instance_type="<Instance type>",

      framework_version="<TensorFlow version>",

      py_version="<Python version>",)

    tf_estimator.fit("<Training data location>")

  • Training PyTorch models: SageMaker provides a built-in training container for the PyTorch model. The following code sample shows how to train a PyTorch model using the PyTorch estimator:

    from sagemaker.pytorch import PyTorch

    pytorch_estimator = PyTorch(

      entry_point="<Training script name>",

      role= "<AWS IAM role>",

      instance_count=<Number of instances),

      instance_type="<Instance type>",

      framework_version="<PyTorch version>",

      py_version="<Python version>",)

    pytorch_estimator.fit("<Training data location>")

  • Training XGBoost models: XGBoost training is also supported via a built-in container. The following code shows the syntax for training a XGBoost model using the XGBoost estimator:

    from sagemaker.xgboost.estimator import XGBoost

    xgb_estimator = XGBoost(

      entry_point=" <Training script name>",

      hyperparameters=<List of hyperparameters>,

      role=<AWS IAM role>,

      instance_count=<Number of instances>,     

      instance_type="<Instance type>",     

      framework_version="<Xgboost version>")

    xgb_estimator.fit("<train data location>")

  • Training scikit-learn models: The following code sample shows how to train a scikit-learn model using the built-in container:

    from sagemaker.sklearn.estimator import SKLearn

    sklearn_estimator = SKLearn(

      entry_point=" <Training script name>",

      hyperparameters=<List of hyperparameters>,

      role=<AWS IAM role>,

      instance_count=<Number of instances>,

      instance_type="<Instance type>",  

         framework_version="<sklearn version>")

    Sklearn_estimator.fit("<training data>")

  • Training models using custom containers: You can also build a custom training container and use the SageMaker training service for model training. See the following code for an example:

    from sagemaker.estimator import Estimator

    custom_estimator = Estimator (

      Custom_training_img,

      role=<AWS IAM role>,

      instance_count=<Number of instances>,

      instance_type="<Instance type>")

    custom_estimator.fit("<training data location>")

In addition to using the SageMaker Python SDK to kick off training, you can also use the boto3 library and SageMaker CLI commands to start training jobs.

Automation support

The SageMaker training service is exposed through a set of APIs and can be automated by integrating with external applications or workflow tools, such as Airflow and AWS Step Functions. For example, it can be one of the steps in an Airflow-based pipeline for an end-to-end ML workflow. Some workflow tools, such as Airflow and AWS Step Functions, also provide SageMaker-specific connectors to interact with the SageMaker training service more seamlessly. The SageMaker training service also provides Kubernetes operators, so it can be integrated and automated as part of the Kubernetes application flow. The following sample code shows how to kick off a training job using the low-level API via the AWS boto3 SDK:

import boto3

client = boto3.client('sagemaker')

response = client.create_training_job(

    TrainingJobName='<job name>',

    HyperParameters={<list of parameters and value>},

    AlgorithmSpecification={...},

    RoleArn='<AWS IAM Role>',

    InputDataConfig=[...],

    OutputDataConfig={...},

    ResourceConfig={...},

    ...

}

Regarding using Airflow as the workflow tool, the following sample shows how to use the Airflow SageMaker operator as part of the workflow definition. Here, train_config contains training configuration details, such as the training estimator, training instance type and number, and training data location:

import airflow

from airflow import DAG

from airflow.contrib.operators.sagemaker_training_operator import SageMakerTrainingOperator

default_args = {

    'owner': 'myflow',

    'start_date': '2021-01-01'

}

dag = DAG('tensorflow_training', default_args=default_args,

          schedule_interval='@once')

train_op = SageMakerTrainingOperator(

    task_id='tf_training',

    config=train_config,

    wait_for_completion=True,

    dag=dag)

SageMaker also has a built-in workflow automation tool called SageMaker Pipelines. A training step can be created using the SageMaker TrainingStep API and become part of the larger SageMaker Pipelines workflow.

Model training life cycle management

SageMaker training manages the life cycle of the model training process. It uses AWS IAM as the mechanism to authenticate and authorize access to its functions. Once authorized, it provides the desired infrastructure, deploys the software stacks for the different model training requirements, moves the data from sources to training nodes, and kicks off the training job. Once the training job has been completed, the model artifacts are saved into an S3 output bucket and the infrastructure is torn down. For lineage tracing, model training metadata such as source datasets, model training containers, hyperparameters, and model output locations are captured. Any logging from the training job runs is saved in CloudWatch Logs, and system metrics such as CPU and GPU utilization are captured in the CloudWatch metrics.

Depending on the overall end-to-end ML platform architecture, a model training environment can also host services for data preprocessing, model validation, and model training postprocessing, as those are important steps in an end-to-end ML flow. There are multiple technology options available for this, such as the SageMaker Processing service and AWS Lambda.

Model hosting environment deep dive

An enterprise-grade model hosting environment needs to support a broad range of ML frameworks in a secure, performant, and scalable way. It should come with a list of pre-built inference engines that can serve common models out of the box behind a RESTful API or via the gRPC protocol. It also needs to provide flexibility to host custom-built inference engines for unique requirements. Users should also have access to different hardware devices, such as CPU, GPU, and purpose-built chips, for the different inference needs.

Some model inference patterns demand more complex inference graphs, such as traffic split, request transformations, or model ensemble support. A model hosting environment can provide this capability as an out-of-the-box feature or provide technology options for building custom inference graphs. Other common model hosting capabilities include concept drift detection and model performance drift detection. Concept drift occurs when the statistical characteristics of the production data deviate from the data that's used for model training. An example of concept drift is the mean and standard deviation of a feature changing significantly in production from that of the training dataset.

Components in a model hosting environment can participate in an automation workflow through its API, scripting, or IaC deployment (such as AWS CloudFormation). For example, a RESTful endpoint can be deployed using a CloudFormation template or by invoking its API as part of an automated workflow.

From a security perspective, the model hosting environment needs to provide authentication and authorization control to manage access to both the control plane (management functions) and data plane (model endpoints). The accesses and operations that are performed against the hosting environments should be logged for auditing purposes. For operations support, a hosting environment needs to enable status logging and system monitoring to support system observability and problem troubleshooting.

The SageMaker hosting service is a fully managed model hosting service. Similar to KFServing and Seldon Core, which we reviewed earlier in this book, the SageMaker hosting service is also a multi-framework model serving service. Next, let's take a closer look at its various capabilities for enterprise-grade model hosting.

Inference engine

SageMaker provides built-in inference engines for multiple ML frameworks, including scikit-learn, XGBoost, TensorFlow, PyTorch, and Spark ML. SageMaker supplies these built-in inference engines as Docker containers. To stand up an API endpoint to serve a model, you just need to provide the model artifacts and infrastructure configuration. The following is a list of model serving options:

  • Serving TensorFlow models: SageMaker uses TensorFlow Serving as the inference engine for TensorFlow models. The following code sample shows how to deploy a TensorFlow Serving model using the SageMaker hosting service:

    from sagemaker.tensorflow.serving import Model

    tensorflow_model = Model(

        model_data=<S3 location of the Spark ML model artifacts>,

        role=<AWS IAM role>,

       framework_version=<tensorflow version>

    )

    tensorflow_model.deploy(

      initial_instance_count=<instance count>, instance_type=<instance type>

    )

  • Serving PyTorch models: SageMaker hosting uses TorchServe under the hood 
to serve PyTorch models. The following code sample shows how to deploy a PyTorch model:

    from sagemaker.pytorch.model import PyTorchModel

    pytorch_model = PyTorchModel(

        model_data=<S3 location of the PyTorch model artifacts>,

        role=<AWS IAM role>,

        framework_version=<PyTorch version>

    )

    pytorch_model.deploy(

        initial_instance_count=<instance count>, instance_type=<instance type>

    )

  • Serving Spark ML models: For Spark ML-based models, SageMaker uses MLeap as the backend to serve Spark ML models. These Spark ML models need to be serialized into MLeap format. The following code sample shows how to deploy a Spark ML model using the SageMaker hosting service:

    import sagemaker

    from sagemaker.sparkml.model import SparkMLModel

    sparkml_model = SparkMLModel(

        model_data=<S3 location of the Spark ML model artifacts>,

        role=<AWS IAM role>,

        sagemaker_session=sagemaker.Session(),

        name=<Model name>,

        env={"SAGEMAKER_SPARKML_SCHEMA": <schema_json>}

    )

    sparkml_model.deploy(

        initial_instance_count=<instance count>, instance_type=<instance type>

    )

  • Serving XGboost models: SageMaker provides an XGBoost model server for serving trained XGBoost models. Under the hood, it uses Nginx, Gunicorn, and Flask as part of the model serving architecture. The entry Python script loads the trained XGBoost model and can optionally perform pre- and post-data processing:

    from sagemaker.xgboost.model import XGBoostModel

    xgboost_model = XGBoostModel(

        model_data=<S3 location of the Xgboost ML model artifacts>,

        role=<AWS IAM role>,

        entry_point=<entry python script>,

        framework_version=<xgboost version>

    )

    xgboost_model.deploy(

        instance_type=<instance type>,

        initial_instance_count=<instance count>

    )

  • Serving scikit-learn models: SageMaker provides a built-in serving container for serving scikit-learn-based models. The technology stack is similar to the one for the Xgboost model server:

    from sagemaker.sklearn.model import SKLearnModel

    sklearn_model = SKLearnModel(

        model_data=<S3 location of the Xgboost ML model artifacts>,

        role=<AWS IAM role>,

        entry_point=<entry python script>,

        framework_version=<scikit-learn version>

    )

    sklearn_model.deploy(instance_type=<instance type>, initial_instance_count=<instance count>)

  • Serving models with custom containers: For custom-created inference containers, you can follow similar syntax to deploy the model. The main difference is that a custom inference container image's uri needs to be provided. You can find detailed documentation on building a custom inference container at https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-inference-container.html:

    from sagemaker.model import Model

    custom_model = Model(

        Image_uri = <custom model inference container image uri>,

        model_data=<S3 location of the ML model artifacts>,

        role=<AWS IAM role>,

        framework_version=<scikit-learn version>

    )

    custom_model.deploy(instance_type=<instance type>, initial_instance_count=<instance count>)

SageMaker hosting provides an inference pipeline feature that allows you to create a linear sequence of containers (up to 15) to perform custom data processing before and after invoking a model for predictions. SageMaker hosting can support traffic splits between multiple versions of a model for A/B testing.

SageMaker hosting can be provisioned using an AWS CloudFormation template. There is also support for the AWS CLI for scripting automation, and it can be integrated into custom applications via its API. The following are some code samples for different endpoint deployment automation methods:

  • The following is a CloudFormation code sample for SageMaker endpoint deployment. You can find the complete code at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/sagemaker_hosting.yaml:

    Description: "Model hosting cloudformation template"

    Resources:

      Endpoint:

        Type: "AWS::SageMaker::Endpoint"

        Properties:

          EndpointConfigName:

            !GetAtt EndpointConfig.EndpointConfigName

      EndpointConfig:

        Type: "AWS::SageMaker::EndpointConfig"

        Properties:

          ProductionVariants:

            - InitialInstanceCount: 1

              InitialVariantWeight: 1.0

              InstanceType: ml.t2.large

              ModelName: !GetAtt Model.ModelName

              VariantName: !GetAtt Model.ModelName

      Model:

        Type: "AWS::SageMaker::Model"

        Properties:

          PrimaryContainer:

            Image: <container uri>

          ExecutionRoleArn: !GetAtt ExecutionRole.Arn

    ...  

  • The following is an AWS CLI sample for SageMaker endpoint deployment:

    Aws sagemaker create-model --model-name <value> --execution-role-arn <value>

    aws sagemaker Create-endpoint-config --endpoint-config-name <value> --production-variants <value>

    aws sagemaker Create-endpoint --endpoint-name <value> --endpoint-config-name <value>

If the built-in inference engines do not meet your requirements, you can also bring your own Docker container to serve your ML models.

Authentication and security control

The SageMaker hosting service uses AWS IAM as the mechanism to control access to its control plane APIs (for example, an API for creating an endpoint) and data plane APIs (for example, an API for invoking a hosted model endpoint). If you need to support other authentication methods for the data plane API, such as OpenID Connect (OIDC), you can put a proxy service as the frontend to manage user authentication. A common pattern is to use AWS API Gateway to frontend the SageMaker API for custom authentication management, as well as other API management features such as metering and throttling management.

Monitoring and logging

SageMaker provides out-of-the-box monitoring and logging capabilities to assist with support operations. It monitors both system resource metrics (for example, CPU/GPU utilization) and model invocation metrics (for example, the number of invocations, model latencies, and failures). These monitoring metrics and any model processing logs are captured by AWS CloudWatch metrics and CloudWatch Logs.

Adopting MLOps for ML workflows

Similar to the DevOps practice, which has been widely adopted for the traditional software development and deployment process, the MLOps practice is intended to streamline the building and deployment processes of ML pipelines and improve the collaborations between data scientists/ML engineers, data engineering, and the operations team. Specifically, an MLOps practice is intended to deliver the following main benefits in an end-to-end ML life cycle:

  • Process consistency: The MLOps practice aims to create consistency in the ML model building and deployment process. A consistent process improves the efficiency of the ML workflow and ensures a high degree of certainty in the input and output of the ML workflow.
  • Tooling and process reusability: One of the core objectives of the MLOps practice is to create reusable technology tooling and templates for faster adoption and deployment of new ML use cases. These can include common tools such as code and library repositories, package and image building tools, pipeline orchestration tools, the model registry, as well as common infrastructure for model training and model deployment. From a reusable template perspective, these can include common reusable scripts for Docker image builds, workflow orchestration definitions, and CloudFormation scripts for model building and model deployment.
  • Model building reproducibility: ML is highly iterative and can involve a large number of experimentations and model training runs using different datasets, algorithms, and hyperparameters. An MLOps process needs to capture all the data inputs, source code, and artifacts that are used to build an ML model and establish model lineage from this input data, code, and artifacts for the final models. This is important for both experiment tracking as well as governance and control purposes.
  • Delivery scalability: An MLOps process and the associated tooling enable a large number of ML pipelines to run in parallel for high delivery throughputs. Different ML project teams can use the standard MLOps processes and common tools independently without creating conflicts from a resource contention, environment isolation, and governance perspective.
  • Process and operations audibility: MLOps enables greater audibility into the process and the audibility of ML pipelines. This includes capturing the details of machine pipeline executions, dependencies, and lineage across different steps, job execution statuses, model training and deployment details, approval tracking, and actions that are performed by human operators.

Now that we are familiar with the intended goals and benefits of the MLOps practice, let's look at the specific operational process and concrete technology architecture of MLOps on AWS.

Components of the MLOps architecture

One of the most important MLOps concepts is the automation pipeline, which executes a sequence of tasks, such as data processing, model training, and model deployment. This pipeline can be a linear sequence of steps or a more complex DAG with parallel execution for multiple tasks. An MLOps architecture also has several repositories for storing different assets and metadata as part of pipeline executions. The following diagram shows the core components and tasks involved in an MLOps operation:

Figure 9.2 – MLOps components

Figure 9.2 – MLOps components

A code repository in an MLOps architecture not only serves as a source code control mechanism for data scientists and engineers – it is also the triggering mechanism to kick off different pipeline executions. For example, when a data scientist checks an updated training script into the code repository, a model training pipeline execution can be triggered.

A feature repository stores reusable ML features and can be the target of a data processing/feature engineering job. The features from the feature repository can be a part of the training datasets where applicable. The feature repository is also used as a part of the model inference request.

A container repository stores the container images that are used for data processing tasks, model training jobs, and model inference engines. It is usually the target of the container building pipeline.

A model registry keeps an inventory of trained models, along with all the metadata associated with the model, such as its algorithm, hyperparameters, model metrics, and training dataset location. It also maintains the status of the model life cycle, such as its deployment approval status.

A pipeline repository maintains the definition of automation pipelines and the statuses of different pipeline job executions.

In an enterprise setting, a task ticket also needs to be created when different tasks, such as model deployment, are performed, so that these actions can be tracked in a common enterprise ticketing management system. To support audit requirements, the lineage of different pipeline tasks and their associated artifacts need to be tracked.

Another critical component of the MLOps architecture is monitoring. In general, you want to monitor items such as the pipeline's execution status, model training status, and model endpoint status. Model endpoint monitoring can also include system/resource performance monitoring, model statistical metrics monitoring, drift and outlier monitoring, and model explainability monitoring. Alerts can be triggered on certain execution statuses to invoke human or automation actions that are needed.

AWS provides multiple technology options for implementing an MLOps architecture. The following diagram shows where these technology services fit in an enterprise MLOps architecture:

Figure 9.3 – MLOps architecture using AWS services

Figure 9.3 – MLOps architecture using AWS services

As we mentioned earlier, the shared service environment hosts common tools for pipeline management and execution, as well as common repositories such as code repositories and model registries.

Here, we use AWS CodePipeline to orchestrate the overall CI/CD pipeline. AWS CodePipeline is a continuous delivery service that integrates natively with different code repositories such as AWS CodeCommit and Bitbucket. It can source files from the code repository and make them available to downstream tasks such as building containers using the AWS CodeBuild service, or training models in the model training environment. You can create different pipelines to meet different needs. A pipeline can be triggered on-demand via an API or the CodePipeline management console, or it can be triggered by code changes in a code repository. Depending on your requirements, you can create different pipelines. In the proceeding diagram, we can see four example pipelines:

  • A container build pipeline for building different container images.
  • A model training pipeline for training a model for release.
  • A model deployment pipeline for deploying trained models to production.
  • A development, training, and testing pipeline for model training and deployment testing in a data science environment.

A code repository is one of the most essential components in an MLOps environment. It is not only used by data scientists/ML engineers and other engineers to persist code artifacts, but it also serves as a triggering mechanism for a CI/CD pipeline. This means that when a data scientist/ML engineer commits a code change, it can automatically kick off a CI/CD pipeline. For example, if the data scientist makes a change to the model training script and wants to test the automated training pipeline in the development environment, he/she can commit the code to a development branch to kick off a model training pipeline in the dev environment. When it is ready for production release deployment, the data scientist can commit/merge the code to a release branch to kick off the production release pipelines.

In this MLOps architecture, we use AWS Elastic Container Registry (ECR) as the central container registry service. ECR is used to store containers for data processing, model training, and model inference. You can tag the container images to indicate different life cycle statuses, such as development or production.

The SageMaker model registry is used as the central model repository. The central model repository can reside in the shared service environment, so it can be accessed by different projects. All the models that go through the formal training and deployment cycles should be managed and tracked in the central model repository.

SageMaker Feature Store provides a common feature repository for reusable features to be used by different projects. It can reside in the shared services environment or be part of the data platform. Features are normally pre-calculated in a data management environment and sent to SageMaker Feature Store for offline model training in the model training environment, as well as online inferences by the different model hosting environments.

SageMaker Experiments is used to track experiments and trials. The metadata and artifacts that are generated by the different components in a pipeline execution can be tracked in SageMaker Experiments. For example, the processing step in a pipeline can contain metadata such as the locations of input data and processed data, while the model training step can contain metadata such as the algorithm and hyperparameters for training, model metrics, and the location of the model artifact. This metadata can be used to compare the different runs of model training, and they can also be used to establish model lineage.

Monitoring and logging

The ML platform presents some unique challenges in terms of monitoring. In addition to monitoring common software system-related metrics and statuses, such as infrastructure utilization and processing status, an ML platform also needs to monitor model and data-specific metrics and performances. Also, unlike traditional system-level monitoring, which is fairly straightforward to understand, the opaqueness of ML models makes it inherently difficult to understand the system. Now, let's take a closer look at the three main areas of monitoring for an ML platform.

Model training monitoring

Model training monitoring provides visibility into the training progress and helps identify training bottlenecks and error conditions during the training process. It enables operational processes such as training job progress reporting and response, model training performance progress evaluation and response, training problem troubleshooting, and data and model bias detection and model interpretability and response. Specifically, we want to monitor the following key metrics and conditions during model training:

  • General system and resource utilization and error metrics: These provide visibility into how the infrastructure resources (such as CPU, GPU, disk I/O, and memory) are utilized for model training. These can help with making decisions on provisioning infrastructure for the different model training needs.
  • Training job events and status: This provides visibility into the progress of a training job, such as job starting, running, completion, and failure details.
  • Model training metrics: These are model training metrics such as loss curve and accuracy reports to help you understand the model's performance.
  • Bias detection metrics and model explainability reporting: These metrics help you understand if there is any bias in the training datasets or machine learning models. Model explainability can also be monitored and reported to help you understand high-importance features versus low-importance features.
  • Model training bottlenecks and training issues: These provide visibility into training issues such as vanishing gradients, poor weights initialization, and overfitting to help determine the required data, algorithmic, and training configuration changes. Metrics such as CPU and I/O bottlenecks, uneven load balancing, and low GPU utilization can help determine infrastructure configuration changes for more efficient model training.

There are multiple native AWS services for building out a model training architecture on AWS. The following diagram shows an example architecture for building a monitoring solution for a SageMaker-based model training environment:

Figure 9.4 – Model training monitoring architecture

Figure 9.4 – Model training monitoring architecture

This architecture lets you monitor training and system metrics and perform log capture and processing, training event capture and processing, and model training bias and explainability reporting. It helps enable operation processes, such as training progress and status reporting, model metric evaluation, system resource utilization reporting and response, training problem troubleshooting, bias detection, and model decision explainability.

During model training, SageMaker can emit model training metrics, such as training loss and accuracy, to AWS CloudWatch to help with model training evaluation. AWS CloudWatch is the AWS monitoring and observability service. It collects metrics and logs from other AWS services and provides dashboards for visualizing and analyzing these metrics and logs. System utilization metrics (such as CPU/GPU/memory utilization) are also reported to CloudWatch for analysis to help you understand any infrastructure constraints or under-utilization. CloudWatch alarms can be created for a single metric or composite metrics to automate notifications or responses. For example, you can create alarms on low CPU/GPU utilization to help proactively identify sub-optimal hardware configuration for the training job. And when an alarm is triggered, it can send automated notifications (such as SMS and emails) to support for review via AWS Simple Notification Service (SNS).

You can use CloudWatch Logs to collect, monitor, and analyze the logs that are emitted by your training jobs. You can use these captured logs to understand the progress of your training jobs and identify errors and patterns to help troubleshoot any model training problems. For example, the CloudWatch Logs logs might contain errors such as insufficient GPU memory to run model training or permission issues when accessing specific resources to help you troubleshoot model training problems. By default, CloudWatch Logs provides a UI tool called CloudWatch Logs Insights for interactively analyzing logs using a purpose-built query language. Alternatively, these logs can also be forwarded to an Elasticsearch cluster for analysis and querying. These logs can be aggregated in a designated logging and monitoring account to centrally manage log access and analysis.

SageMaker training jobs can also send events, such as a training job status changing from running to complete. You can create automated notification and response mechanisms based on these different events. For example, you can send out notifications to data scientists when a training job is either completed successfully or failed, along with a failure reason. You can also automate responses to these failures to the different statuses, such as model retraining on a particular failure condition.

The SageMaker Clarify component can detect data and model bias and provide model explainability reporting on the trained model. You can access bias and model explainability reports inside the SageMaker Studio UI or SageMaker APIs.

The SageMaker Debugger component can detect model training issues such as non-converging conditions, resource utilization bottlenecks, overfitting, vanishing gradients, or conditions where the gradients become too small for efficient parameter updates. Alerts can be sent when training anomalies are found.

Model endpoint monitoring

Model endpoint monitoring provides visibility into the performance of the modeling serving infrastructure, as well as model-specific metrics such as data drift, model drift, and inference explainability. The following are some of the key metrics for model endpoint monitoring:

  • General system and resource utilization and error metrics: These provide visibility into how the infrastructure resources (such as CPU, GPU, and memory) are utilized for model servicing. These can help with making decisions on provisioning infrastructure for the different model serving needs.
  • Data statistics monitoring metrics: The statistical nature of data could change over time, which can result in degraded ML model performance from the original benchmarks. These metrics can include basic statistics deviations such as mean and standard changes, as well as data distribution changes.
  • Model quality monitoring metrics: These model quality metrics provide visibility into model performance deviation from the original benchmark. These metrics can include regression metrics (such as MAE and RMSE) and classification metrics (such as confusion matrix, F1, precision, recall, and accuracy).
  • Model inference explainability: This provides model explainability on a per prediction basis to help you understand what features had the most influence on the decision that was made by the prediction.
  • Model bias monitoring metrics: Similar to bias detection for training, the bias metrics help us understand model bias at inference time.

The model monitoring architecture relies on many of the same AWS services, including CloudWatch, EventBridge, and SNS. The following diagram shows an architecture pattern for a SageMaker-based model monitoring solution:

Figure 9.5 – Model endpoint monitoring architecture

Figure 9.5 – Model endpoint monitoring architecture

This architecture works similarly to the model training architecture. CloudWatch metrics capture endpoint metrics such as CPU/GPU utilization and model invocation metrics (number of invocations and errors) and model latencies. These metrics help with operations such as hardware optimization and endpoint scaling.

CloudWatch Logs captures logs that are emitted by the model serving endpoint to help us understand the status and troubleshoot technical problems.

Similarly, endpoint events, such as the status changing from Creating to InService, can help you build automated notification pipelines to kick off corrective actions or provide status updates.

In addition to system and status-related monitoring, this architecture also supports data and model-specific monitoring through a combination of SageMaker Model Monitor and SageMaker Clarify. Specifically, SageMaker Model Monitor can help you monitor data drift and model quality.

For data drift, SageMaker Monitor can use the training dataset to create baseline statistics metrics such as standard deviation, mean, max, min, and data distribution for the dataset features. It uses these metrics and other data characteristics, such as data types and completeness, to establish constraints. Then, it captures the input data in the production environment, calculates the metrics, compares them with the baseline metrics/constraints, and reports baseline drifts. Model Monitor can also report data quality issues such as incorrect data types and missing values. Data drift metrics can be sent to CloudWatch metrics for visualization and analysis, and CloudWatch Alarms can be configured to trigger a notification or automated response when a metric crosses a predefined threshold.

For model quality monitoring, it creates baseline metrics (such as MAE for regression and accuracy for classification) using the baseline dataset, which contains both predictions and true labels. Then, it captures the predictions in production, ingests ground truth labels, and merges the ground truth with the predictions to calculate various regression and classification metrics before comparing those with the baseline metrics. Similar to data drift metrics, model quality metrics can be sent to CloudWatch Metrics for analysis and visualization, and CloudWatch Alarms can be configured for automated notifications and/or responses. The following diagram shows how SageMaker Model Monitor works:

Figure 9.6 – SageMaker Model Monitor process flow

Figure 9.6 – SageMaker Model Monitor process flow

For bias detection, SageMaker Clarify can monitor bias metrics for deployed models continuously and raises alerts through CloudWatch when a metric crosses a threshold. We will cover bias detection in detail in Chapter 11, ML Governance, Bias, Explainability, and Privacy.

ML pipeline monitoring

The ML pipeline's execution needs to be monitored for statuses and errors, so corrective actions can be taken as needed. During a pipeline execution, there are pipeline-level statuses/events as well as stage-level and action-level statuses/events. You can use these events and statuses to understand the progress of each pipeline and stage and get alerted when something is wrong. The following diagram shows how AWS CodePipeline, CodeBuild, and CodeCommit can work with CloudWatch, CloudWatch Logs, and EventBridge for general status monitoring and reporting, as well as problem troubleshooting:

Figure 9.7 – ML CI/CD pipeline monitoring architecture

Figure 9.7 – ML CI/CD pipeline monitoring architecture

CodeBuild can send metrics, such as SuceededBuilds, FailedBuilds, and Duration metrics. These CodeBuild metrics can be accessed through both the CodeBuild console and the CloudWatch dashboard.

CodeBuild, CodeCommit, and CodePipeline can all emit events to EventBridge to report detailed status changes and trigger custom event processing, such as notifications, or log the events to another data repository for event archiving. All three services can send detailed logs to CloudWatch Logs to support operations such as troubleshooting or detailed error reporting.

Step Functions also provides a list of monitoring metrics to CloudWatch, such as execution metrics (such as execution failure, success, abort, and timeout) and activity metrics (such as activity started, scheduled, and succeeded). You can view these metrics in the management console and set a threshold to set up alerts.

Service provisioning management

Another key component of enterprise-scale ML platform management is service provisioning management. For large-scale service provisioning and deployment, an automated and controlled process should be adopted. Here, we will focus on provisioning the ML platform itself, not provisioning AWS accounts and networking, which should be established as the base environment for ML platform provisioning in advance. For ML platform provisioning, there are the following two main provisioning tasks:

  • Data science environment provisioning: Provisioning the data science environment for data scientists mainly includes provisioning data science and data management tools, storage for experimentation, as well as access entitlement for data sources and pre-built ML automation pipelines.
  • ML automation pipeline provisioning: ML automation pipelines need to be provisioned in advance for data scientists and MLOps engineers to use them to automate different tasks such as container build, model training, and model deployment.

There are multiple technical approaches to automating service provisioning on AWS, such as using provisioning shell scripts, CloudFormation scripts, and AWS Service Catalog. With shell scripts, you can sequentially call the different AWS CLI commands in a script to provision different components, such as creating a SageMaker notebook. CloudFormation is the IaC service for infrastructure deployment on AWS. With CloudFormation, you create templates that describe the desired resources and dependencies that can be launched as a single stack. When the template is executed, all the resources and dependencies specified in the stack will be deployed automatically. The following code shows the template for deploying a SageMaker Studio domain:

Type: AWS::SageMaker::Domain

Properties:

  AppNetworkAccessType: String

  AuthMode: String

  DefaultUserSettings:

    UserSettings

  DomainName: String

  KmsKeyId: String

  SubnetIds:

    - String

  Tags:

    - Tag

  VpcId: String

AWS Service Catalog allows you to create different IT products to be deployed on AWS. These IT products can include SageMakenotebooks, a CodeCommit repository, and CodePipeline workflow definitions. AWS Service Catalog uses CloudFormation templates to describe IT products. With Service Catalog, administrators create IT products with CloudFormation templates, organize these products by product portfolio, and entitle end users with access. The end users then access the products from the Service Catalog product portfolio. The following diagram shows the flow of creating a Service Catalog product and launching the product from the Service Catalog service:

Figure 9.8 – Service Catalog workflow

Figure 9.8 – Service Catalog workflow

For large-scale and governed IT product management, Service Catalog is the recommended approach. Service Catalog supports multiple deployment options, including single AWS account deployments and hub-and-spoke cross-account deployments. A hub-and-spoke deployment allows you to centrally manage all the products and make them available in different accounts. In our enterprise ML reference architecture, we use the hub-and-spoke architecture to support the provisioning of data science environments and ML pipelines, as shown in the following diagram:

Figure 9.9 – The hub-and-spoke Service Catalog architecture for enterprise ML product management

Figure 9.9 – The hub-and-spoke Service Catalog architecture for enterprise ML product management

In the preceding architecture, we set up the central portfolio in the shared services account. All the products, such as creating new Studio domains, new Studio user profiles, CodePipeline definitions, and training pipeline definitions, are centrally managed in the central hub account. Some products are shared with the different data science accounts to create data science environments for data scientists and teams. Some other products are shared with model training accounts for standing up ML training pipelines.

With that, we have talked about the core components of an enterprise-grade ML platform. Next, let's get hands-on and build a pipeline to automate model training and deployment.

Hands-on exercise – building an MLOps pipeline on AWS

In this hands-on exercise, you will get hands on with building a simplified version of the enterprise MLOps pipeline. For simplicity, we will not be using the multi-account architecture for the enterprise pattern. Instead, we will build several core functions in a single AWS account. The following diagram shows what you will be building:

Figure 9.10 – Architecture of the hands-on exercise

Figure 9.10 – Architecture of the hands-on exercise

At a high level, you will create two pipelines using CloudFormation: one for model training and one for model deployment.

Creating a CloudFormation template for the ML training pipeline

In this section, we will create two CloudFormation templates that do the following:

  • The first template creates AWS Step Functions for an ML model training workflow that performs data processing, model training, and model registration. This will be a component of the training pipeline.
  • The second template creates a CodePipeline ML model training pipeline definition with two stages:
    1. A source stage, which listens to changes in a CodeCommit repository to kick off the execution of the Step Functions workflow that we created
    2. A deployment stage, which kicks off the execution of the ML model training workflow

Now, let's get started with the CloudFormation template for the Step Functions workflow:

  1. Create a Step Functions workflow execution role called AmazonSageMaker-StepFunctionsWorkflowExecutionRole. Then, create and attach the following IAM policy to it. This role will be used by the Step Functions workflow to provide permission to invoke the various SageMaker APIs. Take note of the ARN of the newly created IAM role as you will need it for the next step. You can find the complete code sample at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/AmazonSageMaker-StepFunctionsWorkflowExecutionRole-policy.json:

    {

        "Version": "2012-10-17",

        "Statement": [

            {

                "Effect": "Allow",

                "Action": [

                    "sagemaker:CreateModel",

                    "sagemaker:DeleteEndpointConfig",

                    "sagemaker:DescribeTrainingJob",

                    "sagemaker:CreateEndpoint",

                    "sagemaker:StopTrainingJob",

                    "sagemaker:CreateTrainingJob",

                    "sagemaker:UpdateEndpoint",

                    "sagemaker:CreateEndpointConfig",

                    "sagemaker:DeleteEndpoint"

                ],

                "Resource": [

                    "arn:aws:sagemaker:*:*:*"

                ]

            },

    ...

    }

  2. Copy and save the following code block to a file locally and name it training_workflow.yaml. You can find the complete file at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/training_workflow.yaml. This CloudFormation template will create a Step Functions state machine with a training step and model registration step. The training step will train the same BERT model we trained in Chapter 8, Building a Data Science Environment Using AWS ML Services. For simplicity, we will reuse the same source data and training script as well to demonstrate the MLOps concepts we have learned about in this chapter. Note that we are using CloudFormation here to demonstrate managing IaC. Data scientists also have the option to use the Step Functions Data Science SDK to create the pipeline using a Python script:

    AWSTemplateFormatVersion: 2010-09-09

    Description: 'AWS Step Functions sample project for training a model and save the model'

    Parameters:

        StepFunctionExecutionRoleArn:

            Type: String

            Description: Enter the role for Step Function Workflow execution

            ConstraintDescription: requires a valid arn value

            AllowedPattern: 'arn:aws:iam::w+:role/.*'

    Resources:

      TrainingStateMachine2:

        Type: AWS::StepFunctions::StateMachine

        Properties:

            RoleArn: !Ref StepFunctionExecutionRoleArn

            DefinitionString: !Sub |

                   {

                      "StartAt": "SageMaker Training Step",

                      "States": {

                        "SageMaker Training Step": {

                          "Resource": "arn:aws:states:::sagemaker:createTrainingJob.sync",

    ...

  3. Launch the newly created cloud template in the CloudFormation console. Make sure that you provide a value for the StepFunctionExecutionRoleArn field when prompted. This is the ARN you took down from the last step. Once the CloudFormation execution is completed, go to the Step Functions console to test it.
  4. Test the workflow in the Step Functions console to make sure it works. Navigate to the newly created Step Functions state machine and click on Start Execution to kick off the execution. When you're prompted for any input, copy and paste the following JSON as input for the execution. These are the input values that will be used by the Step Functions workflow. Make sure that you replace the actual values with the values for your environment. For the AWS hosting account information for the training images, you can look up the account number at https://github.com/aws/deep-learning-containers/blob/master/available_images.md:

    {

      "TrainingImage": "<aws hosting account>.dkr.ecr.<aws region>.amazonaws.com/pytorch-training:1.3.1-gpu-py3",

      "S3OutputPath": "s3://<your s3 bucket name>/sagemaker/pytorch-bert-financetext",

      "SageMakerRoleArn": "arn:aws:iam::<your aws account>:role/service-role/<your sagemaker execution role>",

      "S3UriTraining": "s3://<your AWS S3 bucket>/sagemaker/pytorch-bert-financetext/train.csv",

      "S3UriTesting": "s3://<your AWS S3 bucket>/sagemaker/pytorch-bert-financetext/test.csv",

      "InferenceImage": " aws hosting account>.dkr.ecr. <aws region>.amazonaws.com/pytorch-inference:1.3.1-cpu-py3",

      "SAGEMAKER_PROGRAM": "train.py",

      "SAGEMAKER_SUBMIT_DIRECTORY": "s3:// <your AWS S3 bucket> /berttraining/source/sourcedir.tar.gz",

      "SAGEMAKER_REGION": "<your aws region>"

    }

  5. Check the processing status in the Step Functions console and make sure that the model has been trained and registered correctly. Once everything is completed, save the input JSON in Step 4 to a file called sf_start_params.json. Launch the SageMaker Studio environment you created in Chapter 8, Building a Data Science Environment Using AWS ML Services, navigate to the folder where you had cloned the CodeCommit repository, and upload the sf_start_params.json file into it. Commit the change to the code repository and verify it is in the repository. We will use this file in the CodeCommit repository for the next section of the lab.

Now, we are ready to create the CloudFormation template for the CodePipeline training pipeline. This pipeline will listen to changes to a CodeCommit repository and invoke the Step Functions workflow we just created:

  1. Copy and save the following code block to a file called mlpipeline.yaml. This is the template for building the training pipeline. You can find the complete file at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/mlpipeline.yaml:

    Parameters:

      BranchName:

        Description: CodeCommit branch name

        Type: String

        Default: master

      RepositoryName:

        Description: CodeCommit repository name

        Type: String

        Default: MLSA-repo

      ProjectName:

        Description: ML project name

        Type: String

        Default: FinanceSentiment

      MlOpsStepFunctionArn:

        Description: Step Function Arn

        Type: String

        Default: arn:aws:states:ca-central-1:300165273893:stateMachine:TrainingStateMachine2-89fJblFk0h7b

    Resources:

      CodePipelineArtifactStoreBucket:

        Type: 'AWS::S3::Bucket'

        DeletionPolicy: Delete

      Pipeline:

        Type: 'AWS::CodePipeline::Pipeline'

    ...

  2. Similarly, let's launch this cloud template in the CloudFormation console to create the pipeline definition for execution. Once the CloudFormation template has been executed, navigate to the CodePipeline management console to verify that it has been created. The CloudFormation execution will also execute the newly created pipeline automatically, so you should see that it already ran once. You can test it again by clicking on the Release changes button in the SageMaker management console.

We want to be able to kick off the CodePipeline execution when a change is made (such as a code commit) in the CodeCommit repository. To enable this, we need to create a CloudWatch event that monitors this change and kicks off the pipeline. Let's get started:

  1. Add the following code block to the mlpipeline.yaml file, just before the Outputs section, and save the file as mlpipeline_1.yaml. You can find the complete file at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/mlpipeline_1.yaml:

    AmazonCloudWatchEventRole:

        Type: 'AWS::IAM::Role'

        Properties:

          AssumeRolePolicyDocument:

            Version: 2012-10-17

            Statement:

              - Effect: Allow

                Principal:

                  Service:

                    - events.amazonaws.com

                Action: 'sts:AssumeRole'

          Path: /

          Policies:

            - PolicyName: cwe-pipeline-execution

              PolicyDocument:

    ...                  

  2. Now, run this CloudFormation template to create a new pipeline. You can delete the previously created pipeline by deleting the CloudFormation stack. This will run the pipeline again automatically. Wait until the pipeline's execution is complete before you start the next step.
  3. Now, let's test the automatic execution of the pipeline by committing a change to the code repository. Find a file in your cloned code repository directory. Create a new file called pipelinetest.txt and commit the change to the code repository. Navigate to the CodePipeline console; you should see the codecommit-events-pipeline pipeline starting to run.

Congratulations! you have successfully used CloudFormation to build a CodePipeline-based ML training pipeline that automatically runs when there is a file change in a CodeCommit repository. Next, let's build the ML deployment pipeline for the model.

Creating a CloudFormation template for the ML deployment pipeline

To start creating a deployment, perform the following steps:

  1. Copy the following code block to create a file called mldeployment.yaml. This CloudFormation template will deploy a model using the SageMaker hosting service. Make sure that you enter the correct model name for your environment:

    Description: Basic Hosting of registered model

    Parameters:

    ModelName:

    Description: Model Name

    Type: String

    Default: <mode name>

    Resources:

    Endpoint:

    Type: AWS::SageMaker::Endpoint

    Properties:

    EndpointConfigName: !GetAtt EndpointConfig.EndpointConfigName

    EndpointConfig:

    Type: AWS::SageMaker::EndpointConfig

    Properties:

    ProductionVariants:

    InitialInstanceCount: 1

    InitialVariantWeight: 1.0

    InstanceType: ml.m4.xlarge

    ModelName: !Ref ModelName

    VariantName: !Ref ModelName

    Outputs:

      EndpointId:

    Value: !Ref Endpoint

      EndpointName:

    Value: !GetAtt Endpoint.EndpointName

  2. Create a CloudFormation stack using this file and verify that a SageMaker endpoint has been created. Now, upload the mldeployment.yaml file to the code repository directory and commit the change to CodeCommit. Note that this file will be used by the CodePipeline deployment pipeline, which we will create in the following steps.
  3. Before we create the deployment pipeline, we need a template config file for passing parameters to the deployment template when it is executed. Here, we need to pass the model name to the pipeline. Copy the following code block, save it to a file called mldeployment.json, upload it to the code repository directory in Studio, and commit the change to codecommit:

    {

      "Parameters" : {

        "ModelName" : <name of the financial sentiment model you have trained>

      }

    }

  4. Now, we can create a CodePipeline pipeline CloudFormation template for automatic model deployment. This pipeline has two main stages:
    1. The first stage fetches source code (such as the configuration file we just created and the mldeployment.yaml template) from a CodeCommit repository.
    2. The second stage creates a CloudFormation change set (a change set is the difference between a new template and an existing CloudFormation stack) for the mldeployment.yaml file we created earlier. It adds a manual approval step and then deploys the CloudFormation template's mldeployment.yaml file.

This CloudFormation template also creates supporting resources, including an S3 bucket for storing the CodePipeline artifacts, an IAM role for CodePipeline to run with, and another IAM role for CloudFormation to use to create the stack for mldeployment.yaml.

  1. Copy the following code block and save the file as mldeployment-pipeline.yaml. You can find the complete code sample at https://github.com/PacktPublishing/The-Machine-Learning-Solutions-Architect-Handbook/blob/main/Chapter09/mldeployment-pipeline.yaml:

    Parameters:

      BranchName:

        Description: CodeCommit branch name

        Type: String

        Default: master

      RepositoryName:

        Description: CodeCommit repository name

        Type: String

        Default: MLSA-repo

      ProjectName:

        Description: ML project name

        Type: String

        Default: FinanceSentiment

      CodePipelineSNSTopic:

        Description: SNS topic for NotificationArn

        Default: arn:aws:sns:ca-central-1:300165273893:CodePipelineSNSTopicApproval

        Type: String

      ProdStackConfig:

        Default: mldeploymentconfig.json

        Description: The configuration file name for the production WordPress stack

        Type: String

      ProdStackName:

        Default: FinanceSentimentMLStack1

        Description: A name for the production WordPress stack

        Type: String

      TemplateFileName:

        Default: mldeployment.yaml

        Description: The file name of the WordPress template

        Type: String

      ChangeSetName:

        Default: FinanceSentimentchangeset

        Description: A name for the production stack change set

        Type: String

    Resources:

      CodePipelineArtifactStoreBucket:

        Type: 'AWS::S3::Bucket'

        DeletionPolicy: Delete

      Pipeline:

    . . . . .

  2. Now, let's launch the newly created mldeployment-pipeline.yaml template in the CloudFormation console to create the deployment pipeline, and then run the pipeline from the CodePipeline console.

Congratulations! You have successfully created and run a CodePipeline deployment pipeline to deploy a model from the SageMaker model registry.

Summary

In this chapter, we discussed the key requirements for building an enterprise ML platform to meet needs such as end-to-end ML life cycle support, process automation, and separating different environments. We also talked about architecture patterns and how to build an enterprise ML platform on AWS using AWS services. We discussed the core capabilities of different ML environments, including training, hosting, and shared services. You should now have a good understanding of what an enterprise ML platform could look like, as well as the key considerations for building one using AWS services. You have also developed some hands-on experience in building the components of the MLOps architecture and automating model training and deployment. In the next chapter, we will discuss advanced ML engineering by covering large-scale distributed training and the core concepts for achieving low-latency inference.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.206.76.160