2

Deep Learning Frameworks and Containers on SageMaker

Amazon SageMaker supports many popular ML and DL frameworks. Framework support in SageMaker is achieved using prebuilt Docker containers for inference and training tasks. Prebuilt SageMaker containers provide a great deal of functionality, and they allow you to implement a wide range of use cases with minimal coding. There are also real-life scenarios where you need to have a custom, runtime environment for training and/or inference tasks. To address these cases, SageMaker provides a flexible Bring-Your-Own (BYO) container feature.

In this chapter, we will review key supported DL frameworks and corresponding container images. Then, we will focus our attention on the two most popular DL frameworks, TensorFlow and PyTorch, and learn how to use them in Amazon SageMaker. Additionally, we will review a higher-level, state-of-the-art framework, Hugging Face, for NLP tasks, and its implementation for Amazon SageMaker.

Then, we will understand how to use and extend prebuilt SageMaker containers based on your use case requirements, as well as learning about the SageMaker SDK and toolkits, which simplify writing training and inference scripts that are compatible with Amazon SageMaker.

In later sections, we will dive deeper into how to decide whether to use prebuilt SageMaker containers or BYO containers. Then, we will develop a SageMaker-compatible BYO container.

These topics will be covered in the following sections:

  • Exploring DL frameworks on SageMaker
  • Using SageMaker DL containers
  • Developing BYO containers

By the end of this chapter, you will be able to decide which container strategy to choose based on your specific problem requirements and chosen DL framework. Additionally, you will understand the key aspects of training and inference script development, which are compatible with Amazon SageMaker.

Technical requirements

In the Using SageMaker DL containers and Developing BYO containers sections, we will provide walk-through code samples, so you can develop practical skills. Full code examples are available at https://github.com/PacktPublishing/Accelerate-Deep-Learning-Workloads-with-Amazon-SageMaker/blob/main/chapter2/.

To follow along with this code, you will need the following:

Exploring DL frameworks on SageMaker

At the time of writing this book, Amazon SageMaker supports the following frameworks, where DL frameworks are marked with an asterisk:

  • scikit-learn
  • SparkML Serving
  • Chainer*
  • Apache MXNet*
  • Hugging Face*
  • PyTorch*
  • TensorFlow*
  • Reinforcement learning containers – including TensorFlow- and PyTorch-enabled containers
  • XGBoost

The preceding list of supported frameworks could change in the future. Be sure to check the official SageMaker documentation at https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html.

In this book, we will primarily focus on the two most popular choices: TensorFlow and PyTorch. Both are open source frameworks with a large and vibrant communities. Depending on the specific use case or model architecture, one or the other framework might have a slight advantage. However, it’s safe to assume that both frameworks are comparable in terms of features and performance. In many practical scenarios, the choice between TensorFlow or PyTorch is made based on historical precedents or individual preferences.

Another framework that we will discuss in this book is Hugging Face. This is a high-level framework that provides access to SOTA models, training, and inference facilities for NLP tasks (such as text classification, translation, and more). Hugging Face is a set of several libraries (transformers, datasets, tokenizers, and accelerate) designed to simplify building SOTA NLP models. Under the hood, Hugging Face libraries use TensorFlow and PyTorch primitives (collectively known as “backends”) to perform computations. Users can choose which backend to use based on specific runtime requirements. Given its popularity, Amazon SageMaker has recently added support for the Hugging Face libraries in separate prebuilt containers for training and inference tasks.

Container sources

Sources of SageMaker DL containers are available on the public GitHub repository at https://github.com/aws/deep-learning-containers. In certain cases, it can be helpful to review relevant Dockerfiles to understand the runtime configuration of prebuilt containers. Container images are available in AWS public registries at https://github.com/aws/deep-learning-containers/blob/master/available_images.md.

For each of the supported frameworks, SageMaker provides separate training and inference containers. We have separate containers for these two tasks because of the following considerations:

  • Training and inference tasks might have different runtime requirements. For example, you might choose to run your training and inference tasks on different compute platforms. This will result in different sets of accelerators and performance optimization tweaks in your container, depending on your specific task.
  • Training and inference tasks require different sets of auxiliary scripts; for instance, standing up a model server in the case of inference tasks. Not separating training and inference containers could result in bloated container sizes and intricate APIs.

For this reason, we will always explicitly identify the container we are using depending on the specific task.

Specific to DL containers, AWS also defines separate GPU-based and CPU-based containers. GPU-based containers require the installation of additional accelerators to be able to run computations on GPU devices (such as the CUDA toolkit).

Model requirements

When choosing a SageMaker DL container, always consider the model requirements for compute resources. For the majority of SOTA models, it’s recommended that you use GPU-based compute instances to achieve acceptable performance. Choose your DL container accordingly.

TensorFlow containers

A TensorFlow container has two major versions: 1.x (maintenance mode) and 2.x (the latest version). Amazon SageMaker supports both versions and provides inference and training containers. In this book, all of the code examples and general commentary are done assuming TensorFlow v2.x.

AWS updates with frequently supported minor TensorFlow versions. The latest supported major version is 2.10.0.

PyTorch containers

Amazon SageMaker provides inference and training containers for PyTorch. The latest version is 1.12.1.

Hugging Face containers

AWS provides Hugging Face containers in two flavors: PyTorch and TensorFlow backends. Each backend has separate training and inference containers.

Using SageMaker Python SDK

AWS provides a convenient Python SDK that simplifies interactions with supported DL frameworks via the Estimator, Model, and Predictor classes. Each supported framework has a separate module with the implementation of respective classes. For example, here is how you import Predict, Estimator, and Model classes for the PyTorch framework:

from sagemaker.pytorch.estimator import PyTorch
from sagemaker.pytorch.model import PyTorchModel, PyTorchPredictor

The following diagram shows the SageMaker Python SDK workflow:

Figure 2.1 – How SageMaker Python SDK works with image URIs

Figure 2.1 – How SageMaker Python SDK works with image URIs

To build a better intuition, let’s do a quick example of how to run a training job using a PyTorch container with a specific version using SageMaker Python SDK. For a visual overview, please refer to Figure 2.1:

  1. First, we decide which framework to use and import the respective Pytorch estimator class:

    from sagemaker.pytorch.estimator import PyTorch

When instantiating the PyTorch estimator object, we need to provide several more parameters including the framework version and the Python version:

estimator = PyTorch(

    entry_point="training_script.py",

    framework_version="1.8",

    py_version="py3",

    role=role,

    instance_count=1,

    instance_type="ml.p2.xlarge"

)

  1. When executing this code, SageMaker Python SDK automatically validates user input, including the framework version and the Python version. If the requested container exists, then SageMaker Python SDK retrieves the appropriate container image URI. If there is no container with the requested parameters, SageMaker Python SDK will throw an exception.
  2. During the fit() call, a correct container image URI will be provided to the SageMaker API, so the training job will be running inside the SageMaker container with PyTorch v1.8 and Python v3.7 installed. Since we are requesting a GPU-based instance, a training container with the CUDA toolkit installed will be used:

    estimator.fit()

Using custom images

Please note that if, for some reason, you would prefer to provide a direct URI to your container image, you can do it using the image_uri parameter that is supported by the model and estimator classes.

Now, let’s take a deep dive into SageMaker DL containers, starting with the available prebuilt containers for the TensorFlow, PyTorch, and Hugging Face frameworks.

Using SageMaker DL containers

Amazon SageMaker supports several container usage patterns. Also, it provides you with Training and Inference Toolkits that simplify using prebuilt containers and developing BYO containers.

In this section, we will learn how to choose the most efficient container usage pattern for your use case and how to use the available SageMaker toolkits to implement it.

Container usage patterns

Amazon SageMaker provides you with the flexibility to choose whether to use prebuilt containers “as is” (known as Script Mode), BYO containers, or modify prebuilt containers.

Typically, the choice of approach is driven by specific model runtime requirements, available resources, and engineering expertise. In the next few subsections, we will discuss when to choose one approach over another.

Script Mode

In script mode, you define which prebuilt container you’d like to use and then provide one or more scripts with the implementation of your training or inference logic. Additionally, you can provide any other dependencies (proprietary or public) that will be exported to the containers.

Both training and inference containers in script mode come with preinstalled toolkits that provide common functionality such as downloading data to containers and model artifacts, starting jobs, and others. We will look at further details of the SageMaker Inference Toolkit and Training Toolkit later in this chapter.

Script Mode is suitable for the following scenarios:

  • Prebuilt containers satisfy your runtime requirements, or you can install any dependencies without needing to rebuild the container
  • You want to minimize the time spent on developing and testing your containers or you don’t have the required expertise to do so

In the following sections, we will review how to prepare your first training and inference scripts and run them on SageMaker in script mode.

Modifying prebuilt containers

Another way to use SageMaker’s prebuilt containers is to modify them. In this case, you will use one of the prebuilt containers as a base image for your custom container.

Modifying prebuilt containers can be beneficial in the following scenarios:

  • You need to add additional dependencies (for instance, ones that need to be compiled from sources) or reconfigure the runtime environment
  • You want to minimize the development and testing efforts of your container and rely for the most part on the functionality of the base container tested by AWS

Please note that when you extend a prebuilt container, you will be responsible for the following aspects:

  • Creating the Dockerfile with the implementation of your runtime environment
  • Building and storing your container in a Container registry such as Amazon Elastic Container Registry (ECR) or private Docker registries

Later in this chapter, we see an example of how to extend a prebuilt PyTorch container for a training task.

BYO containers

There are many scenarios in which you might need to create a custom container, such as the following:

  • You have unique runtime requirements that cannot be addressed by extending the prebuilt container
  • You want to compile frameworks and libraries from sources for specific hardware platforms
  • You are using DL frameworks that are not supported natively by SageMaker (for instance, JAX)

Building a custom container compatible with SageMaker inference and training resources requires development efforts, an understanding of Docker containers, and specific SageMaker requirements. Therefore, it’s usually recommended that you consider script mode or extending a prebuilt container first and choose to use a BYO container only if the first options do not work for your particular use case.

SageMaker toolkits

To simplify the development of custom scripts and containers that are compatible with Amazon SageMaker, AWS created Python toolkits for training and inference tasks.

Toolkits provide the following benefits:

  • Establish consistent runtime environments and locations for storing code assets
  • ENTRYPOINT scripts to run tasks when the container is started

Understanding these toolkits helps to simplify and speed up the development of SageMaker-compatible containers, so let’s review them in detail.

The Training Toolkit

The SageMaker Training Toolkit has several key functions:

  • It establishes a consistent runtime environment, setting environment variables and a directory structure to store the input and output artifacts of model training:
Figure 2.2 – The directory structure in SageMaker-compatible containers

Figure 2.2 – The directory structure in SageMaker-compatible containers

The Training Toolkit sets up the following directories in the training container:

  • The /opt/ml/input/config directory with the model hyperparameters and the network layout used for distributed training as JSON files.
  • The /opt/ml/input/data directory with input data when S3 is used as data storage.
  • The /opt/ml/code/ directory, containing code assets to run training job.
  • The /opt/ml/model/ directory, containing the resulting model; SageMaker automatically copies it to Amazon S3 after training completion.
  • It executes the entrypoint script and handles success and failure statuses. In the case of a training job failure, the output will be stored in /opt/ml/output/failure. For successful executions, the toolkit will write output to the /opt/ml/success directory.

By default, all prebuilt training containers already have a training toolkit installed. If you wish to use it, you will need to install it on your container by running the following:

RUN pip install sagemaker-training

Also, you will need to copy all of the code dependencies into your container and define a special environmental variable in your main training script, as follows:

COPY train_scipt.py /opt/ml/code/train_script.py
ENV SAGEMAKER_PROGRAM train_scipt.py

The training toolkit package is available in the PyPI (pypi.org) package and the SageMaker GitHub repository (https://github.com/aws/sagemaker-training-toolkit).

Inference Toolkit

The Inference Toolkit implements a model serving stack that is compatible with SageMaker inference services. It comes together with an open source Multi-Model Server (MMS) to serve models. It has the following key functions:

  • To establish runtime environments, such as directories to store input and output artifacts of inference and environmental variables. The directory structure follows the layout of the training container.
  • To implement a handler service that is called from the model server to load the model into memory, and handle model inputs and outputs.
  • To implement default serializers and deserializers to handle inference requests.

The Inference Toolkit package is available in the PyPi (pypi.org) package and the GitHub repository (https://github.com/aws/sagemaker-inference-toolkit).

Developing for script mode

Now that we have an understanding of SageMaker’s container ecosystem, let’s implement several learning projects to build practical skills. In this first example, we will use SageMaker script mode to train our custom NLP model and deploy it for inference.

Problem overview

In this example, we will learn how to develop training and inference scripts using the Hugging Face framework. We will leverage prebuilt SageMaker containers for Hugging Face (with the PyTorch backend).

We chose to solve a typical NLP task: text classification. We will use the 20 Newsgroups dataset, which assembles ~20,000 newsgroup documents across 20 different newsgroups (categories). There are a number of model architectures that can address this task. Usually, current SOTA models are based on Transformer architecture. Autoregressive models such as BERT and its various derivatives are suitable for this task. We will use a concept known as transfer learning, where a model that is pretrained for one task is used for a new task with minimal modifications.

As a baseline model, we will use model architecture known as DistilBERT, which provides high accuracy on a wide variety of tasks and is considerably smaller than other models (for instance, the original BERT model). To adapt the model for a classification task, we would need to add a classification layer, which will be trained during our training to recognize articles:

Figure 2.3 – The model architecture for the text classification task

Figure 2.3 – The model architecture for the text classification task

The Hugging Face Transformers library simplifies model selection and modification for fine-tuning in the following ways:

  • It provides a rich model zoo with a number of pretrained models and tokenizers
  • It has a simple model API to modify the baseline model for fine-tuning a specific task
  • It implements inference pipelines, combining data preprocessing and actual inference together

The full source code of this learning project is available at https://github.com/PacktPublishing/Accelerate-Deep-Learning-Workloads-with-Amazon-SageMaker/blob/main/chapter2/1_Using_SageMaker_Script_Mode.ipynb.

Developing a training script

When running SageMaker training jobs, we need to provide a training script. Additionally, we might provide any other dependencies. We can also install or modify Python packages that are installed on prebuilt containers via the requirements.txt file.

In this example, we will use a new feature of the Hugging Face framework to fine-tune a multicategory classifier using the Hugging Face Trainer API. Let’s make sure that the training container has the newer Hugging Face Transformer library installed. For this, we create the requirements.txt file and specify a minimal compatible version. Later, we will provide this file to our SageMaker training job:

transformers >= 4.10

Next, we need to develop the training script. Let’s review some key components of it.

At training time, SageMaker starts training by calling user_training_script --arg1 value1 --arg2 value2 .... Here, arg1..N are training hyperparameters and other miscellaneous parameters provided by users as part of training job configuration. To correctly kick off the training process in our script, we need to include main guard within our script:

  1. To correctly capture the parameters, the training script needs to be able to parse command-line arguments. We use the Python argparse library to do this:

    if __name__ == "__main__":

         parser = argparse.ArgumentParser()

         parser.add_argument("--epochs", type=int, default=1)

         parser.add_argument("--per-device-train-batch-size", type=int, default=16)

         parser.add_argument("--per-device-eval-batch-size", type=int, default=64)

         parser.add_argument("--warmup-steps", type=int, default=100)

         parser.add_argument("--logging-steps", type=float, default=100)

         parser.add_argument("--weight-decay", type=float, default=0.01)

         args, _ = parser.parse_known_args()

         train(args)

  2. The train() method is responsible for running end-to-end training jobs. It includes the following components:
    • Calling _get_tokenized_dataset to load and tokenize datasets using a pretrained DistilBERT tokenizer from the Hugging Face library.
    • Loading and configuring the DistilBERT model from the Hugging Face model zoo. Please note that we update the default configuration for classification tasks to adjust for our chosen number of categories.
    • Configuring Hugging Face Trainer and starting the training process.
    • Once the training is done, we save the trained model:

      def train(args):

          train_enc_dataset, test_enc_dataset = _get_tokenized_data()

          training_args = TrainingArguments(

              output_dir=os.getenv(

                  "SM_OUTPUT_DIR", "./"

              ),  # output directory, if runtime is not

              num_train_epochs=args.epochs,

              per_device_train_batch_size=args.per_device_train_batch_size,

              per_device_eval_batch_size=args.per_device_eval_batch_size,

              warmup_steps=args.warmup_steps,

              weight_decay=args.weight_decay,

              logging_steps=args.logging_steps,

          )

          config = DistilBertConfig()

          config.num_labels = NUM_LABELS

          model = DistilBertForSequenceClassification.from_pretrained(

              MODEL_NAME, config=config

          )

          trainer = Trainer(

              model=model,  # model to be trained

              args=training_args,  # training arguments, defined above

              train_dataset=train_enc_dataset,  # training dataset

              eval_dataset=test_enc_dataset,  # evaluation dataset

          )

          trainer.train()

          model.save_pretrained(os.environ["SM_MODEL_DIR"])

So far in our script, we have covered key aspects: handling configuration settings and model hyperparameters, loading pretrained models, and starting training using the Hugging Face Trainer API.

Starting the training job

Once we have our training script and dependencies ready, we can proceed with the training and schedule a training job via SageMaker Python SDK. We start with the import of the Hugging Face Estimator object and get the IAM execution role for our training job:

from sagemaker.huggingface.estimator import HuggingFace
from sagemaker import get_execution_role
role=get_execution_role()

Next, we need to define the hyperparameters of our model and training processes. These variables will be passed to our script at training time:

hyperparameters = {
    "epochs":1,
    "per-device-train-batch-size":16, 
    "per-device-eval-batch-size":64,
    "warmup-steps":100,
    "logging-steps":100,
    "weight-decay":0.01    
}
estimator = HuggingFace(
    py_version="py36",
    entry_point="train.py",
    source_dir="1_sources",
    pytorch_version="1.7.1",
    transformers_version="4.6.1",
    hyperparameters=hyperparameters,
    instance_type="ml.p2.xlarge",
    instance_count=1,
    role=role
)
estimator.fit({
    "train":train_dataset_uri,
    "test":test_dataset_uri
})

After that, the training job will be scheduled and executed. It will take 10–15 minutes for it to complete, then the trained model and other output artifacts will be added to Amazon S3.

Developing an inference script for script mode

Now that we have a trained model, let’s deploy it as a SageMaker real-time endpoint. We will use the prebuilt SageMaker Hugging Face container and will only provide our inference script. The inference requests will be handled by the AWS MMS, which exposes the HTTP endpoint.

When using prebuilt inference containers, SageMaker automatically recognizes our inference script. According to SageMaker convention, the inference script has to contain the following methods:

  • model_fn(model_dir) is executed at the container start time to load the model into memory. This method takes the model directory as an input argument. You can use model_fn() to initialize other components of your inference pipeline, such as the tokenizer in our case. Note, Hugging Face Transformers have a convenient Pipeline API that allows us to combine data preprocessing (in our case, text tokenization) and actual inference in a single object. Hence, instead of a loaded model, we return an inference pipeline:

    MODEL_NAME = "distilbert-base-uncased"

    NUM_LABELS = 6 # number of categories

    MAX_LENGTH = 512 # max number of tokens model can handle

    def model_fn(model_dir):

        device_id = 0 if torch.cuda.is_available() else -1

        tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)

        config = DistilBertConfig()

        config.num_labels = NUM_LABELS

        model = DistilBertForSequenceClassification.from_pretrained(

            model_dir, config=config

        )

        inference_pipeline = pipeline(

            model=model,

            task="text-classification",

            tokenizer=tokenizer,

            framework="pt",

            device=device_id,

            max_length=MAX_LENGTH,

            truncation=True

        )

        return inference_pipeline

  • transform_fn(inference_pipeline, data, content_type, accept_type) is responsible for running the actual inference. Since we are communicating with an end client via HTTP, we also need to do payload deserialization and response serialization. In our sample example, we expect a JSON payload and return a JSON payload; however, this can be extended to any other formats based on the requirements (for example, CSV and Protobuf):

    def transform_fn(inference_pipeline, data, content_type, accept_type):

        # Deserialize payload

        if "json" in content_type:

            deser_data = json.loads(data)

        else:

            raise NotImplemented("Only 'application/json' content type is implemented.")

        

        # Run inference

        predictions = inference_pipeline(deser_data)

        

        # Serialize response

        if "json" in accept_type:

            return json.dumps(predictions)

        else:

            raise NotImplemented("Only 'application/json' accept type is implemented.")

Sometimes, combining deserialization, inference, and serialization in a single method can be inconvenient. Alternatively, SageMaker supports a more granular API:

  • input_fn(request_body, request_content_type) runs deserialization
  • predict_fn(deser_input, model) performs predictions
  • output_fn(prediction, response_content_type) runs the serialization of predictions

Note that the transform_fn() method is mutually exclusive with the input_fn(), predict_fn(), and output_fn() methods.

Deploying a Text Classification endpoint

Now we are ready to deploy and test our Newsgroup Classification endpoint. We can use the estimator.create_model() method to configure our model deployment parameters, specifically the following:

  1. Define the inference script and other dependencies that will be uploaded by SageMaker to an endpoint.
  2. Identify the inference container. If you provide the transformers_version, pytorch_version, and py_version parameters, SageMaker will automatically find an appropriate prebuilt inference container (if it exists). Alternatively, you can provide image_uri to directly specify the container image you wish to use:

    from sagemaker.huggingface.estimator import HuggingFaceModel

    model = estimator.create_model(role=role,

                                   entry_point="inference.py",

                                   source_dir="1_sources",

                                   py_version="py36",

                                   transformers_version="4.6.1",

                                   pytorch_version="1.7.1"

                                  )

  3. Next, we define the parameters of our endpoint such as the number and type of instances behind it. The model.deploy() method starts the inference deployment (which, usually, takes several minutes) and returns a Predictor object to run inference requests:

    predictor = model.deploy(

        initial_instance_count=1,

        instance_type="ml.m5.xlarge"

    )

Next, let’s explore how to extend pre-built DL containers.

Extending the prebuilt containers

We will reuse code assets from the script mode example. However, unlike the previous container, we will modify our runtime environment and install the latest stable Hugging Face Transformer from the GitHub master branch. This modification will be implemented in our custom container image.

First off, we need to identify which base image we will use. AWS has published all of the available DL containers at https://github.com/aws/deep-learning-containers/blob/master/available_images.md.

Since we plan to use reinstall from scratch HugggingFace Transformer library anyway, we might choose the PyTorch base image. At the time of writing, the latest PyTorch SageMaker container was 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.9.0-gpu-py38-cu111-ubuntu20.04. Note that this container URI is for the AWS East-1 region and will be different for other AWS regions. Please consult the preceding referenced AWS article on the correct URI for your region.

To build a new container, we will need to perform the following steps:

  • Create a Dockerfile with runtime instructions.
  • Build the container image locally.
  • Push the new container image to the container registry. In this example, we will use ECR as a container registry: a managed service from AWS, which is well integrated into the SageMaker ecosystem.

First, let’s create a Dockerfile for our extended container.

Developing a Dockerfile for our extended container

To extend the prebuilt SageMaker container, we need to have at least the following components:

  • A SageMaker PyTorch image to use as a base.
  • The required dependencies installed, such as the latest PyTorch and Hugging Face Transformers from the latest Git master branch.
  • Copy our training script from the previous example into the container.
  • Define the SAGEMAKER_SUBMIT_DIRECTORY and SAGEMAKER_PROGRAM environmental variables, so SageMaker knows which training script to execute when the container starts:

    FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.9.0-gpu-py38-cu111-ubuntu20.04

    RUN pip3 install git+https://github.com/huggingface/transformers

    ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code

    ENV SAGEMAKER_PROGRAM train.py

    COPY 1_sources/train.py $SAGEMAKER_SUBMIT_DIRECTORY/$SAGEMAKER_PROGRAM

Now we are ready to build and push this container image to ECR. You can find the bash script to do this in the chapter repository.

Scheduling a training job

Once we have our extended PyTorch container in ECR, we are ready to execute a SageMaker training job. The training job configuration will be similar to the script mode example with one notable difference: instead of the HuggingFaceEstimator object, we will use a generic SageMaker Estimator object that allows us to work with custom images. Note that you need to update the image_uri parameter with reference to the image URI in your ECR instance. You can find it by navigating to the ECR service on your AWS Console and finding the extended container there:

from sagemaker.estimator import Estimator
estimator = Estimator(
    image_uri="<UPDATE WITH YOUR IMAGE URI FROM ECR>",
    hyperparameters=hyperparameters,
    instance_type="ml.p2.xlarge",
    instance_count=1,
    role=role
)
estimator.fit({
    "train":train_dataset_uri,
    "test":test_dataset_uri
})

After completing the training job, we should expect similar training outcomes as those shown in the script mode example.

Developing a BYO container for inference

In this section, we will learn how to build a SageMaker-compatible inference container using an official TensorFlow image, prepare an inference script and model server, and deploy it for inference on SageMaker Hosting.

Problem overview

We will develop a SageMaker-compatible container for inference. We will use the latest official TensorFlow container as a base image and use AWS MMS as a model server. Please note that MMS is one of many ML model serving options that can be used. SageMaker doesn’t have any restrictions on a model server other than that it should serve models on port 8080.

Developing the serving container

When deploying a serving container to the endpoint, SageMaker runs the following command:

docker run <YOUR BYO IMAGE> serve

To comply with this requirement, it’s recommended that you use the exec format of the ENTRYPOINT instruction in your Dockerfile.

Let’s review our BYO Dockerfile:

  • We use the latest TensorFlow container as a base
  • We install general and SageMaker-specific dependencies
  • We copy our model serving scripts to the container
  • We specify ENTRYPOINT and the CMD instructions to comply with the SageMaker requirements

Now, let’s put it into action:

  1. Use the latest official TensorFlow container:

    FROM tensorflow/tensorflow:latest

  2. Install Java, as required by MMS and any other common dependencies.
  3. Copy the entrypoint script to the image:

    COPY 3_sources/src/dockerd_entrypoint.py /usr/local/bin/dockerd-entrypoint.py

    RUN chmod +x /usr/local/bin/dockerd-entrypoint.py

  4. Copy the default custom service file to handle incoming data and inference requests:

    COPY 3_sources/src/model_handler.py /opt/ml/model/model_handler.py

    COPY 3_sources/src/keras_model_loader.py /opt/ml/model/keras_model_loader.py

  5. Define an entrypoint script and its default parameters:

    ENTRYPOINT ["python3", "/usr/local/bin/dockerd-entrypoint.py"]

    CMD ["serve"]

In this example, we don’t intend to cover MMS and the development of inference scripts in detail. However, it’s worth highlighting some key script aspects:

  • dockerd_entrypoint.py is an executable that starts the MMS server when the serve argument is passed to it.
  • model_handler.py implements model-loading and model-serving logics. Note that the handle() method checks whether the model is already loaded into memory. If it’s not, it will load a model into memory once and then proceed to the handling serving request, which includes the following:
    • Deserializing the request payload
    • Running predictions
    • Serializing predictions

Deploying the SageMaker endpoint

To schedule the deployment of the inference endpoint, we use the generic Model class from SageMaker Python SDK. Note that since we downloaded the model from a public model zoo, we don’t need to provide a model_data parameter (hence, its value is None):

from sagemaker import Model
mms_model = Model(
    image_uri=image_uri,
    model_data=None,
    role=role,
    name=model_name,
    sagemaker_session=session
)
mms_model.deploy(
    initial_instance_count=1,
    instance_type="ml.m5.xlarge", 
    endpoint_name=endpoint_name
)

It might take several minutes to fully deploy the endpoint and start the model server. Once it’s ready, we can call the endpoint using the boto3.sagemaker-runtime client, which allows you to construct the HTTP request and send the inference payload (or image, in our case) to a specific SageMaker endpoint:

import boto3
client = boto3.client('sagemaker-runtime')
accept_type = "application/json"
content_type = 'image/jpeg'
headers = {'content-type': content_type}
payload = open(test_image, 'rb')
response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=payload,
    ContentType=content_type,
    Accept = accept_type
)
most_likely_label = response['Body'].read()
print(most_likely_label)

This code will, most likely, return an object in the image based on model predictions.

Summary

In this chapter, we reviewed how SageMaker provides support for the ML and DL frameworks using Docker containers. After reading this chapter, you should now know how to select the most appropriate DL container usage pattern according to your specific use case requirements. We learned about SageMaker toolkits, which simplifies developing SageMaker-compatible containers. In later sections, you gained practical knowledge of how to develop custom containers and scripts for training and inference tasks on Amazon SageMaker.

In the next chapter, we will learn about the SageMaker development environment and how to efficiently develop and troubleshoot your DL code. Additionally, we will learn about DL-specific tools and interfaces that the SageMaker development environment provides to simplify the building, deploying, and monitoring of your DL models.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.89.173