Amazon SageMaker supports many popular ML and DL frameworks. Framework support in SageMaker is achieved using prebuilt Docker containers for inference and training tasks. Prebuilt SageMaker containers provide a great deal of functionality, and they allow you to implement a wide range of use cases with minimal coding. There are also real-life scenarios where you need to have a custom, runtime environment for training and/or inference tasks. To address these cases, SageMaker provides a flexible Bring-Your-Own (BYO) container feature.
In this chapter, we will review key supported DL frameworks and corresponding container images. Then, we will focus our attention on the two most popular DL frameworks, TensorFlow and PyTorch, and learn how to use them in Amazon SageMaker. Additionally, we will review a higher-level, state-of-the-art framework, Hugging Face, for NLP tasks, and its implementation for Amazon SageMaker.
Then, we will understand how to use and extend prebuilt SageMaker containers based on your use case requirements, as well as learning about the SageMaker SDK and toolkits, which simplify writing training and inference scripts that are compatible with Amazon SageMaker.
In later sections, we will dive deeper into how to decide whether to use prebuilt SageMaker containers or BYO containers. Then, we will develop a SageMaker-compatible BYO container.
These topics will be covered in the following sections:
By the end of this chapter, you will be able to decide which container strategy to choose based on your specific problem requirements and chosen DL framework. Additionally, you will understand the key aspects of training and inference script development, which are compatible with Amazon SageMaker.
In the Using SageMaker DL containers and Developing BYO containers sections, we will provide walk-through code samples, so you can develop practical skills. Full code examples are available at https://github.com/PacktPublishing/Accelerate-Deep-Learning-Workloads-with-Amazon-SageMaker/blob/main/chapter2/.
To follow along with this code, you will need the following:
At the time of writing this book, Amazon SageMaker supports the following frameworks, where DL frameworks are marked with an asterisk:
The preceding list of supported frameworks could change in the future. Be sure to check the official SageMaker documentation at https://docs.aws.amazon.com/sagemaker/latest/dg/frameworks.html.
In this book, we will primarily focus on the two most popular choices: TensorFlow and PyTorch. Both are open source frameworks with a large and vibrant communities. Depending on the specific use case or model architecture, one or the other framework might have a slight advantage. However, it’s safe to assume that both frameworks are comparable in terms of features and performance. In many practical scenarios, the choice between TensorFlow or PyTorch is made based on historical precedents or individual preferences.
Another framework that we will discuss in this book is Hugging Face. This is a high-level framework that provides access to SOTA models, training, and inference facilities for NLP tasks (such as text classification, translation, and more). Hugging Face is a set of several libraries (transformers, datasets, tokenizers, and accelerate) designed to simplify building SOTA NLP models. Under the hood, Hugging Face libraries use TensorFlow and PyTorch primitives (collectively known as “backends”) to perform computations. Users can choose which backend to use based on specific runtime requirements. Given its popularity, Amazon SageMaker has recently added support for the Hugging Face libraries in separate prebuilt containers for training and inference tasks.
Container sources
Sources of SageMaker DL containers are available on the public GitHub repository at https://github.com/aws/deep-learning-containers. In certain cases, it can be helpful to review relevant Dockerfiles to understand the runtime configuration of prebuilt containers. Container images are available in AWS public registries at https://github.com/aws/deep-learning-containers/blob/master/available_images.md.
For each of the supported frameworks, SageMaker provides separate training and inference containers. We have separate containers for these two tasks because of the following considerations:
For this reason, we will always explicitly identify the container we are using depending on the specific task.
Specific to DL containers, AWS also defines separate GPU-based and CPU-based containers. GPU-based containers require the installation of additional accelerators to be able to run computations on GPU devices (such as the CUDA toolkit).
Model requirements
When choosing a SageMaker DL container, always consider the model requirements for compute resources. For the majority of SOTA models, it’s recommended that you use GPU-based compute instances to achieve acceptable performance. Choose your DL container accordingly.
A TensorFlow container has two major versions: 1.x (maintenance mode) and 2.x (the latest version). Amazon SageMaker supports both versions and provides inference and training containers. In this book, all of the code examples and general commentary are done assuming TensorFlow v2.x.
AWS updates with frequently supported minor TensorFlow versions. The latest supported major version is 2.10.0.
Amazon SageMaker provides inference and training containers for PyTorch. The latest version is 1.12.1.
AWS provides Hugging Face containers in two flavors: PyTorch and TensorFlow backends. Each backend has separate training and inference containers.
AWS provides a convenient Python SDK that simplifies interactions with supported DL frameworks via the Estimator, Model, and Predictor classes. Each supported framework has a separate module with the implementation of respective classes. For example, here is how you import Predict, Estimator, and Model classes for the PyTorch framework:
from sagemaker.pytorch.estimator import PyTorch from sagemaker.pytorch.model import PyTorchModel, PyTorchPredictor
The following diagram shows the SageMaker Python SDK workflow:
Figure 2.1 – How SageMaker Python SDK works with image URIs
To build a better intuition, let’s do a quick example of how to run a training job using a PyTorch container with a specific version using SageMaker Python SDK. For a visual overview, please refer to Figure 2.1:
from sagemaker.pytorch.estimator import PyTorch
When instantiating the PyTorch estimator object, we need to provide several more parameters including the framework version and the Python version:
estimator = PyTorch(
entry_point="training_script.py",
framework_version="1.8",
py_version="py3",
role=role,
instance_count=1,
instance_type="ml.p2.xlarge"
)
estimator.fit()
Using custom images
Please note that if, for some reason, you would prefer to provide a direct URI to your container image, you can do it using the image_uri parameter that is supported by the model and estimator classes.
Now, let’s take a deep dive into SageMaker DL containers, starting with the available prebuilt containers for the TensorFlow, PyTorch, and Hugging Face frameworks.
Amazon SageMaker supports several container usage patterns. Also, it provides you with Training and Inference Toolkits that simplify using prebuilt containers and developing BYO containers.
In this section, we will learn how to choose the most efficient container usage pattern for your use case and how to use the available SageMaker toolkits to implement it.
Amazon SageMaker provides you with the flexibility to choose whether to use prebuilt containers “as is” (known as Script Mode), BYO containers, or modify prebuilt containers.
Typically, the choice of approach is driven by specific model runtime requirements, available resources, and engineering expertise. In the next few subsections, we will discuss when to choose one approach over another.
In script mode, you define which prebuilt container you’d like to use and then provide one or more scripts with the implementation of your training or inference logic. Additionally, you can provide any other dependencies (proprietary or public) that will be exported to the containers.
Both training and inference containers in script mode come with preinstalled toolkits that provide common functionality such as downloading data to containers and model artifacts, starting jobs, and others. We will look at further details of the SageMaker Inference Toolkit and Training Toolkit later in this chapter.
Script Mode is suitable for the following scenarios:
In the following sections, we will review how to prepare your first training and inference scripts and run them on SageMaker in script mode.
Another way to use SageMaker’s prebuilt containers is to modify them. In this case, you will use one of the prebuilt containers as a base image for your custom container.
Modifying prebuilt containers can be beneficial in the following scenarios:
Please note that when you extend a prebuilt container, you will be responsible for the following aspects:
Later in this chapter, we see an example of how to extend a prebuilt PyTorch container for a training task.
There are many scenarios in which you might need to create a custom container, such as the following:
Building a custom container compatible with SageMaker inference and training resources requires development efforts, an understanding of Docker containers, and specific SageMaker requirements. Therefore, it’s usually recommended that you consider script mode or extending a prebuilt container first and choose to use a BYO container only if the first options do not work for your particular use case.
To simplify the development of custom scripts and containers that are compatible with Amazon SageMaker, AWS created Python toolkits for training and inference tasks.
Toolkits provide the following benefits:
Understanding these toolkits helps to simplify and speed up the development of SageMaker-compatible containers, so let’s review them in detail.
The SageMaker Training Toolkit has several key functions:
Figure 2.2 – The directory structure in SageMaker-compatible containers
The Training Toolkit sets up the following directories in the training container:
By default, all prebuilt training containers already have a training toolkit installed. If you wish to use it, you will need to install it on your container by running the following:
RUN pip install sagemaker-training
Also, you will need to copy all of the code dependencies into your container and define a special environmental variable in your main training script, as follows:
COPY train_scipt.py /opt/ml/code/train_script.py ENV SAGEMAKER_PROGRAM train_scipt.py
The training toolkit package is available in the PyPI (pypi.org) package and the SageMaker GitHub repository (https://github.com/aws/sagemaker-training-toolkit).
The Inference Toolkit implements a model serving stack that is compatible with SageMaker inference services. It comes together with an open source Multi-Model Server (MMS) to serve models. It has the following key functions:
The Inference Toolkit package is available in the PyPi (pypi.org) package and the GitHub repository (https://github.com/aws/sagemaker-inference-toolkit).
Now that we have an understanding of SageMaker’s container ecosystem, let’s implement several learning projects to build practical skills. In this first example, we will use SageMaker script mode to train our custom NLP model and deploy it for inference.
In this example, we will learn how to develop training and inference scripts using the Hugging Face framework. We will leverage prebuilt SageMaker containers for Hugging Face (with the PyTorch backend).
We chose to solve a typical NLP task: text classification. We will use the 20 Newsgroups dataset, which assembles ~20,000 newsgroup documents across 20 different newsgroups (categories). There are a number of model architectures that can address this task. Usually, current SOTA models are based on Transformer architecture. Autoregressive models such as BERT and its various derivatives are suitable for this task. We will use a concept known as transfer learning, where a model that is pretrained for one task is used for a new task with minimal modifications.
As a baseline model, we will use model architecture known as DistilBERT, which provides high accuracy on a wide variety of tasks and is considerably smaller than other models (for instance, the original BERT model). To adapt the model for a classification task, we would need to add a classification layer, which will be trained during our training to recognize articles:
Figure 2.3 – The model architecture for the text classification task
The Hugging Face Transformers library simplifies model selection and modification for fine-tuning in the following ways:
The full source code of this learning project is available at https://github.com/PacktPublishing/Accelerate-Deep-Learning-Workloads-with-Amazon-SageMaker/blob/main/chapter2/1_Using_SageMaker_Script_Mode.ipynb.
When running SageMaker training jobs, we need to provide a training script. Additionally, we might provide any other dependencies. We can also install or modify Python packages that are installed on prebuilt containers via the requirements.txt file.
In this example, we will use a new feature of the Hugging Face framework to fine-tune a multicategory classifier using the Hugging Face Trainer API. Let’s make sure that the training container has the newer Hugging Face Transformer library installed. For this, we create the requirements.txt file and specify a minimal compatible version. Later, we will provide this file to our SageMaker training job:
transformers >= 4.10
Next, we need to develop the training script. Let’s review some key components of it.
At training time, SageMaker starts training by calling user_training_script --arg1 value1 --arg2 value2 .... Here, arg1..N are training hyperparameters and other miscellaneous parameters provided by users as part of training job configuration. To correctly kick off the training process in our script, we need to include main guard within our script:
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--epochs", type=int, default=1)
parser.add_argument("--per-device-train-batch-size", type=int, default=16)
parser.add_argument("--per-device-eval-batch-size", type=int, default=64)
parser.add_argument("--warmup-steps", type=int, default=100)
parser.add_argument("--logging-steps", type=float, default=100)
parser.add_argument("--weight-decay", type=float, default=0.01)
args, _ = parser.parse_known_args()
train(args)
def train(args):
train_enc_dataset, test_enc_dataset = _get_tokenized_data()
training_args = TrainingArguments(
output_dir=os.getenv(
"SM_OUTPUT_DIR", "./"
), # output directory, if runtime is not
num_train_epochs=args.epochs,
per_device_train_batch_size=args.per_device_train_batch_size,
per_device_eval_batch_size=args.per_device_eval_batch_size,
warmup_steps=args.warmup_steps,
weight_decay=args.weight_decay,
logging_steps=args.logging_steps,
)
config = DistilBertConfig()
config.num_labels = NUM_LABELS
model = DistilBertForSequenceClassification.from_pretrained(
MODEL_NAME, config=config
)
trainer = Trainer(
model=model, # model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_enc_dataset, # training dataset
eval_dataset=test_enc_dataset, # evaluation dataset
)
trainer.train()
model.save_pretrained(os.environ["SM_MODEL_DIR"])
So far in our script, we have covered key aspects: handling configuration settings and model hyperparameters, loading pretrained models, and starting training using the Hugging Face Trainer API.
Once we have our training script and dependencies ready, we can proceed with the training and schedule a training job via SageMaker Python SDK. We start with the import of the Hugging Face Estimator object and get the IAM execution role for our training job:
from sagemaker.huggingface.estimator import HuggingFace from sagemaker import get_execution_role role=get_execution_role()
Next, we need to define the hyperparameters of our model and training processes. These variables will be passed to our script at training time:
hyperparameters = { "epochs":1, "per-device-train-batch-size":16, "per-device-eval-batch-size":64, "warmup-steps":100, "logging-steps":100, "weight-decay":0.01 } estimator = HuggingFace( py_version="py36", entry_point="train.py", source_dir="1_sources", pytorch_version="1.7.1", transformers_version="4.6.1", hyperparameters=hyperparameters, instance_type="ml.p2.xlarge", instance_count=1, role=role ) estimator.fit({ "train":train_dataset_uri, "test":test_dataset_uri })
After that, the training job will be scheduled and executed. It will take 10–15 minutes for it to complete, then the trained model and other output artifacts will be added to Amazon S3.
Now that we have a trained model, let’s deploy it as a SageMaker real-time endpoint. We will use the prebuilt SageMaker Hugging Face container and will only provide our inference script. The inference requests will be handled by the AWS MMS, which exposes the HTTP endpoint.
When using prebuilt inference containers, SageMaker automatically recognizes our inference script. According to SageMaker convention, the inference script has to contain the following methods:
MODEL_NAME = "distilbert-base-uncased"
NUM_LABELS = 6 # number of categories
MAX_LENGTH = 512 # max number of tokens model can handle
def model_fn(model_dir):
device_id = 0 if torch.cuda.is_available() else -1
tokenizer = DistilBertTokenizerFast.from_pretrained(MODEL_NAME)
config = DistilBertConfig()
config.num_labels = NUM_LABELS
model = DistilBertForSequenceClassification.from_pretrained(
model_dir, config=config
)
inference_pipeline = pipeline(
model=model,
task="text-classification",
tokenizer=tokenizer,
framework="pt",
device=device_id,
max_length=MAX_LENGTH,
truncation=True
)
return inference_pipeline
def transform_fn(inference_pipeline, data, content_type, accept_type):
# Deserialize payload
if "json" in content_type:
deser_data = json.loads(data)
else:
raise NotImplemented("Only 'application/json' content type is implemented.")
# Run inference
predictions = inference_pipeline(deser_data)
# Serialize response
if "json" in accept_type:
return json.dumps(predictions)
else:
raise NotImplemented("Only 'application/json' accept type is implemented.")
Sometimes, combining deserialization, inference, and serialization in a single method can be inconvenient. Alternatively, SageMaker supports a more granular API:
Note that the transform_fn() method is mutually exclusive with the input_fn(), predict_fn(), and output_fn() methods.
Now we are ready to deploy and test our Newsgroup Classification endpoint. We can use the estimator.create_model() method to configure our model deployment parameters, specifically the following:
from sagemaker.huggingface.estimator import HuggingFaceModel
model = estimator.create_model(role=role,
entry_point="inference.py",
source_dir="1_sources",
py_version="py36",
transformers_version="4.6.1",
pytorch_version="1.7.1"
)
predictor = model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)
Next, let’s explore how to extend pre-built DL containers.
We will reuse code assets from the script mode example. However, unlike the previous container, we will modify our runtime environment and install the latest stable Hugging Face Transformer from the GitHub master branch. This modification will be implemented in our custom container image.
First off, we need to identify which base image we will use. AWS has published all of the available DL containers at https://github.com/aws/deep-learning-containers/blob/master/available_images.md.
Since we plan to use reinstall from scratch HugggingFace Transformer library anyway, we might choose the PyTorch base image. At the time of writing, the latest PyTorch SageMaker container was 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.9.0-gpu-py38-cu111-ubuntu20.04. Note that this container URI is for the AWS East-1 region and will be different for other AWS regions. Please consult the preceding referenced AWS article on the correct URI for your region.
To build a new container, we will need to perform the following steps:
First, let’s create a Dockerfile for our extended container.
To extend the prebuilt SageMaker container, we need to have at least the following components:
FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.9.0-gpu-py38-cu111-ubuntu20.04
RUN pip3 install git+https://github.com/huggingface/transformers
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/code
ENV SAGEMAKER_PROGRAM train.py
COPY 1_sources/train.py $SAGEMAKER_SUBMIT_DIRECTORY/$SAGEMAKER_PROGRAM
Now we are ready to build and push this container image to ECR. You can find the bash script to do this in the chapter repository.
Once we have our extended PyTorch container in ECR, we are ready to execute a SageMaker training job. The training job configuration will be similar to the script mode example with one notable difference: instead of the HuggingFaceEstimator object, we will use a generic SageMaker Estimator object that allows us to work with custom images. Note that you need to update the image_uri parameter with reference to the image URI in your ECR instance. You can find it by navigating to the ECR service on your AWS Console and finding the extended container there:
from sagemaker.estimator import Estimator estimator = Estimator( image_uri="<UPDATE WITH YOUR IMAGE URI FROM ECR>", hyperparameters=hyperparameters, instance_type="ml.p2.xlarge", instance_count=1, role=role ) estimator.fit({ "train":train_dataset_uri, "test":test_dataset_uri })
After completing the training job, we should expect similar training outcomes as those shown in the script mode example.
In this section, we will learn how to build a SageMaker-compatible inference container using an official TensorFlow image, prepare an inference script and model server, and deploy it for inference on SageMaker Hosting.
We will develop a SageMaker-compatible container for inference. We will use the latest official TensorFlow container as a base image and use AWS MMS as a model server. Please note that MMS is one of many ML model serving options that can be used. SageMaker doesn’t have any restrictions on a model server other than that it should serve models on port 8080.
When deploying a serving container to the endpoint, SageMaker runs the following command:
docker run <YOUR BYO IMAGE> serve
To comply with this requirement, it’s recommended that you use the exec format of the ENTRYPOINT instruction in your Dockerfile.
Let’s review our BYO Dockerfile:
Now, let’s put it into action:
FROM tensorflow/tensorflow:latest
COPY 3_sources/src/dockerd_entrypoint.py /usr/local/bin/dockerd-entrypoint.py
RUN chmod +x /usr/local/bin/dockerd-entrypoint.py
COPY 3_sources/src/model_handler.py /opt/ml/model/model_handler.py
COPY 3_sources/src/keras_model_loader.py /opt/ml/model/keras_model_loader.py
ENTRYPOINT ["python3", "/usr/local/bin/dockerd-entrypoint.py"]
CMD ["serve"]
In this example, we don’t intend to cover MMS and the development of inference scripts in detail. However, it’s worth highlighting some key script aspects:
To schedule the deployment of the inference endpoint, we use the generic Model class from SageMaker Python SDK. Note that since we downloaded the model from a public model zoo, we don’t need to provide a model_data parameter (hence, its value is None):
from sagemaker import Model mms_model = Model( image_uri=image_uri, model_data=None, role=role, name=model_name, sagemaker_session=session ) mms_model.deploy( initial_instance_count=1, instance_type="ml.m5.xlarge", endpoint_name=endpoint_name )
It might take several minutes to fully deploy the endpoint and start the model server. Once it’s ready, we can call the endpoint using the boto3.sagemaker-runtime client, which allows you to construct the HTTP request and send the inference payload (or image, in our case) to a specific SageMaker endpoint:
import boto3 client = boto3.client('sagemaker-runtime') accept_type = "application/json" content_type = 'image/jpeg' headers = {'content-type': content_type} payload = open(test_image, 'rb') response = client.invoke_endpoint( EndpointName=endpoint_name, Body=payload, ContentType=content_type, Accept = accept_type ) most_likely_label = response['Body'].read() print(most_likely_label)
This code will, most likely, return an object in the image based on model predictions.
In this chapter, we reviewed how SageMaker provides support for the ML and DL frameworks using Docker containers. After reading this chapter, you should now know how to select the most appropriate DL container usage pattern according to your specific use case requirements. We learned about SageMaker toolkits, which simplifies developing SageMaker-compatible containers. In later sections, you gained practical knowledge of how to develop custom containers and scripts for training and inference tasks on Amazon SageMaker.
In the next chapter, we will learn about the SageMaker development environment and how to efficiently develop and troubleshoot your DL code. Additionally, we will learn about DL-specific tools and interfaces that the SageMaker development environment provides to simplify the building, deploying, and monitoring of your DL models.
18.218.89.173