Using Amazon SageMaker for machine learning

Typically, the machine learning process comprises the following steps:

Data wrangling: This involves setting up and managing notebook environments. This will help you to get data to notebooks securely.
Experimentation: This involves setting up and managing clusters. This will help you scale/distribute ML algorithms.
Deployment: This involves setting up and managing inference clusters. This will help you manage and autoscale inference APIs with testing, versioning, and monitoring.

The preceding cycle can take as long as 6 to 18 months. However, given the potential of machine learning and AI to empower and improve businesses, the effort is often worthwhile. Challenges in large-scale machine learning include storing and processing lots of data, model selection, and production deployments and model updates. Hyperparameter tuning is expensive and incremental training is a problem (significant repetitious training on data already trained previously). As the amount of investment required goes up with increasing data/model size, businesses limit the size of the data used for training. This results in wasted opportunity due to unused data (as the data is incrementally dropped from the training set with time).

Amazon SageMaker aims to make the machine learning process easier. It is a managed service that provides the quickest and easiest way for data scientists and developers to get ML models from idea to production. It provides an end-to-end machine learning platform by supporting data exploration, model training, and hosting with minimal setup, and you pay by the second.

Amazon SageMaker requires minimal setup for data exploration and resizable as per your needs. This involves common tools pre-installed with easy access to your data sources. There are no servers to manage and a modular architecture lets you use only what you need. It has a pay as you go model and a free trial is available to you get started quickly.

Amazon optimized algorithms can be applied for distributed training using the AWS SDK, or Apache Spark SageMaker Estimators, deep learning scripts (using TensorFlow and Gluon), or your custom algorithm Docker image. Algorithms designed for huge datasets: Streaming datasets, for cheaper training. Train faster in a single pass. Greater reliability on extremely large datasets.

Choice of several ML algorithms include XGBoost, Factorization Machines, and Learn Linear for classification and regression, K-means and PCA for clustering and dimensionality reduction, image classification with convolutional neural networks, LDA, and NTM for topic modeling, and seq2seq for translation—with more algorithms to follow.

Amazon SageMaker makes a whole bunch of machine learning tasks ranging from the management of notebook environments to easy data exploration in Jupyter notebooks. It also supports one-step deployment to quickly deploy ML models in production. It provides a low latency, high throughput, and high-reliability service.

You can get started using Amazon SageMaker with notebook samples. You will need to modify the code to access your own data sources.

Table of Contents for Using Amazon SageMaker for machine learning

Create new playlist

Sign In

Sign Up

Table of Contents for
Using Amazon SageMaker for machine learning