Chapter 9
Network Performance

The AWS Certified Advanced Networking – Specialty Exam objectives covered in this chapter may include, but are not limited to, the following:

  • Domain 2.0: Design and Implement AWS Networks
  • images 2.4 Determine network requirements for a specialized workload
  • images 2.5 Derive an appropriate architecture based on customer and application requirements

images Modern applications depend on the network for communication between services or between other application components. Since the network connects all application components, it can have major impacts, both positive and negative, on application performance and behavior. There are also applications that are heavily dependent on network performance, such as High Performance Computing (HPC), where deep network understanding is important to increasing cluster performance.

This chapter focuses on network performance characteristics that are important to applications, examples of network-dependent applications, options to improve network performance, and how to optimize network performance on Amazon Elastic Compute Cloud (Amazon EC2).

Network Performance Basics

It’s common for end users and developers simply to describe their network performance as fast or slow. The user experience of the network is a combination of several networking aspects that likely span multiple different networks and applications. For example, a user at a coffee shop accessing a website hosted on Amazon EC2 must pass through the local coffee shop wireless network, the service provider network, and the AWS network. The website application likely also has multiple dependencies within AWS. In this context, it’s important to separate network performance into multiple, more accurate, and measurable terms.

Bandwidth

Bandwidth is the maximum rate of transfer over the network. Typically, this is defined in bits per second (abbreviated Bps), Mbps for one million bits per second, or Gbps for one billion bits per second. Network bandwidth defines the maximum bandwidth rate, but the actual user or application transfer rate will also be affected by latency, protocol, and packet loss. In comparison, throughput is the successful transfer rate over the network.

Latency

Latency is the delay between two points in a network. Latency can be measured in one-way delay or Round-Trip Time (RTT) between two points. Ping is a common way to test RTT delay. Delays include propagation delays for signals to travel across different mediums such as copper or fiber optics, often at speeds close to the speed of light. There are also processing delays for packets to move through physical or virtual network devices, such as the Amazon Virtual Private Cloud (Amazon VPC) virtual router. Network drivers and operating systems can be optimized to minimize processing latency on the host system as well.

Jitter

Jitter is the variation in inter-packet delays. Jitter is caused by a variance in delay over time between two points in the network. Jitter is often caused by variations in processing delays and queueing delays in the network, which increase with higher network load. For example, if the one-way delay between two systems varies from 10 ms to 100 ms, then there is 90 ms of jitter. This type of varying delay causes issues with voice and real-time systems that process media because the systems have to decide to buffer data longer or continue without the data.

Throughput

Throughput is the rate of successful data transferred, measured in bits per second. Bandwidth, latency, and packet loss affect the throughput rate. The bandwidth will define the maximum rate possible. Latency affects the bandwidth of protocols like Transmission Control Protocol (TCP) with round-trip handshakes. TCP uses congestion windows to control throughput. One side of the TCP connection will send a single segment of data and then wait for confirmation from the other side before sending more data. If the handshake continues at the single-segment rate, the throughput will be heavily affected by the latency involved in the round-trip confirmation. This is why TCP uses scaling window sizes to increase throughput. Conversely, User Datagram Protocol (UDP) doesn’t acknowledge packets with handshakes, though some applications built on top of UDP may implement RTT requirements. UDP will not adaptively throttle traffic rates when there is loss unless application logic decides to back off.

Packet Loss

Packet loss is typically stated in terms of the percentage of packets that are dropped in a flow or on a circuit. Packet loss will affect applications differently. TCP applications are generally sensitive to loss due to congestion control. For instance, TCP Reno, a common implementation of TCP, halves its congestion window on a single packet loss.

Packets per Second

Packets per second refers to how many packets are processed in one second. Packets per second are a common bottleneck in network performance testing. All processing points in the network must process each packet, requiring computing resources. Particularly for small packets, per-packet processing can limit throughput before bandwidth limits are reached. You can monitor the packets per second on AWS Direct Connect ports with Amazon CloudWatch metrics.

Maximum Transmission Unit

The Maximum Transmission Unit (MTU) defines the largest packet that can be sent over the network. The maximum on most Internet and Wide Area Networks (WANs) is 1,500 bytes. Jumbo frames are packets larger than 1,500 bytes. AWS supports 9,001 byte jumbo frames within a VPC. VPC peering and traffic leaving a VPC support up to 1,500 byte packets, including Internet and AWS Direct Connect traffic. Increasing the MTU increases throughput when the packet per second processing rate is the performance bottleneck.

Amazon Elastic Compute Cloud (Amazon EC2) Instance Networking Features

Amazon EC2 offers a set of networking features directly to instances in addition to the features available in Amazon VPC and Amazon Route 53. Amazon EC2 offers different networking capabilities by instance type and features such as placement groups, Amazon Elastic Block Store (Amazon EBS)-optimized instances, and enhanced networking.

Instance Networking

AWS offers instance families for different use cases, each with different network, computing, and memory resources.

Examples of Instance Families

  • General Purpose (M4)
  • Compute Optimized (C5)
  • Memory Optimized (R4)
  • Accelerated Computing (P2)

These families have different networking speeds and capabilities, such as enhanced networking and Amazon EBS-optimized networking. Each instance type’s network performance is documented as low, medium, high, 10 Gigabit, up to 10 Gigabit, or 20 Gigabit. Larger instance types in a family have bandwidth that generally scales with the vCPU quantity within the family.

Placement Groups

A placement group is a logical grouping of instances within a single Availability Zone. Placement groups are recommended for applications that benefit from low network latency, high network throughput, or both. Use a placement group to provide the lowest latency and the highest packet-per-second network performance.

Placement groups enable higher bandwidths for instances. When instances are documented with 10 Gigabit network performance, those numbers refer to placement group performance. Newer instance types that have up to 10 Gigabit or 25 Gigabit network performance can achieve those bandwidths outside of a placement group, however. The instance and flow bandwidth capabilities are important for network-bound applications that require high network throughput to resources outside of the local VPC, such as Amazon Simple Storage Service (Amazon S3). A single flow inside a placement group is limited to 10 Gbps and flows outside a placement group are limited to 5 Gbps. Multiple flows can be used to achieve higher aggregate throughput.

Placement groups are ideal for distributed applications that require low latency, such as HPC. HPC cluster performance is dependent on network latency, and the communication is kept within the cluster.

Amazon Elastic Block Store (Amazon EBS)-Optimized Instances

Amazon EBS provides persistent block storage volumes for use with Amazon EC2 instances. You can launch selected Amazon EC2 instance types as Amazon EBS-optimized instances. Amazon EBS input and output affect network performance because the storage is network-attached. Amazon EBS optimization enables Amazon EC2 instances to fully use the Input/Output Per Second (IOPS) provisioned on an Amazon EBS volume. Amazon EBS-optimized instances deliver dedicated throughput between Amazon EC2 and Amazon EBS, with options between 500 and 4,000 Mbps depending on the instance type. The dedicated throughput minimizes contention between Amazon EBS Input/Output (I/O) and other traffic from your Amazon EC2 instance, providing the best performance for your Amazon EBS volumes.

Amazon EBS-optimized instances are designed for use with both Standard and Provisioned IOPS Amazon EBS volumes. When attached to Amazon EBS-optimized instances, Provisioned IOPS volumes can achieve single-digit millisecond latencies. We recommend using Provisioned IOPS volumes with Amazon EBS-optimized instances or with instances that support cluster networking for applications with high storage I/O requirements and low latency.

Network Address Translation (NAT) Gateways

You can use Network Address Translation (NAT) gateways to enable outbound access to the Internet while preventing inbound connectivity. NAT gateways offer better network performance than operating your own NAT instance. A NAT gateway is horizontally scalable within an Availability Zone and can forward up to 10 Gbps of traffic. NAT gateways increase availability and remove the bottleneck that a single NAT instance creates.

Enhanced Networking

Enhanced networking uses Single Root I/O Virtualization (SR-IOV) and Peripheral Component Interconnect (PCI) passthrough to provide high-performance networking capabilities on supported instance types for Linux, Windows, and FreeBSD. SR-IOV and PCI passthrough are methods of device virtualization that provide higher I/O performance and lower CPU utilization when compared to traditional virtualized network interfaces. Enhanced networking provides higher bandwidth—over one million packets per second performance—and consistently lower inter-instance latencies. Combined with placement groups, it provides full bi-section bandwidth without bandwidth oversubscription for the largest instance types. Enhanced networking requires both operating system driver support and for the Amazon Machine Image (AMI) or instance to be flagged for Enhanced networking.

Network Drivers

Depending on the instance type, enhanced networking can be enabled with one of two drivers: the Intel 82599 Virtual Function interface and the Amazon Elastic Network Adapter (ENA) driver. The ENA driver was built for newer instance families to support speeds up to 400 Gbps, with current instances using up to 25 Gbps. Each instance family supports either the Intel or the ENA driver but not both. In Linux, the ixgbevf module provides 82599 Virtual Function driver support.

Enabling Enhanced Networking

There are two methods to enable enhanced networking for an instance. The first method is to enable the enhanced networking attribute set on the AMI. The second method is to set the instance attribute to enable enhanced networking. The latest Amazon Linux Hardware Virtual Machine (HVM) AMI launches with enhanced networking support by default.

Operating System Support

Support for the Intel 82599 Virtual Function is available for Linux, Windows Server 2008 R2, Windows Server 2012, Windows Server 2016, and BSD. Enhanced networking is not available on Windows Server 2008 or Windows Server 2003.

Support for the ENA driver is available for Linux, Windows Server 2008 R2, Windows Server 2012, Windows Server 2016, and FreeBSD. The driver code is hosted on GitHub and is included in the Linux 4.9 kernel.

Additional Tuning and Driver Support

Enhanced networking is a fundamental component to increasing networking performance on AWS. We recommend enabling enhanced networking for all instances that support it. There are additional tuning and optimization techniques available for applications that require the highest performance available.

The Intel Data Plane Development Kit (DPDK) is a set of libraries and drivers for fast packet processing. It supports Linux, Windows, and a subset of features for FreeBSD. DPDK extends the packet processing capabilities of enhanced networking with support for both the Intel 82599 Virtual Interface and the ENA driver. This amount of control is application-specific, so DPDK has a different level of complexity to enable its benefits as compared to enhanced networking.

Enhanced networking and SR-IOV reduce the overhead of packet processing between an instance and the Hypervisor. DPDK reduces the overhead of packet processing inside the operating system, which provides applications with more control of network resources such as ring buffers, memory, and poll-mode drivers. Combining DPDK and enhanced networking provides higher packets per second, less latency, less jitter, and more control over packet queueing. This combination is most common in packet processing devices that are highly impacted by networking performance such as firewalls, real-time communication processing, HPC, and network appliances.

There are additional operating system-specific enhancements such as TCP settings, driver settings, and Non-Uniform Mapping Access (NUMA) that can further increase performance. Since these are not AWS-specific concepts, they are not covered in depth in this study guide.

Optimizing Performance

It’s important to learn how to use the concepts just discussed to tune and optimize your network performance. This section reviews some of those concepts and methods.

Enhanced Networking

If your application requires high network performance, we suggest using an instance type that supports enhanced networking. This is a fundamental step in reducing latency, packet loss, and jitter and in increasing bandwidth for instances. Remember to have both operating system support and for the instance to be flagged for enhanced networking support.

Jumbo Frames

For applications that require high throughput, such as bulk data transfer, increasing the MTU can increase throughput. In scenarios where the performance bottleneck is packets per second, increasing the MTU can increase overall throughput by sending more data per packet.

The most common MTU found on the Internet is 1,500 bytes, which is what AWS supports across AWS Direct Connect and Internet gateways. Any Ethernet frame larger than 1,500 bytes is called a jumbo frame. Certain instance families support an MTU of 9,001 bytes within a VPC. If you have a cluster in a placement group, enabling jumbo MTUs can increase the cluster performance. To enable jumbo MTUs, you will need to change the operating system network parameters. For example, on Linux this is a parameter of the ip command.

Network Credits

Instance families such as R4 and C5 use a network I/O credit mechanism. Most applications do not consistently need a high level of network performance, but can benefit from having access to increased bandwidth when they send or receive data. For example, the smaller R4 instance sizes offer peak throughput of 10 Gbps. These instances use a network I/O credit mechanism to allocate network bandwidth to instances based on average bandwidth utilization. These instances accrue credits when their network throughput is below their baseline limits and can then use these credits when they perform network data transfers.

If you plan on running performance baselines with instances that support network credits, we recommend accounting for built-up credits during the test. One approach is to test with freshly installed instances. You can also send a large amount of traffic until you get to a steady state of throughput after all credits have been exhausted. Currently, there is no metric to track network credits for instances.

Instance Bandwidth

Each instance type has a bandwidth definition ranging from low to 20 gigabit. Larger instance types have more bandwidth and packet per second capabilities. There are no explicit bandwidth limits for any single VPC, VPC peering connection, or Internet gateway. We suggest trying a larger instance type if your application has a bandwidth bottleneck. If you are not sure about your performance bottleneck, trying a larger instance size is the easiest method to determine whether bandwidth supported by the instance is your bottleneck. The instance’s allowed bandwidth is roughly proportional to the size of the instance. Different instance families, such as C3 and C4, use different hardware and potentially networking implementations, so they may have slightly different performance characteristics as traffic gets closer to your networking limits. The Compute-Optimized (C family) and General-Purpose (M family) instances are common choices for network-bound applications. Instance bandwidth is also dependent on the network driver in use. Instance types using the Intel 82599 interface with the ixgbevf module have both an aggregate and flow-based bandwidth limit of 5 Gbps. Instances with the AWS ENA driver have a 5 Gbps flow limit outside of a placement group but can achieve an aggregate bandwidth of 25 Gbps within a VPC or a peered VPC with multiple flows.

Flow Performance

In addition to instance bandwidth, the quantity of flows that your application uses also affects throughput. In a placement group, any single flow will be limited to 10 Gbps. This is important to understand so that you can use the full bandwidth of any instance with greater than 10 Gbps performance.

Outside of a placement group, the maximum throughput for a single flow is 5 Gbps. Examples include traffic between Availability Zones in the same VPC, a flow between an Amazon EC2 instance and Amazon S3, and traffic between an instance and an on-premises resource.

Load Balancer Performance

If your application will be using Elastic Load Balancing, you have multiple choices for load balancing. The Application Load Balancer has many HTTP and Layer 7 features. The Network Load Balancer has more TCP and Layer 3 features. These options are covered in more depth in Chapter 6, “Domain Name System and Load Balancing.”

The advantages of the Network Load Balancer are performance and scale. Since it is less computationally complex to forward packets without looking inside them, the Network Load Balancer scales faster and has lower latency. If your application does not require HTTP or Layer 7 features, you can improve performance with a Network Load Balancer. The additional latency is measured in microseconds for Network Load Balancer packet processing.

Virtual Private Network (VPN) Performance

The Virtual Private Gateway (VGW) is the AWS managed Virtual Private Network (VPN) service. When a VPN connection is created, AWS provides tunnels to two different VPN endpoints. These VPN endpoints are capable of approximately 1.25 Gbps per tunnel depending on packet size.

To increase bandwidth into AWS, you can forward traffic to both endpoints. This design requires that on-premises equipment support Equal Cost Multipath (ECMP) to load balance traffic across both links or to balance more preferred prefixes on each VPN endpoint. It is possible to set different route preferences so that traffic leaves from both VPN endpoints for egress diversity.

In addition to the AWS VGW, you can install a VPN endpoint on your own Amazon EC2 instances. This approach allows more options for routing, performance tuning, and encryption overhead. Note that AWS does not manage the availability of this option. You should either test the VPN endpoint performance in your own account or work with the software provider to obtain their performance evaluations.

AWS Direct Connect Performance

One of the primary reasons for using AWS Direct Connect is to obtain more predictable performance than can be obtained using a VPN. Using a dedicated circuit or existing network allows you to control the quality of the network between on-premises infrastructure and AWS. For example, you can use a dedicated fiber between your data center and the AWS Direct Connect facility to reduce latency.

Another advantage that AWS Direct Connect offers is high bandwidth. While the VGW service is multi-gigabit, it is not suitable for throughputs of 10 Gbps and higher. AWS Direct Connect allows for customers to provision multiple 10 Gbps connections and also aggregate those connections into a single 40 Gbps circuit.

Quality of Service (QoS) in a VPC

On-premises networks often support Quality of Service (QoS) with Differentiated Services Code Point (DSCP) in order to have more control over which traffic is prioritized in case of network congestion. All traffic is treated equally inside of a VPC. The DSCP is the seven bits in the IP header used to identify the priority of traffic. The DSCP is not used to modify traffic forwarding in AWS networks, but the header remains as it was received.

You can use AWS Direct Connect in conjunction with QoS to improve application performance and reliability. When packets leave on-premises and traverse any service provider networks that honor DSCP bits, QoS can be applied normally. The goal is to use service provider networks that honor QoS so that performance is improved from on-premises infrastructure to AWS. Even so, packets are not differentiated at the AWS edge of the connection. This is common for real-time communications packets and other flows that are sensitive to packet loss.

Example Applications

The majority of applications on AWS perform adequately without needing advanced tuning or involvement of optional networking features. This is a study guide for advanced networking, though, so this section will review some of AWS more complex networking configurations.

High Performance Computing

High Performance Computing (HPC) allows scientists and engineers to solve complex, compute-intensive, and data-intensive problems. HPC applications often require high network performance, fast storage, large amounts of memory, very high compute capabilities, or all of these. HPC performance can be bound by network latency, so it is important to minimize latency within a cluster.

Using placement groups with HPC enables access to a low-latency, high-bandwidth network for tightly coupled, IO-intensive, and storage-intensive workloads. For faster Amazon EBS IO, we recommend using Amazon EBS-optimized instances and Provisioned IOPS volumes for high performance.

Real-Time Media

Real-time media services include applications like Voice over IP (VoIP), media streaming using the Real-time Transport Protocol (RTP) or Real-time Messaging Protocol (RTMP), and other video and audio applications. Real-time media use cases include enterprise migrations of existing communications infrastructure as well as service provider telephony and video services.

Media streams can have different requirements on the network depending on the implementation and architecture. Video workloads can have varying bandwidth requirements during an existing flow, which can be dependent on the complexity of movement in the video. Audio flows may also change their bandwidth requirements if the audio stream supports redundancy or adaptive changes. In most cases, both audio and video streams are highly sensitive to packet loss and jitter. Packet loss and jitter can cause distortion and gaps in the media, which are easily detected by end users. We recommend taking steps to reduce loss and jitter.

The first step to reducing loss and jitter on AWS is to make sure that enhanced networking is enabled for real-time media applications. This feature provides a smoother packet delivery. If AWS Direct Connect is used, you can use QoS on the circuit if the provider or equipment supports it, reducing the chance of packet loss.

Detailed monitoring and proactive routing control can also mitigate network congestions and challenges. For highly sensitive media with multiple potential network paths, you can configure monitoring probes on Amazon EC2 instances to report on link health. That information can be used centrally to modify routes to alternative network paths that are healthy.

Some media applications support buffering traffic before the media is played to the user. This buffering can help guard against jitter and varying network latencies. For media streams that can buffer audio or video, decreasing jitter is more important than reducing the average latency.

Data Processing, Ingestion, and Backup

When you want to move, process, or back up terabytes of data in AWS, the network is an important consideration. Data transfer can occur from on premises or within a VPC. It may also include different storage services such as Amazon EBS and Amazon S3, which have different networking characteristics.

You should understand potential performance limitations for data transfers, particularly if the transfer rate is important for your operation. Data processing and transfer in AWS generally follows this flow:

  1. Read data and potentially metadata from storage.
  2. Encapsulate the data in a transfer protocol, such as File Transfer Protocol (FTP), Secure Copy (SCP), or HTTP.
  3. Transfer the data over a network, such as VPN, the Internet, or AWS Direct Connect.
  4. Decapsulate the data and perform validation or other processes.
  5. Write the data to storage.

Network transfer is one part of the overall performance equation. The other performance components are storage IOPS, read performance, metadata processing, and write performance. It is possible for storage performance to be the primary bottleneck. You can try benchmarking different network configurations, test an entirely local transfer, and monitor storage performance rates to determine the relationship between the network and the storage transfer rate. If there are Amazon EC2 instances involved, such as VM Import/Export, you can also try using different instance sizes.

On-Premises Data Transfer

On-premises use cases may include migration to AWS, on-premises processing, or backup data. In addition to the storage components mentioned above, on-premises networking affects data movement performance.

One primary performance aspect is the existing Internet or private circuits available between transfer points. For Internet transfers, the existing Internet connection bandwidth and utilization can be a bottleneck. If there is a single 20 Mbps Internet connection that’s 50 percent utilized, that provides 10 Mbps of available bandwidth. For large transfers, AWS Direct Connect can provide a dedicated circuit with less latency and more predictable throughput. Provisioning AWS Direct Connect takes more time than configuring a VPN, however, so timing is a consideration.

Security is an important factor for data transfer. If the data transfer is over insecure protocols, we suggest using encryption over any untrusted connections. The techniques for increasing performance through VPN mentioned in the VPN Performance section of this chapter can be used in this scenario. Note that IP Security (IPsec) can limit performance due to the encapsulation involved.

There are many additional services and concepts that are outside the scope of the AWS Certified Advanced Networking – Specialty Exam. Consider using services like AWS Snowball, AWS Snowmobile, or AWS Storage Gateway for transferring datasets larger than 1 TB. Amazon S3 has additional optimizations such as Transfer Acceleration and multipart uploads. There may also be additional optimizations at the operating system level to tune window scaling, interrupts, and Direct Memory Access (DMA) channels.

Network Appliances

Routers, VPN appliances, NAT instances, firewalls, intrusion detection and prevention systems, web proxies, email proxies, and other network services have historically been hardware-based solutions in the network. On AWS, these solutions are implemented virtually on Amazon EC2. Since these solutions are deployed on operating systems on Amazon EC2, they begin to look like applications themselves, even if they participate in routing.

Some, or even all, of your VPC traffic can be forwarded through these instances, so it’s important to improve performance. You should size the instance correctly by testing your required throughput on different instance types. Enhanced networking is highly important for performance as well. It’s common for the network appliance to connect to external networks. If this is the case, placement groups will not increase performance to destinations outside of a placement group.

These network appliances may require multiple interfaces in different subnets to achieve different routing policies. You should understand that there are maximum interface counts and maximum amounts of IP addresses that you can apply per interface. As of this writing, a c4.8xlarge can have 8 interfaces with 30 IP addresses per interface. The quantity of network interfaces does not affect performance characteristics if the instance type supports enhanced networking. Remember, additional network interfaces do not change the networking performance.

For Amazon EC2 VPN instances, you should understand that the additional IPsec headers reduce the overall throughput because there is less data in each 1,500-byte frame. It’s important to reduce the MTU to allow additional room for headers for protocols like IPsec. Most applications will have mixed packet sizes that are less than the MTU, so Amazon EC2 VPN endpoints are likely to be bound by packets per second rather than CPU, memory, or network bandwidth.

One of the benefits of operating on AWS is scalability and the ability to build fault-tolerant applications—this concept applies to network appliances. If possible, network appliances should be able to use Auto Scaling and interact with Elastic Load Balancing to scale and be fault tolerant. The Network Load Balancer supports long-lived sessions based on source IP address, destination IP address, source port, destination port, and protocol (5-tuple hash), making it well suited for network applications that use TCP.

Routing traffic through instances is accomplished by modifying the routing table of subnets. Each subnet can have a route for a certain prefix (for example, default route) to the elastic network interface of a network appliance instance. When routing traffic to an elastic network interface, remember that you are responsible for the fault tolerance and availability of that route. By default, this route or elastic network interface requires additional configuration to be fault tolerant. Some approaches include Amazon EC2 instances with AWS Identity and Access Management (IAM) roles that allow them to modify routes or attach the elastic network interface to a new instance when the instance detects a failure. AWS has published some example scripts to accomplish failover in NAT instances (see https://aws .amazon.com/articles/2781451301784570). Another approach could include using AWS Lambda to monitor and provide fault tolerance.

Performance Testing

Running performance tests and establishing a baseline is important for applications with high network performance requirements, as mentioned previously in this chapter.

Amazon CloudWatch Metrics

Amazon CloudWatch metrics make it easy to observe and collect data about your networks. Amazon CloudWatch metrics are available for many AWS Cloud services, including Amazon EC2. Amazon EC2 instances have a variety of CPU, memory, disk, and networking metrics available by default in five-minute increments. To receive one-minute metrics, detailed monitoring can be enabled for an additional cost. For the exam, you should understand which metrics are available, but you do not need to memorize specific details. Table 9.1 lists available instance networking metrics in Amazon CloudWatch.

TABLE 9.1 Instance Networking Amazon CloudWatch Metrics

Amazon CloudWatch Metric Description
NetworkIn The number of bytes received on all network interfaces by the instance. This metric identifies the volume of incoming network traffic to a single instance.
NetworkOut The number of bytes sent out on all network interfaces by the instance. This metric identifies the volume of outgoing network traffic from a single instance.
NetworkPacketsIn The number of packets received on all network interfaces by the instance.
NetworkPacketsOut The number of packets sent out on all network interfaces by the instance.

In addition to instance metrics, Amazon CloudWatch metrics is available for Amazon VPN. Table 9.2 lists available Amazon EC2 VPN metrics in Amazon CloudWatch.

TABLE 9.2 Amazon EC2 VPN Amazon CloudWatch Metrics

Amazon CloudWatch Metric Description
TunnelState The state of the tunnel. 0 indicates DOWN and 1 indicates UP.
TunnelDataIn The bytes received through the VPN tunnel.
TunnelDataOut The bytes sent through the VPN tunnel.

Table 9.3 lists available AWS Direct Connect metrics in Amazon CloudWatch.

TABLE 9.3 AWS Direct Connect Amazon CloudWatch Metrics

Amazon CloudWatch Metric Description
ConnectionState The state of the connection. 0 indicates DOWN and 1 indicates UP.
ConnectionBpsEgress The bit rate for outbound data from the AWS side of the connection.
ConnectionBpsIngress The bit rate for inbound data to the AWS side of the connection.
ConnectionPpsEgress The packet rate for outbound data from the AWS side of the connection.
ConnectionPpsIngress The packet rate for inbound data to the AWS side of the connection.
ConnectionCRCErrorCount The number of times Cyclic Redundancy Check (CRC) errors are observed for the data received at the connection.
ConnectionLightLevelTx Indicates the health of the fiber connection for egress (outbound) traffic from the AWS side of the connection.
ConnectionLightLevelRx Indicates the health of the fiber connection for ingress (inbound) traffic to the AWS side of the connection.

Testing Methodology

Application performance is a combination of memory, CPU, networking, storage, application architecture, latency, and other factors. Testing different configurations and settings is a cost-effective way to determine how to increase performance. There are a wide array of tools and methods available, so this section will focus on testing as it relates to AWS networking.

Throughput Testing

It is possible to baseline network performance for an instance type so that you know where the performance boundaries are. Here are some considerations for testing network throughput on AWS.

  • The instance type and size will largely determine maximum throughput and packets per second.
  • Test the right scenario. If your application will be communicating between Availability Zones or outside of the VPC, test that flow. Testing traffic within an Availability Zone or placement group will provide different performance than between Availability Zones.
  • Enable enhanced networking for optimal performance.
  • Test over multiple flows, test over multiple copies of the application, and distribute the test over multiple instances. Tools that are single-threaded or use a single TCP flow are not likely to maximize network throughput.
  • Test with jumbo MTUs within your VPC to maximize throughput.
  • TCP and UDP will react differently to congestion and latency, so test the protocols that your application will be using.
  • With high-latency connections, you can try tuning different TCP parameters, such as TCP implementation, congestion window sizes, and timers.

Solution Testing

Testing network throughput and connectivity is helpful, but ultimately you care about the application’s performance over the network. We suggest performing end-to-end testing of metrics for the application. This may involve using tools to simulate web requests, transactions, or data processing. With the data from your network tests, you can identify network bottlenecks using Amazon CloudWatch reports and operating system statistics. This is where trying different approaches such as placement groups, larger instance sizes, and more distributed systems increases application performance. There are tools like Bees with Machine Guns that can run a distributed load test.

One helpful tool to investigate further is a packet capture on the network. A packet capture is a raw dump of the traffic sent or received over the network. The easiest way to do this in a VPC is to run a packet capture locally on the instance with tools such as tcpdump. External tools can run packet loss analysis, or you can look for TCP retransmissions that indicate packet loss. These actions are also an effective way to determine if the network has latency or if it is the application. The packet timing can determine when the host receives network packets and how quickly the application responds.

Summary

In this chapter, you learned about the different aspects of network performance in AWS networks. Each application and network topology can interact differently, so understanding the relationship between features, instances, technologies, and applications is core to improving performance. Every modern application is dependent on the network for performance, so every improvement can make many applications more responsive, run faster, or be more reliable.

It is important to understand core concepts such as bandwidth, latency, packet loss, jitter, and throughput. After you understand core performance as a concept, the next step is to understand what features AWS offers to increase performance. In addition to the available features, it is important to understand the differences between instances and enhanced networking support. With that knowledge, you can start to apply AWS features to different types of specialized applications and networking scenarios. To validate the concepts, you should run both networking and application performance tests to further tweak and tune networking configuration.

To provide higher network performance, AWS offers a variety of features such as placement groups, Provisioned IOPS, enhanced networking, and jumbo frames. We reviewed the network performance characteristics that instances can have, such as enhanced networking, which reduces jitter, increases throughput, and improves reliability.

Network performance is a critical component of applications for HPC, high-throughput data processing, and real-time media, as well as for network appliances. Each of these use cases has differing requirements for latency, jitter, packet loss, and throughput. Those differences will change the outcome of the network architecture and networking features required for optimal performance.

Theory and features are helpful, but ensuring that performance is achieved for your applications to function in an efficient manner is paramount. It is important to know how to test your network and understand the baseline and peak capabilities of your environment. Experiencing a full deployment and testing it will drive further validation and opportunities for tweaks and improvements.

Resources to Review

For further review, check out the following URLs:

Exam Essentials

Understand latency, jitter, bandwidth, and packets per second. Latency is the time delay between two points of the network. Jitter is the variance in delay between two points of the network. Bandwidth is the maximum amount of data that can be transferred at one point of the network. Packets per seconds is the rate of packets that a point of the network is transmitting or receiving.

Understand throughput and the relationship to latency. Throughput is the successful transfer rate between two points in a network. This is different from bandwidth, which is simply the maximum possible transfer rate. Throughput is affected by packet loss, protocol, latency, MTU, and other components such as storage and application processing.

Know the relevance of the MTU and jumbo frames. The MTU is the largest Ethernet frame that can be sent on a network. Most networks, including the Internet, use a 1,500-byte MTU. This is the maximum in AWS, except within a VPC where the MTU is 9,001 bytes. Any MTU over 1,500 bytes is considered a jumbo frame. The MTU increases throughput because each packet can carry more data while maintaining the same packets per second.

Understand the relationship between instance size and bandwidth. Each instance is given a certain bandwidth, ranging from low to 25 Gigabit. As you increase the instance size within a family, you generally get higher bandwidths.

Understand Amazon EBS-optimized instances. Amazon EBS-optimized instances have provisioned network bandwidth that allows instances to fully utilize the IOPS available on the network-attached storage.

Understand when and why to use a placement group. A placement group is a logical grouping of instances within an Availability Zone that is designed to provide instances with the highest bandwidth, lowest latency, and highest packets per second. Placement groups are useful for high-performance applications that communicate within a single Availability Zone.

Understand what enhanced networking offers and what is required to support it. Enhanced networking enables instances to use SR-IOV to provide lower latency, more reliability, and higher packet per second performance for instances. Instances must support either the Intel 82599 Virtual Function driver or the ENA driver on a variety of Linux, Windows, and FreeBSD operating systems. In addition to driver support, the AMI or instance must be flagged for enhanced networking support.

Understand some of the steps required to optimize performance for an instance. Important steps include enabling enhanced networking, configuring the operating system for jumbo MTUs, and trying larger instance types. You should also understand the benefits of using multiple instances and flows for performance.

Understand the limitations of instances, placement groups, and flows. Instances will have bandwidth and packets per second limitations that differ based on enhanced networking support, the instance family, the instance size, whether the traffic flow is within a placement group, and how many flows are used. Any single instance or flow leaving an Availability Zone is limited to 5 Gbps for instances using the ixgbevf driver and 25 Gbps for ENA enabled instances. Any single flow inside a placement group is limited to 10 Gbps, even for 20 Gigabit instance types.

Understand network credits and the benefit they offer. Certain instance families, such as the Memory-Optimized R4 and Compute-Optimized C5, have network I/O credits. This feature allows instances to utilize higher network bandwidths if they have accrued credits. They accrue more credits the longer they run while remaining under their bandwidth allocation. You should understand that this network credit may cause variance in load testing, depending on how long the instance has been running and its throughput.

Understand the impact that AWS Direct Connect can have on performance. AWS Direct Connect allows for control over network paths, which can decrease latency, reduce jitter, increase bandwidth, and increase reliability. Bandwidth is only limited by the port speeds. Even though traffic is treated equally on AWS, it is possible to use QoS on networks connected to AWS Direct Connect that support QoS.

Understand the performance advantages of the Network Load Balancer. The Network Load Balancer is a Layer 3 and Layer 4 load balancer that can reduce latency and scale for incoming traffic. It has lower latency than other Elastic Load Balancing options, measured in microseconds.

Understand how to apply networking features to specialized workloads. HPC requires low latency and high bandwidth, so we recommend using placement groups. Enabling enhanced networking and reducing jitter are important for applications like real-time media. Applications that require heavy data processing should spread flows over multiple instances for higher throughput. Network appliances like proxies should support enhanced networking and have an appropriate instance size for their required bandwidth.

Learn how to investigate network performance through testing. You should understand the Amazon CloudWatch metrics that are available for both Amazon EC2 and VPN. End-to-end system testing differs from load testing the networking capacity of a single instance because applications have many other dependencies such as storage, application logic, and latency that will affect overall performance.

Exercises

Increasing network performance requires both understanding concepts and putting them into practice. The best way to grasp these concepts and interconnected relationships is to measure performance, change settings, and measure again.

For assistance completing these exercises, refer to the Amazon VPC User Guide located at http://aws.amazon.com/documentation/vpc/ and https://aws.amazon.com/documentation/ec2/.

Note that using the same instance type for the following exercises can better help you compare network performance in different scenarios.

Review Questions

  1. In order to decrease the number of instances that have inbound web access, your team has recently placed a Network Address Translation (NAT) instance on Amazon Linux in the public subnet. The private subnet has a 0.0.0.0/0 route to the elastic network interface of the NAT instance. Users are complaining that web responses are slower than normal. What are practical steps to fix this issue? (Choose two.)

    1. Replace the NAT instance with a NAT gateway.
    2. Enable enhanced networking on the NAT instance.
    3. Create another NAT instance and add another 0.0.0.0/0 route in the private subnet.
    4. Try a larger instance type for the NAT instance.
  2. Voice calls to international numbers from inside your company must go through an open-source Session Border Controller (SBC) installed on a custom Linux Amazon Machine Image (AMI) in your Virtual Private Cloud (VPC) public subnet. The SBC handles the real-time media and voice signaling. International calls often have garbled voice, and it is difficult to understand what people are saying. What may increase the quality of international voice calls?

    1. Place the SBC in a placement group to reduce latency.
    2. Add additional network interfaces to the instance.
    3. Use an Application Load Balancer to distribute load to multiple SBCs.
    4. Enable enhanced networking on the instance.
  3. Your big data team is trying to determine why their proof of concept is running slowly. For the demo, they are trying to ingest 1 TB of data from Amazon Simple Storage Service (Amazon S3) on their c4.8xl instance. They have already enabled enhanced networking. What should they do to increase Amazon S3 ingest rates?

    1. Run the demo on-premises and access Amazon S3 from AWS Direct Connect to reduce latency.
    2. Split the data ingest on more than one instance, such as two c4.4xl instances.
    3. Place the instance in a placement group and use an Amazon S3 endpoint.
    4. Place a Network Load Balancer between the instance and Amazon S3 for more efficient load balancing and better performance.
  4. Your database instance running on an r4.large instance seems to be dropping Transmission Control Protocol (TCP) packets based on a packet capture from a host with which it was communicating. During initial performance baseline tests, the instance was able to handle peak load twice as high as its current load. What could be the issue? (Choose two.)

    1. The r4.large instance may have accumulated network credits before load testing, which would allow higher peak values.
    2. There may be additional database processing errors causing connection timeouts.
    3. The read replica database should be placed in a separate Availability Zone.
    4. The Virtual Private Network (VPN) session should be configured for dynamic Border Gateway Protocol (BGP) routing for higher availability.
  5. Your development team is testing the performance of a new application using enhanced networking. They have updated the kernel to the latest version that supports the Elastic Network Adapter (ENA) driver. What are the other two requirements for support? (Choose two.)

    1. Use an instance that supports the ENA driver.
    2. Support the Intel Virtual Function driver in addition to the ENA driver.
    3. Flag the Amazon Machine Image (AMI) for enhanced networking support.
    4. Enable enhanced networking on the elastic network interface.
  6. The new architecture for your application involves replicating your stateful application data from your Virtual Private Cloud (VPC) in US East (Ohio) to Asia Pacific (Tokyo). The replication instances are in public subnets in each region and communicate with public addresses over Transport Layer Security (TLS). Your team is seeing much lower replication throughput than they see within a single VPC. Which steps can you take to improve throughput?

    1. Increase the application’s packets per second.
    2. Configure the Maximum Transmission Unit (MTU) to 9,001 bytes on each instance’s eth0 to support jumbo frames.
    3. Create a Virtual Private Network (VPN) connection between the regions and enable jumbo frames on each instance.
    4. None of the above
  7. Which networking feature will provide the most benefits to support a clustered computing application that requires very low latency and high network throughput?

    1. Enhanced networking
    2. Network Input/Output (I/O) credits
    3. Placement groups
    4. Amazon Route 53 performance groups
  8. What would you recommend to make a scalable architecture for performing very high throughput data transfers?

    1. Use enhanced networking.
    2. Configure the Amazon Virtual Private Cloud (Amazon VPC) routing table to have a single hop between every instance in the VPC.
    3. Distribute the flows across many instances.
    4. Advertise routes to external networks with Border Gateway Protocol (BGP) to increase routing scale.
  9. One of the applications that you want to migrate to AWS has high disk performance requirements. You need to guarantee certain baseline performance with low latency. Which feature can help meet the performance requirements of this application?

    1. Amazon Elastic Block Store (Amazon EBS) Provisioned Input/Output Per Second (IOPS)
    2. Amazon Elastic File System (Amazon EFS)
    3. Dedicated network bandwidth
    4. Quality of Service (QoS)
  10. Your application developers are facing a challenge relating to network performance. Their application creates a buffer to accept network data so that it can be analyzed and displayed in real time. It seems that packets have delays of between 2 milliseconds and 120 milliseconds, however. Which network characteristic do you need to improve?

    1. Bandwidth
    2. Latency
    3. Jitter
    4. Maximum Transmission Unit (MTU)
  11. The operations group at your company has migrated one of your application components from C4 instances to C5 instances. The networking performance is not as high as expected, however. What could be this issue? (Choose two.)

    1. Instance routes have become more specific, creating network latency.
    2. The operating system does not have the ixgbevf module installed.
    3. The instance type does not support the Elastic Network Adapter (ENA) driver.
    4. The instance or Amazon Machine Image (AMI) is no longer flagged for enhanced networking.
  12. Your application is having a slower than expected transfer rate between application tiers. What is the best option for increasing throughput?

    1. Use a single Network Load Balancer in front of each instance.
    2. Enable Quality of Service (QoS).
    3. Reduce the jitter in the network.
    4. Increase the Maximum Transmission Unit (MTU).
  13. Your company has an application that it would like to share with a business partner, but the performance of the application is business-critical. The network architects are discussing using AWS Direct Connect to increase performance. Which of the following are performance advantages of AWS Direct Connect compared to a Virtual Private Network (VPN) or Internet connectivity? (Choose three.)

    1. Lower latency
    2. Ability to use jumbo frames
    3. Ability to configure Quality of Service (QoS) on the AWS Direct Connect provider’s circuits
    4. Lower egress costs
    5. Ability to perform detailed monitoring of the AWS Direct Connect connections
  14. What information is most efficient to determine whether a workload is CPU bound, bandwidth bound, or packets per second bound? (Choose four.)

    1. Amazon CloudWatch CPU metrics
    2. Packet captures
    3. Elastic network interface count
    4. Amazon CloudWatch network bytes metrics
    5. Amazon CloudWatch packets per second metrics
    6. Kernel version
    7. Host CPU information
  15. Your organization is planning on connecting to AWS. The organization has decided to use a specific Virtual Private Network (VPN) technology for the first phase of the project. You are tasked with implementing the VPN server in a Virtual Private Cloud (VPC) and optimizing it for performance. What are important considerations for Amazon Elastic Compute Cloud (Amazon EC2) VPN performance? (Choose two.)

    1. The VPN instance should support enhanced networking.
    2. Because all VPN connections use the Virtual Private Gateway (VGW), it’s important to scale the VGW horizontally.
    3. IP Security (IPsec) VPNs should use a Network Load Balancer to create a more scalable VPN service.
    4. Investigate packet per second limitations and bandwidth limitations.
  16. Your research and development organization has created a mission-critical application that requires low latency and high bandwidth. The application needs to support AWS best practices for high availability. Which of the following is not a best practice for this application?

    1. Deploy the application behind a Network Load Balancer for scale and availability.
    2. Use a placement group for the application to guarantee the lowest latency possible.
    3. Enable enhanced networking on all instances.
    4. Deploy the application across multiple Availability Zones.
  17. Your security department has mandated that all traffic leaving a Virtual Private Cloud (VPC) must go through a specialized security appliance. This security appliance runs on a bespoke operating system that users cannot access. What considerations are the most important for this operating system performance on AWS? (Choose two.)

    1. Driver support for the Intel Virtual Function and Elastic Network Adapter (ENA)
    2. Support for Amazon Linux
    3. Instance family and size support
    4. Domain Name System (DNS) resolution speed
  18. Your company has deployed a bursty web application to AWS and would like to improve the user experience. It is important for only the web host to have the private key for Transport Layer Security (TLS), so the Classic Load Balancer has a listener on Transmission Control Protocol (TCP) port 443. What are some approaches that you can use to reduce latency and improve the scale-out process for the application?

    1. Use an Application Load Balancer in front of the application, enabling better utilization of multiple target groups with different HTTP paths and hosts.
    2. Configure enhanced networking on the Classic Load Balancer for lower latency load balancing.
    3. Use Amazon Certificate Manager (ACM) to distribute new certificates to Amazon CloudFront to accomplish handling content at the edge.
    4. Use a Network Load Balancer in front of your application to increase network performance.
  19. You are in charge of creating a network architecture for a development group that is interested in running a real-time exchange on AWS. The participants of the exchange expect very low latency but do not operate on AWS. Which description most accurately describes the networking and security tradeoffs for potential network designs?

    1. Use AWS Direct Connect to connect to the exchange application. This allows for lower latency and native encryption but requires additional configuration to support multi-tenancy and agreements from participants.
    2. Configure a separate Virtual Private Network (VPN) connection on the Virtual Private Gateway (VGW) for each participant. This will allow individual scaling per participant and the lowest latency but requires customers to support VPN devices.
    3. Use AWS Direct Connect to connect to the exchange application. This allows for more control of the latency, but it requires organizing connectivity to each of the participants and provides no security guarantees.
    4. Allow participants to connect directly via the Internet. This allows for customers to come in freely but does not guarantee security. Latency can be managed with Transmission Control Protocol (TCP) tuning and network performance appliances.
  20. Which statement about Maximum Transmission Units (MTUs) on AWS is true?

    1. MTUs define the maximum throughput on AWS.
    2. You must configure a Virtual Private Cloud (VPC) to support jumbo frames.
    3. You must configure a placement group to support jumbo frames.
    4. Increasing the MTU is most beneficial for applications limited by packets per second.
  21. What is the advantage of the Data Plane Development Kit (DPDK) over enhanced networking?

    1. DPDK decreases the overhead of Hypervisor networking.
    2. Enhanced networking only increases bursting capacity, whereas DPDK increases steady-state performance.
    3. DPDK decreases operating system overhead for networking.
    4. DPDK allows deeper access to AWS infrastructure to enable new networking features that enhanced networking does not provide.
  22. What is the optimal performance configuration to enable high-performance networking for an Amazon Elastic Compute Cloud (Amazon EC2) instance operating as a firewall?

    1. One elastic network interface for all traffic.
    2. One elastic network interface for management traffic and one elastic network interface for each subnet the firewall operates in.
    3. Configure as many elastic network interfaces as possible and use operating system routing to split traffic over all interfaces.
    4. None of the above.
  23. Your team uses an application to receive information quickly from other parts of your infrastructure. It leverages low-latency multicast feeds to receive information from other applications and displays analysis. Which approach could help satisfy the application’s low latency requirements in AWS?

    1. Maintain the same multicast groups in AWS because the application will work in a Virtual Private Cloud (VPC).
    2. Work with the application owners to find another delivery system such as a message queue or broker. Place the applications in a placement group for low latency.
    3. Move the multicast application to AWS and enable enhanced networking. Configure the other applications to send their multicast feed to the application over AWS Direct Connect.
    4. Use the VPC routing table to route 224.0.0.0/8 traffic to the instance elastic network interface. Enable enhanced networking and jumbo frames for low latency and high throughput.
  24. What is bandwidth?

    1. Bandwidth is the number of bits that an instance can store in memory over a network.
    2. Bandwidth is the amount of data transferred from one point in the network to another point.
    3. Bandwidth is a measurement of the largest capacity of handling network traffic in any given path in a network.
    4. Bandwidth is the maximum data transfer rate at any point in the network.
  25. Why does User Datagram Protocol (UDP) react to performance characteristics differently than Transmission Control Protocol (TCP)?

    1. UDP requires more packet overhead than TCP.
    2. UDP supports less resilient applications.
    3. UDP is not a stateful protocol, so it reacts differently to latency and jitter.
    4. UDP lacks traffic congestion awareness.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.37.20