© Shimon Ifrah 2021
S. IfrahGetting Started with Containers in Google Cloud Platform https://doi.org/10.1007/978-1-4842-6470-6_10

10. Troubleshooting

Shimon Ifrah1  
(1)
Melbourne, VIC, Australia
 

In this chapter, we will learn how to troubleshoot the core services we’ve learned about in this book, but with more focus on using Cloud SDK and gcloud. The main goal of this chapter is to help you avoid common misconfiguration issues with GCP container services.

In this chapter, we will focus on the following topics:
  • Basic gcloud commands

  • Troubleshooting Google Kubernetes Service (GKE)

  • Troubleshooting Cloud Run and Cloud Build deployments

  • Troubleshooting GCP Container Registry

  • Troubleshooting the Compute Engine resource

Basic gcloud Commands

Let’s start with a review of the most basic commands of gcloud and how to get started after installing it.

Install Cloud SDK

To install Cloud SDK and gcloud, use the following URL and select your OS platform; as of the time of writing, you can install it on the following platforms:

https://cloud.google.com/sdk/install
  • Linux

  • MacOS

  • Windows

  • Docker container image

Initialize Cloud SDK

After installing Cloud SDK, start with the following command, which will initialize and authorize your account and configure gcloud with the right project:
$ gcloud init
To install additional components like GKE kubectl, use the following command:
$ gcloud components install name
To update a component, use the following command:
$ gcloud components update components_name
To check the version of your Cloud SDK, run the following command:
$ gcloud version
To get detailed information about your gcloud environment, type the following:
$ gcloud info
To access help, type the following:
$ gcloud help
If your session has expired, or if you would like to log in with another account, you can use the following command and follow the prompt:
$ gcloud auth login

Work with Projects

To get detailed information about a GCP project, type the following command:
$ gcloud projects describe
To change the output of the commands to a table format, run the following:
$ gcloud projects describe --format table
To set a project and start working with it, use the following:
$ gcloud config set project name

Troubleshoot Google Kubernetes Service (GKE)

In this section, we will start by going over the steps needed to connect to a GKE cluster using Cloud SDK.

Connect to a GKE Cluster

To connect to a GKE cluster using Cloud SDK, use the following process.

From the GKE cluster console, click on Connect, as shown in Figure 10-1.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig1_HTML.jpg
Figure 10-1

Connect to GKE

From the Connect to Cluster page, you have the following two options:
  • Copy the command into your terminal where you installed Cloud SDK, and authenticate to the cluster.

  • Click on Run in Cloud Shell, and, using the browser, connect to the cluster with Cloud Shell.

The only disadvantage of Cloud Shell is that sometimes it can take a few minutes to start and connect. Figure 10-2 shows the Connect to Cluster page.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig2_HTML.jpg
Figure 10-2

Connect to the cluster

The following command is an example of the gcloud command:
$ gcloud container clusters get-credentials cluster-1 --zone us-central1-c --project web-project-269903

Overloading

A lot of performance issues in GKE are caused by overloading the cluster with too many deployments. To overcome the overloading issue, I recommend you enable Auto-scaling on the cluster.

To check which workloads are running on your cluster, you can use the Workloads console located on the left navigation menu, as shown in Figure 10-3.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig3_HTML.jpg
Figure 10-3

Workloads

If you click on each workload that is running on the cluster, you will see the resource consumption utilization, as shown in Figure 10-4.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig4_HTML.jpg
Figure 10-4

Resource usage

If you click on the Details tab, you will get a detailed view of the deployment that includes the following details:

Cluster name

Namespace

When it was created

Figure 10-5 shows the Details tab.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig5_HTML.jpg
Figure 10-5

Details tab

Auto-scaling

If your workloads are slow and you feel that the performance is not where it should be, I recommend you check if auto-scaling is enabled, and, if not, enable it. Running a GKE cluster without auto-scaling is not recommended since the cluster can run out of resources if no one is actively checking the resource utilization of the cluster. Auto-scaling takes the guesswork away and lets GKE manage the resource utilization.

To enable auto-scaling on your GKE cluster, edit your cluster and enable the Auto-provisioning options, as shown in Figure 10-6.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig6_HTML.jpg
Figure 10-6

Auto-scaling

To prevent issues with your cluster, I strongly recommend you enable all the cluster and pod automation features GKE has to offer. By using automation, your GKE cluster will auto-scale and fix issues that arise as a result of large deployment, updates, and workloads.

Troubleshoot Cloud Run and Cloud Build Deployments

In this section, we will cover some common practices that will help you troubleshoot and prevent issues with your Cloud Run service. Because Cloud Run is a fully managed service and can be considered as a serverless solution, the main issues that will arise will be performance issues. I have the following recommendations:
  • Keep your container images in the same region of your Cloud Run deployment for maximum performance. Pulling a Docker image from a GCR registry in a different region will cause latency in the container startup time.

  • Space your deployment accordingly and don’t go with the default option. In the Capacity section of the wizard, you have the option to set the memory and CPU allocation, as shown in Figure 10-7.
    ../images/496339_1_En_10_Chapter/496339_1_En_10_Fig7_HTML.jpg
    Figure 10-7

    Capacity

  • By default, Cloud Run is configured with an auto-scaling feature that will scale the number of containers if the load is high, so make sure you review the settings before deploying your Cloud Run service.

Console Logs

To troubleshoot your Cloud Run application, you can access the console logs from the Logs tab and see what is going inside the container. Figure 10-8 shows the console logs’ output.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig8_HTML.jpg
Figure 10-8

Logs

The console also shows real-time logs, and in my case every time someone accesses the application, an entry will appear in the console. In the case of an application issue, these logs can be very useful and helpful.

If you need to troubleshoot RBAC permissions issues to Cloud Run, you can use the Permissions tab to see who has access and which access level he or she has. In Figure 10-9, you can see the RBAC permissions level.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig9_HTML.jpg
Figure 10-9

Permissions

Cloud Build Triggers

The most common issue that I have seen with Cloud Build deployments is the trigger configuration. Make sure your Cloud Build trigger is enabled by checking that the status is set to “Enable” from the console, as shown in Figure 10-10.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig10_HTML.jpg
Figure 10-10

Trigger status

To check Cloud Build logs and history, use the History section on the left navigation menu of the Cloud Build console. Figure 10-11 shows the build history.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig11_HTML.jpg
Figure 10-11

History

In the case of a failed deployment, clicking on the deployment will show all the steps taken and where the deployment failed. Figure 10-12 shows the build steps of a failed deployment.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig12_HTML.jpg
Figure 10-12

Build steps

Troubleshoot GCP Container Registry

The most common issues I have seen with Google Container Registry (GCR) is that the wrong region is set to host the images. Using the wrong region can cause performance issues.

The following three regions are available in GCR.
  • gcr.io — United States

  • eu.gcr.io — Europe

  • asia.gcr.io — Asia

When you tag your image make sure you tag it with the correct location. If your apps are running in the United States, make sure you align the image location with gcr.io.

Troubleshoot Compute Engine Resource

In this section, I will cover a couple of known issues that you need to pay attention to when working with Compute Engine VM resources.

Select the Right Machine Family

Many performance issues in public clouds are related to a poor selection of instance type. It is very easy to make a mistake and select a general-purpose instance for running a database server. When selecting your instance, make sure you select a VM instance that is suitable for your workloads.

GCP offers the following three main machine family types:
  • General-purpose

  • Memory-optimized

  • Compute-optimized

Based on the family type and your application type, you can select the best series that will fit your needs. Figure 10-13 shows the main machine family types.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig13_HTML.jpg
Figure 10-13

Machine family types

Firewall Ports

Another issue that is very common is related to exposing your application to external access over the internet. By default, all ports are closed except remote desktop and SSH for Linux machines. Ports HTTP and HTTPS can be opened from the Firewalls section of the VM during setup or by editing an existing VM. Figure 10-14 shows the Firewalls options.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig14_HTML.jpg
Figure 10-14

Firewalls options

Open Non-standard Ports

If you need to open ports that are different from HTTP and HTTPS, you will need to open them from the Firewalls section of your VPC network. By default, your GCP workloads are protected by a virtual firewall that is attached to your Virtual Private Cloud (VPC) network. To access your VPC firewall, search for VPC network or Firewall from the GCP management console search bar.

Figure 10-15 shows the Firewall console located in the VPC network page.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig15_HTML.jpg
Figure 10-15

Firewall page

To open a port different from HTTP and HTTPS, you need to add a network tag to your virtual machine.

Add Network Tag

To add a network tag, edit your VM and scroll down to the Firewalls section. In the Network Tags section, type a name that describes your host. For this demonstration, I will type dockerhost and save the VM configuration.

Figure 10-16 shows the Network Tags section.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig16_HTML.jpg
Figure 10-16

Network tags

After tagging the VM, open the Firewall console from the VPC network console. Click on Create a Firewall Rule, as shown in Figure 10-17. Fill in the details and make sure you type the name of your VM network tag in the Target Tags section.
../images/496339_1_En_10_Chapter/496339_1_En_10_Fig17_HTML.jpg
Figure 10-17

Create a firewall rule

Add the source address; for everything use 0.0.0.0/0. In the Protocols and Ports section, type the TCP port number and click Save.

Summary

In this last chapter of the book, we covered a few troubleshooting issues and strategies that you might come across and need when using the following services:
  • gcloud

  • GKE

  • Cloud Run

  • Cloud Build

  • Compute Engine

You must learn how to use gcloud command-line tools and develop a library of code that will help you redeploy workloads and save time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.240.178