Chapter 8. Troubleshooting

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Troubleshooting

This chapter provides information about how to fix some common issues with IBM Cloud Private. It shows you how to collect log information and open a request with the IBM Support team.

This chapter has the following sections:

•8.1, “Common errors during the IBM Cloud Private installation” on page 274

•8.2, “Network configuration errors” on page 277

•8.3, “Common errors when installing a Helm chart” on page 281

•8.4, “Common errors when running applications” on page 286

•8.5, “Opening a support case” on page 287

8.1 Common errors during the IBM Cloud Private installation

This section gives you some tips on how to troubleshoot IBM Cloud Private installation problems.

8.1.1 Customizing the config.yaml file

While installing IBM Cloud Private, you need to customize the config.yaml file located at /<installation_directory>/cluster/config.yaml.

The list of required parameters to be configured on a config.yaml file is available at https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/installing/install_containers.html see step #3 “Customize your cluster”.

The first parameter that you need to configure in the config.yaml file is the admin user and password. The admin password policy has been changed in IBM Cloud Private version 3.1.2 and now requires, by default, at least 32 characters. If the password does not match the requirements, the installation log will show an error as shown in Example 8-1.

Example 8-1 Password problem

TASK [Checking if setting password or not] *************************************

fatal: [localhost]: FAILED! => changed=false

msg: 'The password is not set. You must specify a password that meets the following criteria: ''^([a-zA-Z0-9-]{32,})$'''

NO MORE HOSTS LEFT *************************************************************

To fix the previous issue, go to /<installation_directory>/cluster/config.yaml and define a password that matches the standard that has at least 32 alphanumeric characters.

If you want to change the policy, you can use the regular expression that best fits your company policy as described at the following link:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/installing/install_containers.html

If your server has more than one Ethernet adapter and you are installing the IBM Cloud Private Enterprise Edition, you also need to configure the following parameters:

•cluster_lb_address: <external address>

•proxy_lb_address: <external address>

The external address is the IP address of the adapter from which the incoming traffic will be coming to the server.

For the full options of the cluster.yaml file configuration, see the following URL:

https://www.ibm.com/support/knowledgecenter/SSBS6K_3.1.2/installing/config_yaml.html

8.1.2 Customizing the /cluster/hosts file

The hosts file in the cluster directory is used to define the cluster architecture and configures how the workers, masters, and management servers are distributed.

When configuring persistence storage, you need to group them in the hosts file. See the full specification of the hosts file configuration at the following link:

https://www.ibm.com/support/knowledgecenter/SSBS6K_3.1.2/installing/hosts.html

If the hosts file configuration is not done, the error described in Example 8-2 will be displayed.

Example 8-2 The hosts file configuration error

fatal: [...]: UNREACHABLE! => changed=false

msg: |-

Failed to connect to the host via ssh: ssh: Could not resolve hostname ...: Name does not resolve

unreachable: true

Tip: When planning for the installation of IBM Cloud Private, it is highly advised that you define all of the functions that you want your server to run, because some of the customizations on the hosts file require IBM Cloud Private to be uninstalled and installed again.

If you plan to use storage for persistent data, you might need to add the host group in the hosts file for that particular type of storage. Read the documentation about the storage system you are using and make sure that the prerequisites are met before running the IBM Cloud Private installation. For more information, see Chapter 4, “Managing persistence in IBM Cloud Private” on page 115.

8.1.3 SSH key error

During the installation preparation, it is required to generate and exchange the SSH key between the nodes. In addition, you need to copy the SSH key to the IBM Cloud Private cluster installation folder (/<installation_directory>/cluster). If this step is not performed, you will receive the error message in Example 8-3 during the installation.

Example 8-3 SSH key error

fatal: [9.46.67.246]: UNREACHABLE! => changed=false

msg: |-

Failed to connect to the host via ssh: Load key "/installer/cluster/ssh_key": invalid format

Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).

unreachable: true

To correct the error, copy the ~/.ssh/id_rsa to /<installation_directory>/cluster/ssh_key

This procedure is described in “Step 2: Set up the installation environment item 9” at the following link:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/installing/install_containers.html

8.1.4 Missing the IBM Cloud Private binary files in the installation folder

To have the IBM Cloud Private installed you need to copy the binary files to the /<installation_directory>/cluster/images folder.

If this step is not performed, the error shown in Example 8-4 is displayed during the installation.

Example 8-4 Missing binary files error

TASK [icp-registry-image : Aborting installation process] **********************

fatal: [9.46.67.246]: FAILED! => changed=false

msg: Unable to find offline package under images directory

This procedure is described in step 2: Set up the installation environment item 10 at the following link:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/installing/install_containers.html

8.1.5 Missing the minimum system requirements

Missing the minimum system requirements could cause random errors during the installation. If the installation completes, it might present an error during a new Helm chart deployment, or when trying to access an existing chart or running a function in your IBM Cloud Private environment.

To avoid these kinds of errors, it is required that the system matches at least the minimum system requirements. You can see the system requirements at this link:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/supported_system_config/hardware_reqs.html

Also, it is suggested that you need to evaluate the sizing of the cluster before the installation. See this link for information about how to size your cluster:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/installing/plan_capacity.html

8.1.6 Perform the system cleanup when the installation fails

If the installation fails, you need to uninstall IBM Cloud Private with the uninstall command before trying a new installation.

Run the uninstall command:

sudo docker run --net=host -t -e LICENSE=accept -v "$(pwd)":/installer/cluster ibmcom/icp-inception-amd64:3.1.2-ee uninstall

After running the uninstaller, you need to run the commands described in Example 8-5 to make sure that the system is clean and ready for a new installation.

Example 8-5 Making sure the system is ready for a new installation

sudo systemctl stop kubelet docker

sudo systemctl start docker

sudo docker rm $(sudo docker ps -qa)

if sudo mount | grep /var/lib/kubelet/pods; then sudo umount $(sudo mount | grep /var/lib/kubelet/pods | awk '{print $3}'); fi

sudo rm -rf /opt/cni /opt/ibm/cfc /opt/kubernetes

sudo rm -rf /etc/cfc /etc/cni /etc/docker/certs.d

sudo rm -rf /var/lib/etcd/* /var/lib/etcd-wal/*

sudo rm -rf /var/lib/mysql/*

sudo rm -rf /var/lib/kubelet/* /var/lib/icp/* /var/lib/calico

echo done

sudo rm -rf {{ .icpInstallDir }}/cluster/cfc-certs {{ .icpInstallDir }}/cluster/cfc-components {{ .icpInstallDir }}/cluster/cfc-keys {{ .icpInstallDir }}/cluster/.addon {{ .icpInstallDir }}/cluster/.misc ; echo done

After completing the previous steps, perform the installation again.

8.2 Network configuration errors

This section describes how to troubleshoot the IBM Cloud Private networking components, such as Calico and IPsec.

8.2.1 Calico troubleshooting

Calico network issues might show up during or after an IBM Cloud Private installation. During the installation, the installer runs checks to ensure seamless pod-to-pod connectivity in the cluster. However, if there are issues, the following information might help to identify the possible causes and resolve the issues.

Problems during the installation

To avoid Calico network issues during the installation, ensure that the following settings are correctly configured.

The calico_ipip_enabled parameter must be set to true if all the nodes in the cluster do not belong to the same subnet. This parameter must be set to true if the nodes are deployed in a cloud environment such as OpenStack, where source and destination checks prevent the IP traffic from unknown IP ranges, even if all the nodes belong to the same subnet. This configuration enables encapsulation of pod to pod traffic over the underlying network infrastructure.

The calico_ip_autodetection_method parameter must be set, so that Calico uses the correct interface on the node. If there are multiple interfaces, aliases, logical interfaces, bridge interfaces, or any other type of interfaces on the nodes, use either the following settings to ensure that the auto-detect mechanism chooses the correct interface.

•calico_ip_autodetection_method: can-reach= (This is the default setting.)

•calico_ip_autodetection_method: interface=

•The calico_tunnel_mtu parameter must be set based on the MTU of the interface that is configured to be used by Calico.

•If the calico_ipip_enabled parameter is set to true, 20 bytes are used for IP-IP tunnel header. It is required to set the calico_tunnel_mtu parameter to be at least 20 bytes less than the actual MTU of the interface.

•If IPsec is enabled, 40 bytes are needed for the IPsec packet header. Because when you enable IPsec, calico_ipip_enabled to true is set, you also need the 20 bytes for the IP-IP tunnel header. Therefore, you must set the calico_tunnel_mtu parameter to be at least 60 bytes less than the actual MTU of the interface.

•The network CIDR (Classless Inter-Domain Routing), existing host network, and the service cluster IP range must not be in conflict with each other.

Problems after the installation of IBM Cloud Private

After the cluster is installed, it could present IP connectivity issues across the pods. Service name resolution issues are a symptom of pods not being able to reach the DNS service. These problems are not always related to Calico networks.

In these situations, gather the following information from the cluster for support:

1. Get the node list:

kubectl get nodes -o wide

2. Get the logs:

Collect logs from the calico-node-* pod running on the node which is experiencing the mesh problem. Example 8-6 shows how to get the logs from calico-node-* running on node 10.10.25.71.

Example 8-6 Getting the logs

# kubectl get pods -o wide | grep calico-node

calico-node-amd64-2cbjh 2/2 Running 0 7h 10.10.25.70 10.10.25.70

calico-node-amd64-48lf9 2/2 Running 0 7h 10.10.25.71 10.10.25.71

calico-node-amd64-75667 2/2 Running 0 7h 10.10.25.7 10.10.25.7

3. Retrieve the logs from the calico-node container in the pod:

# kubectl logs calico-node-amd64-48lf9 -c calico-node-amd64

4. Get the routing table and interface details to complete this. Run the commands on master node(s) + nodes on which pods are experiencing the connectivity issues.

route -n

ifconfig -a

5. Get Calico node list running the command on the master node:

calicoctl get nodes

6. Get all the pods and endpoints on Calico mesh by running the command on the IBM Cloud Private master node:

calicoctl get workload endpoints

7. Get calico node status and diagnostics. Run the commands on the IBM Cloud Private master node and the nodes on which pods are experiencing connectivity problem:

calicoctl node status

calicoctl node diags

8. Provide config.yaml and host files from boot node.

Configuring calicoctl

Perform the following steps:

1. Log in to the node. Find the calico-ctl docker image and copy the calicoctl to node as shown in Example 8-7.

Example 8-7 Copy the calicoctl to node

# docker images | grep "icp-inception"

ibmcom-amd64/icp-inception 3.1.0-ee c816bd4546f9 2 days ago 746MB

# docker run -v /usr/local/bin:/data -t --rm -e LICENSE=accept ibmcom-amd64/icp-inception:3.1.0-ee cp /usr/local/bin/calicoctl /data

# ls /usr/local/bin/calicoctl

/usr/local/bin/calicoctl

2. Configure calicoctl to authenticate to the etcd cluster. Copy the etcd cert, key, and ca files to a node from boot node’s cluster directory:

– cert file: cluster/cfc-certs/etcd/client.pem

– key file: cluster/cfc-certs/etcd/client-key.pem

– ca file: cluster/cfc-certs/etcd/ca.pem

3. Create a calicoctl.cfg file at /etc/calico/calicoctl.cfg, with the following contents, as shown in Example 8-8.

Example 8-8 The calicoctl.cfg file

apiVersion: projectcalico.org/v3

kind: CalicoAPIConfig

metadata:

spec:

datastoreType: "etcdv3"

etcdEndpoints: "https://<master node IP>:4001"

etcdKeyFile: <File path of client-key.pem>

etcdCertFile: <File path of client.pem>

etcdCACertFile: <file path of ca.pem>

4. Change the value between <... > by the actual name.

8.2.2 IPsec troubleshooting

To configure IPSec on IBM Cloud Private, every node in the cluster you must have least two network interfaces. The first one is a management interface. The second interface provides secure networking for the pods. Specify the IP address of the management interface in cluster/hosts and the other interface name (data plane interface) in the Calico and IPsec configurations in cluster/config.yaml.

Calico networks must be enabled in IP-in-IP mode. Calico tunnel MTU must be set correctly.

The IPsec package used for encryption must be installed on all the nodes in the cluster. The IPsec package used for RHEL is libreswan. On Ubuntu and SLES, it is strongswan.

Note: All nodes in the cluster must run the same operating system.

Configuration

When performing the configuration to use IPSec, ensure that the following Calico configurations are provided in the config.yaml file (Example 8-9).

Example 8-9 The config.yaml file

network_type: calico

calico_ipip_enabled: true

calico_tunnel_mtu: 1390

calico_ip_autodetection_method: interface=eth0

In Example 8-9 on page 279, the following components have these attributes:

•calico_ipip_enabled must be true. IPIP tunnelling must be enabled for IPsec.

•calico_tunnel_mtu must be at least 60 bytes less than the interface MTU. If the eth0 interface mtu is 1450 bytes, the calico_tunnel_mtu must be set to at most 1390 bytes.

•calico_ip_autodetection_method must be configured to choose the data plane interface.

Then, verify the IPsec configuration in config.yaml, as shown in Example 8-10.

Example 8-10 Check the IPsec configuration

ipsec_mesh:

enable: true

interface: eth0

subnets: [10.24.10.0/24]

exclude_ips: [10.24.10.1/32, 10.24.10.2, 10.24.10.192/28]

cipher_suite: aes128gcm16!

Where:

•interface must be the same interface that was set in the calico_ip_autodetection_method parameter.

•subnets are the address ranges. The packets destined for such subnet ranges are encrypted. The IP address of the data plane interface must fall in one of the provided subnet ranges.

•exclude_ips are the IP addresses that are excluded from the IPsec subnet. Traffic to these IP addresses is not encrypted.

•cipher_suite: aes128gcm16! is the list of Encapsulating Security Payload (ESP) encryption/authentication algorithms to be used. The default cipher suite that is used is aes128gcm16!. Ensure that this module is available and loaded in the operating system on all the hosts. It is also possible to change it to another cipher suite.

Post installation

For the RHEL installation, perform the following steps:

1. Check the libreswan configuration:

cat /etc/ipsec.conf

cat /etc/ipsec.d/ipsec-libreswan.conf

2. Check the status of the ipsec process:

ipsec status

3. If the ipsec status does not display the established connections, check /var/log/messages for errors related to IPsec. Enable the libreswan logging by enabling plutodebug in the /etc/ipsec.conf file, as shown in Example 8-11.

Example 8-11 Enable libreswan logging

# /etc/ipsec.conf - libreswan IPsec configuration file

config setup

...

plutodebug = all # <<<<<<<<<<<<

For the Ubuntu/SLES installations, perform the following steps:

1. Check the strongswan configuration:

cat /etc/ipsec.conf

2. Check the status of the ipsec process:

ipsec status

service strongswan status

3. If the ipsec status does not display the established connections, check /var/log/syslog for errors related to IPsec.

4. Enable strongswan logging by enabling charondebug in the /etc/ipsec.conf file, as shown in Example 8-12.

Example 8-12 Enable strongswan logging

# /etc/ipsec.conf - libreswan IPsec configuration file

config setup

...

charondebug="ike 2, knl 2, cfg 2" # <<<<<<<<<<<<

If the problem persists, you can open a support ticket as described in 8.5, “Opening a support case” on page 287.

8.3 Common errors when installing a Helm chart

In this section we will describe some of the common errors when installing a Helm chart.

8.3.1 When accessing an application getting the 504 error

When you try to access an application and get a 504 error or the message that the page cannot be displayed, as seen in Figure 8-1, the best approach is to check the pod description and pod logs (if the pod was already running and stopped).

Figure 8-1 Gateway timeout

To access the pod description, you can check the pod status and pod information:

kubectl describe pods <pod> -n <namespace>

See the sample output in Example 8-13.

Example 8-13 Pod description

Name: mydatapower-ibm-datapower-dev-d95f656dd-rjk5x

Namespace: default

Priority: 0

PriorityClassName: <none>

Node: <none>

Labels: app=mydatapower-ibm-datapower-dev

chart=ibm-datapower-dev-2.0.4

heritage=Tiller

pod-template-hash=d95f656dd

release=mydatapower

Annotations: kubernetes.io/psp: ibm-privileged-psp

productID: IBMDataPowerGatewayVirtualEdition_2018.4.1.2.306098_Developers

productName: IBM DataPower Gateway Virtual Edition for Developers

productVersion: 2018.4.1.2.306098

prometheus.io/module: dpStatusMIB

prometheus.io/path: /snmp

prometheus.io/port: 63512

prometheus.io/scrape: true

prometheus.io/target: 127.0.0.1:1161

Status: Pending

IP:

Controlled By: ReplicaSet/mydatapower-ibm-datapower-dev-d95f656dd

Containers:

ibm-datapower-dev:

Image: ibmcom/datapower:2018.4.1.2.306098

Port: 8443/TCP

Host Port: 0/TCP

Command:

-c

exec /start.sh --log-format json-icp

Limits:

cpu: 8

memory: 64Gi

Requests:

cpu: 4

memory: 8Gi

Liveness: http-get http://:service/ delay=120s timeout=5s period=10s #success=1 #failure=3

Readiness: http-get http://:service/ delay=120s timeout=5s period=10s #success=1 #failure=3

Environment:

DATAPOWER_ACCEPT_LICENSE: true

DATAPOWER_INTERACTIVE: true

DATAPOWER_LOG_COLOR: false

DATAPOWER_WORKER_THREADS: 4

Mounts:

/drouter/config from mydatapower-ibm-datapower-dev-config-volume (rw)

/var/run/secrets/kubernetes.io/serviceaccount from default-token-j82nq (ro)

Conditions:

Type Status

PodScheduled False

Volumes:

mydatapower-ibm-datapower-dev-config-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: mydatapower-ibm-datapower-dev-config

Optional: false

default-token-j82nq:

Type: Secret (a volume populated by a Secret)

SecretName: default-token-j82nq

Optional: false

QoS Class: Burstable

Node-Selectors: <none>

Tolerations: node.kubernetes.io/memory-pressure:NoSchedule

node.kubernetes.io/not-ready:NoExecute for 300s

node.kubernetes.io/unreachable:NoExecute for 300s

Events:

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 52s (x2 over 52s) default-scheduler 0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.

In Example 8-13 on page 282, it is possible that the pod was not able to run due to insufficient CPU or memory. This would cause the error. To fix this issue, make sure that the server (worker) has sufficient memory and CPU to run the pod.

8.3.2 No CPU available

When looking at the pod description sometimes the message that there is not enough CPU to run the pod is displayed.

To fix the issue, add more CPU and restart the docker and Kubernetes to get the pod running.

To determine the amount of CPU being used, run the following command:

kubectl describe node <node>

The output of the command will be like that shown in Example 8-14.

Example 8-14 Check the amount of CPU being used

Name: 9.46.73.206

Roles: etcd,management,master,proxy,worker

Labels: beta.kubernetes.io/arch=amd64

beta.kubernetes.io/os=linux

etcd=true

kubernetes.io/hostname=9.46.73.206

management=true

master=true

node-role.kubernetes.io/etcd=true

node-role.kubernetes.io/management=true

node-role.kubernetes.io/master=true

node-role.kubernetes.io/proxy=true

node-role.kubernetes.io/worker=true

proxy=true

role=master

Annotations: node.alpha.kubernetes.io/ttl: 0

volumes.kubernetes.io/controller-managed-attach-detach: true

CreationTimestamp: Wed, 20 Feb 2019 14:58:20 -0800

Taints: <none>

Unschedulable: false

Conditions:

Type Status LastHeartbeatTime LastTransitionTime Reason Message

---- ------ ----------------- ------------------ ------ -------

OutOfDisk False Mon, 25 Feb 2019 14:49:08 -0800 Wed, 20 Feb 2019 14:58:20 -0800 KubeletHasSufficientDisk kubelet has sufficient disk space available

MemoryPressure False Mon, 25 Feb 2019 14:49:08 -0800 Wed, 20 Feb 2019 14:58:20 -0800 KubeletHasSufficientMemory kubelet has sufficient memory available

DiskPressure False Mon, 25 Feb 2019 14:49:08 -0800 Wed, 20 Feb 2019 14:58:20 -0800 KubeletHasNoDiskPressure kubelet has no disk pressure

PIDPressure False Mon, 25 Feb 2019 14:49:08 -0800 Wed, 20 Feb 2019 14:58:20 -0800 KubeletHasSufficientPID kubelet has sufficient PID available

Ready True Mon, 25 Feb 2019 14:49:08 -0800 Wed, 20 Feb 2019 15:49:34 -0800 KubeletReady kubelet is posting ready status

Addresses:

InternalIP: 9.46.73.206

Hostname: 9.46.73.206

Capacity:

cpu: 8

ephemeral-storage: 244194820Ki

hugepages-1Gi: 0

hugepages-2Mi: 0

memory: 16265924Ki

pods: 80

Allocatable:

cpu: 7600m

ephemeral-storage: 241995268Ki

hugepages-1Gi: 0

hugepages-2Mi: 0

memory: 15114948Ki

pods: 80

System Info:

Machine ID: cbb00030e5204543a0474ffff17ec26f

System UUID: 79E65241-2145-4307-995A-B3A5C6401F48

Boot ID: c8e1b505-e5cf-4da4-ab04-63eb5ad2d360

Kernel Version: 3.10.0-957.el7.x86_64

OS Image: Red Hat Enterprise Linux Server 7.6 (Maipo)

Operating System: linux

Architecture: amd64

Container Runtime Version: docker://18.3.1

Kubelet Version: v1.12.4+icp-ee

Kube-Proxy Version: v1.12.4+icp-ee

Non-terminated Pods: (58 in total)

Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits

--------- ---- ------------ ---------- --------------- -------------

cert-manager ibm-cert-manager-cert-manager-7dbc9c8db6-5d84q 0 (0%) 0 (0%) 0 (0%) 0 (0%)

kube-system audit-logging-fluentd-ds-dzswg 0 (0%) 0 (0%) 0 (0%) 0 (0%)

kube-system auth-apikeys-sc4k8 200m (2%) 1 (13%) 300Mi (2%) 1Gi (6%)

kube-system auth-idp-j457s 300m (3%) 3200m (42%) 768Mi (5%) 3584Mi (24%)

kube-system auth-pap-thf7x kube-system unified-router-n5v2f 20m (0%) 0 (0%) 64Mi (0%) 0 (0%)

kube-system web-terminal-6488cfff5d-mgzgw 10m (0%) 100m (1%) 64Mi (0%) 512Mi (3%)

Allocated resources:

(Total limits may be over 100 percent, i.e., overcommitted.)

Resource Requests Limits

-------- -------- ------

cpu 6049m (79%) 10506m (138%)

memory 19087040Ki (126%) 23525056Ki (155%)

Events: <none>

To solve this issue, you need to add more CPUs to the instance or remove some of the unused pods.

Attention: Be careful if removing a pod on a kube-system, because this action could impact the whole system.

After adding more CPUs, check the node description again.

8.3.3 The required port is in use

When deploying a Helm chart, you might see the message shown in Example 8-15 on the pod description.

Example 8-15 Required port is in use

Type Reason Age From Message

---- ------ ---- ---- -------

Warning FailedScheduling 32s (x2 over 32s) default-scheduler 0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

In this case, you can check the full list of ports that have conflicts with the following command:

kubectl describe pods <pod>

To fix the problem, remove the deployment and change the port that is being used so that there are no conflicts.

8.3.4 Deployment fails due to a missing permission

When deploying a pod with a missing permission, the error message described in Figure 8-2 is displayed.

Figure 8-2 Pod security missing

To fix the problem, you need to grant the pod security to the namespace. Run the command:

kubectl -n appsales create rolebinding ibm-anyuid-clusterrole-rolebinding --clusterrole=ibm-anyuid-clusterrole --group=system:serviceaccounts:appsales

After the command completion, try to deploy the pod again.

See the following URL on this troubleshooting:

https://www.ibm.com/support/knowledgecenter/en/SSBS6K_3.1.2/user_management/psp_addbind_ns.html

8.4 Common errors when running applications

The following sections describe some of the common errors when running applications on IBM Cloud Private, and their solutions.

8.4.1 Getting the 504 or 500 errors when trying to access the application

After deploying a pod or during the execution of the pod, you might get the error message 504 or 500 when trying to access the application from a browser as shown in Figure 8-3.

Figure 8-3 Connection time out error

There are some common cases where this error is displayed, such as the pod entering in CrashLoopBack or not starting. Those errors discussed in the next sections.

Pod in CrashLoopBack

When a pod enters in a CrashLoopBack, it means that the pod is trying to start, crashes, and then tries to restart again. At the server console, run the kubectl get pods --all-namespaces command to observe the output, as shown in Example 8-16.

Example 8-16 The kubectl get pods --all-namespaces command

kubernetes get pods --all-namespaces

NAMESPACE NAME READY STATUS RESTARTS AGE

[..]

my-server-demo myapp-web-55-84kkm 0/1 CrashLoopBackOff 3774 9h

in Example 8-16, you can see that the pod has been restarted 3774 times in the last 9 hours. Usually this error happens when the pod is starting and failing in a loop.

To try to understand where the error is, you can run the following commands:

kubectl logs <pod>

kubectl describe <pod>

With the output of both commands you can determine where the error is and how to solve it.

Pod not starting

When there is an issue with starting the pod, you can run the kubectl get pods -n <namespace> command. If you see the status of the pod as ErrImagePull or ImagePullBackOff, this means that there is a problem with the deployment. Possible problems include pointing to an image that does not exist, having an incorrect image tag, or not giving Kubernetes permission to run the image.

The details about the error are observed when running the description of the pod (the kubectl describe <pod> command).

8.5 Opening a support case

When you try the troubleshooting methods that are discussed in this chapter and they do not fix the issue, you can open a request to IBM support team. You need to be an IBM Customer with a valid product ID and license for this.

Note: if you are using the IBM Cloud Private Community Edition it is possible to get support through the Slack channel and community forum or ask the Watson Chatbot. See the following addresses:

The slack channel is at https://slack-invite-ibm-cloud-tech.mybluemix.net.

The stack overflow is at https://stackoverflow.com/search?q=ibm-cloud-private.

IBM Watson Chatbot is at https://ibm.biz/icpsupport.

Follow the directions at https://ibm.biz/icpsupport for opening a support ticket and sending the related data to IBM Support:

When opening the ticket, it is suggested that you include the following information:

•Title: High level description of the issue

•Product version

•Platform (architecture): Specify whether it is a PowerPC®, x86, or other

•Operating system (OS)

•Virtualization platform: Where it is installed (VMWARE, Azure, other)

•High availability (HA) environment or not

•Problem area

•Severity

•Detailed error description

•Business impact: Is this a PoC, development environment, or production cluster?

Collect the following general troubleshooting data, along with any other data for your problem area:

•hosts file (located at /<installation_directory>/cluster): This provides IBM the server topology details

•config.yaml file (located at /<installation_directory>/cluster): This provides details about customization, including the load balancer details.

•Run the following command and attach the output to the case:

sudo docker run --net=host -t -e LICENSE=accept -v "$(pwd)":/installer/cluster ibmcom/icp-inception-<architecture>:<version> healthcheck -v

Tip: You can find the cheat sheet in “Cheat sheet for production environment” on page 361 useful when troubleshooting the IBM Cloud Private problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 8. Troubleshooting

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 8. Troubleshooting