Troubleshooting VMs

The last component you may need to troubleshoot is the VMs. Typical issues are related to power-on, delete, misconfiguration, and resources.

To list the files belonging to a specific VM you use the command ls -lh:

[root@esxdell1:/vmfs/volumes/55c2fd45-d88e90e6-ae16-e41f13b3b2d0/WEB_Server_01] ls -lh
total 5899280
-rw------- 1 root root 1.0G Aug 12 13:47 WEB_Server_01-ecc960d0.vswp
-rw------- 1 root root 20.0G Oct 13 06:46 WEB_Server_01-flat.vmdk
-rw------- 1 root root 8.5K Aug 12 13:47 WEB_Server_01.nvram
-rw------- 1 root root 533 Aug 12 13:47 WEB_Server_01.vmdk
-rw-r--r-- 1 root root 0 Nov 3 2015 WEB_Server_01.vmsd
-rwxr-xr-x 1 root root 2.7K Aug 12 13:47 WEB_Server_01.vmx
-rw------- 1 root root 0 Aug 12 13:47 WEB_Server_01.vmx.lck
-rw------- 1 root root 150 Nov 18 2015 WEB_Server_01.vmxf
-rwxr-xr-x 1 root root 2.7K Aug 12 13:47 WEB_Server_01.vmx~
-rw-r--r-- 1 root root 226.8K Nov 18 2015 vmware-1.log
-rw-r--r-- 1 root root 203.6K Feb 9 2016 vmware-2.log
-rw-r--r-- 1 root root 180.8K Feb 9 2016 vmware-3.log
-rw-r--r-- 1 root root 152.1K Feb 12 2016 vmware-4.log
-rw-r--r-- 1 root root 177.0K May 29 23:16 vmware-5.log
-rw-r--r-- 1 root root 196.7K Aug 11 20:02 vmware-6.log
-rw-r--r-- 1 root root 167.8K Aug 12 13:47 vmware.log
-rw------- 1 root root 110.0M Aug 12 13:47 vmx-WEB_Server_01-3972620496-1.vswp

During the TRBL process, the vmware.log is the virtual machine's log file that helps to better understand the problem. In this log, you see all the details about the problem. The name of the log file vmware.log is the same for each VM.

A problem that quite often may occur to your VMs is the error message related to a locked file. When you click on power-on VM option, the system displays the error the file is lockedFor this situation there is the command vmkfstools -D to manage the file lock. The important modes are as follows:

  •  mode 0: No lock
  •  mode 1: Exclusive lock 
  • mode 2: Read-only lock

To solve the lock file error, proceed as follow:

  1. Run the command vmkfstools -D <locked disk>, you can identify the mode type in use (for example, mode 1exclusive lock ) and the MAC Address of the ESXi host that holds the VM lock (you have to look at the last part of the number, 0010868226e2 as in the following example).
vmkfstools -D WEB_Server_01-flat.vmdk
Lock [type 10c00001 offset 226893824 v 301, hb offset 3751936
gen 635, mode 1, owner 598e133a-a788d32f-337a-0010868226e2 mtime 17138
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr <4, 526, 148>, gen 282, links 1, type reg, flags 0, uid 0, gid 0, mode 600
len 21474836480, nb 4619 tbz 0, cow 0, newSinceEpoch 4619, zla 3, bs 1048576

vmnic4 0000:04:00.0 ixgbe Up Up 10000 Full 00:10:86:82:26:e2 9000 Intel(R) 82599 10 Gigabit Dual Port Network
  1. When you know which host holds a lock, connect the ESXi through SSH to remove the lock.  
  2. Run the command lsof | grep <file_of_locked_file> to obtain the PID of the process for the VM.
  3. The final step is to use the command kill -9 PID to kill the process. If the problem is not solved, you may try rebooting the host.

In the following example, we will get the PID of a VM (using the step 3), and then use this PID to kill the VM (as explained in step 4):

[root@esxdell2:~] lsof | grep WEB_Server_01-flat.vmdk
5065291 less FILE 4 WEB_Server_01-flat.vmdk
[root@esxdell2:~] kill -9 5065291

A lot of problems are due to resources, resource pools, and vApp thus you should be very careful. If it is not a requirement, don't use reservations or limits for VMs. 

Although troubleshooting can fix most of the problems, there are situations where the VM restore from the backup is the only possible solution. Backup is an essential part of the vSphere management. 

Great!  You are now ready for basic TRBL operations. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.44.255