As you migrate your environment from VMware ESX to ESXi, what was your stable ESX environment will become your stable ESXi environment. Your management methods may change, but you’ll slowly grow accustomed to the nuances of managing ESXi.
If you have to perform low-level troubleshooting of your ESXi hosts, the differences in architecture will become quite obvious and you’ll be in a situation where some aspects are familiar, but others quite different. In this chapter, you explore the following aspects:
Accessing Tech Support Mode
Auditing access to Tech Support Mode
Exploring the boot process and filesystem for ESXi
Understanding standing system backups and repairs
Using Tech Support Mode to troubleshoot your hosts
The Direct Console User Interface (DCUI) provides a local user interface to the console of an ESXi host. As discussed in Chapter 3, “Management Tools,” the DCUI is a simple, menu-driven interface that you can use to configure and manage components, including performing the following actions:
Configuring the password for the root account
Configuring the Internet Protocol (IP) settings and network interface cards (NICs) used for management access
Viewing system logs
Restarting management services
With the DCUI, you can control access to Tech Support Mode (TSM), whether that be for local access or remote access via Secure Shell (SSH). To enable TSM with the DCUI, follow these steps:
Access the DCUI for the host and press F2 to open the System Customization menu.
Select Troubleshooting Options and press Enter.
Select either the Enable Local Tech Support Mode or Enable Remote Tech Support Mode (SSH). If either option starts instead with the word “Disable,” this indicates that the option has already been enabled.
Press Enter to enable the service.
Select the Modify Tech Support Timeout option and press Enter.
Enter a value between 0 and 1440 to set the timeout value for TSM in minutes. Setting the timeout value to 0 disables the timeout option.
Once you have enabled TSM, you can press Alt+F1 to access Local TSM or use an SSH client to access Remote TSM via SSH. You require a login that has been granted the Administrator role on the host or a local user that is a member of the root group. If Local TSM is not enabled when you press Alt+F1, you receive the error message Tech Support Mode Has Been Disabled by the Administrator
. For Remote TSM, the connection is refused as ESXi does not have the port for SSH enabled unless the Remote TSM service is running. If you have Lockdown Mode enabled on the ESXi host, you are prevented from using TSM. In that case, the error message is Login Incorrect for Local Tech Support Mode and Access Denied for Remote TSM
.
The timeout value sets the number of minutes that can elapse before you must log in to TSM. If the timeout expires, you must enable TSM again before you can log in. If you have an existing session open to TSM when the timeout value expires, that session is not closed.
TSM may also be enabled with the vSphere client, both manually and using Host Profiles. To enable TSM manually, use the following process:
Log in to your ESXi host or vCenter Server with the vSphere client.
Select the Configuration tab for the host and click Security Profile.
Click the Properties link to access the Services Properties screen.
Select either Local Tech Support or Remote Tech Support (SSH) and click Options.
On the Options screen, click Start. You can optionally configure the startup policy for the service.
After the service has started and is showing a Status of Running, click OK to close the Options screen.
Click OK to close the Services Properties screen.
You can also enable the TSM timeout value in the vSphere client. On the Configuration tab, select Advanced Settings in the Software section. Find the UserVars.TSMTimeOut
parameter and set it to a value between 0 and 86400 seconds. A value of 0 disables the TSM timeout.
Setting a short timeout value between one and two minutes ensures that you have ample time to connect to TSM, but safeguards that the service is not left running after you have connected.
If you’re using host profiles, you can configure the settings for TSM mode as shown in Figure 11.1. In this example, the Remote TSM is set to have a startup policy of Off and the Advanced Setting UserVars.TSMTimeOut
has been set to 120 seconds. Note that Host Profiles does not check for the status of the TSM services to verify if they are stopped or running.
You can also configure the TSM services with PowerCLI. The next example connects to each ESXi host and stops the Local or Remote TSM services if they are running. The startup policy for the services is set to Start and Stop Manually, and the timeout value is set to 2 minutes.
$VMhosts = Get-VMHost ForEach ($VMhost in $VMhosts) { Set-VMHostService -HostService (Get-VMHostService -VMHost $VMhost |' Where {$_.key -eq "TSM-SSH"}) -Policy "Off" Set-VMHostService -HostService (Get-VMHostService -VMHost $VMhost |' Where {$_.key -eq "TSM"}) -Policy "Off" $status = Get-VMHostService -VMHost $VMhost | Where {$_.key -eq "TSM-SSH"} If ($status.Running -eq "True") {Stop-VMHostService -HostService' (Get-VMHostService -VMHost $VMhost | where {$_.key -eq "TSM-SSH"} ) } $status = Get-VMHostService -VMHost $VMhost | Where {$_.key -eq "TSM"} If ($status.Running -eq "True") {Stop-VMHostService -HostService ' (Get-VMHostService -VMHost $VMhost | where {$_.key -eq "TSM"} ) } If((Get-VMHostAdvancedConfiguration -VMHost $VMhost -Name' UserVars.TSMTimeOut).Values -ne 120 ){Set-VMHostAdvancedConfiguration' -VMHost $VMHost -Name UserVars.TSMTimeOut -Value 120' -Confirm:$False } }
TSM can be used when testing and debugging the pre-boot, post-boot, or first boot portions of the automated installation scripts for ESXi. This is not recommended for production environments. To enable Remote TSM during your installation script, you can add the following lines to the appropriate section for when you want to enable Remote TSM:
vim-cmd hostsvc/enable_remote_tsm vim-cmd hostsvc/start_remote_tsm
To enable Local TSM with your installation script, you can add the following lines:
vim-cmd hostsvc/enable_local_tsm vim-cmd hostsvc/start_local_tsm
You can also set the TSM timeout value within your installation script. The following command sets the timeout value to 300 seconds:
vim-cmd hostsvc/advopt/update UserVars.TSMTimeOut long 300
If you’re managing a large environment, you may wish to take steps to ensure that TSM is not improperly used. Although TSM provides an important tool for troubleshooting, problems can result if it is used for other management tasks that can be performed with the vSphere client or automated with the vCLI or PowerCLI.
One line of defense to ensure that TSM is not casually accessed is to enable Lockdown Mode. As discussed in Chapter 7, “Securing ESXi,” Lockdown Mode prevents all user access to both Local and Remote TSM. However, if users have access to log in to the DCUI or can access the Configuration tab for the host, they can disable Lockdown Mode, then access TSM. To monitor for a change in Lockdown Mode, you can use the following process to enable a vCenter Server alert:
Start the vSphere client and connect to vCenter Server.
At the appropriate level, select the Alarms tab and click Definitions.
Select File > New > Alarm.
Enter an Alarm Name and Description for the new alarm.
Select an Alarm Type of Host and check the option Monitor for Specific Events Occurring on This Object.
Select the Triggers tab.
Click Add to create a new trigger.
Set the Event to Host Administrator Access Enabled.
On the Action tab, configure an appropriate Action for your alarm.
Click OK to create the new alarm.
When you disable Lockdown Mode in the DCUI, a corresponding event is not currently logged with vCenter Server. If you are sending syslog data to a management server, you can look for a rapid succession of log entries containing vim.Authorization Manager.setEntityPermissions
. You can also monitor for a number of login events by the dcui account as shown in Figure 11.2. When you enable Lockdown Mode, you should be aware that it does not terminate existing TSM sessions to the host.
If you prefer using PowerCLI rather than vCenter Server alarms, you can use Get-VIEvent
to monitor for the event that indicates that Lockdown Mode has been disabled. The following script checks all hosts connected to your vCenter Server for this event. When Lockdown Mode is enabled, the event HostAdminDisableEvent
is generated.
Get-VIEvent -Entity $_.Name | Where { $_.Gettype().Name' -eq "HostAdminEnableEvent"}' | Select CreatedTime, UserName, FullFormattedMessage PowerCLI may also be used to query for the events related to enabling Local and Remote TSM. The following script queries vCenter Server for events of the type LocalTSMEnabledEvent and RemoteTSMEnabledEvent: Get-VIEvent -Entity $_.Name | Where { ($_.Gettype().Name' -eq "LocalTSMEnabledEvent") -Or ($_.Gettype().Name' -eq "RemoteTSMEnabledEvent") }' | Select CreatedTime, UserName, FullFormattedMessage
If you enable TSM in the DCUI or when connected directly to the host with the vSphere client, vCenter Server records the LocalTSMEnabledEvent
and RemoteTSMEnabledEvent
type of error. If you start the TSM services via vCenter Server, a Service Start task is recorded instead of the TSM events. You will also note some differences in how the user is recorded when TSM is enabled. When TSM is enabled in the DCUI, the event is recorded by ESXi and vCenter Server as having been initiated by the DCUI user. When the change is made by a user connected directly to the ESXi, both ESXi and vCenter Server record the actual user that initiated the change. Lastly, if the change is made when connected to vCenter Server, vCenter Server records the actual user, but ESXi records the task as initiated by the vpxuser account.
The last option for auditing TSM and the actions taken in TSM is to use syslog. Setup of syslog is discussed in Chapter 6, “System Monitoring and Management.” The following events correspond with TSM being enabled. You could create an alert that would be triggered based on the event text VMware Tech Support Mode available
:
Oct 1 23:13:17 Hostd: [2010-10-01 23:13:17.147 23506B90 verbose 'ServiceSystem' opID=E3CD7415-0000012C] Invoking command /bin/ash /etc/init.d/TSM start Oct 1 23:13:17 root: TSM Displaying TSM login: runlevel = Oct 1 23:13:17 init: init: process '/sbin/initterm.sh TTY1 /sbin/techsupport.sh ++min=0,swap' (pid 579733) exited. Scheduling it for restart. Oct 1 23:13:17 init: init: starting pid 584732, tty '/dev/tty1': '/bin/sh' Oct 1 23:13:17 root: techsupport VMware Tech Support Mode available
The following events record a user’s session with TSM. The initial dropbear
events record a remote SSH session initiated from a remote host. This corresponds with the warning message that the user receives when accessing TSM shown in Figure 11.3. Note that the commands executed by the user in TSM are logged from a source of shell
. The last event records the end of the user’s TSM session.
Figure 11.3. When TSM is accessed, a warning message is issued and the event is logged to the VMkernel log file.
Oct 1 22:46:44 dropbear[582158]: Child connection from 192.168.1.225:62951 Oct 1 22:46:51 dropbear[582158]: pam_per_user: create_subrequest_handle(): doing map lookup for user "root" Oct 1 22:46:51 dropbear[582158]: pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic") Oct 1 22:46:51 dropbear[582158]: PAM password auth succeeded for 'root' from 192.168.1.225:62951 Oct 1 22:46:51 shell[582159]: Interactive shell session started Oct 1 22:50:13 shell[582378]: esxcfg-nics -l Oct 1 22:50:21 shell[582378]: vdf -h Oct 1 23:47:35 dropbear[582158]: exit after auth (root): Exited normally
ESXi was designed to be easily deployed to thousands of nodes and at the same time to enable the deployment of very small turnkey installations. As you’ve seen in prior chapters, ESXi can be preinstalled on a flash device or installed to a small local or remote hard drive. In the future, you can expect to see ESXi booting on completely diskless systems. These design goals required a change in the way that the boot files are stored, and ESXi differs significantly from how a traditional operating system (OS) is installed or even how ESX is installed and boots.
VMware Auto Deploy is an experimental product from VMware Labs that enables automatic Preboot Execution Environment (PXE) boot and customization for VMware ESXi. With Auto Deploy, you can use completely diskless and stateless ESXi hosts. Each time an ESXi host boots from the PXE server, it is automatically configured by Auto Deploy using Host Profiles and other information stored within vCenter Server. The host can then join a cluster and begin to handle your virtual machine workload. VMware Auto Deploy is not supported in production environments, but provides a preview of how you might deploy your ESXi hosts in the future. It is available for download from http://labs.vmware.com.
The system partitions for ESXi are summarized in the following list. The partition layout is the same whether you’re using ESXi Embedded or Installable. The system partitions include the following:
Bootloader partition. This 4MB partition contains SYSLinux, which is used as a bootloader to start ESXi.
Boot bank partition. This 250MB partition stores the required files to boot VMware ESXi. The partition is also referred at as Hypervisor1.
Alt boot bank partition. This 250MB partition is initially empty. The first time you patch ESXi, the new system image is stored here. The partition is also referred to as Hypervisor2.
Core dump partition. This 110MB partition is normally empty, but the VMkernel will store a memory dump image if a system failure occurs. The partition can be managed with vicfg-dumppart
from the vCLI.
Store partition. This 286MB partition is used to store system utilities such as the ISO images for VMware Tools and floppy disk images for virtual device drivers. The partition is also referred to as Hypervisor3.
When you are installing ESXi Installable on a disk with at least 5GB of storage, ESXi also creates a 4GB scratch partition. This partition is mounted as /scratch
and is used to store the output from vm-support
and to store upgrade files, and is set as the default location for the advanced parameter Syslog.Local.DatastorePath
. If you are installing ESXi to boot from a storage area network (SAN), you can allocate a 5GB logical unit number (LUN) for the boot disk. Note that if you’re using a scripted installation, the installer requires an additional 1GB of space as it attempts to create a datastore.
When your ESXi host first starts, SYSLinux is loaded. SYSLinux looks at the file boot.cfg
, which is located both on Hypervisor1 and Hypervisor2. SYSLinux uses the parameters build
, updated,
and bootstate
to determine which partition to use to boot ESXi. The following is a typical boot.cfg
file:
kernel=b.z kernelopt= modules=k.z- - -s.z- - -c.z- - -oem.tgz- - -license.tgz- - -m.z- - -vpxa. - - -state.tgz- - -aam.vgz build=4.1.0-235786 updated=2 bootstate=0
After a new installation, ESXi boots from Hypervisor1, which is mounted by ESXi as /bootbank
. Hypervisor2 is mounted as /altbootbank
and is initially empty. When you patch or upgrade your ESXi host, a completely new firmware image for ESXi is loaded on the host and stored in /altbookbank
. The boot parameters are updated so that when ESXi next starts, it will recognize Hypervisor2 as containing the version of ESXi that should be booted. That partition is mounted as /bootbank
, whereas Hypervisor1 becomes the new /altbootbank
. The partition roles are reversed the next time you patch your ESXi host. With this design, your host boot partitions always contain two complete images that can be used to boot ESXi. If you patch a host and a problem is detected when ESXi starts to boot, the system automatically reverts to the previously installed version of ESXi.
Update activities on ESXi generate a log file called esxupdate.log
. This file can be found in /store/db
.
If you experience problems with ESXi after installing a patch, you can manually revert to the prior version with the following process. Begin by rebooting the ESXi host. At the initial Loading VMware Hypervisor screen, press Shift+R. The warning message shown in Figure 11.4 is displayed, where you can press Shift+Y to revert back to the prior version. The screen then displays an option to view the log for the event. Press Esc to view the log or press Enter to continue booting. If the operation was successful, the log displays the message Fallback hypervisor restored successfully
. You can then press Esc to exit the log screen or press Enter to continue the boot process. The version of ESXi that you experienced problems with becomes /altbootbank
and will be overwritten the next time you apply a patch. If you attempt to repeat the process to revert to the boot version, because the bootstate
parameter is set to 3, you will receive the error No valid fallback hypervisor found
. If you need to boot ESXi to that version again, you can reapply the ESXi patch or change the parameter bootstate
to 0. If you’re applying customizations from a third party, each patch causes a change in the boot bank used. If you apply two or more patches, you won’t be able to revert to the prepatch state using this method. You instead need to run a repair installation and restore a system configuration backup.
After SYSLinux determines which system image to boot, boot.cfg
is read to determine the files that are used to boot the VMkernel. The files that ESXi uses are loaded into memory and then not accessed again on storage until the host is rebooted. It is possible, although not recommended, to remove the boot device from an ESXi host that has completed the boot process. For the most part, the host will function properly, only having difficulty with system processes that access the boot media such as system backups and that access VMware Tools ISO images. Likewise, changes made to the ESXi memory file system are lost when a host reboots. As you will see in the following section, the ESXi system backup process backs up the necessary system state files, but if the random access memory (RAM) disk fills up due to a technical issue, that problem will not persist after a reboot.
There are three significant file types that ESXi uses to boot. First are the Executive files tboot.gz
(Trusted Platform Module files), vmkboot.gz
(small core), and vmkernel.gz
, which make up the VMkernel are loaded into memory as executables. These files exhibit no presence within the ESXi memory filesystem. Second, a series of Archive files with the extension vgz
, called tardisks, are mounted and extracted to form the filesystem. Those files include system. vgz
, vpxa.vgz
, aam.vgz
, extmod.tgz
, and oem.tgz
. These packages use the vSphere Installation Bundle (VIB) format. system.vpz
contains core system files, vpxa.vgz
contains files for the vCenter Agent, and aam.vgz
contains the High Availability system files. VIB updates from third parties may also be listed. The files in these VIBs are extracted in a progressive manner. If a duplicate file is found in both system.vgz
and oem.tgz
, the file in oem.tgz
is extracted later, as is the version of the file that is used by ESXi. The last file type is the State archive file. This file is called state.tgz
with ESXi Installable and local.tgz
with ESXi Embedded. The contents in both versions are the same and the archive file contains a backup of the files necessary for the configuration of your ESXi host to persist between reboots. This file is discussed further in the following section.
When the ESXi filesystem has been extracted into the RAM disk, the end result is similar to the directory listing of the root folder shown in Figure 11.5. The root of the filesystem and most folders—such as bin
, etc
, and sbin
—are stored in memory. Note that ESXi does mount the disk partitions that correspond to bootbank
, altbootbank
, scratch
, and store
. As you browse the filesystem, it will appear similar to what you would experience with ESX. /sbin
contains a number of esxcfg
executables that can be used to configure and manage the host if you are unable to do so via vSphere client or other vSphere application programming interface (API) client.
If you run df -–h
, you get another view of the filesystem showing the disks that ESXi has mounted, as shown in Figure 11.6. Listed first is visorfs
, which is the RAM disk that ESXi has created. The four vfat
partitions are bootbank
, altbootbank
, scratch
, and store
. In this case, the boot disk for ESXi also contains a Virtual Machine File System (VMFS) datastore.
The command vdf
is new to ESXi 4.1. It provides some valuable data about the RAM disk. The listing that follows first shows the tardisks that ESXi has extracted to create the filesystem. These entries correspond to the Archive and State file types in boot.cfg
, discussed earlier. The Space value listed represents the extracted size of the tardisk, not the compressed size within /bootbank
. The first tardisk listed is using 199MB of memory on the host. If the root filesystem is running low of space, you can use vdf
to check whether one of the tardisks is using too much memory. The command also displays information about the mounts that are available on the RAM disk. The following output shows four mounts: MAINSYS
, tmp
, updatestg
, and hoststats
. MAINSYS
is the root folder, whereas tmp is /tmp
. hoststats
is used to store real-time performance data on the host, and updatestg
is used as storage space for staging patches and updates. These four mounts and tardisk mounts correspond to the 1.3GB size of visorfs
, shown in Figure 11.6.
~ # vdf -h tardisk Space Used SYS1 199M 199M SYS2 55M 55M SYS3 12K 12K SYS4 12K 12K SYS5 4K 4K SYS6 42M 42M SYS7 20K 20K SYS8 12M 12M - - - - - Ramdisk Size Used Available Use% Mounted on MAINSYS 32M 1M 30M 3%- - tmp 192M 0B 192M 0%- - updatestg 750M 8K 749M 0%- - hoststats 53M 1M 51M 3%- -
In the output from vdf
, you can observe the size of memory that has been allocated to the mounts MAINSYS
, tmp
, updatestg
, and hoststats
. updatestg
has been allocated 750MB, but is currently using only 8KB of actual memory. You can also view the resource allocation to the visorfs
components and other system processes for ESXi using the System Resource Allocation screen as shown in Figure 11.7.
Figure 11.7. You can view resource allocation for the ESXi RAM disk filesystem with the System Resource Allocation screen.
One last command that is useful to explore is vdu
. The following example shows the output of that command for /etc
. The command summarizes the source of files within a folder structure.
~#vdu-hs/etc For'/etc': tardiskSYS1: 4M ( 221inodes) heap: 84K ( 43inodes) ramdiskMAINSYS: 60K ( 6inodes) tardiskSYS6: 6K ( 5inodes) tardiskSYS8: 4K ( 2inodes) tardiskSYS2: 10K ( 22inodes)
As you navigate the ESXi filesystem, similarities to the ESX service console will be evident, but you will note that some of the Linux commands that you may have used, such as nano
, are missing. The command interface to the VMkernel is based on BusyBox. BusyBox, a single executable designed for use with a Linux kernel, provides many of the standard tools that you would find in Linux, including cp
(copy), kill
(kill process), tar
, and tail
. Given the small size of BusyBox, it is typically used with embedded devices. BusyBox uses the ash shell. If you are monitoring your ESXi host with resxtop
, you can observe a single process for BusyBox and one ash process for each Local or Remote TSM session that is in progress.
It is important to note that TSM is intended to be the last method of access to your ESXi host. The first level of management should be via the vSphere API and tools such as the vSphere client, the vCLI, and PowerCLI. The DCUI provides the next level of support; with the DCUI, you can restart the management agents, troubleshoot the management network, and reset the configuration of your ESXi host. The TSM provides the last resort for access, and misconfiguration at this level can have significant consequences on the host. Ideally, TSM access should be made under the guidance of VMware Support. In practice, you may find that you need to perform configuration with TSM, as the vSphere API tools may not include specific capabilities, or problems such as those related to the management network may be difficult to troubleshoot without TSM access. Some of the commands that you may use in TSM are discussed later in the section “Troubleshooting with Tech Support Mode.”
If you’re developing a complex installation script that employs the %firstboot
section, you can use TSM to work through your script in a test environment. The system tardisk that is loaded for the installation process is the same used to boot ESXi normally. The installation merely adds a specific tardisk to include the necessary files for the installation process. The esxcfg
and similar commands that you’ll find in TSM also exist when the ESXi installer is booted.
As discussed in the previous section, ESXi employs a State tardisk to ensure that configuration changes made to the ESXi host persist across a reboot. For ESXi Installable, that tardisk is called state.tgz
, whereas for Embedded it is local.tgz
. The State tardisk consists of any files in /etc
that have been marked with the sticky bit. The initial copy of state.tgz
is empty, but files extracted from the other tardisks have the sticky bit enabled and these files are subsequently backed up into state.tgz
. For example, the vCenter Agent tardisk vpxa.vgz
contains a number of system files that are extracted to /opt/vmware/vpxa
, but also the configuration files dasConfig.xml
and vpxa.cfg
found in /etc/opt/vmware/vpxa
. Both files have the sticky bit enabled and are thus backed up into state.tgz
. On a subsequent boot of ESXi, these files are extracted from both vpxa.vgz
and state.tgz
, but as state.tgz
is extracted second, those versions of the files overwrite the copies from vpxa.vgz
. If you make any changes to files in /etc
that do not have the sticky bit enabled, those files are changed only in the RAM disk filesystem and are gone when ESXi is rebooted. The same applies to any files changed, added, or deleted outside of / etc
with the exception of mounts that are made to physical partitions such as /bootbank
.
To view the contents of state.tgz
, you can issue the following commands in TSM or copy the file to a management server for extraction:
~ # cd tmp /tmp # mkdir state /tmp # cd state /tmp/state # cp /bootbank/state.tgz state.tgz /tmp/state # gzip -d state.tgz /tmp/state # tar -xvf state.tar local.tgz /tmp/state # gzip -d local.tgz /tmp/state # tar -xvf local.tar
At every minute past the hour, a backup job defined in /var/spool/cron/crontabs/root
is executed. The backup job runs the script /sbin/autobackup.sh
. This script creates a new state.tgz
file and copies it to the appropriate location. The script does check the parameters in both boot. cfg
files to see whether a reboot is pending from a patch or update installation. If that is the case, then the new state.tgz
file is copied to both /bootbank
and /altbootbank
. autobackup.sh
calls another script, /sbin/backup.sh
. In part, the function of that script is to make sure that the backups are made in a consistent manner and to ensure file integrity.
Given that there is a period of time between any configuration changes and the scheduled backup, there is a risk of loss of changes should the host experience an unexpected failure. If this occurs, this loss typically affects the registration of virtual machines, as host configuration tends to be more static. The time period between backups originated from the need to minimize write operations to flash devices. Excessive write operations can limit the life of those devices. If you restart or shut down an ESXi host, part of the shutdown process includes updating state.tgz
. Configuration changes should not be lost when you restart or shut down a host.
The vSphere API allows you to make a configuration backup of your ESXi host. In Chapter 6 the process was shown with the vCLI command vicfg-cfgbackup
. You can also back up your host’s configuration with the PowerCLI cmdlet Set-VMHostFirmware
, as shown in the following example:
Get-VMHost esx06.mishchenko.net | Set-VMHostFirmware -BackupConfiguration -DestinationPath c:ackupsesx06
The backup file generated has the file name similar to configBundle-esx06.mishchenko.net.tgz
. With either method, the vSphere API merely transfers the state.tgz
file from the host to the management server. If you extract the backup file that is generated you can view the same files that ESXi bundles into state.tgz
to preserve the host’s configuration. To restore the configuration backup with PowerCLI, you would issue the following command:
Get-VMHost esx06.mishchenko.net | Set-VMHostFirmware -Restore -SourcePath c:ackupsesx06configBundle-esx06.mishchenko.net.tgz
The host has to be in maintenance mode before you start the restore and it is rebooted after the restore process has completed.
In some cases, you may not be able to boot your ESXi host due to a corruption of the system partitions, or you may have to revert to a version of ESXi that does not exist in /altbootbank
. In such cases, you want to perform a repair installation followed by the restore of the configuration backup. As state.tgz
contains the entire configuration of your host, you do not need any further configuration files to restore your ESXi host. To repair your ESXi installation, use the following process:
Insert the ESXi 4.1 installation CD into the host’s CD-ROM drive.
Restart the host and select to boot from the CD.
At the installation Welcome screen, shown in Figure 11.8, press R to begin the repair process.
Read the VMware end-user license agreement and press F11 to continue.
Select the disk that contains the original installation of ESXi and press Enter.
The ESXi repair process preserves any VMFS datastores that exist on the installation disk as long as those partitions do not exist within the first 900MB of storage. If you do not select the original installation disk, then a new system image is installed and any existing partitions on that disk will be lost.
If the disk contains an existing partition, you are prompted to confirm your choice of disk. Press Enter to confirm your choice.
Press F11 to begin the repair process.
When the repair process has completed, eject the installation CD and press Enter to reboot.
After the host has rebooted, make a note of the IP address assigned to the host. You may need to assign an IP address manually if you do not have a Dynamic Host Configuration Protocol (DHCP) server for that network subnet.
Restore the host configuration file with the following commands: Set-VMHost -State "Maintenance" Set-VMHostFirmware -Restore -SourcePath c:ackupsesx06configBundle-esx06.mishchenko.net.tgz
Start the vSphere client and connect to your vCenter Server. The host will have a state of Not Responding, as the vCenter Agent software does not exist on the host at this point.
Right-click on the host and select Connect. This initiates the installation of the vCenter Agent on the host. After the process is complete, the host should show a normal status and any registered virtual machines should no longer have a status of Orphaned.
If you’re connecting via Remote TSM, you need a client that supports SSH, and if you want to transfer files, your client should support Secure Copy (SCP). Most Linux distributions include a client with these capabilities. For a Windows computer, you can download Putty from http://www.putty.org/ to use for your SSH sessions and WinSCP from http://winscp.net/ to transfer and edit files. ESXi includes only vi
for editing files, so it is worthwhile to get a client that includes an easy-to-use file editor.
If you start in /bin
, you find a number of useful utilities that you can use to manage files and processes within TSM. If you run ls –l
, you will note that many of the program names are symbolic links to other programs. kill
, gzip
, and tail
link back to the busybox
executable. BusyBox also includes wget
, which you can use to download files from a Web server. This command can prove useful to download patch files directly to your host. If you use wget
, make sure that you’re in a location, such as /scratch
or a datastore, with sufficient space to store your downloads.
scp
, which links to /sbin/dropbearmulti
, is a basic SCP client that you can use to transfer files to and from an SCP-capable server. dropbearmulti
is also used to provide SSH connectivity to the Remote TSM. Both ping
and ping6
link to /sbin/vmkping
. If you’re troubleshooting vMotion network issues, you can use either of the commands to check for network connectivity. There is no separate networking stack as there is with the ESX Service Console, so any network traffic is sent by the VMkernel and thus both ping
and vmkping
work the same way on ESXi.
There have been a number of cautions about working with TSM. You should further note that all commands do not operate in a consistent manner. For the most part, if you don’t know the command options, you can simply issue the command without any options to see a list of available options. With /sbin/reboot
, which links to BusyBox, issuing the command without any options will restart your host. To see the options, you have to run reboot -- help
. Likewise, if you execute /sbin/techsupport.sh
, your screen clears, you see the message Tech Support Mode Has Been Disabled by the Administrator
and your TSM session is effectively over.
Given that ESXi can be hosted in a virtual machine even on ESXi, there’s no significant reason not to set up a test host to use for exploring TSM. Instructions for running ESXi in a virtual machine on VMware Workstation were provided in Chapter 4, “Installation Options.” A similar process can be used to host an ESXi virtual machine within your current vSphere environment. Hosting your training environment for ESXi in this manner will provide a method to learn and explore ESXi with no impact on your production environment. With a change to the virtual machine configuration file, documented at the following URL http://communities.vmware.com/docs/DOC-8970
, it is possible to run a nested virtual machine on your virtual ESXi host.
This caution isn’t intended to totally dissuade use of TSM. TSM is supported for troubleshooting, remediation, and in some cases configuration purposes. The official support policy for TSM can be found at http://kb.vmware.com/kb/1017910.
There are a few other commands in /bin
that are noteworthy. Both vdf
and vdu
link to /sbin/ vmkvsitools
. The vsi of vmkvsitools
stands for VMware Sysinfo Interface. As shown in the following output, a number of commands link to vmkvsitools
. This command can be used to provide a wealth of information about the host’s hardware, running processes, and memory. As shown in the Knowledge Base article at http://kb.vmware.com/kb/1024632, the command may be used to gain information that you would have obtained from /proc
on an ESX host.
/bin # vmkvsitools Usage: 'vmkvsitools [-c/- -cache vsicache] cmd args' where cmd is one of amldump, bootOption, hwclock, hwinfo, lsof, lspci, pci-info, pidof, ps, vdf, vdu, vmksystemswap, vmware A symlink to vmkvsitools with an above command name can also be used.
Lastly in /bin
is vim-cmd
, which can be used to control and configure a wide range of aspects of ESXi. vim-cmd
links to /sbin/hostd
. When you execute the command without options, the following is displayed:
~ # vim-cmd Commands available under /: hostsvc/ proxysvc/ supportsvc_cmds/ vmsvc/ internalsvc/ solo/ vimsvc/ help
You can add the commands listed to vim-cmd
to see additional commands. vmsvc/
deals with the management of virtual machines. That command displays the following options:
~ # vim-cmd vmsvc Commands available under vmsvc/: acquiremksticket get.configoption power.on acquireticket get.datastores power.reboot connect get.disabledmethods power.reset convert.toTemplate get.environment power.shutdown convert.toVm get.filelayout power.suspend createdummyvm get.guest power.suspendResume destroy get.guestheartbeatStatus queryftcompat device.connection get.managedentitystatus reload device.connusbdev get.networks setscreenres device.disconnusbdev get.runtime snapshot.create device.diskadd get.snapshotinfo snapshot.dumpoption device.diskaddexisting get.summary snapshot.get device.diskremove get.tasklist snapshot.remove device.getdevices getallvms snapshot.removeall device.toolsSyncSet gethostconstraints snapshot.revert device.vmiadd login snapshot.setoption device.vmiremove logout tools.cancelinstall devices.createnic message tools.install get.capability power.getstate tools.upgrade get.config power.hibernate unregister get.config.cpuidmask power.off upgrade
As you can see, vim-cmd
provides the access to manage all aspects of your virtual machines. The options to manage and configure your hosts are equally numerous. Although this is not the tool to use for day-to-day management, you can leverage vim-cmd
in your scripted installs to configure the networking and storage aspects of your hosts.
Within /etc
, you find the configuration files for your host. Because the system state backup contains files only from /etc
, any configuration changes that you make outside of this folder are temporary. Changes to your host’s configuration files within this folder are permanent if the file has the sticky bit enabled. Some of the configuration files within this folder structure are listed in Table 11.1. As the permissions on those files allow you to overwrite the files with vifs
, you could also edit these files in TSM. In either case, you would want to ensure that you have made a system backup first. If you use either TSM or vifs
to update these configuration files, changes are not verified by the vSphere API and a misconfiguration could render the host unusable.
In the /opt
folder structure, you’ll find the binaries for any agents you install on the host, such as for the vCenter Agent or for High Availability. Configuration files for these agents are found in /etc
. If you change the IP address of your vCenter Server host, you can update the file /etc/ opt/vmware/vpxa/vpxa.cfg
on your hosts and then issue the command /sbin/services.sh restart
to restart the management services on your host. If you need to uninstall any agents from your host manually, you may find an uninstall script in /opt/vmware/uninstallers
.
Within /var/logs
, you’ll find the log files for your host. /var/log/messages
is the main VMkernel log file. This includes events from the VMkernel, any agents running on the host, from the hostd
daemon, and commands issued in TSM. ESXi maintains a rotation of 10 copies of the messages file and rotates the files when the current file reaches 1MB. The prior copies are compressed to save space within the ESXi RAM disk. If the ESXi installation created a scratch partition on disk, this log file will be mirrored to /scratch/log/
and the log files in that folder will persist across a reboot. However, if you require long-term storage of your host’s log files, you should enable the syslog service or use vi-logger
from the vSphere Management Assistant (vMA) to capture this log file.
Also in /var/logs
is sysboot.log
. This file captures the boot process from the time the VMkernel initializes to the completion of the boot process. This file is useful to troubleshoot any problems you experience when your host boots up. In /var/log/vmware
, you’ll find the hostd
log files. Also within subfolders are the logs for agents such as the vCenter Agent service.
There are numerous other folders that you can examine and that provide valuable insight into the inner workings of ESXi. The last folder that this section examines is /sbin
. Within /sbin
, you’ll find the commands that will be the most useful to your troubleshooting sessions with TSM. To begin with, a number of esxcfg
binaries are included. These closely match the functionality you would find with the vCLI or in the Service Console for ESX. A version of esxcli
is also included that tends to be more feature-rich than the version found in the vCLI. For this reason, some Knowledge Base articles will direct you to TSM to perform advanced configuration changes. esxupdate
is found in /sbin
and can be used to manage patches and updates on the host.
If you’re having problems with performance, you can use esxtop
. This functions the same as resxtop
from the vCLI, but also adds replay mode. With replay mode, you can record and then replay esxtop statistics for a specific period of time. This can be helpful if you need to send the performance data to VMware Support. To record data, you can use the command vm-support -S -i 5 -d 120
. The -i
parameter sets the query interval, and -d
sets the duration of the capture. This generates a support bundle within /var/tmp
that contains the performance data and the log files from the host. The output from vm-support
includes instructions to extract the file, as shown in this example:
To see the files collected, run: tar -tzf '/var/tmp/esx-2010-10-02--02.02.13217.tgz'
After you have extracted the tar file, you can issue the following command to replay the data that was captured:
esxtop -R vm-support- esx-2010-10-02--02.02.13217
The vm-support
command can also be used to generate support bundles. While the log files messages
, hostd.log
, and vpxa.log
are accessible via a number of methods, the log files for High Availability and other services are not accessible through the vSphere API. All log files for ESXi are available within the support bundle, as well as core dumps, configuration files, and log files for virtual machines. You can generate a support bundle with the vSphere client by selecting File > Export > Export System Logs to display the screen shown in Figure 11.9. Both methods create a support bundle within /var/tmp
. With vm-support
, you then need to copy that bundle to another computer. If you need to provide VMware Support with a core dump file, such a file can be found in /var/core
.
Generally the ESXi RAM disk mounts should not exceed 90% usage and thus should not run out of space. However, a software bug in a driver for example could cause one of the mounts to fill in which case the following event would be generated. If disk space is a concern on your host you can set up your syslog receiver to monitor for similar events.
Oct 1 01:22:08 vmkernel: 0:00:55:09.839 cpu0:4149)BC: 3837: Failed to flush 52 buffers of size 8192 each for object '1.z' 1 4 1167 0 0 0 0 0 0 0 0 0 0 0: No free memory for file data
From the Spiderman movies comes the quote, “With great power comes great responsibility.” This applies not only to your “spidey senses,” but also to your use of ESXi’s TSM. When you access TSM, you have complete and unrestricted access to the VMkernel. The commands you issue through TSM are not necessarily filtered for problems or mistakes as the same commands being issued through the vSphere API are. Properly used, TSM is a great tool and it can certainly save the day. However, improperly used, it can have significant negative consequences.
Getting to know TSM means spending time in it exploring the various aspects of the ESXi filesystem and commands that it contains. Given the ease of setting up a virtual ESXi host, that’s the best way to get into the TSM without needing to worry about any mistakes. For production use, TSM access should be made under the appropriate circumstances and be audited.
18.119.160.119