Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 12 Collaborative Roles in Quick Boot

Every sin is the result of a collaboration.

—Lucius Annaeus Seneca

Collaboration between hardware, firmware, and software is vital to achieve a fast boot experience on the system. If the hardware chosen is incapable of achieving a sub-second initialization time, or if the software is not optimized to take advantage of a Fast Boot time, then the investment spent in firmware optimization with either custom activity or implementing a systematic Fast Boot architecture is wasted. Below are several examples of ways and techniques that can improve the boot times through picking the right device, optimizing the way that device is initialized, or doing the minimum for a future driver to take full advantage of a particular subsystem through loading it’s driver.

Power Hardware Role

Before the first instruction set, hundreds of milliseconds have elapsed.

Power Sequencing

If measuring time from the power button, then the motherboard hardware power sequencing can provide a source of delay. If simplified power plans merge the Manageability Engine’s power plane with other devices, then an additional 70 ms may be required.

Power Supply Specification

If you’re using PCI or PCIe add-in cards on a desktop system, then the PC AT power supply (silver box) will have an additional 100 ms delay for the PCI add-in card onboard firmware to execute. If you’re using a system with an embedded controller (mobile client or an all-in-one desktop), then there can be an additional power button debounce in embedded controller firmware that can be as much as 100 ms.

Flash Subsystem

The system should use the highest speed SPI flash components supported by the chipset.

High Speed SPI Bus for Flash

When selecting an SPI component, a slower 33 MHz clock for a SPI chip can slow the system boot time by 50 percent as compared with a 50 MHz SPI component. Single-byte reads versus multiple-byte reads can also affect performance. Designers should select at least a 50 MHz component. Intel series 6 and 7 chipsets support 50 MHz DOFR (Dual Output Fast Read). There are also components that will support Quad Fast Read. As the SPI interface is normally a bottleneck on boot speed, this small change can mean a lot to overall performance, and it is a relatively small difference on the bill of material.

Flash Component Accesses

Besides configuring for faster read access and prefetch enabling, further optimizations can be done to reduce its accesses. For example: BIOS option setup data could be a few kilobytes in size and, each time a setup option is referenced, it could cost about 1 ms for 33 MHz SPI, and there could be several references. Optimization can also be done by reinstalling the read-only variable PPI with a new version, which has a copy of the setup data in CAR/memory and thus the setup data is read only once from the SPI. It could also be cached in memory for S3 needs to prevent unnecessary SPI access during S3 resume.

SPI Prefetch and Buffer

It is possible to enable the buffers on the components to prefetch data from the flash device. If the chipset is capable, set up the SPI Prefetch as early as the SEC phase. It is recommended to do performance profiling to compare prefetch-enabled boot time of each of the individual UEFI modules to determine potential impact. The time when firmware volumes are shadowed can be particularly interesting to examine.

SPI Flash Reads and Writes

BIOS can minimize the number for flash writes by combining several writes into one. Any content in SPI flash should be read at most once during each boot and then stored in memory/variable for future reference. Refer to PCH BIOS Writer’s Guide for the optimal SPI prefetch setting. The PCH prefetching algorithm is optimized for serial access in the hardware; however, if the firmware is not laid out in a sequential nature, the prefetch feature may need to be turned off if the read addresses are random (see also “EDK II Performance Optimization Guide – Section 8.3”).

SPI flash access latency can be greatly improved with hardware and firmware codesign. Table 12.1 presents some guidelines that, when followed, can provide improvements of several seconds off a boot.

Table 12.1: Improving SPI Flash Access Latency with Hardware and Firmware Co-Design

Slow Interface and Device Access

Interface and device accesses can be time consuming, either due to the nature of the interfaces and/or devices, or the necessity of issuing multiple accesses to the particular controller, bus, and device.

DMI Optimizations

If the PCH and CPU are both capable of flexible interface timings, then faster DMI settings minimize I/O latency to PCH and onboard devices. The BIOS should be using the highest DMI link speed, as the DMI default link speed may not be optimal. The BIOS should enable Gen2 (or highest) DMI link speed as early as possible (in the SEC phase), if this does not conflict with hardware design constraints. The reason for this quirk is predictable survivability: the default value for DMI link speed is 2.5 GT/s (Gen 1). A faster DMI link helps in I/O configuration speed by 6 to 14 percent. Thus, it should be trained to 5 GT/s (Gen 2 speed), at an early SEC phase. There may be reasons why the link should not run at top speeds all the time. If the BIOS option controls the DMI link speed, when the option may only be read later in the boot, and down speed training.

Processor Optimizations

The following section describes processor optimizations.

CPU Turbo Enabling

Starting on Sandy Bridge CPUs, which launched in 2010, the CPU frequency at reset is limited to the lowest frequency supported by the processor. To enable CPU performance state (P-state) transitioning, a list of registers must be configured in the correct order. For the Intel Fast Boot, the following is expected:

Every Full Boot shall save the necessary register setting in UEFI Variables protocol, as nonvolatile content.
On Fast Boot, the necessary registers needed for enabling P-state will be restored as soon as possible. By the time that DXE phase is entered, the boot processor shall be in P0 with turbo enabled (if applicable).

Streamline CPU Reset and Initial CPU Microcode Update

Precise time budgeting has been done for the following sequence of events—from platform reset to initial BIOS code fetch at the reset vector (memory address). The key steps within this sequence are:

CPU timestamp counter zeroed and counting started at maximum nonturbo frequency.
DMI initialization completed.
Soft strap and dynamic fusing of CPU completed.
CPU patch microcode update read from SPI flash (only once).
All logical processors (APs) within the CPU package patch-updated (using the cached copy of the CPU patchMU).
BSP (bootstrap processor) starts fetching the BIOS at reset vector for execution.

Efficient APs Initialization

In addition to patch-updating, the BIOS needs to replicate memory range and other CPU settings for all APs (application processors) in a multicore, multithreaded CPU. The best optimized algorithm may be CPU specific but, in general, the following guidelines apply:

Parallelize microcode updating, MTRR, and other operations in logical core.
Minimize synchronization overhead using the best method for the particular CPU microarchitecture.
Execution from memory and not XIP from SPI.

Caching Code and Data

All BIOS code must be executed in cache-enabled state. The data (stack) must be in cache. These apply to all phases of the BIOS, regardless of memory availability. Unless an intentional cache flush operation is done (for security or other reasons), second access to the same SPI address should be hitting the CPU cache (see also “EDK II Performance Optimization Guide – Section 8.2,” but included here for completeness).

Main Memory Subsystem

The following section describes the main memory subsystem.

Memory Configuration Complexity

When looking at memory configuration, the higher frequency memory will provide faster boots. Like the processor, the simpler the physical makeup of the memory, the faster the boot will be. Fewer banks of memory will boot faster than greater numbers of memory. If the memory’s overall size is smaller, then the runtime performance will be limited. Balance the smaller number of banks with high bit technology memory to allow for a small but agile memory footprint.

Fast and Safe Memory Initialization

Since 2010, with the Intel® Core™ series CPU, fast memory initialization flow has been available for typical boot. To accomplish this, for the first boot with a new memory stick and/or a new processor, there is an involved and intensive (time-consuming) training algorithm executed to tune the DDR3 parameters. If the same CPU and DIMM combination are booted subsequently, major portions of the memory training can be bypassed.

In fast memory initialization, the MRC is expected to support three primary flows:

Full (slow) Memory Initialization. Create memory timing point when CPU and memory are new and never seen before.
Fast Memory Initialization. If CPU and DIMM have not changed since previous the boot, this flow is used to restore previous settings.
Warm Reset. Power was not removed from DIMM. This flow is used during platform reset and S3 resume.

The three flows can be used in conjunction with the Fast Boot states; however, they may operate independently of the main Fast Boot UEFI flag setting.

Hardware-Based Memory Clearing

On some SKUs of memory controllers offered by Intel, the hardware can be set to zero memory for security or ECC requirements. This is not part of the two-second BIOS boot time budgeting. Traditionally, a complete software- based memory overwrite is a very time-consuming process, adding seconds to every boot.

Efficient Memory Operations Instruction Usage

Starting on the Sandy Bridge generation CPU, new CPU instructions have been added to the speed up string operation. For memory operations, such as clearing large buffers, using optimized instructions helps. For more information, please see EDK II Performance Optimization Guide – Section 8.5.

SMBus Optimizations (Which Applies to Memory Init)

The PCH SMBus controller has one data/address lane and clock at 100 KHz. There are three ways to read data:

–SMBus Byte Read: A single SMBUS byte read makeup is 39 bits long, and thus at minimum one SMBUS byte read would take 0.39 ms.

–SMBus Word Read: A SMBus word read is 48 bits, hence 0.48/2 bytes or 0.24 ms/byte. Word reads are 40 percent more efficient than byte reads, but the bytes we need to read are not always sequential in nature on full boots.

–I2C Read: Where I2C is an alternate mode of operation supported by the PCH SMBus controller.

With the MRC code today on fast boots, we do not read all the SPD bytes all the time; we read only the serial number of the DIMMs, unless the DIMMs change. The serial number can be performed with sequential reads. Experiments prove that using the I2C read saves a few milliseconds, which count overall.

Minimize BIOS Shadowing Size, Dual DXE Paths for Fast Path versus Full Boot

UEFI BIOS is packaged into multiple firmware volumes. The Fast Boot is enhanced when there are several DXE firmware volumes instead of one monolithic one. That means the DXE phase of the BIOS should be split into two or more DXE firmware volumes; for example, a fast one and a slow one (a full one). The fast-boot-capable DXE firmware volume contains the minimum subset of module needed for IFB typical boot, and the other DXE firmware volume contains the rest of the module needed for full boot.

This requirement affects only the firmware volumes that have to be decompressed from SPI flash into memory before execution. To optimize decompression speed, the BIOS needs to avoid decompressing unnecessary modules that will not be executed.

This may be done at the DXE driver boundary; however, there is no restriction preventing module owners from creating a smaller fast boot module and a separate full boot module for the two different volumes.

PCIe Port Disable Algorithm

There are Mini PCIe enumeration needs for detections that ultimately lead to function-disable of the particular PCIe port. These include PCIe port without card detected, and NAND over PCIe initialization flow. All these must be combined and get done in a single Mini PCie enumeration.

Manageability Engine

The Manageability Engine (ME) is a security processor subsystem and offload engine inside the PCH. There are two SKUs of the firmware that runs on the device: 1.5 MB and 5.0 MB SKUs. The 5.0 MB SKU is associated with advanced security features, such as Intel® Active Management Technology. The 5.0 MB firmware requires a setup option ROM called the Manageability Engine BIOS Extension (MEBx), which up until 2011 ran on every boot, which took time. There are also ME/BIOS interactions during boot, regardless of the firmware SKUs.

Eliminating MEBx

Starting with 2012 platform controller hubs, the Intel PCH 5.0 MB eliminates the need to execute MEBx on Fast Boots. Instead of running MEBx on every boot, MEBx is run only as needed within the framework on a typical boot per the UEFI flag.

Reducing Manageability Engine and BIOS Interactions

In addition to the 2012 MEBx optimization, starting in 2012 platforms, during normal boot there are only two architecturally defined sync-points between ME and BIOS remaining:

Device Initialization Done (DID). This happens as soon as memory is available for ME use following MRC initialization. The time is estimated to be between 50 ms and 120 ms after TSC starts, depending on the MRC and CPU requirements.
End of POST (EOP). This happens before a BIOS process boot list (at the end of DXE phase). It is estimated to be 700 ms after TSC starts.

All other ME-BIOS communication will happen asynchronously outside of these two sync-points (that is, no waiting for the other execution engine). The MEBx (ME BIOS extension) module is not expected to be called in a typical BIOS boot. If it is needed, it can be called via the exception handling methodology defined in Intel Fast Boot framework.

Hardware Asset Reporting for Intel® Active Management Technology (Intel AMT)

Within the Fast Boot framework, SMBIOS, PCI Asset, and ASF tables are always updated and passed to the ME Intel AMT firmware (5MB in size) regardless of boot mode.

For the media table, the BIOS will enumerate all storage controllers and attached devices during full boot and upon request by the Intel AMT firmware. Upon detecting an Intel AMT firmware request, BIOS will enumerate all media (except USB) devices to generate and pass the media table. The heuristic on how frequent Intel AMT will request this is the responsibility of the Intel AMT design.

USB Flash Drive Provisioning for Intel® AMT

Instead of supporting USB provisioning for Intel AMT in a typical fast boot, a BIOS following the Fast Boot framework will support USB flash drive provisioning only in full boot mode. By definition, any type 2 or type 3 exceptions will cause the BIOS to fall back into full boot mode. For example, one mechanism is when a user interrupts the boot processing by a hot key, stalling the boot; an exception will be triggered. If BDS phase is in full boot mode, a USB stick provisioning of Intel AMT will need to function as expected.

Graphics Subsystem

The following section describes the graphics subsystem.

Graphics Device Selection

When looking at video and graphics devices, the panel timings were mentioned above. The controller timing and speed are also important to boot speeds—the faster the better. The timing numbers can be modified if required to help to achieve this in a BMP utility on the UEFI GOP driver. A UEFI Graphics Output Protocol driver will provide faster boot speeds than a legacy video BIOS. Finally, a single graphics solution will be faster to boot than a multiple display/controller configuration.

Graphics Output Protocol (GOP) Support for CSM-Free Operating Systems

For operating systems that support a CSM-free boot, the GOP driver will be loaded by BIOS instead of CSM legacy video option ROM. This eliminates the time spent on creating legacy VGA mode display services (INT 10). Benefits can be seen in Figure 12.1 in microseconds based on different ports/monitor selection.

Figure 12.1: Benefits of Graphics Output Protocol Support

Panel Specification

If you are using an Embedded DisplayPort† (eDP) panel, using the panel startup time per the industry specification, then 250 ms is required during the boot to just reset power on the panel. This is a ridiculously long time to wait for hardware, as an entire PC motherboard takes about that time to come up through power sequencing. If the timing is adjusted to what the hardware actually requires to cycle power, then eDP may prove adequate.

Start Panel Power Early

Like the disk drives, the panel now must be started early to parallelize the delays in the hardware during boot. A PEI module to power up the eDP panel is needed if the target display panel has a noticeable delay in the power-up sequence. For example, if a panel requires 300 ms to power up, a PEI module to enable (power up) the eDP port needs to be executed at least 300 ms before the video module is reached in BDS phase of BIOS.

Storage Subsystems

The following section describes storage subsystems.

Spinning Media

For spinning media storage devices, the spin-up time for a hard drive is 2seconds minimum. Even if the BIOS starts the initialization a short time after it gets control of the system, the drive may not be ready to provide an OS.

Utilizing Nonblocking Storage I/O

The Intel PCH integrated SATA controller supports Nonblocking Command Queuing (NCQ) in native AHCI mode operation. Unless required by the target operating system, the BIOS should access the storage subsystem in the most efficient way possible. For example: in all Windows operating systems since Windows XP, AHCI mode should be the default SATA storage mode.

Early SATA COMRESETs: Drive Spin-Up

Generally, in client platforms, the disk feature of power up in standby (PUIS) is disabled. The hard disk would automatically spin up once it receives a COMRESET, which is sent when the BIOS enables the related SATA port. Spinning up the hard drive as early as possible at the PEI phase is required by enabling ports right after setting the SATA DFT.

While SATA SSD NAND drives do not literally spin up, the wear-leveling algorithms and databases required to track the bits take hundreds of milliseconds before data is ready to be fetched (including identifying drive data). While this can be mitigated with SSD firmware enhancements or controller augmentation to store such data, numbers in the range of 800 ms are probable with the latest SATA3 SSDs at the time of this writing.

CSM-Free Intel® Raid Storage Technology (Intel RST) UEFI Driver

The Intel RST UEFI driver is required to allow for SSD caching of a slower hard drive. The SSD performance far outweighs the HDD in both read/write and spin-up readiness. This newly optimized UEFI driver is needed to support the CSM-free Class Two and Class Three UEFI boot mechanism. Elimination of CSM is the primary time saving; however, the optimizations made to the new UEFI driver over the legacy Intel RST option ROM are dramatic. As with the MEBx UEFI driver, the Intel RST driver will follow the recommendation of the UEFI flag for Fast Boot. One of the fallback conditions for Fast Boot must also be that for any drive configuration change, the BIOS must inform the UEFI option ROMs via the UEFI boot mode flag.

In SDR0 (Single Disk RAID 0) Intel RST caching mode, the Intel RST driver does not wait for HDD to spin up. It needs to allow data access (read) to OS boot loader as soon as cached data in SSD are available.

The fastest HDD (at the writing of this chapter) takes about 1.4 to 3 seconds from power-up to data availability. That is far slower than the 800 ms power- up to data availability on Intel SSD (X25M).

Minimizing USB Latency

Intel integrated USB host controller and integrated hubs have much smaller latencies than the generic host controller and hub settings spelled out in the USB specifications. The BIOS can be optimized for the integrated components (as they are always present in the platform), by replacing the default USB specification timing with Intel PCH USB timing, as published in the PCH USB BIOS writer’s guide.

For example, the minimum time needed to enumerate all the USB ports on PCH (as if they are empty) is approximately 450 ms. Using Intel PCH silicon-specific timing guideline can cut that down by more than half.

Power Management

The following section describes power management.

Minimizing Active State Power Management Impact

On several buses in the platform, there is a recommendation for active state power management (ASPM). The ASPM setting is important in extending battery life during runtime; however, there is nonzero latency in ASPM state transition. The impact can be seen in Table 12.2 for both boot and resume times.

Intel DMI bus supports a PCI ASPM lower link power scheme. To eliminate potential side effects, enabling of DMI ASPM should be done at the last possible moment in the BIOS initialization flow. Actually, it can be made after the POST is completed.

To delay the setting of the DMI ASPM link states (L0s/L1) to the last possible moment in the boot, there are three possible options:

At ExitBootServices()
In ACPI
One-shot SMI timer, heuristically determined by experiment, say 8 seconds after ExitBootServices(), to cover the OS booting period

Option 1 is to be selected if we are interested only in BIOS boot time improvement. Option 2 and 3 could be explored for OS boot time improvement. If we aim to improve the OS load time as well, we could use the SMI timer for the S4/S5 path, and use the ACPI method _BFS to set the DMI ASPM for the S3 path, assuming an ACPI compliant OS.

Table 12.2: Active State Power Management Impact

Responsiveness Phase from Microsoft’s VTS tool on an Intel Sandy Bridge CRB Board/BIOS	Baseline ASPM - ON	ASPM - OFF
BIOS Post (seconds)	8.89	8.87
Boot to Desktop (seconds)	5.39	5.14
Boot Complete (seconds)	7.31	6.97
Resume (seconds)	0.6	0.57

Security

Security at a high level is often a tradeoff versus boot speeds and responsiveness. Trusted Platform Modules and measured boots will add noticeable time to a boot flow. Single-threaded Boot ROMs, HW Root of Trust and subsequent daisy chaining of authentication takes a very long time if not architected for speed (and security). You need to look at the platform requirements carefully and balance security and responsiveness. There are some things we can do to mitigate security impact to platform boot times.

Intel® Trusted Execution Technology (Intel TXT)

Intel Trusted Execution Technology included additional BIOS binary modules that execute to assist in authentication of subsequent code execution and provide a secure environment for that activity. It takes time to execute these modules, and it takes a small amount of time to do authentication prior to executing code in these environments. Other authentication schemes have similar setup and execution penalties.

TPM Present Detect and Early Start

Trusted platform modules hash and save results as variables during a secure or measured boot. The delay associated with a TPM can be between 300 ms to upwards of 1 second, depending on the TPM vendor, the TPM firmware revision, and the size of the BIOS firmware volumes being hashed. There are several techniques that can save time when using a TPM:

1.Use the fastest SPI flash part available.

2.Use the fastest TPM within the budget.

3.Where possible, execute longer latency TPM commands in parallel with other BIOS code. TPM_Startup and TPM_ContSelfTest commands are usually the slowest commands. This will allow for continued BIOS execution and access to the TPM while the diagnostics are completed. Specifically:

–Finish measurement of last FV in SEC/PEI before executing TPM_ContSelfTest in PEI.

–Delay checking for TPM_ContSelfTest until the next TPM command in DXE, and delay the next TPM command if possible. Interrupting SelfTest in some cases causes additional delay.

4.Measure only what is needed. Do not measure free space or boot block if it cannot be modified.

5.If TPM supports legacy, turn off I/O port 0x4E/0x4F.

6.Depending on the BIOS settings and configuration algorithm optimization, there could be several access attempts to 0xfed40000 to detect whether TPM existed on the platform, and each access could cost about 7 ms. Adding an HOB, a UEFI hand off block, to save the data after the very first access to the 0xfed40000 could be used to indicate whether TPM is present on the board or not. The rest of the components should then reference to this HOB instead of checking for the presence of the TPM on system. Read the same information across IO just one time…

7.Copying data into memory before hashing will save time over hashing-in-place.

Operating System Interactions

The following section describes operating system interactions.

Compatibility Segment Module and Legacy Option ROMs

In a UEFI BIOS, a Class 3 UEFI solution will normally be more than 100 ms faster than a legacy OS-supported solution; that is, the CSM time to execute (without additional delay due to legacy option ROMs). Again, this is a tradeoff between OS compatibility support with older operating systems and boot speeds. Setup menu options can disable the CSM if it is not required.

OS Loader

If the OS is being examined, then the OS loader times also can be improved by looking at the OS image size. Limiting the OS requirements for pre-OS keyboard can speed up boot by tens to hundreds of milliseconds. Loading the user interface sooner in the boot flow of the kernel will make a noticeable difference to the end user. Device driver load and start times and usage of services can be streamlined to positively affect boot performance.

During runtime, the UEFI capabilities are very limited and not all the UEFI drivers that were used to boot the platform are available for the OS to call. Once Exit-BootServices() is called by the OS loader and it assumes control of the platform, much information is lost.

The OS loader can collect quite a bit of data about the platform above and beyond the typical ACPI table standard set of information accessing the BIOS through UEFI function calls. Before exiting boot services, the OS loader can both get and give data directly to the BIOS.

An example of the OS level of interactions is setting up for the graphics resolution of the splash screen such that it will match the OS via a hint during OS loading.

Legacy OS Interface

Windows 7 and other legacy operating systems that require a CSM in the BIOS to provide Int 10h (and other legacy software interrupts) execute hundreds of milliseconds to several seconds slower due to the nature of their boot flow. Initialization of each and every legacy option ROM serially is just one reason why it may be many seconds slower than UEFI boot flows. If the OS was not optimized for boot during the kernel and driver loading, then any reasonable amount of BIOS improvement is going to be lost anyway (even a zero-second boot is too long if the OS takes more than ten seconds to boot).

Reducing Replication of Enumeration Between Firmware and OS

The OS often repeats enumeration of buses in the post-boot space that the BIOS firmware has performed in the pre-boot. Ideally this would be a source of timing savings. However, upon further inspection, there are multiple reasons for this replication, including but not limited to the following:

Firmware may not have done a complete job of enumerating the entire enumerable subsystem, expecting software to repeat the enumeration when the OS level drivers load. This may be due to the BIOS not requiring use of that portion of the system in the pre-boot space.
Virtualization: the firmware can perform a full enumeration of a bus, then expose a different set or a subset of hardware to the operating system through virtualization technology.
The firmware may not have done an accurate job.
The initial enumeration may not work well with the kernel or device driver stack designed by the operating system developers.

At the end of the day, the BIOS must enumerate the portions of the design only just enough to boot the operating system. Assuming the operating system has the proper enumeration support for the system hardware, the enumeration will be repeated and in a more complete manner than in the BIOS. Standard enumerable bus architecture allows for this replication and the system may require it. Examples include PCI and USB enumeration. The whole USB network under a port may not need to be enumerated five-plus hubs deep. The BIOS really needs to initialize all the hardware that cannot be enumerated through industry standards (such as i2C). The coordination can be made tighter in an embedded design where an RTOS and a custom firmware have minimum overlap in enumeration.

Other Factors Affecting Boot Speed

Certain devices or configurations are known to extend the boot times, including but not limited to the following items.

No Duplication in Hardware Enumeration within UEFI

While replication of enumeration maybe required between BIOS and OS, it is not required within the UEFI domain itself. If necessary, the BIOS can pass information between modules via UEFI variables or HOBs. For example, we can use an HOB to pass CPUI BIST information from SEC to PEI, and memory information from MRC to the SMBIOS module. It is recommended that we not access the same hardware I/O twice unless the data is expected to change.

Minimize Occurrences of Hardware Resets

Most hardware have a long power reset sequence. Question whether a hardware reset is necessary, or if it can be handled in software without reinitializing hardware. It is possible that CPU initialization, memory initialization, or ME initialization may all require an extra system or CPU reset, which will add time, as the boot is partly replicated for that given boot. Fast Boot eliminates most possibilities of system resets.

Intel Architecture Coding Efficiency

Intel architecture performance can be sensitive to coding arrangement (just like any other computer architecture). Follow coding optimization guide at the Intel Software portal is the best-known method. At a minimum, code and data structure alignments should be optimized as described in the optimization guide. (See also EDK II Performance Optimization Guide – Section 8.10.)

Network Boot Feature

A network boot (booting to an OS image over LAN) takes several seconds to negotiate with the DHCP server for an IP address. Fast Boot is not really an option.

Value-Add, But Complex Features

Complexity and robust feature sets will likely result in a flexible, but slower boot performance than a simple configuration. RAID is a feature that adds a lot of value, but can decrease the speed of the boot due to an option ROM execution requirement. Here UEFI drivers can help with some of the boot speeds, but cannot completely compensate for the tradeoffs.

Tools and the User Effect

Tools being used to measure speed can produce an observer effect if not properly implemented. Using file I/O, or serial output, or post codes, or other slow recording mechanisms can add to a boot flow. And the more precise the data collection is, the greater the effect. Methods can vary broadly per tool, but the best tools will use memory to store the data during the boot flow and then read it off the platform afterwards. For a complete picture of the boot flow (into the OS level), the best tools are from the OS vendor that has incorporated the Firmware Performance Data Table (FPDT), where the BIOS reports the data into a memory location inside the ACPI tables. Runtime tools can read the data after the fact.

Human Developer’s Resistance to Change

As Confucius said, “Only the wisest and stupidest of men never change.” The developer’s attitudes toward the challenge of boot speeds can have a huge impact on the results. “It’s only a few milliseconds” can add up quickly. “S3 is fast enough” will leave many milliwatts and milliseconds on the table. “It’s a systemic problem, what can I do?” will leave others to solve the problem if they choose to. “Even if the BIOS disappeared entirely, the OS is still too slow,” but that cannot be said any more.

Intel architecture platforms have not all been optimized with this technology to date. Customers need to work with their independent BIOS vendors to see if the capability has been included with their BIOS release to achieve Fast Boot optimization.

Motherboards can be developed to encompass the right sequencing with the right parts. Tradeoffs can be worked through for the right reasons. Tools can be obtained and code modules instrumented properly. And with the right approach, developers can streamline the entire boot path into something truly responsive.

Summary

When combined with a systematic Fast Boot framework and policy decision around major subsystems, hardware selection and initialization nuances complete the picture of a quick boot solution.

The list discussed is far from complete, focusing on today’s Intel® Core™ based platforms. Similar activities are feasible on any platform with due diligence and time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 12: Collaborative Roles in Quick Boot

Create new playlist

Sign In

Sign Up

Chapter 12

Collaborative Roles in Quick Boot