Chapter 11
Intel’s Fast Boot Technology

A little simplification would be the first step toward rational living, I think.

—Eleanor Roosevelt

One of the key objectives of computing platforms is responsiveness. BIOS boot time is a key factor in responsiveness that OEM manufacturers, end users, and OS vendors ask firmware developers to deliver. Here the focus is on system startup time and resume times.

Traditional Intel architecture BIOS has labored through the years to design itself to boot on any configuration that the platform can discover, each time it boots it works to rediscover anything it can about a change to the machine configuration that may have happened when it was off. This provides the most robust, flexible experience possible for dynamic open boxes with a single binary. However, this has also resulted in a bloated boot time upwards of 30 seconds for the BIOS alone. As we have covered in other chapters, when properly tuned and equipped, a closed box consumer electronics or embedded computing device powering on can easily boot in under two seconds. This normally requires a customized hard-coded solution or policy-based decisions, which can cost several months of optimizations to an embedded design. By following the methods contained in this chapter, even open-box PCs can be as fast as embedded devices and consumer electronics, less than two seconds. And the benefit to embedded designs is that they need not spend weeks and months of analysis per design to get the desired effect on boot speed.

The Human Factor

Assuming the device being created will interact with people, it is important to understand how the human brain reacts to stimulus, or what it expects in response with regards to timing.

Figure 11.1: Response Time in Man-Computer Conversational Transactions

Back in 1968, a psychologist named Robert B. Miller, working at IBM in Poughkeepsie, New York, conducted experiments between people and computers using the latest input devices at the time. While the devices have changed over the last few decades, the human brain has not (Miller, 1968).

In Miller’s experiments, he showed that a person would believe that, in terms of response time, less than 200 milliseconds (ms) is considered to be immediate, greater than two seconds makes people start to get impatient, and after four seconds, communication between human and machine is broken (see Figure 11.1).

Four decades later, another psychologist, Steven Seow, authored a book with a similar set of experimental results in his responsiveness chapter.1 While Seow’s experiments were geared toward a software UI instead of simple input device responses, the results are strikingly similar to Miller’s, as illustrated in Figure 11.2.

Figure 11.2: Seow’s Responsiveness Time Definitions

Seow broke up responsiveness into four categories:

Instantaneous was measured in the range 100–200 ms, where the maximum was 100 ms for a key press and a maximum of 200 ms for a menu draw.

Immediate implied 500–1000 ms, where the end user perceives information is available and being displayed.

Continuous was coined for the range of 2000–5000 ms, where the machine is reacting, but the user expects feedback on progress at longer durations.

Captive lasted from 7000 through 10000 ms, the maximum time a user will maintain focus before switching to some other task (abandon as background).

The next set of similarities was determining, without a stopwatch, a delta between times. Variations may not be noticeable to the untimed eye.

Seow suggested a basic “20 percent rule” for perceptible differences. Miller’s data was a bit more refined (as Figure 11.3 shows):

75%of people cannot detect change of ±8 percent between 2 and 4 seconds

From 0.6 to 0.8 seconds, there was 10 percent variation

From 6 to 30 seconds, a 20–30 percent variation

Figure 11.3: Miller’s Results

For typical system boot times where the BIOS executes in 6 seconds to 15 seconds, the times would have to be improved between 1.2 and 4.5 seconds for someone to take notice and appreciate the work of a developer trying to make improvements at the millisecond level.

It should be noted that many little changes can add up to a lot. Even in the sub-second range, where Miller’s results imply a need for approximately 80 milliseconds to see results, a series of small changes of 5 to 10 ms can easily reach a noticeable timing result.

Responsiveness

Looking across the system, in order to achieve what your brain thinks is the right timing, many things must be aligned for speed and performance, not just during runtime, but during startup and resume operations, as well as during sleep and shutdown. Every millisecond wasted burns milliwatts, costing time and energy.

We need responsiveness across all levels of the platform, as shown in Figure 11.4:

Figure 11.4: The Platform Level Stack Responsiveness

Low latency hardware. Today’s silicon and hardware provide much greater speed than they are normally tuned for. While people think of a PC as booting in 45 seconds or more, the same hardware, properly tuned, can execute a viable boot path in less than 1 second.

Along these lines, power sequencing of the various power rails on the platform is an area that needs to be examined. Between 300 and 700 ms can be absorbed by just the power sequencing. This is before the CPU comes out of reset and the system firmware starts operating.

Fast boot BIOS and firmware. System BIOS and embedded microcontroller firmware components are moving forward in ways that they didn’t in years past, or didn’t need to. Silicon vendors often provide sample boot code that works to enable the platform through initialization. What is needed is more responsive silicon reference code.

Operating system. Intel has been working with OS vendors to optimize the user experience. For developers, it is easy to see where a very modular and diverse mix of components can work if architected correctly, but many solutions are not fully optimized. Linux teams in the Intel Open Source Technology Center (OTC) are engaged to speed up the experience from kernel load and driver startup times. And other operating systems are not being ignored.

Driver optimizations. Intel OS drivers are measured using various tools, depending on the OS. Intel is reducing the load and execution times for all of our drivers. We are continually investigating any optimizations possible to improve the responsiveness of our ingredients, including the graphics and wireless device drivers.

Middleware. This is a topic that you will have to consider: the applications that use it and how it can be altered to take advantage of offload engines on the system.

Applications. Like other companies, Intel is investing in its application store, as well as working with others in the industry to streamline their applications with various tools from the Intel tools suites. There are a variety of applications we provide which assist in the debug and monitoring of the system that we provide to all of our customers.

Depending on the use of solid-state drives, there are fundamental things that can be done differently at the OS and application levels that can have a profound impact on application responsiveness. This is an area we need to work on, this is where our customers work, and this is where it will count the most. The user experience is here, if we did the rest of our jobs correctly.

And let’s not forget that responsiveness doesn’t end at the platform level; it extends into the local network and into the ubiquitous cloud.

The (Green) Machine Factor

Assuming the device being created will interact with other machines, it is important to understand how the delays in the system will affect the overall effectiveness. Timing requirements between machines can be much tighter or much looser than when dealing with people.

Mission-critical systems may have single-digit millisecond responsiveness requirements in certain responses where the system cannot be put into a lower power idle state without significant risk or limitations to the overall network. Other systems require only low data amounts and infrequent access, and can stand to wait several seconds or minutes or hours for a system to resume from slumber and complete a task.

Real-time systems allow for setting priority on devices as well as execution threads so they can predetermine if there is any millisecond wait at all for anything in the system. The more responsive the system is, the more flexibility the real time system designer may have.

The faster the response times can be, the deeper the sleep state can be and the less power is required to keep the system active over time. Example: If the system can be made to boot in less than two seconds from an OFF (S4) state, where power is normally 2 mW, then why put the system into S3, where the resume time is about 1 second and the power is several hundred mW? Obviously, it depends on the resume times, power requirements, and usage model tradeoffs. But the faster response times from lower power states can prove extremely beneficial. Another example is Enhanced Intel SpeedStep® Technology, where the CPU can dynamically enter a lower power state and run at a lower frequency until the performance is required. Average power is a key factor when working on more ecologically sensitive, green infrastructures, from servers to sensors. Responsiveness capabilities provide an ability to lower power overall and create more easily sustainable technology.

Boot Time Analysis

In order to properly measure the responsiveness of a system, a stopwatch doesn’t work. Counting aloud “one Mississippi, two Mississippi, three Mississippi…,” doesn’t work either. For the right level of granularity, there are timers on both the Intel processors (Time Stamp Counter, TSC, with millisecond timing) and in the chipsets (High Performance Event Timers, HPET, microsecond timing) that can be implemented as part of a logging mechanism in the firmware. While firmware or hardware architecture limit the straight usage of either, it is possible to incorporate a mechanism with reasonable accuracy for the need of boot time analysis. Logging of the various milestones in the boot flow can be added by outputting to a serial port as an example. A temporary storage location in local memory reserved from the OS known location is preferred, though, for two reasons: no port requirements and no additive delay creating an observer effect. This data can also be dropped into an ACPI table for the log to be retrieved and reviewed later.

If using Tiano implementations, EFI PERF Monitor functions can be added quickly to various EFI code modules in all phases of the Tiano boot flow.

In the PEI phase, we use PEI_PERF_START(), PEI_PERF_END(), PEI_SER_PERF_START(), and PEI_SER_PERF_END().

In the DXE, BDS, and Shell phases, we use PERF_ENABLE(), PERF_ START(), PERF_END(), and finally PERF_UPDATE().

These logging routines will dump data into a reserved memory location from a cold boot for retrieval later.

You also need to know how to get around a few limitations:

  1. The first limitation is that when you are doing CPU or memory initialization, a reset is required to either the CPU or the platform. When this happens, the timers may also get reset. Finding a scratchpad region that is “sticky,” which means it maintains its data across a warm reset or cold boot, is required such that you can save/restore the TSC for accurate measurement/logging across the entire boot path, instead of from just the last reset executed by the firmware.
  2. Some basic framework processing overhead can happen outside of the instrumented routines that will not be counted. Between main PEI or DXE cores operating, the processes in between may not be fully instrumented in a particular code base. While this should not be the case, a few milliseconds here or there could slip through the cracks.
  3. Over the course of a system S3 sleep/resume cycle, all the timers are reset and you must reserve a memory region from the OS that you can access when memory is “active.” As S3 has been on the order of several hundred milliseconds versus tens of seconds, some people choose to use this state as their low-power fast boot option.
  4. The TSC or HPET timers may not be set up by default at power on. The firmware will have to initialize them and some tens of milliseconds may be lost in the setup before the first logging can occur.

One way to overcome the software logging issue is to have the hardware instrumented with a logic analyzer. Depending on the motherboard layout and test points available, you should be able to probe the motherboard for different signals that will respond to the initialization as it happens. If no known good test point exists, a GPIO can be set up as the start and end point. Sending the I/O commands takes some time, so it is not ideal.

Using hardware measuring techniques brings further complications. It is likely that the hardware power sequencing takes upwards of 100 ms alone to execute before the processor is taken out of reset and the BIOS or bootloader code can begin to execute. From a user’s perspective, it should be considered as they “pushed the button” and their eyes are expecting a response in less than a few seconds. So from a BIOS point of view, this hardware power sequencing is a required handicap.

With the addition of any large number of experimental test points, it is possible to incur an observer effect or to change the boot environment, slowing it down with the extra cycles being added. Example: if you turn on debug mode, or if you do an excessive number of I/Os, the performance can be heavily affected by up to 30 percent in some cases. Be aware of this effect and don’t chase ghosts. This concept is outlined in Mytkowicz et al. (2008).

Once we have the data, then the fun truly begins. A quick Pareto chart, where summarizing the timing data per block may quickly help developers focus on the top 20 percent of the longer latency, which may total up to 80 percent of the boot time. These items can be reduced first; then dig into the shorter portions. When looking at attacking this problem, it is a good idea to step back and look at the bigger picture before charging ahead and tackling the problem feature by feature.

First Boot versus Next Boot Concept

In ACPI system state description, the system starts up from G3, G2/S5, or G1/S4, and ends in a G0/S0 working system state. Orthogonal to Global and Sleep states of the ACPI, Intel has defined Fast Boot states that can be thought of as:

B0. First boot, in order to robustly boot on any configuration, firmware dynamically scans all enumerable buses and configures all applicable peripherals required to boot the OS.

B1. Full boot, similar to first boot, whenever a configuration change is noted.

BN. Typical subsequent boot, which reuses data from previous scans to initialize the platform. This results in a sub-two-second user experience.

Figure 11.5 is a diagram of the Fast Boot State.

After a full boot and with no configuration changes, a fast path is taken in the subsequent BIOS boot, resulting in faster boot time. Fast Boot can be implemented without missing any platform features, assuming that the idea is to hand off to the OS loader. This fast path is a “normal” or “typical” boot path.

Figure 11.5: Boot State Diagram

Some environments may not be suitable for such a Fast Boot scenario: IT activity, development activity, or a lab environment may not provide enough flexibility. For those cases, the atypical full boot path or first boot paths continue to be supported. Decisions can be made automatically or triggered by user or other external trigger.

Boot Mode UEFI Configuration Setting

In general, the B-states can be aligned to the UEFI-defined boot modes listed in Table 11.1.

Table 11.1: UEFI-Defined Boot Modes

Full Boot Configuration Fast Boot
B0 B1 B(n)
BOOT_WITH_DEFAULT_ SETTINGS BOOT_WITH_FULL_ CONFIGURATION BOOT_WITH_MINIMAL_ CONFIGURATION

The following list shows a high-level breakdown of the two-second budget. It does not assume the scripted boot method, which may be faster in certain scenarios:

1.SEC/PEI phase budget – 500 ms, where:

Memory is configured, initialized, and scrubbed (if needed).

BIOS has been shadowed to memory and is currently executing from memory.

ME-BIOS DID handshake is done and no exception or fall back request was raised.

Long latency hardware (such as HDD, eDP panel) has been powered up.

CPU is completely patch-updated, including the second CPU patch microcode update.

2.DXE phase budget – 500 ms:

Greater than 5 ms each for about 100 existing modules (most are in 1-ms range).

Each DXE module has a time budget of 10 ms, unless explicitly waived otherwise.

DXE module that issues proper delay to allow other module to run will not have the delay time counts against it.

CPU DXE, PCH DXE each has extended time budget – 100 ms.

ME DXE budget – 10 ms (no MEBx during Bn).

Option ROM – 0 ms, no legacy option ROMs are allowed.

Raid Storage Technology ROM - 0 ms as AHCI mode does not need RST option ROM.

3.BDS phase budget – 500 ms – only one boot target:

GOP does not display dynamic text – 100 ms.

4.TSL (transient layer), – 500 ms:

OS bootloader time is after the BIOS boot ended. However, it does affect the overall end-to-end boot time.

Fallback Mechanisms

There are several events that could cause a full boot to transpire following the next system reset, including hardware, BIOS, or software level changes.

Certain hardware changes that are relatively easily detected by BIOS include CPU, memory, display, input, boot target, and RTC battery errors.

BIOS setting changes that will cause an event may include changes to the boot target and console input/output changes.

Software changes include UEFI updates.

Not every exception event is likely to trigger a full boot on the next boot sequence or involve interrupting the current boot with a system reset. Table 11.2 depicts the different types of exceptions that could occur.

Table 11.2: Types of Exceptions

A Type 1 Exception can be completely handled within an EFI module and the remainder of the BIOS boot flow will continue with the BOOT_ WITH_ MINIMAL_CONFIGURATION flag. Optionally, the exception can be logged for diagnostic purposes.

A Type 2 Exception occurs if a BIOS module encounters an issue that will prevent the rest of the BIOS from continuing the Fast Boot framework. The remainder of the boot sequence may execute slower than anticipated but no reset is needed. The boot will continue and it is recommended that a logging of the vent occurs.

A Type 3 Exception requires an interruption of the current boot flow with a system reset. The next boot will be a full boot (without BOOT_ WITH_ MINIMAL_ CONFIGURATION flag) to handle the exception. It is unlikely, unless the error was generated during memory initialization, that a full MRC training would be required on top of the full boot.

A Type 4 Exception is similar to Type 3, except that after the system reset, the BIOS is in Full Boot mode, and Memory Initialization needs to perform full training.

Table 11.3 lists a series of exceptions and the probable type casting. Several of these can be different, depending on the policy decisions of the designer.

Table 11.3: Exceptions and Probable Type Casting

Type Exception Example
3 BIOS Setup Fast boot setting override, setup option changes
4 Boot Failure No successful first/full boot, watch dog timeout
4 Hardware Change ConOut changed, RTC power loss, CMOS clear, 4-second power button override is pressed
4 Hardware Override Recovery jumper, MFG jumper, critical CPU error
2 Software Override UEFI capsule
4 User/Setup User interrupt, separate from dock
4 Memory Changed DIMM change, no DIMM ID, CPU changed?

Baseline Assumptions for Enabling Intel Fast Boot

The following assumptions are made in the Fast Boot feature:

1.A stable platform configuration is reachable. Following the initial provisioning boot, small modifications to platforms are allowed, but the majority of systems boot continuously with the same configuration, from the same drive for the life of the system to the same operating system for the life of the system, in a very limited number of environments (example: at home, at work, or in a single industrial application).

There are no reconfigurations allowed after the first boot. This means that after the platform is provisioned out of the box, the configuration is not changing.

Boot device list does not change, the BIOS setup menu does not change, and the non-PCI devices on external buses/ports do not require BIOS discovery and initialization. Finally, device initialization needs do not change.

Minimum configuration boot when boot target is a nonstandard or user-defined OS. Provide only a static splash screen display only, as opposed to dynamic video prior to OS.

2.No UEFI shell or Serial Redirecting debug console as any user interaction or configuration change will negate Fast Boot times.

3.UEFI only boot. A legacy boot may be necessary for older OS loader and operating systems, but this will inevitably delay the boot times as well as open the system to potential root kit attacks.

4.Setup Menu or other user entry (hot keys) during boot is not required. If it is to be entered, then, when entered, boot speed is not a key requirement.

5.When referring to sub-two-second timing the start and finish lines are described in the following manner:

The Start Line is when the processor comes out of reset and starts to fetch code from the SPI. While the starting line can be from the power button, depending on the scope of the parameters, the BIOS is not responsible for the approximately +300ms of power sequence time between the button until the CPU comes out of reset.

The Finish Line is when the UEFI BIOS calls LoadImage() for the OS loader image. On loading of the image, the system enters a handoff phase where the OS loader is responsible much more so than the BIOS.

Intel Fast Boot Timing Results

How effective is it? Based on experiments with a variety of systems from 2010 through 2012, system boot times were decreased from:

seconds (full boot)

As low as 1 second (Fast Boot) in some cases

Typically, 2 seconds for a Fast Boot is achievable for PCs, imagine what a true embedded system can do?

Summary

This chapter introduced a very powerful Fast Boot mechanism that can be applied to UEFI solutions. While this mechanism provides dynamic and reliable time savings over a monotonic long, slow boot, other optimizations must still be performed for optimum results. As we have seen in other chapters, hardware selection, OS loader optimizations, and OS interface come into play. Due diligence of the development and design teams is vital to a successful Fast Boot user experience.

Developers should read Chapter 12, and then reread Chapters 10, 11, and 12. Then ask yourself and others the right questions, including: “What will it take to do this?”

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.46.92