© The Author(s), under exclusive license to APress Media, LLC, part of Springer Nature 2022
S. Banik, V. ZimmerFirmware Developmenthttps://doi.org/10.1007/978-1-4842-7974-8_6

6. Looking at the Future of System Firmware

Subrata Banik1   and Vincent Zimmer2
(1)
Bangalore, Karnataka, India
(2)
Issaquah, WA, USA
 

“If everything you try works, you aren’t trying hard enough.”

—Gordon Moore

The evolution of system firmware over the last three decades has involved inheriting lots of complexity to support the underlying hardware and complement the limitations in legacy operating systems in a more flexible way for device manufacturers. This has resulted in unnecessary complexity in the entire system firmware boot process, requiring significant development cost and time to restructure the firmware design. This trend of designing complex system firmware has continued, without realizing the current end-user demands, offerings from modern hardware, and more competent operating systems that have overcome such legacy limitations. Boot firmware does not necessarily need to do a lot work, as it used to do in the past. Rather, future system firmware should have boundaries with end users and industry needs.

System firmware goals will evolve in the future, and if we could foresee those goals and align the designs of all available boot firmware (BIOS), then this would improve the ecosystem of future firmware. There are several key areas inside the system firmware premises where the industry is looking to improve firmware design. The fundamental principles that future system firmware will focus on are as follows:

Performance: In the modern computing era, billionsof devices are connected to the Internet and doing trillions of data transfers per second. Turning a device off to perform a system update is too much to ask for. For example, if a Facebook or Google server has a scheduled reboot and the system restoration time is way higher then expected, it would significantly impact users and business. Hence, the minimum ask from future firmware would be an instant boot.

Simplicity: For any solution to get accepted by the wider community, it has to be simple enough that it doesn’t expect too much from its users. Since the origin of the boot firmware, it has been maintained by a closed set of communities. Hence, it’s easy to implement system firmware with a high-level programming language and build a specification around it. But this results in resistance in the open community to accept such complex solutions and adapt to them. Also, the community wants to explore the best methodology to get rid of such complex solutions; hence, the idea is to use a basic programming language and software engineering methods, rather than being attached to something that needs specific skill sets.

Security: Over the years system firmware has been the method to provide access from the operating system (OS) to the hardware layer due to its operational privilege level. ODM/OEMs are using legacy techniques like System Management Mode (SMM) and Option ROM (OpRom) to perform platform device initialization, which remains unnoticed by the high-level monitoring layer like OS-driven security policies. This might expose security risks. The expectation from future hardware and SoC would be to define security into its core so that firmware can avoid having such legacy implementations, which not only increases the firmware footprint but also might increase the security risk.

Open source: Any future firmware development and maintenance is expected to be in an open and inclusive environment, rather than limited to a few companies to make the decision on behalf of everyone. Having visibility into open source is kind of a blessing in many ways. It helps to resolve the trust issue that is normally there with any proprietary system firmware. Open source generally helps users adapt to a specification and provides a continuous feedback path for improvements. Reducing development cost is also another key benefit of open source for device manufacturers.

Exploring hardware: In the past, firmware has been designed to ease the communication with the underlying hardware. Hence, when it comes to designing an efficient platform with reduced firmware space and an instant boot experience, device manufacturers usually pick the easiest solution by introducing a pre-initialized hardware controller. This results in a higher bill of material (BoM) cost and puts an extra burden on end users. But the real exploration would be on utilizing the existing CPU capabilities and offerings in hardware or refactoring the hardware capabilities to design a better platform with combined hardware-firmware innovations, without increasing the platform BoM cost.

This chapter focuses on such key forward-looking firmware initiatives that have been built around the principles discussed earlier.
  • Designing LITE firmware: The real need of firmware is to perform essential hardware initialization to boot the platform to the operating system. But the firmware boundary has grown so much in the last decade that sometimes it’s referred to as “beyond BIOS.” This chapter will cover how to design a LITE boot firmware to shrink the firmware boundary.

  • Designing a feature kernel: The book System Firmware: An Essential Guide to Open Source and Embedded Solutions provides details about the payload and its usage model. The only purpose that a typical payload serves is to pick the correct boot services like console input, console output, and block device or network device to boot to the OS (and additionally perform some crypto-related operations to verify the kernel partitions prior to loading). In addition, there could be some ODM/OEM-specific customization to allow users to configure the BIOS. The feature kernel concept is to utilize the kernel as part of the payload to reduce the firmware boundary and use early OS-like environments to further boot to the OS.

  • Design multithread boot firmware: In the past, boot firmware has been designed to work over a single-threaded environment. Modern and future processor designs are more capable of supporting more logical processors. But because of the legacy design of system firmware, it never works in a multithreaded environment to provide better opportunities for future boot firmware and reduce platform boot time.

  • Innovative hardware design: Today system firmware looks more complicated because of the underlying SoC and/or hardware design. Ideally, the system firmware should just be responsible for performing basic CPU and chipset initialization, and the rest hands over the control to the operating system. But because of several factors such as not having enough memory to access hardware resources early in the boot process and the need to set up temporary memory to continue the hardware access, such cyclic dependencies in hardware design limits the innovation in firmware and tries to make system firmware act as legacy firmware. This chapter will discuss the possible innovation in hardware or SoC architecture to make the future firmware get rid of this legacy.

Designing LITE Firmware

Basic Input/Output System (BIOS) was originally meant to perform basic hardware (CPU and chipset) initialization during the boot process and when booting to the operating system. Over time, to support complex SoC and platform designs, the BIOS has also become reasonably complex and mammoth in nature. Today, the de facto successor to BIOS is Unified Extensible Firmware Interface (UEFI), which is also known as “beyond BIOS.” Having said that, there are some ecosystem concerns due to its closed source nature.

Because of the increasing concerns of security, complexity, and closed firmware, the industry is heading toward platform development under the open source umbrella, by an initiative driven by the Open Compute Project (OCP, https://www.opencompute.org/). This effort has led market leaders to also adopt open source firmware development approaches.

This provides an opportunity for the OEM/ODM to pick a suitable BIOS for their platform from a wide range. As discussed earlier in this book, there are currently three main successors to the BIOS: coreboot, Slimboot, and UEFI.

Typically closed source BIOSs for client and server platforms have the following shortcomings:
  • The firmware has become an operating system.

  • The system firmware is archaic, complex, and often quite buggy.

  • Closed source firmware is hard to maintain and can’t forward/backport features and fixes.

  • Vendor-specific tools are used in the case of closed source firmware.

  • Closed source firmware has a large number of features and complexity required to support shrinkwrap operating systems and the vagaries of ‘compatibility’ therein.

  • Closed source firmware has more challenges in robustness, ability to debug, and flexibility in both build and deployment.

OEM/ODMs are looking forward to overcoming these barriers by adopting open source firmware approaches. Open source firmware provides the opportunity to achieve feature parity, support for many generations of equipment, and curating both unified and adaptable toolkits.

coreboot is the most popular and is an extended firmware platform, built on the principles of open source software; it provides key advantages from having various CPU architecture support available by default to all OxMs.

As a firmware developer, what matters most is how to initialize various hardware intellectural property (IP) blocks in order to boot a SoC and hand it over to the operating system (OS). This process involves writing various components/IP initialization code for different SoC. But today any system firmware, be it open source or closed source, has unnecessary or redundant and complex blocks to perform hardware initialization. The majority of those complex and redundant hardware initialization blocks were introduced when operating systems were not as advanced as today and there was not much hardware knowledge to perform platform initialization. Here are some examples:
  • PCI enumeration and resource allocation: All boot firmware does PCI enumeration and resource allocation before booting to the operating system. In reality, PCI enumeration was required in the BIOS space only when the operating system was not capable of doing it. It was late 1999 when Linux had limitations to performing PCI device initialization, and the BIOS was responsible for the PCI configuration.

  • Multiprocessor initialization: Boot firmware does CPU core initialization, brings APs out of reset, and runs some basic tasks such as range register programming with DRAM-based resources. But because of the BIOS topology of running in a single-threaded environment, it never uses those APs to run tasks in parallel. In the past, operating systems were unable to perform SMP initialization. Hence, it was kind of a requirement to perform those initializations in the boot firmware space. But today Linux-like kernels are capable of doing SMP setup at an early stage of OS booting.

  • Provide runtime services: System firmware has provided runtime hooks to operating systems using SMI handlers. Recently, many security researchers have strongly discarded this practice of using runtime services via SMI.

  • Storage block initialization: At the end of boot firmware initialization, it’s expected that firmware should initialize the block devices like UFS/emmc, NVMe/SATA or USB to boot to the kernel. This means that system firmware should have the required storage drivers in it to perform those initializations. Having those advanced drivers inside the firmware space makes it more complex and increases the maintenance costs as well. For example, to boot from NVMe, one has to add NVMe driver support in the firmware space. But in general that support has already been added into Linux-like kernels by default.

With the previous examples, it’s clear that there are many redundant blocks that exist inside system firmware today. Different individuals writing the same initialization code for different firmware blocks will increase the enabling time, as well as the validation time and the review time. All SoC vendors have to support all possible BIOS solutions; hence, we’re not gaining anything by doing repeated work in system firmware. Rather, it is increasing the liability of validating it across platforms.

Design Principle

Let’s focus on implementing solutions with respect to the open source firmware space, i.e., coreboot. (Similar designs can be made for UEFI and Slimboot as well in the future.) coreboot is an extended firmware platform that delivers a lightning-fast and secure boot experience on modern computers and embedded systems. Figure 6-1 provides a simplistic view of the coreboot flow and other population boot firmware so you can understand the size impact of each BIOS phase and where this philosophy of LITE firmware can fit in.

Figure 6-1 and 6-2 provide the typical boot flow of the boot firmware for any SoC platform and the estimated size for each stage. Without having a detailed understanding of its boot flow and where the majority of boot time and boot firmware footprint lies, it would be difficult to design the LITE firmware solutions.

Note

The boot flow between coreboot and the Slim bootloader are similar, where the bootblock can be referred as Stage 1A, romstage as Stage 1B, and ramstage as Stage 2; hence, the estimated size for those stages are almost identical.

A flow chart has three parts, Coreboot, payload, and operating system. 1, coreboot starts from boot block, romstage, postcar, ramstage. 2, Payload continues with payload. 3, the operating system ends with a boot to OS.

Figure 6-1

Typical coreboot flow with size of each boot stage

A flow chart has two panels U E F I, and the operating system. U E F I starts with S E C, P E I, D E X, B D S and ends with the operating system, boot to OS.

Figure 6-2

Typical UEFI flow with size of each boot stage

The takeaway from these figures is that ramstage as an individual stage is consuming around 80 percent of the total coreboot volume, and DXE is similar in the case of UEFI.

Today both firmware and the OS have an equal share of complexity in their domains. As we are interested in open source firmware development, let’s discuss coreboot and the possibilities for designing LITE firmware.
  • Ramstage has grown over time from a simple PCI configurator to a complex firmware programming block that does something beyond its basic needs and thus creates redundancy.

  • Operating systems have also grown in capacity over the years. Hence, the things that were not possible decades ago in the OS and relied on the system firmware are now very much possible to perform using the kernel layer itself.

  • The OS for sure can handle more than what ramstage does.

  • Hence, we are now at an “intersection point” of the ramstage and operating systems.

  • So, the real question is, do we really need a ramstage-like programming block?

The answer is simple: no, we can adopt a “LITE” model.

The system firmware needs to explore the possibility of being LITE where the proposal is to have a minimum functional firmware block with an original boot firmware methodology to boot to the operating OS. The basic agenda of a LITE system firmware would be removing the redundant firmware initialization and making as much use of the operating system as possible.

Over the years ramstage in coreboot and DXE in UEFI have grown beyond their boundaries, and system firmware space has been a bit of a dumping ground to put more OS-like services and applications.

Many features provided by ramstage are not required to be explicitly added into ramstage and don’t have a real product need. For example:
  • SMM: The limited usage model should depend on CPU vendors and product design requirements.

  • Support S3/Sleep: Modern computing systems have support for connected standby, lucid sleep, or runtime suspend where such underlying legacy Sleep/S3 support can be moved from being mandatory to optional based on the platform design.

  • Runtime services: This depends on the targeted operating system; hence, there is no point in publishing more runtime services than the OS is actually able to consume.

Using the LITE model, the system firmware can perform the limited initialization of chipsets components that are getting used “only” in firmware space to reduce the firmware boundary.

Mandatory PCI device enumeration and resource allocation: Figure 6-3 illustrates the current PCI device enumeration and resource allocation flow, where ramstage for example in the case of coreboot is responsible for picking all PCI devices from the available PCI tree and performing the predevice initialization and early chipset initialization, specifically to compute and assign the bus resources, enable devices on the bus, and finally initialize the devices on the bus. This iterative process of the PCI tree parsing and resource assignment and finally device initialization can take a significant amount of time in the boot process. This scenario is also the same with UEFI and Slimboot, being responsible for doing the entire PCI tree enumeration irrespective of being used in the firmware space or not.

A flow diagram depicts the reflection process of ram stage PCI tree, ramstage, execute boot state machine, pre device initialization, device enumeration, assign device bus resource, finalize devices initialization, initialize devices enable devices, write tables, load payload, boot payload, PIRQ table, MP table, ACPI table.

Figure 6-3

Typical coreboot PCI enumeration process in ramstage

In the LITE firmware design principle, the system firmware only needs to initialize and enable PCI devices that are getting used in the firmware space and perform the minimum operations prior to transferring the control to the payload or OS.

To adhere to the LITE firmware development strategy on coreboot, a new tag of “mandatory” was added into the PCI tree generation process to skip all the unnecessary device initialization in the firmware space and save a significant amount of system firmware boot time.

During static parsing of the PCI tree structure in coreboot in the LITE firmware model, all PCI device initialization will be skipped unless the “mandatory” keyword is tagged with that PCI device.

The following is the pseudocode from a coreboot reference where a minimum PCI enumeration can be achieved by adding the additional “mandatory” keyword checks to save on boot time. CONFIG_LITE_FIRMWARE is the token being used in the coreboot open source firmware to identify the platform using the LITE system firmware development model.
/*
 * Probe all devices/functions on this bus with some optimization for
 * non-existence and single function devices.
 */
for (devfn = min_devfn; devfn <= max_devfn; devfn++) {
        if (CONFIG(LITE_FIRMWARE)) {
             dev = pcidev_path_behind(bus, devfn);
             if (!dev || !dev->mandatory)
                    continue;
        }
        /* First thing, set up the device structure. */
        dev = pci_scan_get_dev(bus, devfn);
         ....
         ....
}

The following example from the X86-based QEMU emulation shows the reduction in PCI enumeration effort significantly in coreboot by performing only the mandatory PCI device initialization (host bridge and LPC in the following example), which is the minimum system firmware requirement to boot to the payload and further to the OS. The mandatory PCI device list might differ between platform designs, hence making it flexible for platform owners to add the minimum device initialization list as required.

A set of two divisions explains the PCI tree view in the existing model and the PCI tree view in the lite firmware development model.

Figure 6-4 explains the proposed LITE firmware-based PCI enumeration to avoid complexity and further reduce the firmware footprint and improve system boot time.

In the following example, the system firmware will perform the initialization and resource allocation for two devices alone (the devices are tagged with the “mandatory” keyword), compared to all possible devices in the existing model.

A flow diagram depicts the flows starting from the P C I tree, mandatory, ram stage, execute boot state machine, boot state device enumeration, write tables, load payload, boot payload, and A C P I table.

Figure 6-4

Adopted LITE firmware model in coreboot for PCI enumeration

Minimum CPU initialization: In a multiprocessor environment, the system firmware brings up only a single bootstrap processor after the power-on reset. Later in the boot process, the system firmware also needs to bring up all the applicable logical processors to perform simultaneous operations in parallel. Because of the limited knowledge in previous generation operating systems for bringing up all of the logical processors, it needs to rely on the system firmware. Typically, ramstage (in coreboot), like an advanced stage, is responsible for performing this multiprocessor (MP) initialization operation. It’s also a fact to consider that each operation to bring up the other logical processor into action has its own latency as per CPU vendor design guide. Hence, the possible solutions to ensure minimum CPU initialization are as follows:
  • Based on the real needs, design the system firmware to perform operations in a multithreaded environment.

  • Early initialization of the processors during boot.

  • Deferring initialization of all of the processors in the kernel.

Reduced ACPI table creation: ACPI stands for Advanced Configuration and Power Interface. The purpose of ACPI is to describe the underlying hardware and its interface to the operating system to let you understand that the hardware is present and how to configure it. It controls hardware actions such as the power button behavior, system sleep states, etc.

In the existing model, the creation of ACPI tables have been tied to the PCI enumeration using the dynamic ASL generation method; hence, the BIOS may perform any of the two possible operations as follows:

Option 1: Attach the required ACPI dynamic generation process to the PCI tree device marked as mandatory as follows, where ChromeOS needs an ACPI device for the embedded controller and hence attaches the device to the LPC interface:
device domain 0 mandatory
     device pci 1f.0 mandatory
          chip ec/google/chromeec
               device pnp 0c09.0 mandatory  end
          end
     end # LPC Interface
end

Option 2: Let the system firmware completely get rid of the ACPI creation process and try to utilize the kernel driver rather than relying on the underlying runtime firmware services. There is a kernel command-line parameter named acpi.

acpi: Many hardware platforms ship with buggy or out-of-specification ACPI firmware, which may cause unspecified problems. If the platform is randomly powering off or failing to boot due to potential ACPI-related issues, disabling ACPI is recommended in such scenarios. To potentially get rid of the additional complexity of pulling the required ACPI infrastructure into the prior boot stage, one could also explore this acpi=off kernel command to skip ACPI creation in the system firmware. The downside of this approach is that the system loses its capabilities to communicate with the system firmware, and the user space application or driver needs to create direct access to the underlying hardware to retrieve some key information like battery status, power-off, shutdown, etc.

Figure 6-5 illustrates the modified coreboot boot flow using the LITE firmware development model.

A flow chart has three parts, Coreboot, payload, and operating system. 1, coreboot starts from boot block, romstage, and postcar. 2, payload continues with payload. 3, the operating system ends with a boot to OS.

Figure 6-5

coreboot boot flow with adapted LITE firmware model

Figure 6-5 shows how this LITE model benefits the open source system firmware development approach. This approach to LITE system firmware development on coreboot could be applied to UEFI firmware development as well. In the case of UEFI, it would reduce the DXE stage by keeping only the required DXE modules to boot to the BDS stage.

Conclusion

  • A reduction of ramstage (in coreboot) eventually reduces the code by 50 percent.

  • Improved boot performance is able to reduce the boot time by an additional ~500ms.

  • This effort might help the OEM/ODM to reduce the SPI Flash size and eventually optimize the platform bill of material cost.

  • Given that operating systems are more sophisticated and feature-rich, this approach of moving more of the traditional firmware flows into the OS kernel will provide a more balanced approach long term.

  • From a platform engineering team perspective, there will be very minimal firmware support required if this proposal is implemented successfully. There is a good amount of resource savings.

  • People do not have to learn a specific boot state machine/PCI enumeration process/payload or complex protocol/services-oriented firmware development tricks. Instead, they can focus on the early kernel boot process.

Designing a Feature Kernel

The payload is an additional firmware entity used in system firmware to boot to the operating system. Boot firmware can be used with various payloads to provide a complete system firmware solution where ideally a payload’s job is to find the required boot services like console input, console output, and block device or network device to boot to the OS. In addition, there could be some ODM/OEM-specific customization like a pre-boot environment to launch an application to certify the bootloader or diagnose the underlying hardware.

But the problem is that all payloads are different in nature from each other, and they have their own expectations from the boot firmware. Hence, there is no unification possible to boot to the operating system, although the targeted OS might be the same in all these cases.

Having a different payload for the same boot firmware creates various problems while developing system firmware:
  • The underlying boot firmware needs to provide various interfaces as expected by different payloads, resulting in interfaces that are unused without any consumer of any service on the payloader side, resulting in higher development and validation time.

  • The storage device is required to boot the OS from firmware therefore, the payload is likely to have such hardware support. For example: next generation block device support like UFS and NVME been added recently. Often this support is backported from the equivalent upstream kernel driver into the more limited firmware environment.

  • There is a need to custom hardware initialization flows in the payload prior to the OS due to the lack of full OS system services and features. For example, the payload requires a boot beep for error reporting in case of faulty hardware. To implement the requirement either a dedicated hardware circuit or an audio driver in payload is required to generate an audio tone.

  • The maintenance of the payload infrastructure is also difficult due to limited open source community support.

  • The typical payload size is from 1.5MB (compressed) to about 6MB, which is eventually sitting in SPINOR and will result in additional BoM cost.

Figure 6-6 provides the overview of modern system firmware with different payloads, booting to the OS.

A flow diagram has two panels. It starts from the first panel titled S P I mapped layout system firmware with boot block, romstage, postcar, ramstage, and payload. Then the second panel, titled block device mapper layout OS, ends the flow with OS boot partitions.

Figure 6-6

coreboot boot flow in a current scenario

Look at the “Boot Partitions” block 1, highlighted in Figure 6-6: it is the initial kernel block, typically sized about 4MB. During the platform boot, the payload tries to locate the initial kernel block from the boot media (NVME, eMMC, SATA, UFS, etc.) and then verifies it prior to loading it into the main memory. The initial kernel block is then run over memory to perform the root file system mount, followed by the boot kernel picking up the runtime kernel block to complete the boot sequence.

Design Principle

The idea of a “feature kernel” is to avoid having a dedicated payload attached to the BIOS to boot to the OS and possibly simplify the system firmware boot flow by using the LITE firmware model plus the feature kernel to further reduce the firmware boundary and improve the system booting time.

Figure 6-7 illustrates the proposed “feature kernel” boot flow with an open source firmware model, where the initial kernel block (about 4MB) would be part of SPI Flash, and the bootloader would load the boot kernel from SPINOR and run over the memory to perform the root filesystem mount, followed by the boot kernel to pick the runtime kernel from the bootable media. In that process, the payload dependencies can be removed as well.

A flow diagram has two panels. It starts from the first panel titled S P I mapped layout system firmware with boot block, romstage, postcar, and book kernel or initial kernel block. Then the second panel, titled block device mapper layout OS, ends the flow with OS boot partitions.

Figure 6-7

coreboot boot flow with adapted LITE firmware and feature kernel

Conclusion

The key benefits of this idea are as follows:
  • In the past, the system firmware booting from SPINOR was considered as the Trusted Computing Boundary (TCB) and the OS partitions stored into block devices were always outside of this computing boundary. But this approach would bring the kernel within TCB, which means
    • The firmware update implementation using the power of the boot kernel could be more efficient than ever in this model.

    • It is more scalable for future usages (i.e., support for advance boot devices/specs is native).

    • There is no need to create a special hardware interface or driver support in the payload/bootloader to bypass the audio codec to implement special solutions like boot beep, etc.

    • There is a reduction in system firmware complexity to create dedicated interfaces for various payloads.

    • We can avoid the additional porting effort of any new controller and interface in the payload from the kernel as and when required.

    • There is a possible reduction of SPINOR size and dedicated development effort to create support for a newer SoC in the payload.

Design Multithreaded System Firmware

In the modern-day world, where usage of IoT devices (like automotive and navigation devices and home appliance devices) and client devices are rapidly increasing, performance is key. Users are expecting to see the device operational as soon as they press the power button or hold the device.

The increase in the complexity of compute, software updates, and the I/O subsystem has created new challenges to meet customer expectations, such as a better user experience with a faster boot to the OS, providing an instant-on experience.

As part of the enhanced user experience (UX), many applications using advanced computer systems now demand an instant system bootup time. A faster system response time is a key performance indicator (KPI) used by OEMs/ODMs for their product requirements for almost all computing sectors today, such as personal devices like modern smartphone/tablet/laptop, healthcare equipment (ultrasound, defibrillators, and patient monitor devices), industrial devices (robots change arms) and MAG systems (firing a missile, fail-safe redundancy on airplanes, or similar single function devices), and office/home automation devices.

Figure 6-8 shows the typical client platform (x86 architecture based) boot path where the entire boot process is in sequential order. The average system boot time is expected to be less than 500ms from the G3 system state (no power applied) until the operating system (OS) hand-off, which includes the pre-power (All rails and clock stabilization), prereset (power sequencing), and post CPU reset (boot firmware and payload) boot path components. But in reality the system boot time is way beyond 500ms today (average ~2sec).

The flow diagram represents the client platform reset flow pre reset phase, pre memory phase, memory initialization phase, post-memory phase, payload, and operating system. Below that, Core boot, U E F I, and slimbootloader has respective blocks.

Figure 6-8

Typical x86 based client platform reset flow with all BIOS

It is important to note that the most time-consuming phase of the total boot path is the execution of the system firmware as mentioned in Figure 6-8, hence making it a critical phase to optimize to provide a fast boot experience.

Another point to consider is that an increased number of I/O subsystems attached to the motherboard, and every subsystem having its own device firmware, poses increasing challenges for product manufacturers to ensure periodic firmware updates with an instant system power-on experience.

Figure 6-9 shows a typical OEM platform design that more than 15 independent IP/device FW updates take place when the OS initiates the FW update. As the FW update takes place during the boot path where the entire boot process is in a sequential order, it’s impossible to meet the expectation that the system firmware would be able to complete all device (SoC and Platform) firmware updates (measuring FW components, verifying FW components, loading FW into device, reading firmware version back to ensure successful FW update) within the regular time window, which is expected to be less than 500ms to 1sec from the G3 system state (no power applied) to the operating system (OS), hands off.

An illustration depicts the design of the OEM client platform. It consists of more than 15 independent IP device FW updates will undergo when the OS initiates the FW update.

Figure 6-9

Typical OEM client platform design with possible firmware update requirement

As the entire BIOS boot takes place in a single-threaded environment, running only over the Boot Strap Processor (BSP) (even after multicores are available typically at ~650ms since CPU reset), it results in an indolent wait time on the BIOS side. Eventually this results in discrete and serial platform initialization where each independent IP/device initialization/update is waiting for its execution time or turn. This entire process makes the platform slower while operating firmware updates and creates a bad user experience. Subsequently, this makes users afraid of accepting firmware updates, meaning the users will push out firmware updates without knowing the criticality of the update that might potentially fix some platform security issues.

Another use case is Windows 10X where the OS is moving to an AB servicing model and the operating system will update itself while running (i.e., while running OS version A, it will provision a new version B), versus today’s scenario with the blue screen and percent countdown. Microsoft is now constraining the preboot time for doing BIOS/capsule updates, so having a speedier reboot is necessary to meet these emergent UX requirements.

All device manufacturers are looking toward an instant platform boot, without bothering whether the platform has mundane or bulky devices attached to it, whether the boot process is going through a firmware update initiated by the OS, etc. But the legacy boot process always runs in a single core, irrespective of modern-day CPUs being multicore processors in nature, where two or more cores are capable of running in parallel to execute tasks. This in turn forces the firmware code to run sequentially, leading to a slower boot time and ineffective usage of processor power and system resources to initiate SoC components, and/or platform devices update/initialization process, and thereby resulting in a higher platform boot time and ultimately a bad user experience.

Design Principles

The proposal is to enhance the boot process by adding concurrency to it by isolating boot functions and platform configurations, which will be executed with the boot firmware as the context master. Additionally, the method proposes configuring the platform components with additional cores for running concurrent processes. Finally, this method will also be applied in the process of firmware updates during the boot phase.

This section will explain the necessary changes in system firmware flow to create a multithreaded system firmware boot solution that overcomes the limitations mentioned in the previous sections.

Multithreading is the ability of a CPU to execute multiple processes concurrently. In the multicore platform, it is the responsibility of the BSP to perform multiprocessor initialization to bring application processors (APs) out from reset. In a multiprocessor (MP) environment, each processor can perform independent execution of assigned tasks.

The design principle is to provide an option to ensure a multithreaded environment where the BIOS can perform its tasks concurrently. To design a multicore environment, there might be some potential hardware or CPU/SoC architecture changes also required; the details can be found in the section “Innovation in Hardware Design.”

The assumption is that the platform has made some hardware changes to support this system firmware design change proposal.
  • This method decouples unidirectional communication flow in the boot firmware to allow independent BIOS tasks to perform over parallel threads.

  • This method provides options for boot firmware to execute its tasks in a parallel thread-safe mechanism (without worrying about core synchronization between multiple firmware back-and-forth calls).

  • This method provides flexibility to perform multiprocessor initialization early in the boot flow to maximize CPU resource utilization by the BIOS.

  • The hardware design change proposal provides significantly larger temporary memory at the prememory phase in terms of the SRAM or LLC cache to execute independent tasks over dedicated cores in parallel in absence of physical memory or prior to DRAM initialization.

  • This method implements a high-level synchronization construct as a “monitor” type inside system firmware to ensure tasks are getting performed in multiple cores and remain in sync to avoid any duplicate access. The “monitor” construct ensures that only one processor at a time can be given access to a task.

  • Using the MONITOR/MWAIT instruction inside system firmware reduces latency between the core operations and wake time from idle.

  • Use a semaphore to access potential shared resources inside the bootloader in normal mode.

Figure 6-10 illustrates the modified firmware boot flow of a system to leverage the new design proposal.

In the existing design, no tasks are getting executed over cores other than the BSP, although Aps are available and active later in the boot flow. In this proposed model, the BIOS is designed to work in a multithreaded environment, where all possible cores are available and active right at the reset break or within a very short time after reset.

A block diagram flows from pres reset phase, pre memory phase, memory initialization, post memory phase, payload, and operating system for the existing design and proposed model.

Figure 6-10

Existing versus proposed system firmware flow with multicore environment

To run multiple operations concurrently using the available hardware and CPU capacity, first the system firmware needs to split all the possible tasks required to boot to the OS into multiple subtasks and assign them over multiple cores to run in parallel. Hence, it needs a semaphore for providing a convenient and effective mechanism for core synchronization.

Here are the design principles to build this multithreaded system firmware logic:
  • Implement monitor/mwait logic inside the system firmware to avoid idle time while resting the core, upon completion of a task and prior to assigning a new task.

  • A single task is attempted by a single core at a time, and a task is assigned to the core if both the task and the core are available and free, respectively.

  • To implement this solution, some data variables are needed, as shown here:
    1. a.

      Shared region: In the multicore environment, to avoid synchronization issues, a shared data region is needed where the interprocessor communication (IPC) variables will be stored. Task state variables will also be located in such shared data regions for allowing core synchronization.

       
    2. b.
      Task state variable: A “task” and “state” data structure is created with “n” number of planned tasks in it.
      1. i.

        Each task has its state tag to notify it if a task is waiting for the actual core to get assigned, if the task is in progress, or if the task is completed.

         
       
    3. c.

      Initialization code: Prior to entering into the critical region where each core will perform the independent hardware controller initialization task, the initialization code assigns all those tasks to its default state.

       
    4. d.
      Scheduler: Create a scheduler inside the system firmware to assign the waiting tasks to the available cores, where “mwait” initially treated as “nop” and “monitor” would assign the next task instruction and immediately change the task state to “in progress.” Once a core is done with its assigned task, it will mark the task state as “done” and wait an “mwait” for the next available task assignment.
      1. i.

        Respective cores would execute those assigned tasks and update the shared data variable. This step continues until all tasks are migrated to “done” states.

         
       
    5. e.

      Task assignment: The idea here is to perform only the independent boot tasks to perform over a multithreaded environment, whereas the BSP would still continue to perform the typical hardware controller interdependent tasks.

       

A schematic diagram design process represents the view of the monitor. It starts with the Initialization code and consists of shared data and operations. In the task state, it will be waiting for the state.

Figure 6-11

Schematic view of a monitor

  • Prior to booting to the payload or operating system, all the available cores would need to reach a synchronization point, where the BSP would monitor the shared data region to check the state of the assigned tasks and the current condition of the cores. Ideally, all the tasks should get tagged with “done,” and all the available cores should park into the “mwait” state and remain in active state.

The book System Firmware: An Essential Guide to Open Source and Embedded Solutions demonstrates a case study done on an x86 platform using this multithreaded system firmware design principle to perform the dynamic optimization of the system boot time without any additional hardware modification.

Conclusion

The following are the key benefits of this idea:
  • This innovation helps to nullify legacy system firmware assumptions of a serialized boot or even static multithread boot to optimize the boot time. Rather, this method proposes an opportunistic platform boot by predicting when to initiate a multithreaded boot to optimize the boot time.

  • This proposed solution might be useful for running the entire system firmware update shown in Figures 6-9 and 6-10 using all the available core’s capacity. This will significantly improve the firmware update latency problems and ensure that the system never goes out of service.

  • A proof-of-concept trial of informed multicore boot has been produced. It demonstrates the boot performance savings compared to the normal firmware boot method. (refer to the book System Firmware: An Essential Guide to Open Source and Embedded Solutions for more details)

Innovation in Hardware Design

Today system firmware is more complicated because of the underlying SoC and/or hardware design. Ideally the system firmware should just be responsible for performing the basic CPU and chipset initialization and then handing over control to the operating system. But because of several factors, like not having enough memory to access hardware resources early in the boot process and the need to set up temporary memory to continue the hardware access, such cyclic dependency in hardware design limits the innovation in firmware and tries to make the system firmware act like legacy firmware.

This section discusses such possible hardware limitations in the existing CPU and platform design and identifies the possible solution to design a simplified version of the system firmware to either reduce the boot boundary or optimize the platform BoM cost with a more complex hardware design.

Refer to Figure 6-12 to understand that the existing system firmware boot flow on any architecture has a significant dependency on the physical memory available during the early boot phase.

Take for example the x86 platform, where the legacy CPU design or the modern SoC design doesn’t have a dedicated pre-initialization memory controller such as static RAM (SRAM), which is available in other CPU architectures like ARM. But as per the system firmware design, it needs memory after reset to perform chipset initialization using advanced firmware programming logic, which needs a basic programming infrastructure like a stack, heap, and functions to make the firmware development more modular. Rather, on the x86 platform, to mitigate the problem of not having ample memory at reset, the SoC architecture proposes using a shared cache between the various underlying hardware blocks inside CPU/SoC, as per Figure 6-12.

An architecture has three levels connected to two cores and G F X. Below that flows start from other devices, P C L e, system agent, M C, and D I M M. On the right, the Block diagram flows from dram-based access, L 3, 2, 1 cache, and C P U.

Figure 6-12

Cache architecture on x86 architecture

Because LLC is bigger in size than other available caches, it should be used in the absence of real physical memory or SRAM on x86 platforms. The process of cache being used as temporary memory is known as cache as RAM (CAR). This has its own complexity with several model-specific registers (MSR) that need to be programmed. Also, this programming recommendation might evolve generation after generation due to improved the cache architecture. Because CAR or temporary memory is quite limited, the entire chipset initialization can’t really rely on this memory; rather, this limited memory is being served as the minimum memory required to set up the stack and initial programming requirement, until the time physical memory is initialized. Typically in the system firmware boot process on the x86 platform, early stages like bootblock and romstage in coreboot, SEC and PEI phases in UEFI, Stage 1A and Stage 1B in Slim Bootloader, are just for the preparatory work being done to mitigate the design limitation. Eventually this limitation also results in delaying the initialization of the security controller that is sitting deep into the SoC/CPU hardware layer and unable to communicate with the host CPU in the absence of a good amount of physical memory.

Design Principles

The proposal is to create a platform design by combining the hardware and firmware-centric innovations. A simple system firmware design has a bottleneck on the platform hardware design. This section will highlight a futuristic system firmware design where the firmware-level complexities are being nullified by the hardware design to make a system firmware that is more generic, simpler, and robust.

Hardware Design Principles

Ideally, having a simplified SoC design will also reflect the simple system firmware design, without added complexity and so much preparatory work needed to perform basic chipset initialization even using the LITE firmware design principle.

This section will provide several hardware modification proposals for a simplified boot process.

Scenario 1:

Figure 6-13 provides a SoC design with an on-chip SRAM controller and reasonable numbers of SRAM attached for initializing CPU and I/O components and allocating resources without really depending on the DRAM initialization sequence.

A block diagram connects D I M M, memory subsystem, S R A M, Input, and output subsystem, system bus, cores, and L L C.

Figure 6-13

Proposed SoC design with on-chip SRAM

These kinds of SoC designs are costly compared to a DRAM-based hardware design, and at the same time the available SRAM memory is supposed to be limited. On a typical x86-based client and IoT platform, to complete the entire static device initialization (without off-board graphics or a network card), the system firmware requires about 32MB of memory, which is possible to accommodate using such hardware design. SRAM has a lower access time, so it’s faster compared to DRAM; hence, it’s efficient to meet the low-level access latency requirement on the boot firmware.

Scenario 2

Figure 6-14 provides an alternative proposal, where there is no need to increase the BoM cost by introducing SRAM like costly and dedicated hardware components into the SoC design, instead utilizing the existing SoC design and providing an additional interface to access the DRAM controller by an auxiliary processor sitting in the SoC.

A block diagram flows from I F W I layout, Auxillary processor, and memory subsystem to core and D I M M.

Figure 6-14

Proposed SoC design with auxiliary processor initialize DRAM at prereset

The auxiliary processor at the prereset stage would utilize its boot ROM to perform self-initialization and fetch the auxiliary patch firmware from the IFWI layout (present inside the SPINOR or block device) to initialize the DRAM controller and train the memory prior to the x86 cores hitting reset. After the CPU reset, the system firmware running on the host CPU won’t necessarily perform that temporary memory setup; rather, it’s able to perform flat access to physical memory.

Firmware Design Principle

The proposed hardware design changes described earlier in scenarios 1 and 2 would help to simplify the job of system firmware and also help to nullify the legacy requirement where the system firmware has to perform a few additional steps just to mitigate the problems described.

The following sections will provide the modified system firmware design to accommodate either of the hardware change proposals.

Figure 6-15 provides a high-level boot flow of the system firmware where either SRAM is available or an auxiliary processor like PSP in the AMD SoC reset architecture can be used to perform memory controller initialization.
  1. 1.

    Upon powering on the system, the auxiliary processor present inside the SoC will start execution immediately from its ROM. This is followed by fetching the updatable patch from the IFWI layout. The reason for having such an updatable patch into IFWI is so that it’s easy to provide bug fixes if required and send new patches over firmware updates during system use in the field.

     
  2. 2.

    The auxiliary processor patch firmware has the required foundation code to initialize the DRAM memory controller and train the memory device.

     
  3. 3.

    After DRAM controller initialization, memory is available at pre-CPU reset. The auxiliary processor will pass the available memory base and limit to the bootloader. Upon CPU release, the bootloader will use that memory range to create the system memory layout.

     
  4. 4.
    At CPU reset, it breaks those legacy assumptions about x86 boot flow where the setting of temporary memory is no longer required. Hence, several boot phases can be removed with these assumptions:
    1. a.

      On the coreboot side: There is no need to have a dedicated bootblock, romstage, and postcar because all these stages are just meant to do the preparatory work prior to or during DRAM initialization.

       
     

A block diagram starts from I F W I layout f auxiliary processor patch firmware to auxiliary processor, memory subsystem and D I M M to I F W I layout O B B, C P U, ramstage, payload, and finally to the boot to OS.

Figure 6-15

Reduced system firmware boot flow with pre-initialized DRAM at reset

  1. b.

    On the UEFI side: The SEC and PEI phases can be eliminated as memory can be default initialized. All the necessary pre-work can be done in the DXE phase directly.

     
With memory available at reset, the reset vector can now be patched at DRAM mapped memory, rather than SPI mapped memory. The auxiliary processor loads the OBB image from SPINOR into DRAM prior to hitting CPU reset. The system firmware will start executing the code from the ramstage in coreboot (as per Figure 6-15) or DXE (as per Figure 6-16).

A block diagram consists of four panels. 1, diver execution environment. 2, boot devices selection. 3, transient system. 4, run time.

Figure 6-16

Reduced UEFI boot flow with pre-initialized DRAM at reset

  1. 5.

    This process will help to reduce the firmware boundary, and the system firmware is now responsible for doing only the recommended chipset and CPU initialization. The BIOS will create its own system memory map as shown Figure 6-17 and break the barrier between different boot stages with the only goal to perform the minimum and mandatory operations to boot to payload.

     

A block diagram has D X E with reset, chipset init, handoff block, and H O B. The H O B has P H I T and three layers of H O B. Each layer connects system memory, M M I O resources, and firmware volumes.

Figure 6-17

UEFI DXE being the first stage after CPU reset

Conclusion

The key benefits in this idea are as follows:
  • Provide flexibility in system firmware design and don’t really focus on implementing all the boot phases.

  • Reduce the system firmware boundary by one-third, which eventually results in high optimization of system boot time and reduction of the SPINOR footprint.

  • Having an auxiliary processor doing DRAM initialization prior to CPU reset and copying the OBB BIOS region into DRAM would provide a better opportunity to enforce the hardware assistant security rather than building security blocks using the system firmware.

Summary

This chapter provided an overview of the futuristic aspects of system firmware. It also explained several examples of its possible usages and all the opportunities to design better system firmware considering simplicity, performance, security, and open source philosophy. After reading this chapter, you should have a general understanding of the uniqueness of system firmware and hardware design. The idea here is to make sure you understand where the industry might be heading in the future. System firmware design in the future might expect to have such requirements as a minimum for any boot firmware working on any SoC/CPU architecture. The common traits across all these application examples of future system firmware design is the need for instant platform boot with reduced functionality and effective use of system resources without any additional cost of platform hardware. This chapter may also be useful to break the assumption about any BIOS or system firmware design where application engineers or designers might consider all the boot phases to be mandatory, rather to make it clear that those stages are flexible. Firmware designers could choose to pick the correct boot flow as per their platform design and hardware needs, to do meaningful and minimum tasks in the system firmware design to make life simple during the product development life cycle and after-life support as well.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.18.220.243