DSP in Embedded Systems: A Roadmap

DSP Software development for embedded systems follows the standard hardware/software co-design model for embedded systems as shown in Figure 1.

image

Figure 1: DSP Software Development Following the Embedded HW/SW Co-design Model

The development process can be divided into six phases;

Phase 1; Product Specification

Phase 2; Algorithmic Modeling

Phase 3; HW/SW Partitioning

Phase 4; Iteration and Selection

Phase 5; Real-Time SW Design

Phase 6; HW/SW Integration

This book will cover each of these important phases of DSP Software Development.

Phase 1 – Product Specification

Phase 1 is an overview of embedded and real time systems and introduces the reader to the unique aspects of this type of software development.

There are several embedded systems challenges that we need to comprehend before we can have a discussion on digital signal processing. These include a significant degree of complexity of environments as well as interactions between systems, an increasing percentage of software within embedded components, the need for software code reuse and rapid reengineering, fast innovation and release cycles driven by shifting market demands, numerous real time demands and the need for requirements management, and the increasing focus on quality and process maturity. Chapters 1 and 2 (page 1 and 15) will provide an overview of DSP as well as embedded systems in general, and review the key differences that exist in embedded systems in general and DSP specifically.

Phase 2 – Algorithmic Modeling

Phase 2 focuses on the understanding of the fundamental algorithmic nature of signal processing. Digital signal processing is about the representation of discrete time signals using a sequence of numbers or symbols and the subsequent processing of these signals. DSP includes areas like audio and speech signal processing, sonar and radar signal processing, statistical signal processing, digital image processing, communications, system control, biomedical signal processing, and many other areas. DSP algorithms are used to process these digital signals. There are a set of fundamental algorithms used in signal processing such as fourier transforms, digital filters, convolution, and correlations. Chapter 7 (page 113) will introduce and explain some of the most important and fundamental DSP algorithms in order to set the base for many of the topics later in the book.

Phase 3 – Hardware/Software Partitioning

An important phase of any embedded development project is the partitioning of the system into hardware and software components.

Much of DSP is programmable. Programmable architectures for digital signal processing take a number of forms, each having their own tradeoffs in terms of cost, power consumption, performance and flexibility. At one end of the spectrum, digital signal processing system designers can achieve extremely high levels of efficiency and performance via the use of proprietary assembly language implementations of their application. At the other end of the spectrum, system developers can implement digital signal processing software stacks using ordinary ANSI C or C++ or other domain specific languages, executing the resulting algorithm on commercial desktop computers. Chapter 4 (page 63) details tradeoffs in implementations at varying points on a continuum, with one end being utmost digital signal processing performance and the other end of the continuum being flexibility and portability of a software implementation. Tradeoffs for each solution are detailed along the way, with the goal of guiding the digital signal processing system developer to the solution that meets their particular use case needs.

DSP is also implemented using Field Programmable Gate Arrays (FPGA) . As an example of this, in chapter 5 (page 75), we discuss the architectural challenges associated with spatial multiplexing and diversity gain schemes and introduce FPGA architectures and report experimental results for the FPGA realization of these systems. Chapter 5 (page 75) will introduce a flexible architecture and implementation of a spatial multiplexing MIMO detector, Flex-sphere, and its FPGA implementation. We also present a hardware architecture for beamforming in a WiMAX system as a way to enhance the diversity and performance in the next-generation wire- less systems.

Hardware platforms for digital signal processing systems come in varying designs, each with inherent programmability, power and performance tradeoffs. What is suitable for one system designer may not be suitable for another. Chapter 6 (page 103) details various digital signal processing platform designs as they pertain to system configurability and programmability. At one end of the spectrum, application specific integrated circuits (ASICs) are detailed as a high performance, low configurability solution. At the other end of the spectrum, general purpose embedded microprocessors with SIMD style extensions are presented as a highly configurable solution with robust software programmability support. Various design points are presented throughout such as reconfigurable field programmable gate array (FPGA) based solutions, and high performance application specific integrated processors (ASIP) with varying degrees of software programmability. Chapter 6 (page 103) will present tradeoffs for each system design, as a way of guiding the system developer in choosing the right digital signal processing hardware platform and component for their immediate and future system deployment needs.

Phase 4 – Iteration and Selection

Another key concern for DSP developers is managing the embedded lifecycle. This starts with selection of the solution space for the DSP problem you are trying to solve, and defining an embedded system that meets performance as well as cost, time to market and other important system constraints. As mentioned earlier, an embedded system is a specialized computer system that is integrated as part of a larger system. Many embedded systems are implemented using digital signal processors. The DSP will interface with the other embedded components to perform a specific function. The specific embedded application will determine the specific DSP to be used. For example, if the embedded application is one that performed video processing, the system designer may choose a DSP that is customized to perform media processing including video and audio processing. Chapter 3 (page 29) will discuss the embedded lifecycle and the various options for DSP and how to determine overall system performance and capability.

Phase 5 – Real-Time Software Design

Real-time software design follows the five step process outlines in Figure 1;

1. Identify the stimuli for processing and required responses to the stimuli

2. Identify timing constraints for each stimuli and response

3. Aggregate stimulus and response processing into concurrent processes

4. Design algorithms to process stimulus and response, meeting the given timing requirements

5. Design a scheduling solution ensuring processes are scheduled in time to meet their deadlines

We will discuss details of this process in this phase of the process.

1. Part 1; Identify the stimuli for processing and required responses to the stimuli

Firstly, we need to identify the system stimuli for signal processing as well as the response to these stimuli. This must be done regardless of whether we are using hardware or software.

We introduce a simple practical but very powerful specification technique in Case Study 2: DSP for medical devices (page 493) to give the developer a set of guidelines for this level of specification. The focus is on DSP development process focusing on specifying the behavior of embedded systems using DSP. The criticality of correct, complete, testable requirements is a fundamental tenet in software engineering. Both functional and financial success is affected by the quality of requirements. This starts with good requirements. Requirements may range from a high-level abstract statement of a service or of a system constraint to a detailed mathematical functional specification. Requirements are needed to Specify external system behavior, specify implementation constraints, serve as reference tool for maintenance, record forethought about the life cycle of the system i.e. predict changes, and to characterize responses to unexpected events. This case study introduces a rigorous behavioral specification technique that can greatly reduce risk by exposing ambiguities in requirements and making explicit otherwise tacit information. Such an external, or “black box” specification can be developed from behavioral requirements in a systematic manner through the process of sequence enumeration. This process results in an arguably complete, consistent, and traceable specification of external system behavior. Sequence abstraction provides a powerful means to manage and focus the enumeration process. A simple cell phone is used to reinforce these concepts.

There are some more advanced techniques that can be used for both hardware and software. Chapter 8 (page 133) focuses on a system level methodology of designing and synthesizing real-time digital signal processing systems. This is another challenge in the area of DSP development. Current hardware designs and implementations for DSP systems have a huge time gap between the development of algorithms for new DSP applications and their hardware implementation. The high level design and synthesis tools create application-special DSP accelerators from high abstraction-level for complex DSP processing hardware, which greatly reduces the design cycle while still maintaining area and power efficiency. This chapter presents two high level design methodologies, 1) C-to-RTL high level synthesis (HLS) for ASIC/FPGA implementation of the DSP systems, and 2) System Generator for FPGA implementation of the DSP systems. In the case studies, we will present three complex DSP accelerator designs using high level design tools: 1) Low-density parity-check (LDPC) decoder accelerator design using PICO C, 2) Matrix multiplication accelerator design using Catapult C, and 3) QR decomposition accelerator design using System Generator.

2. Identify timing constraints for each stimuli and response

A key part of optimizing DSP software is being able to properly profile and benchmark a DSP kernel and a DSP system. With a solid benchmark and profiling of a DSP kernel, both best case and in system performance can be assessed. Proper profiling and benchmarking can often be an art form. It is often the case that an algorithm is tested in nearly ideal conditions, and that performance is then used within a performance budget. Truly understanding the performance of an algorithm requires being able to model system effects along with understanding an algorithms best-case performance. System effects can include changes such as a running operating system, executing out of memories with different latencies, cache overhead, and managing coherency with memory. All of these effects require a carefully crafted benchmark, which can model these behaviors in a standalone fashion. If modeled correctly the standalone benchmark can very closely replicate a DSP kernel’s execution, as it would behave in a running system. Chapter 9 (page 157) discusses how to perform this kind of benchmark.

3. Aggregate stimulus and response processing into concurrent processes

Once we understand what goes in and what comes out, we need to design and development the DSP software solution. Software development using DSPs is subject to many of the same constraints and development challenges that other types of software development face. These include a shrinking time to market, tedious and repetitive algorithm integration cycles, time intensive debug cycles for real-time applications, integration of multiple differentiated tasks running on a single DSP, as well as other real-time processing demands. Up to 80% of the development effort is involved in analysis, design, implementation, and integration of the software components of a DSP system.

Chapter 15 (page 335), will cover several topics related to managing the DSP software development effort. The first section of this chapter will discuss DSP system engineering and problem framing issues that relate to DSP and real-time software development. High level design tools are then discussed in the context of DSP application development. Integrated development environments and prototyping environments are also discussed. Specific challenges to the DSP application developer such as code tuning, profiling and optimization are reviewed. At the end of the chapter, there are several white papers that discuss related topics to DSP and real-time system development, integration, and analysis.

4. Design algorithms to process stimulus and response, meeting the given timing requirements

Writing DSP software to meet real-time constraints is challenging, so we dedicate several chapters to this area. Historically, DSP software was written in assembly languages. However, with the advent of the modern compiler, writing efficient DSP code using the C language is standard. However, since most DSPs have features that cannot be expressed fully in the C language, various alternatives exist to augment the standard languages of C and C++ such as extensions, higher-level languages and auto-vectorizing compilers. Chapter 10 (page 169) explains the high-level languages and programming models available for writing DSP software.

Optimization is the process of transforming a piece of code to make more efficient (either in terms of time, space, or power) without changing its output or side-effects. The only difference visible to the code’s user should be that it runs faster and/or consumes less memory or power. Optimization is really a misnomer in the sense that the name implies one is trying to find an “optimal” solution— in reality, optimization aims to improve, not perfect, the result.

DSP is all about optimization, and this includes optimization for performance, memory, and power.

Chapter 11 (page 181) focuses on code optimization for performance. This is a critical step in the development process as it directly impacts the ability of the system to do it’s intended job. Code that executes faster means more channels, more work performed and competitive advantage. This chapter is intended to help programmers write the most efficient code possible. It starts with an introduction to using the tool chain, covers the importance of knowing the Digital Signal Processor architecture before optimization, then moves on to cover wide range of optimization techniques. Techniques are presented which are valid on all programmable DSP architectures – C-language optimization techniques and general loop transformations. Real-world examples are presented throughout. To illustrate the concepts, DSPs from both Texas Instruments and Freescale are discussed.

Chapter 12 (page 217) focuses on memory optimization. Optimizing application code around memory system performance can often yield impressive gains in both executable code size, as well as runtime performance. This chapter illustrates methods that applications developers can utilize to improve both the static size of their executables in the presence of all too often resource constrained embedded systems. In addition, the chapter illustrates code optimization techniques that can either provide explicit performance gains of the application code, or implicit gains by improving the software build tools’ ability to optimize code at compilation, assembly and link time.

Chapter 13 (page 241) focuses on optimization for power. This chapter is intended to be a resource for programmers needing to optimize a DSP for power consumption using strictly software. In order to provide the most comprehensive source for DSP software power optimization, this chapter provides a basic introduction into power consumption background, measurement techniques, and then goes into the details of power optimization. The chapter will focus on three main areas: algorithmic optimization, software controlled hardware power optimization (low power modes, clock control, and voltage control), and data flow optimization with a discussion into the functionality and power considerations when using fast SRAM type memories (common for cache) and DDR SDRAM.

5. Design a scheduling solution ensuring processes are scheduled in time to meet their deadlines

DSP applications are very demanding in terms of data rates and real-time computational requirements. These applications also can have very diverse real-time requirements. The DSP designer must understand the real-time design issues or order to obtain maximum performance. In addition, the resulting complexity requires the use of a real-time operating system. Key characteristics of a DSP RTOS include, extremely fast context switch times, very low interrupt latency times, optimized scheduler and interrupt handler, task, event, message, circular queue, and timer management, resource and semaphore processing, fixed block and memory management, and full pre-emption and ability to also have cooperative and time slice scheduling. Chapter 14 (page 291) will cover DSP RTOS in depth in order to be able to effectively utilize a DSP RTOS in application development.

This entire phase of real-time software design and development requires the engineer to build and debug the system in iterative steps using software development tools. DSP debug technology includes both hardware and software technology. Debug hardware consists of functionality on the DSP chip, which enables the collection of data. This data provides state behavior and other system visibility. Hardware is also required to extract this information from the DSP device at high rates and format the data. Debug software provides additional higher level control and an interface with the host computer, usually in terms of an interface called a debugger. The debugger provides the development engineer an easy migration from the compilation process (compiling, assembling, and linking an application) to the execution environment. The debugger takes the output from the compilation process and loads the image into the target system. The engineer then uses the debugger to interact with the emulator to control and execute the application and find and fix problems. These problems can be hardware as well as software problems. The debugger is designed to be a complete integration and test environment. Chapter 17 (page 381) discusses a sequence of activities and techniques to debug complex DSP systems.

Case Study 1 (page 423) will pull all of this together with an excellent case study on performance engineering for DSP systems. The focus is on a case study related to Performance Engineering. Expensive disasters can be avoided when system performance evaluation takes place relatively early in the software development lifecycle. Applications will generally have better performance when alternative designs are evaluated prior to implementation. Software Performance Engineering (SPE) is a set of techniques for gathering data, constructing a system performance model, evaluating the performance model, managing risk of uncertainty, evaluating alternatives, and verifying the models and results. SPE also includes strategies for the effective use of these techniques. Software performance engineering concepts have been used on programs developing digital signal processing applications concurrently with a next generation DSP-based array processor. Algorithmic performance and an efficient implementation were driving criteria for the program. As the processor was being developed concurrently with the software application a significant amount of the system and software development would be completed prior to the availability of physical hardware. This led to incorporation of SPE techniques into the development life-cycle. The techniques were incorporated cross-functionally into both the systems engineering organization responsible for developing the signal processing algorithms and the software and hardware engineering organizations responsible for implementing the algorithms in an embedded real-time system.

Phase 6 – Hardware/Software Integration

The integration phase is where the system is integrated into a fully functional real time system. This is where we choose to describe several cases studies that reinforce many of the points discussed in the preceding chapters. We will cover the more useful and important aspects of system integration as it relates to DSP systems. Topical areas in this phase include;

Integration of Multicore DSP systems

Integration of DSP systems for base stations

Integration of DSP systems for medical

Integration of DSP systems for Voice over IP (VoIP)

Integration of DSP systems for Software Defined Radio

Multicore Digital Signal Processors have grown in importance in recent years mainly because of the emergence of data-intensive applications, such as video and high-speed Internet browsing on mobile devices. These applications demand significant computational performance and at lower cost and power consumption. In Chapter 16 (page 361), we discuss these topics and use the example of porting a single core application to a multicore environment, which entails the consideration of complex programming and process scheduling in addition to performing the required process algorithms. This chapter discusses two basic multicore programming methods: multiple-single-cores and true-multiple-cores. The true-multiple-cores model is used to port a motion JPEG application to a multicore DSP device and serves to illustrate the concepts presented. The chapter also addresses some of the concerns that arise when porting applications to a multi-core environment along with proposed solutions to address these issues.

As more and more DSP algorithm components are developed for a growing number of signal processing centric applications, the need for a programming model and standard is also emerging. Like other coding standards, a DSP programming standard can improve engineering efficiency and improve integration time and effectiveness. It can also make integration of DSP components from multiple vendors more effective.

Phase 6 also includes a case study describing a Long Term Evolution (LTE) basestation design of layer 1 and layer 2 software and some of the techniques used to develop a multicore implementation of this application (Case Study 1, page 423). It brings together many points raised earlier in the book. This case study summarizes the various challenges and their resolution when migrating single-core embedded application software to multi-core SoCs (System-on-a-Chip). The migration of a LTE eNodeB protocol stack is taken as the subject of this case-study.

This case study describes the essential software engineering practices which could be used by engineering teams for enhancing the viability of projects that involve software development over complex embedded platforms.

Thereafter the case study describes through a well defined 3-Step process, the steps required to develop the embedded applications for multicore SoCs with the help of associated examples at each step. Each of the steps is further broken down into sub steps, which seek to ensure that there is a measurable progress at the end of each phase of development. The various technical challenges and their resolution are described in detail, along with various suggestions, using the example of a high performance multicore SoC.

There is a case study on DSP for Voice over IP (VoIP) systems (Case Study 3, page 523). VoIP offers many cost related advantages over legacy copper-based telephony, on both the physical side (the equipment and wiring materials), as well as the logical side (considering the flexibility to add services and to differentiate the cost model). This case study gives an introduction on how the DSP made possible the VoIP ubiquity in the last decade.

There are many packet-based voice technologies that reshaped the telecommunication network, once the migration from twisted-pair and E1/T1 trunk wiring made room for Ethernet LAN and optical backbones. One of the workhorse systems that made the infrastructure transition possible was the VoIP Gateway. Segments of the network were replaced with IP-based technologies but still the service had to be operational when the legacy and new technologies collided. A VoIP Media Gateway handled the voice and signaling information that is required to interface two technologically different sides of the network, for example the TDM network to a packet network. Full-duplex real-time voice or fax/modem communications are compressed by the DSP engine and encoded into IP packets and then sent to the IP network. Packets received from the network are decoded and then decompressed towards TDM side. A dynamic jitter buffer automatically compensates for network delay variation enabling real-time voice communication. Voice processing includes echo cancellation, voice compression, voice-activity detection, and voice packetization. Other functions included are signaling detection and relay and fax/modem relay. DSP technology enabled all of this underlying technology. The architecture of a Media Gateway will be described in this case study.

There is also a case study on the use of DSP in the medical field (Case Study 2, page 493). The example used is a real time ultrasound system. Real time ultrasound systems have been available for more than forty years. During this time the architecture of such systems has changed significantly, bringing new capabilities, in terms of quality and operation modes.

In this case study we present an overview of some of the most common operation modes concentrating on an engineering approach to answer some of the frequent design questions. The text is focused on implementation of beamforming and B-Mode on modern DSP architecture, discussing tradeoffs and hardware capabilities.

Today’s generation of DSP bring a lot of processing power, making them more suitable for medical ultrasound applications. The use of cases and examples in this chapter (Case Study 2, page 493) will show what could be implemented on DSP, where the bottlenecks are and what are the benefits.

There is a case study on Software Defined Radio (SDR) (Case Study 6, page 599); Digital signal processing has become a powerful premise for the world of wireless communications. All the latest technologies, including OFDM, CDMA, SC-FDMA that represent the main foundation of the 3 G-4 G networks, such as HSPA, LTE or WiMAX, are now made possible by the high density of digital algorithms that can be squeezed in a small, low-power chip. On top of that, additional signal processing techniques, such as beamforming or spatial multiplexing, have contributed to the achievement of throughputs of hundreds Mbps and spectral efficiencies of tens of b/s/Hz. Also, some other complex algorithms that can be found in channel estimators, equalizers and bit decoders allow all these techniques to operate in a tough radio environment, including in non-line of sight communications with high mobility, even at large distance.

Large-scale integration allows a handheld device to include a multi-standard terminal, compliant with a large variety of wireless standards. Think about a smart phone, that now conveniently chooses the best technology for data and voice transmission, according to its service needs and to the channel conditions. This is why your smart phone can connect to the BTS through GSM, EDGE, UMTS or soon, LTE. At the same time, it has capabilities for Wi-Fi, Bluetooth, GPS connections, all these in a very small terminal, with decent battery life. In the future, this service selection and multiplexing will only be defined through software. Without digital signal processing, relying solely on analog components, there would be no possible way to achieve such performance. Right now, the analog part in a transceiver is losing its pace in front of the digital part. More and more operations are performed in the digital part, including filtering, up/down conversion, compensations of the distortions produced in the analog chain, at a smaller price and with better performances. Moreover, it is expected that the analog part will be completely conquered in the future, with the help of high-speed A/D and D/A converters, so that the digital transceiver will be glued directly to the antenna. This part is called front-end in a transceiver and while some 10 years ago it used to be completely analog, it is becoming more and more digital. This case study (Case Study 6, page 599) attempts to reveal some of the digital signal processing existing in the front end of a digital transceiver.

DSP systems require accurate specification of requirements for processing for many applications. In this book we have a case study involving the specification steps for a cell phone application utilizing DSP technology. Detailed and accurate specification techniques lead to systems that meet customer requirements and perform well in the field. This case study will walk the reader through these interesting and practical steps to specify software systems.

Finally, there are case studies that overview software performance engineering (SPE). SPE is a systematic, quantitative approach that leads to cost-effective development of software systems to meet system performance requirements. SPE is a software-oriented approach that focuses on architecture, design, and implementation choices. There is an excellent case study focusing on the use of SPE for a large DSP application with hard real-time deadlines.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.109.8