0%

Book Description

The only book to offer special coverage of the fundamentals of multicore DSP for implementation on the TMS320C66xx SoC 

This unique book provides readers with an understanding of the TMS320C66xx SoC as well as its constraints. It offers critical analysis of each element, which not only broadens their knowledge of the subject, but aids them in gaining a better understanding of how these elements work so well together.

Written by Texas Instruments’ First DSP Educator Award winner, Naim Dahnoun, the book teaches readers how to use the development tools, take advantage of the maximum performance and functionality of this processor and have an understanding of the rich content which spans from architecture, development tools and programming models, such as OpenCL and OpenMP, to debugging tools. It also covers various multicore audio and image applications in detail.  Additionally, this one-of-a-kind book is supplemented with:

  • A rich set of tested laboratory exercises and solutions
  • Audio and Image processing applications source code for the Code Composer Studio (integrated development environment from Texas Instruments)
  • Multiple tables and illustrations

With no other book on the market offering any coverage at all on the subject and its rich content with twenty chapters, Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC is a rare and much-needed source of information for undergraduates and postgraduates in the field that allows them to make real-time applications work in a relatively short period of time. It is also incredibly beneficial to hardware and software engineers involved in programming real-time embedded systems.

Table of Contents

  1. Cover
  2. Title Page
  3. Preface
  4. Acknowledgements
  5. Foreword
  6. About the Companion Website
  7. 1 Introduction to DSP
    1. 1.1 Introduction
    2. 1.2 Multicore processors
    3. 1.3 Key applications of high‐performance multicore devices
    4. 1.4 FPGAs, Multicore DSPs, GPUs and Multicore CPUs
    5. 1.5 Challenges faced for programming a multicore processor
    6. 1.6 Texas Instruments DSP roadmap
    7. 1.7 Conclusion
    8. References
  8. 2 The TMS320C66x architecture overview
    1. 2.1 Overview
    2. 2.2 The CPU
    3. 2.3 Single instruction, multiple data (SIMD) instructions
    4. 2.4 The KeyStone memory
    5. 2.5 Peripherals
    6. 2.6 Conclusion
    7. References
  9. 3 Software development tools and the TMS320C6678 EVM
    1. 3.1 Introduction
    2. 3.2 Software development tools
    3. 3.3 Hardware development tools
    4. 3.4 Laboratory experiments based on the C6678 EVM: introduction to Code Composer Studio (CCS)
    5. 3.5 Loading different applications to different cores
    6. 3.6 Conclusion
    7. References
  10. 4 Numerical issues
    1. 4.1 Introduction
    2. 4.2 Fixed‐ and floating‐point representations
    3. 4.3 Dynamic range and accuracy
    4. 4.4 Laboratory exercise
    5. 4.5 Conclusion
    6. References
  11. 5 Software optimisation
    1. 5.1 Introduction
    2. 5.2 Hindrance to software scalability for a multicore processor
    3. 5.3 Single‐core code optimisation procedure
    4. 5.4 Interfacing C with intrinsics, linear assembly and assembly
    5. 5.5 Assembly optimisation
    6. 5.6 Software pipelining
    7. 5.7 Linear assembly
    8. 5.8 Avoiding memory banks
    9. 5.9 Optimisation using the tools
    10. 5.10 Laboratory experiments
    11. 5.11 Conclusion
    12. References
  12. 6 The TMS320C66x interrupts
    1. 6.1 Introduction
    2. 6.2 The interrupt controller
    3. 6.3 Laboratory experiment
    4. 6.4 Conclusion
    5. References
  13. 7 Real‐time operating system: TI‐RTOS
    1. 7.1 Introduction
    2. 7.2 TI‐RTOS
    3. 7.3 Real‐time scheduling
    4. 7.4 Dynamic memory management
    5. 7.5 Laboratory experiments
    6. 7.6 Conclusion
    7. References
  14. 8 Enhanced Direct Memory Access (EDMA3) controller
    1. 8.1 Introduction
    2. 8.2 Type of DMAs available
    3. 8.3 EDMA controllers architecture
    4. 8.4 Parameter RAM (PaRAM)
    5. 8.5 Transfer synchronisation dimensions
    6. 8.6 Simple EDMA transfer
    7. 8.7 Chaining EDMA transfers
    8. 8.8 Linked EDMAs
    9. 8.9 Laboratory experiments
    10. 8.10 Conclusion
    11. References
  15. 9 Inter‐Processor Communication (IPC)
    1. 9.1 Introduction
    2. 9.2 Texas Instruments IPC
    3. 9.3 Notify module
    4. 9.4 MessageQ
    5. 9.5 ListMP module
    6. 9.6 GateMP module
    7. 9.7 Multi‐processor Memory Allocation: HeapBufMP, HeapMemMP and HeapMultiBufMP
    8. 9.8 Transport mechanisms for the IPC
    9. 9.9 Laboratory experiments with KeyStone I
    10. 9.10 Laboratory experiments with KeyStone II
    11. 9.11 Conclusion
    12. References
  16. 10 Single and multicore debugging
    1. 10.1 Introduction
    2. 10.2 Software and hardware debugging
    3. 10.3 Debug architecture
    4. 10.4 Advanced Event Triggering
    5. 10.5 Unified Instrumentation Architecture
    6. 10.6 Debugging with the System Analyzer tools
    7. 10.7 Instrumentation with TI‐RTOS and CCS
    8. 10.8 Laboratory sessions
    9. 10.9 Conclusion
    10. References
  17. 11 Bootloader for KeyStone I and KeyStone II
    1. 11.1 Introduction
    2. 11.2 How to start the boot process
    3. 11.3 The boot process
    4. 11.4 ROM Bootloader (RBL)
    5. 11.5 Boot process
    6. 11.6 Laboratory experiment 1
    7. 11.7 Laboratory experiment 2
    8. 11.8 TFTP boot with a host‐mounted Network File System (NFS) server – NFS booting
    9. 11.9 Conclusion
    10. References
  18. 12 Introduction to OpenMP
    1. 12.1 Introduction to OpenMP
    2. 12.2 Directive formats
    3. 12.3 Forking region
    4. 12.4 Work‐sharing constructs
    5. 12.5 Environment variables and library functions
    6. 12.6 Synchronisation constructs
    7. 12.7 OpenMP accelerator model
    8. 12.8 Laboratory experiments
    9. 12.9 Conclusion
    10. References
  19. 13 Introduction to OpenCL for the KeyStone II
    1. 13.1 Introduction
    2. 13.2 Operation of OpenCL
    3. 13.3 Command queue
    4. 13.4 Kernel declaration
    5. 13.5 How do the kernels access data?
    6. 13.6 OpenCL memory model for the KeyStone
    7. 13.7 Synchronisation
    8. 13.8 Basic debugging profiling
    9. 13.9 OpenMP dispatch from OpenCL
    10. 13.10 Building the OpenCL project
    11. 13.11 Laboratory experiments
    12. 13.12 Conclusion
    13. References
  20. 14 Multicore Navigator
    1. 14.1 Introduction
    2. 14.2 Navigator architecture
    3. 14.3 Complete functionality of the Navigator
    4. 14.4 Laboratory experiment
    5. 14.5 Conclusion
    6. References
  21. 15 FIR filter implementation
    1. 15.1 Introduction
    2. 15.2 Properties of an FIR filter
    3. 15.3 Design procedure
    4. 15.4 Laboratory experiments
    5. 15.5 Conclusion
    6. References
  22. 16 IIR filter implementation
    1. 16.1 Introduction
    2. 16.2 Design procedure
    3. 16.3 Coefficients calculation
    4. 16.4 IIR filter implementation
    5. 16.5 Laboratory experiment
    6. 16.6 Conclusion
    7. Reference
  23. 17 Adaptive filter implementation
    1. 17.1 Introduction
    2. 17.2 Mean square error
    3. 17.3 Least mean square
    4. 17.4 Implementation of an adaptive filter using the LMS algorithm
    5. 17.5 Implementation using linear assembly
    6. 17.6 Implementation in C language with compiler switches
    7. 17.7 Laboratory experiment
    8. 17.8 Conclusion
    9. References
  24. 18 FFT implementation
    1. 18.1 Introduction
    2. 18.2 FFT algorithm
    3. 18.3 FFT implementation
    4. 18.4 Laboratory experiment
    5. 18.5 Conclusion
    6. References
  25. 19 Hough transform
    1. 19.1 Introduction
    2. 19.2 Theory
    3. 19.3 Limits of r and θ and θ
    4. 19.4 Hough transform implementation
    5. 19.5 Laboratory experiment
    6. 19.6 Conclusion
    7. References
  26. 20 Stereo vision implementation
    1. 20.1 Introduction
    2. 20.2 Algorithm for performing depth calculation
    3. 20.3 Cost functions
    4. 20.4 Implementation
    5. 20.5 Conclusion
    6. References
  27. Index
  28. End User License Agreement
18.191.5.239