Table of Contents

Cover image

Title page

Copyright

Dedication

Introduction

Performance Apologetic

A Word on Premature Optimization

The Roadmap

Part 1: Background Knowledge

Chapter 1: Early Intel® Architecture

Abstract

1.1 Intel® 8086

1.2 Intel® 8087

1.3 Intel® 80286 and 80287

1.4 Intel® 80386 and 80387

Chapter 2: Intel® Pentium® Processors

Abstract

2.1 Intel® Pentium®

2.2 Intel® Pentium® Pro

2.3 Intel® Pentium® 4

Chapter 3: Intel® Core™ Processors

Abstract

3.1 Intel® Pentium® M

3.2 Second Generation Intel® Core™ Processor Family

Chapter 4: Performance Workflow

Abstract

4.1 Step 0: Defining the Problem

4.2 Step 1: Determine the Source of the Problem

4.3 Step 2: Determine Whether the Bottleneck Can Be Avoided

4.4 Step 3: Design a Reproducible Experiment

4.5 Step 4: Check Upstream

4.6 Step 5: Algorithmic Improvement

4.7 Step 6: Architectural Tuning

4.8 Step 7: Testing

4.9 Step 8: Performance Regression Testing

Chapter 5: Designing Experiments

Abstract

5.1 Choosing a Metric

5.2 Dealing with External Variables

5.3 Timing

5.4 Phoronix Test Suite

Part 2: Monitors

Chapter 6: Introduction to Profiling

Abstract

6.1 PMU

6.2 Top-Down Hierarchical Analysis

Chapter 7: Intel® VTune™ Amplifier XE

Abstract

7.1 Installation and Configuration

7.2 Data Collection and Reporting

Chapter 8: Perf

Abstract

8.1 Event Infrastructure

8.2 Perf Tool

Chapter 9: Ftrace

Abstract

9.1 DebugFS

9.2 Kernel Shark

Chapter 10: GPU Profiling Tools

Abstract

10.1 Traditional Graphics Stack

10.2 buGLe

10.3 Apitrace

Chapter 11: Other Helpful Tools

Abstract

11.1 GNU Profiler

11.2 Gcov

11.3 PowerTOP

11.4 LatencyTOP

11.5 Sysprof

Part 3: Optimization Techniques

Chapter 12: Toolchain Primer

Abstract

12.1 Compiler Flags

12.2 ELF and the x86/x86_64 ABIs

12.3 CPU Dispatch

12.4 Coding Style

12.5 x86 Unleashed

Chapter 13: Branching

Abstract

13.1 Avoiding Branches

13.2 Improving Prediction

Chapter 14: Optimizing Cache Usage

Abstract

14.1 Processor Cache Organization

14.2 Querying Cache Topology

14.3 Prefetch

14.4 Improving Locality

Chapter 15: Exploiting Parallelism

Abstract

15.1 SIMD

Chapter 16: Special Instructions

Abstract

16.1 Intel® Advanced Encryption Standard New Instructions (AES-NI)

16.2 PCLMUL-Packed Carry-Less Multiplication

16.3 CRC32

16.4 SSE4.2 String Functions

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.127