Architecture-SpecifiC Optimizations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

134 9. CODE OPTIMIZATION

code size. e O0 option denotes no optimization ﬂags; O1 enables a subset of options;

O2 enables more options adding to the ones enabled by O1; and O3 includes all the opti-

mizations added by O1 and O2. e option Ofast enables optimizations that may result in

variables getting truncated or rounded incorrectly for ﬂoating-point math operations. For most

cases, the O3 option produces the best computational eﬃciency outcome.

When using Android Studio, the options for C code libraries need to be set using the

build.gradle ﬁle of the app. e optimization ﬂags can be set within the ndk block using the

cFlags directive. An example using the O3 optimization follows:

ndk {

moduleName "yourLibrary"

abiFilter "armeabi"

ldLibs "log"

cFlags "-O3"

}

When using Xcode, all options for C code libraries can be set within the Build Settings of

the app by changing the Optimization Level under the Apple LLVM 6.1—Code Generation

section.

9.4 EFFICIENT C CODE WRITING

e compiler automatically performs common code optimization changes, such as loop reversal

or changing division by a constant to multiplication by the reciprocal of the constant. us, it

may only be necessary to further improve code eﬃciency by refactoring or manually implement-

ing architecture speciﬁc features such as SIMD instructions. Let us examine the changes that

can be made to the above linear convolution code to improve its computational eﬃciency or

performance.

For the FIR ﬁlter to work properly, it is required to store a suﬃcient number of previ-

ous input samples in memory. Because the generic ARM processor does not support circular

buﬀering, this can be accomplished by using two loops to shift previous samples through an

array structure in memory as follows:

for(i=0; i<fir->numCoefficients; i++) {

fir->window[i] = fir->window[fir->frameSize + i];

}

for(i=0; i<fir->frameSize; i++) {

fir->window[fir->numCoefficients + i] = input[i];

}

9.4. EFFICIENT C CODE WRITING 135

e array window is stored in heap memory using the previously deﬁned FIRFilter structure

as these values need to be retained between calls to the compute method. Memory allocation is

time consuming and multiple repeated allocations should be avoided if possible.

Another way to improve code performance is to reduce the logic necessary for the loop to

operate. Although the above two loops may appear ﬁne, it still takes extra operations to compute

the array index and thus the memory address of the desired value. A method involving pointer

manipulation can be used as shown in the following code block:

void computeFIR(FIRFilter* fir, float* input) {

int i, j;

float temp;

float* windowPtr = fir->window;

for(i=0; i<fir->numCoefficients; i++) {

*windowPtr = windowPtr[fir->frameSize];

windowPtr++;

}

for(i=0; i<fir->frameSize; i++) {

temp = 0;

*windowPtr = input[i];

for(j=0; j<fir->numCoefficients; j++) {

temp += windowPtr[-j] * fir->coefficients[j];

}

windowPtr++;

fir->result[i] = temp;

}

Using this technique, the memory address of the array is loaded one time before variable over-

writes or computations take place. Coming out of the shifting loop, the pointer windowPtr

refers to the memory location of the ﬁrst array index that receives a sample from the new frame

of audio data due to the post-update incrementing. Using the pointer also removes the need for

some logic to accomplish array indexing. In terms of actual instructions generated by the com-

piler, this version of the code has six operations in the second loop as opposed to the original

version of the code having ten operations. Also note, unlike the previous case where the window

array was accessed from low index values to high index values, the window array is now being

accessed in reverse order.

e instructions to compute the result can be generalized into core instructions, e.g., the

multiply-accumulate instruction in linear convolution. Supporting instructions, which add com-

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Architecture-SpecifiC Optimizations

Create new playlist

Sign In

Sign Up

Table of Contents for
Architecture-SpecifiC Optimizations