Neon Intrinsics

9.5. ARCHITECTURE-SPECIFIC OPTIMIZATIONS 137

armv7-a-DMETHOD=1"

}

Code sections can then be enabled or disabled depending on the ﬂags set when the library is

getting compiled. is allows architecture-speciﬁc optimizations to be included in one main set

of source ﬁles. As noted below, the METHOD ﬂag is deﬁned to enable NEON code blocks:

#if METHOD == 1

/* Normal code */

#elif METHOD = 2

/* NEON code */

#endif

Note that this is only one case and the Gradle build system allows compilation of completely

separate source sets for diﬀerent architectures. is eliminates the need for using compiler ﬂags

for source code selection when building for diﬀerent architectures. In addition, separate compi-

lation ﬂags may be set for each product ﬂavor, allowing one to ﬁne-tune to a speciﬁc architecture.

It is to be emphasized that this discussion of the Gradle build system may change due to the rel-

atively recent release of Android Studio as well as the continued development eﬀort by Google

on the Android Studio IDE.

9.5.2 ARM HARDWARE CAPABILITIES

Often, signiﬁcant gains in performance can be acquired by enabling compilation for the hard-

ware architecture version that is being used. For example, the compilation setting armeabi refers

to processors up to ARMv6. When using ARMv7, the compilation setting armeabi-v7a pro-

vides additional instruction sets such as umb-2 and VFPv3 (vector ﬂoating-point). One major

disadvantage of ARMv6 is the absence of a hardware ﬂoating-point unit. is results in ﬂoating-

point operations to be performed via software routines instead of a dedicated hardware. ARMv7

allows hardware ﬂoating-point operations with the addition of the VFPv3 instruction set. An-

other feature introduced with ARMv7 is the Advanced SIMD instruction set provided by the

NEON Media Processing Engine (NEON MPE) or coprocessor. ese instructions are sim-

ilar to the MMX and SSE SIMD (single instruction, multiple-data) instruction sets on Intel

processors.

e Advanced SIMD instruction set includes many functions speciﬁcally targeted for

signal processing applications. For example, in the linear convolution code, the core multiply-

accumulate statement consists of a multiply operation followed by an addition operation. is

causes a value getting rounded after both of the multiplication and the addition operations are

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Neon Intrinsics