Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

J. Van HoeyBeginning x64 Assembly Programminghttps://doi.org/10.1007/978-1-4842-5076-1_28

28. SSE Alignment

Jo Van Hoey¹

(1)

Hamme, Belgium

It’s time to start the real SSE work! Although we have had a number of chapters on SSE, we only scratched the surface of the subject. There are hundreds of SIMD instructions (MMX, SSE, AVX), and investigating them in-depth would require another book or even a series of books. In this chapter, we will give a number of examples so that you know where to start. The purpose of these examples is to enable you to find your way in the multitude of SIMD instructions in the Intel manuals. In this chapter, we will discuss alignment, which we already covered briefly in Chapter 26.

Unaligned Example

Listing 28-1 shows how to add vectors using data that is unaligned in memory.

; sse_unaligned.asm

extern printf

section .data

;single precision

spvector1 dd 1.1

dd 2.2

dd 3.3

dd 4.4

spvector2 dd 1.1

dd 2.2

dd 3.3

dd 4.4

;double precision

dpvector1 dq 1.1

dq 2.2

dpvector2 dq 3.3

dq 4.4

fmt1 db "Single Precision Vector 1: %f, %f, %f, %f",10,0

fmt2 db "Single Precision Vector 2: %f, %f, %f, %f",10,0

fmt3 db "Sum of Single Precision Vector 1 and Vector 2:"

db " %f, %f, %f, %f",10,0

fmt4 db "Double Precision Vector 1: %f, %f",10,0

fmt5 db "Double Precision Vector 2: %f, %f",10,0

fmt6 db "Sum of Double Precision Vector 1 and Vector 2:"

db " %f, %f",10,0

section .bss

spvector_res resd 4

dpvector_res resq 4

section .text

global main

main:

push rbp

mov rbp,rsp

; add 2 single precision floating point vectors

mov rsi,spvector1

mov rdi,fmt1

call printspfp

mov rsi,spvector2

mov rdi,fmt2

call printspfp

movups xmm0, [spvector1]

movups xmm1, [spvector2]

addps xmm0,xmm1

movups [spvector_res], xmm0

mov rsi,spvector_res

mov rdi,fmt3

call printspfp

; add 2 double precision floating point vectors

mov rsi,dpvector1

mov rdi,fmt4

call printdpfp

mov rsi,dpvector2

mov rdi,fmt5

call printdpfp

movupd xmm0, [dpvector1]

movupd xmm1, [dpvector2]

addpd xmm0,xmm1

movupd [dpvector_res], xmm0

mov rsi,dpvector_res

mov rdi,fmt6

call printdpfp

leave

ret

printspfp:

push rbp

mov rbp,rsp

movss xmm0, [rsi]

cvtss2sd xmm0,xmm0

movss xmm1, [rsi+4]

cvtss2sd xmm1,xmm1

movss xmm2, [rsi+8]

cvtss2sd xmm2,xmm2

movss xmm3, [rsi+12]

cvtss2sd xmm3,xmm3

mov rax,4; four floats

call printf

leave

ret

printdpfp:

push rbp

mov rbp,rsp

movsd xmm0, [rsi]

movsd xmm1, [rsi+8]

mov rax,2; four floats

call printf

leave

ret

Listing 28-1

sse_unaligned.asm

The first SSE instruction is movups (which means “move unaligned packed single precision”), which copies data from memory into xmm0 and xmm1. As a result, xmm0 contains one vector with four single-precision values, and xmm1 contains one vector with four single-precision values. Then we use addps (which means “add packed single precision”) to add the two vectors; the resultant vector goes into xmm0 and is then transferred to memory. Then we print the result with the function printspfp. In the printspfp function, we copy every value from memory into xmm registers using movss (which means “move scalar single precision”). Because printf expects double-precision floating-point arguments, we convert the single-precision floating-point numbers to double precision with the instruction cvtss2sd (which means “convert scalar single to scalar double”).

Next, we add two double-precision values. The process is similar to adding single-precision numbers, but we use movupd and addpd for double precision. The printdpfp function for printing double-precision is a bit simpler. We have only a two-element vector, and because we are already using double precision, we do not have to convert the vectors.

Figure 28-1 shows the output.

../images/483996_1_En_28_Chapter/483996_1_En_28_Fig1_HTML.jpg — Figure 28-1
sse_unaligned.asm output

Aligned Example

Listing 28-2 shows how to add two vectors.

; sse_aligned.asm

extern printf

section .data

dummy db 13

align 16

spvector1 dd 1.1

dd 2.2

dd 3.3

dd 4.4

spvector2 dd 1.1

dd 2.2

dd 3.3

dd 4.4

dpvector1 dq 1.1

dq 2.2

dpvector2 dq 3.3

dq 4.4

fmt1 db "Single Precision Vector 1: %f, %f, %f, %f",10,0

fmt2 db "Single Precision Vector 2: %f, %f, %f, %f",10,0

fmt3 db "Sum of Single Precision Vector 1 and Vector 2:"

db " %f, %f, %f, %f",10,0

fmt4 db "Double Precision Vector 1: %f, %f",10,0

fmt5 db "Double Precision Vector 2: %f, %f",10,0

fmt6 db "Sum of Double Precision Vector 1 and Vector 2:"

db " %f, %f",10,0

section .bss

alignb 16

spvector_res resd 4

dpvector_res resq 4

section .text

global main

main:

push rbp

mov rbp,rsp

; add 2 single precision floating point vectors

mov rsi,spvector1

mov rdi,fmt1

call printspfp

mov rsi,spvector2

mov rdi,fmt2

call printspfp

movaps xmm0, [spvector1]

addps xmm0, [spvector2]

movaps [spvector_res], xmm0

mov rsi,spvector_res

mov rdi,fmt3

call printspfp

; add 2 double precision floating point vectors

mov rsi,dpvector1

mov rdi,fmt4

call printdpfp

mov rsi,dpvector2

mov rdi,fmt5

call printdpfp

movapd xmm0, [dpvector1]

addpd xmm0, [dpvector2]

movapd [dpvector_res], xmm0

mov rsi,dpvector_res

mov rdi,fmt6

call printdpfp

; exit

mov rsp,rbp

pop rbp ; undo the push at the beginning

ret

printspfp:

push rbp

mov rbp,rsp

movss xmm0, [rsi]

cvtss2sd xmm0,xmm0 ;printf expects double precision argument

movss xmm1, [rsi+4]

cvtss2sd xmm1,xmm1

movss xmm2, [rsi+8]

cvtss2sd xmm2,xmm2

movss xmm3, [rsi+12]

cvtss2sd xmm3,xmm3

mov rax,4; four floats

call printf

leave

ret

printdpfp:

push rbp

mov rbp,rsp

movsd xmm0, [rsi]

movsd xmm1, [rsi+8]

mov rax,2; two floats

call printf

leave

ret

Listing 28-2

sse_aligned.asm

Here we create a dummy variable to make sure the memory is not 16-byte aligned. Then we use the NASM assembler directive align 16 in section .data and the directive alignb 16 in section .bss. You need to add these assembler directives before each data block that needs to be aligned.

The SSE instructions are slightly different from the unaligned version. We use movaps (which means “move aligned packed single precision”) to copy data from memory into xmm0. Then we can immediately add the packed numbers from memory to the values in xmm0. This is different from the unaligned version, where we had to put the two values in an xmm register first. If we add the dummy variable to the unaligned example and try to use movaps instead of movups with a memory variable as a second operand, we risk having a runtime segmentation fault. Try it!

The register xmm0 contains the resulting sum vector with four single-precision values. Then we print the result with the function printspfp. In the printspfp function , we call every value from memory and put them into xmm registers. Because printf expects double-precision floating-point arguments, we convert the single-precision floating-point numbers to double precision with the instruction cvtss2sd (“convert scalar single to scalar double”).

Next, we use double-precision values. The process is similar to using single precision, but we use movapd and addpd for double-precision values.

Figure 28-2 shows the output for the aligned example.

../images/483996_1_En_28_Chapter/483996_1_En_28_Fig2_HTML.jpg — Figure 28-2
sse_aligned.asm output

Figure 28-3 shows the unaligned example, with the dummy variable added as the second operand of movaps.

../images/483996_1_En_28_Chapter/483996_1_En_28_Fig3_HTML.jpg — Figure 28-3
sse_unaligned.asm segmentation fault

Summary

In this chapter, you learned about the following:

Scalar data and packed data
Aligned and unaligned data
How to align data
Data movement and arithmetic instructions on packed data
How to convert between single-precision and double-precision data

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 28. SSE Alignment

Create new playlist

Sign In

Sign Up

28. SSE Alignment

Unaligned Example

Aligned Example

Summary

Table of Contents for
28. SSE Alignment