Appendix D. SPU Instruction Set Reference

Chapter 15, “SPU Assembly Language,” presented SPU assembly coding in depth, but there wasn’t enough room to add details related to the timing and pipeline usage of the individual instructions. In this case, pipeline usage refers to whether the instruction is processed by the even pipeline (0) or the odd pipeline (1). This is important to know; the SPU can issue two instructions in the same cycle if they are processed by different pipelines.

This appendix lists the SPU’s instructions in alphabetic order. Each entry shows the number of clock cycles required by the instruction (latency), which pipeline it uses (0 or 1), and a description of the instruction’s purpose.

Table D.1. SPU Load/Store Instructions

Opcode

Latency

Pipeline

Purpose

a rt,ra,rb

2

0

Add words in ra and rb

absdb rt,ra,rb

4

0

Subtract bytes in ra from rb, returns absolute value

addx rt,ra,rb

2

0

Add words in ra and rb to LSB of rt

ah rt,ra,rb

2

0

Add halfwords in ra and rb

ahi rt,ra,imm

2

0

Add halfwords in ra and imm value

ai rt,ra,imm

2

0

Add words in ra to imm value

and rt,ra,rb

2

0

AND the values of ra and rb

andbi rt,ra,imm

2

0

AND the bytes of ra with the imm value

andc rt,ra,rb

2

0

AND the values of ra and the complement of rb

andhi rt,ra,imm

2

0

AND the halfwords of ra with the imm value

andi rt,ra,imm

2

0

AND the words of ra with the imm value

avgb rt,ra,rb

4

0

Average of bytes in ra and rb

bg rt,ra,rb

2

0

Generate borrow from ra and rb

bgx rt,ra,rb

2

0

Generate borrow from ra, rb, and the LSB of rt

bi ra

4

1

Branch to ra

bid ra

4

1

Branch to ra, disable interrupts

bie ra

4

1

Branch to ra, enable interrupts

bihnz rt,ra

4

1

Branch to ra if rt halfword doesn’t equal 0

bihnzd rt,ra

4

1

Branch to ra if rt halfword doesn’t equal 0, disable

bihnze rt,ra

4

1

Branch to ra if rt halfword doesn’t equal 0, enable

bihz rt,ra

4

1

Branch to ra if rt halfword equals 0

bihzd rt,ra

4

1

Branch to ra if rt halfword equals 0, disable

bihze rt,ra

4

1

Branch to ra if rt halfword equals 0, enable

binz rt,ra

4

1

Branch to ra if rt word doesn’t equal 0

binzd rt,ra

4

1

Branch to ra if rt word doesn’t equal 0, disable

binze rt,ra

4

1

Branch to ra if rt word doesn’t equal 0, enable

bisl rt,ra

4

1

Branch to ra and set link

bisld rt,ra

4

1

Branch to ra and set link, enable

bisle rt,ra

4

1

Branch to ra and set link, disable

bisled rt,ra

4

1

Branch to ra and set link if an event occurs

bisledd rt,ra

4

1

Branch to ra and set link if an event occurs, disable

bislede rt,ra

4

1

Branch to ra and set link if an event occurs, enable

biz rt,ra

4

1

Branch to ra if rt word equals 0

bizd rt,ra

4

1

Branch to ra if rt word equals 0, disable

bize rt,ra

4

1

Branch to ra if rt word equals 0, enable

br imm

4

1

Branch to sum of imm and PC

bra imm

4

1

Branch to the imm address

brasl rt,imm

4

1

Branch to imm and set link

brhnz rt,imm

4

1

Branch to sum of imm and PC if rt halfword doesn’t equal 0

brhz rt,imm

4

1

Branch to sum of imm and PC if rt halfword equals 0

brnz rt,imm

4

1

Branch to sum of imm and PC if rt word doesn’t equal 0

brsl rt,imm

4

1

Branch to sum of imm and PC, set link

brz rt,imm

4

1

Branch to sum of imm and PC if rt word equals 0

cbd rt,index(ra)

4

1

Create mask for byte insertion

cbx rt,ra,rb

4

1

Create mask for byte insertion

cdd rt,index(ra)

4

1

Create mask for doubleword insertion

cdx rt,ra,rb

4

1

Create mask for doubleword insertion

ceq rt,ra,rb

2

0

Compare equality of words in ra and rb

ceqb rt,ra,rb

2

0

Compare equality of bytes in ra and rb

ceqbi rt,ra,imm

2

0

Compare equality of bytes in ra to imm value

ceqh rt,ra,rb

2

0

Compare equality of halfwords in ra and rb

ceqhi rt,ra,imm

2

0

Compare equality of halfwords in ra to imm value

ceqi rt,ra,imm

2

0

Compare equality of words in ra to imm value

cflts rt,ra,imm

7

0

Convert float in ra to signed integer in rt, scaled by imm

cfltu rt,ra,imm

7

0

Convert float in ra to unsigned integer in rt, scaled by imm

cg rt,ra,rb

2

0

Generate carry vector from ra and rb

cgt rt,ra,rb

2

0

Return if words in ra are greater than words in rb

cgtb rt,ra,rb

2

0

Return if bytes in ra are greater than bytes in rb

cgtbi rt,ra,imm

2

0

Return if bytes in ra are greater than imm

cgth rt,ra,rb

2

0

Return if halfwords in ra are greater than rb

cgthi rt,ra,imm

2

0

Return if halfwords in ra are greater than imm

cgti rt,ra,imm

2

0

Return if words in ra are greater than imm

cgx rt,ra,rb

2

0

Generate carry vector from ra, rb, and the LSB of rt

chd rt,index(ra)

4

1

Create mask for halfword insertion

chx rt,ra,rb

4

1

Create mask for halfword insertion

clgt rt,ra,rb

2

0

Return if words in ra are logically greater than rb

clgtb rt,ra,rb

2

0

Return if bytes in ra are logically greater than rb

clgtbi rt,ra,imm

2

0

Return if bytes in ra are logically greater than imm

clgth rt,ra,rb

2

0

Return if halfwords in ra are logically greater than rb

clgthi rt,ra,imm

2

0

Return if halfwords in ra are logically greater than imm

clgti rt,ra,imm

2

0

Return if words in ra are logically greater than imm

clz rt,ra

2

0

Count 0s preceding the first 1 in ra

cntb rt,ra

4

0

Count number of 1s in each byte of ra

csflt rt,ra,imm

7

0

Convert signed integer in ra to float in rt, scaled by imm

cuflt rt,ra,imm

7

0

Convert unsigned integer in ra to float in rt, scaled by imm

cwd rt,index(ra)

4

1

Create mask for word insertion

cwx rt,ra,rb

4

1

Create mask for word insertion

dfa[1] rt,ra,rb

13

0

Add double-precision values in ra and rb

dfm1 rt,ra,rb

13

0

Multiply double-precision values in ra and rb

dfma1 rt,ra,rb

13

0

Multiply double-precision values in ra and rb, add to rt

dfms1 rt,ra,rb

13

0

Multiply double-precision values in ra and rb, subtract values in rt

dfnma1 rt,ra,rb

13

0

Multiply double-precision values in ra and rb, add values in rt, negate result

dfnms1 rt,ra,rb

13

0

Multiply double-precision values in ra and rb, subtract values in rt, negate result

dfs1 rt,ra,rb

13

0

Subtract double-precision value in ra from rb

dsync

4

1

Ensures LS data is current before external accessing

eqv rt,ra,rb

2

0

Return 1 if ra and rb are the same, 0 otherwise

fa rt,ra,rb

6

0

Add single-precision values in ra and rb

fceq rt,ra,rb

2

0

Compare floating-point equality of ra and rb

fcgt rt,ra,rb

2

0

Return if floating-point ra is greater than floating-point rb

fcmeq rt,ra,rb

2

0

Compare floating-point equality of ra and rb magnitudes

fcmgt rt,ra,rb

2

0

Return if floating-point magnitude of ra is greater than that of rb

fesd1 rt,ra

13

0

Convert float in ra to double in rt

fi rt,ra,rb

7

0

Floating-point interpolate between ra and rb

fm rt,ra,rb

6

0

Multiply floating-point values in ra and rb

fma rt,ra,rb,rc

6

0

Multiply floating-point values in ra and rb, add to values in rc

fms rt,ra,rb,rc

6

0

Multiply floating-point values in ra and rb, subtract values in rc

fnms rt,ra,rb,rc

6

0

Multiply floating-point values in ra and rb, subtract values in rc, negate result

frds1 rt,ra

13

0

Round double in ra to float in rt

frest rt,ra

4

1

Floating-point reciprocal estimate

frsqest rt,ra

4

1

Floating-point reciprocal absolute square-root estimate

fs rt,ra,rb

6

0

Subtract floating-point values in ra from rb

fscrrd1 rt

13

0

Move floating point status and control register to rt

fscrwr ra

7

0

Move rt to floating point status and control register

fsm rt,ra

4

1

Form select mask for words

fsmb rt,ra

4

1

Form select mask for bytes

fsmbi rt,imm

4

1

Form select mask for bytes with imm

fsmh rt,ra

4

1

Form select mask for halfwords

gb rt,ra

4

1

Concatenate LSBs of each word in ra

gbb rt,ra

4

1

Concatenate LSBs of each byte in ra

gbh rt,ra

4

1

Concatenate LSBs of each halfword in ra

hbr imm,ra

15

1

Hint that the branch at imm will target PC + ra

hbra imm1,imm2

15

1

Hint that the branch at imm1 will target PC + imm2

hbrp

15

1

Hint for upcoming branch, prefetch

hbrr ra,rb

15

1

Hint that the branch at PC + imm1 will target PC + imm2

heq ra,rb

2

0

Halt if ra equals rb

heqi ra,imm

2

0

Halt if ra equals imm

hgt ra,rb

2

0

Halt if ra is greater than rb

hgti ra,imm

2

0

Halt if ra is greater than imm

hlgt ra,rb

2

0

Halt if ra is logically greater than rb

hlgti ra,imm

2

0

Halt if ra is logically greater than imm

il rt,imm

2

0

Load each word in rt with the imm value

ila rt,imm

2

0

Load imm (18-bit) into the LSBs of rt

ilh rt,imm

2

0

Load each halfword in rt with the imm value

llhu rt,imm

2

0

Load the high halfword of each word in rt with imm

iohl rt,imm

2

0

OR the low halfword of each word in rt with imm

iret

4

1

Interrupt return

iretd

4

1

Interrupt return, disable

irete

4

1

Interrupt return, enable

lnop

0

1

No operation (pipeline 1)

lqa rt,lsa

6

1

Load quadword from lsa to register rt

lqd rt,index(ra)

6

1

Load quadword from ra+index to register rt

lqr rt,lsa

6

1

Load quadword from lsa+PC to register rt

lqx rt,ra,rb

6

1

Load quadword from ra+rb to register rt

mfspr rt,imm

6

1

Move special-purpose register imm to rt

mpy rt,ra,rb

7

0

Multiply low halfwords in ra and rb

mpya rt,ra,rb,rc

7

0

Multiply signed words in ra and rb, add to rc

mpyh rt,ra,rb

7

0

Multiply high halfwords of ra and low hws of rb

mpyhh rt,ra,rb

7

0

Multiply high halfwords of ra and rb

mpyhha rt,ra,rb

7

0

Multiply high halfwords of ra and rb, and add rt

mpyhhau rt,ra,rb

7

0

Multiply unsigned high halfwords of ra and rb, and add rt

mpyhhu rt,ra,rb

7

0

Multiply unsigned high halfwords of ra and rb

mpyi rt,ra,imm

7

0

Multiply low halfwords in ra by imm value

mpys rt,ra,rb

7

0

Multiply low halfwords in ra and rb and shift right

mpyu rt,ra,rb

7

0

Multiply unsigned low halfwords in ra and rb

mpyui rt,ra,imm

7

0

Multiply unsigned low halfwords in ra by imm value

mtspr imm,rt

6

1

Move rt to special-purpose register imm

nand rt,ra,rb

2

0

NAND the values of ra and rb

nop

0

0

No operation (pipeline 0)

nor rt,ra,rb

2

0

NOR the values of ra and rb

or rt,ra,rb

2

0

OR the values of ra and rb

orbi rt,ra,imm

2

0

OR the bytes of ra with the imm value

orc rt,ra,rb

2

0

OR the values of ra and the complement of rb

orhi rt,ra,imm

2

0

OR the halfwords of ra with the imm value

ori rt,ra,imm

2

0

OR the words of ra with the imm value

orx rt,ra

4

1

OR the words of ra

rchcnt rt,imm

6

1

Read capacity of channel imm into rt

rdch rt,imm

6

1

Read data from channel imm into rt

rot rt,ra,rb

4

0

Rotate bits in words of ra left according to rb

roth rt,ra,rb

4

0

Rotate bits in halfwords of ra left according to rb

rothi rt,ra,imm

4

0

Rotate bits in halfwords of ra left according to imm

rothm rt,ra,rb

4

0

Rotate bits in halfwords of ra right according to -rb

rothmi rt,ra,imm

4

0

Rotate bits in halfwords of ra right according to -imm

roti rt,ra,imm

4

0

Rotate bits in words of ra left according to imm

rotm rt,ra,rb

4

0

Shift bits in words of ra right according to -rb

rotma rt,ra,rb

4

0

Shift bits in words of ra right according to -rb (algebraic)

rotmah rt,ra,rb

4

0

Shift bits in halfwords of ra right according to -rb (algebraic)

rotmahi rt,ra,imm

4

0

Shift bits in halfwords of ra right according to -imm (algebraic)

rotmai rt,ra,imm

4

0

Shift bits in words of ra right according to -imm (algebraic)

rotmi rt,ra,imm

4

0

Shift bits in words of ra right according to -imm

rotqbi rt,ra,rb

4

1

Rotate entire ra left by bits according to rb

rotqbii rt,ra,imm

4

1

Rotate entire ra left by bits according to imm

rotqby rt,ra,rb

4

1

Rotate entire ra left by bytes according to rb

rotqbybi rt,ra,rb

4

1

Rotate entire ra left by bytes according to rb count

rotqbyi rt,ra,imm

4

1

Rotate entire ra left by bytes according to imm

rotqmbi rt,ra,rb

4

1

Shift entire ra right by bits according to -rb

rotqmbii rt,ra,imm

4

1

Shift entire ra right by bits according to -imm

rotqmby rt,ra,rb

4

1

Shift entire ra right by bytes according to -rb

rotqmbybi rt,ra,rb

4

1

Shift entire ra right by bytes according to -rb count (algebraic)

rotqmbyi rt,ra,imm

4

1

Shift entire ra right by bytes according to -imm

selb rt,ra,rb,rc

2

0

Select bits from ra and rb according to rc

sf rt,ra,rb

2

0

Subtract words in ra from rb

sfh rt,ra,rb

2

0

Subtract halfwords in ra from rb

sfhi rt,ra,imm

2

0

Subtract halfwords in ra from imm value

sfi rt,ra,imm

2

0

Subtract words in ra from imm value

sfx rt,ra,rb

2

0

Subtract words in ra from rb and LSB of rt

shl rt,ra,rb

4

0

Shift bits in words of ra left according to rb

shlh rt,ra,rb

4

0

Shift bits in halfwords of ra left according to rb

shlhi rt,ra,imm

4

0

Shift bits in halfwords of ra left according to imm

shli rt,ra,imm

4

0

Shift bits in words of ra left according to imm

shlqbi rt,ra,rb

4

1

Shift entire ra left by bits according to rb

shlqbii rt,ra,imm

4

1

Shift entire ra left by bits according to imm

shlqby rt,ra,rb

4

1

Shift entire ra left by bytes according to rb

shlqbybi rt,ra,rb

4

1

Shift entire ra left by bytes according to rb count

shlqbyi rt,ra,imm

4

1

Shift entire ra left by bytes according to imm

shufb rt,ra,rb,rc

4

1

Form rt from the bytes of ra and rb according to rc

stop

4

1

Halt the SPU and send stop signal to PPU

stopd

4

1

Halt the SPU and send signal (can be used as breakpoint)

stqa rt,lsa

6

1

Store quadword from register rt to lsa

stqd rt,index(ra)

6

1

Store quadword from register rt to ra+index

stqr rt,lsa

6

1

Store quadword from register rt to lsa+PC

stqx rt,ra,rb

6

1

Store quadword from register rt to ra+rb

sumb rt,ra,rb

4

0

Add bytes in ra and rb, return halfword results

sync

4

1

Force SPU to complete all store operations before continuing

syncc

4

1

Force SPU to complete store and channel operations

wrch imm,rt

6

1

Write data from rt into channel imm

xor rt,ra,rb

2

0

XOR the values of ra and rb

xorbi rt,ra,imm

2

0

XOR the bytes of ra with the imm value

xorhi rt,ra,imm

2

0

XOR the halfwords of ra with the imm value

xori rt,ra,imm

2

0

XOR the words of ra with the imm value

xsbh rt,ra

2

0

Sign extend bytes in ra to halfwords

xshw rt,ra

2

0

Sign extend halfwords in ra to words

xswd rt,ra

2

0

Sign extend words in ra to doublewords

[1] The double-precision math instructions (dfa, dfm, dfma, dfms, dfnma, dfnms, and dfs) and certain single-precision instructions (fesd, frds, and fscrrd) cause the pipeline to stall for six cycles. This means that after the instruction completes, the pipeline has to wait at least six cycles before issuing a new instruction.

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.90.172