Chapter 13. SPU Communication, Part 2: Events, Signals, Mailboxes

Direct memory access (DMA) gets all the attention, but the Cell provides other ways to communicate between processors. These methods, which include signals and mailboxes, are generally used to send control data or small, important pieces of information such as memory addresses.

Events and interrupts play a central role in this discussion. In many cases, it’s not enough to transfer data into an SPU’s local store (LS); the SPU needs to be alerted that new data has arrived. This can be accomplished with interrupts, which immediately tell the processing unit that an event has occurred.

The Cell provides three main mechanisms for data transfer: DMA, signals, and mailboxes. These topics can seem bewildering at first, but at a low level, they all represent different uses of channels. This chapter begins by describing what channels are, how they work, and the functions that access them.

SPE Channels and the Memory Flow Controller

No matter which method of SPU communication you use, the underlying process boils down to a series of reads and writes to channels. Channels are like Linux file descriptors—processes read and write data to channels by accessing their associated numbers.

The MFC (Memory Flow Controller) contains 32 channels, which can be interfaced with three basic functions. But before I describe the functions, I want to present an analogy that (I hope) will provide some insight into how channels work.

The Scholar-Butler Analogy and Channels

Section 12.1, “The Element Interconnect Bus (EIB) and the Memory Flow Controller (MFC),”presented a brief analogy that compares the relationship between an SPU and its MFC to that between an isolated scholar and its butler. The scholar communicates with the external world by ringing a bell to summon a butler and requesting that the butler send and receive messages. This is fine for DMA, but to explain the broader topic of channels, the analogy needs to be elaborated further.

The scholar (SPU) actually has 32 bells and the servants’ quarters (MFC) provides lodging for 32 butlers (channels), each with a specific task to perform. There are two types of butlers: The first type of butler receives information (write-only) and the second type provides information (read-only). Scholar-butler communication is always one way. If the scholar wants specific data, the usual process is to request the information from a butler of the first type and hear the information from a butler of the second type.

Many butlers respond immediately, no matter how many times the scholar rings the bell (nonblocking channels). But some butlers can only be summoned a set number of times (channel capacity). If the scholar rings the bell again, it must wait until the butler finishes one of its tasks (blocking channel).

Butlers can communicate with other butlers, but not through bells. Instead, for a butler to send a message to another, it needs to have the address of the recipient butler’s lodging (MFC register). This makes it possible for scholars to speak to one another through their butlers.

SPU Channels and Channel Functions

The SPU accesses the MFC’s channels by calling one of three functions:

  • unsigned int spu_readch(chan): Returns the value of the channel

  • void spu_writech(chan, int): Writes the int value into the channel

  • unsigned int spu_readchcnt(chan): Returns the unused capacity of the channel

This last function is important. Most of the channels can be read or written to repeatedly, but some of the channels, such as those used for mailbox communication, can be accessed only when capacity is available. Before accessing one of these channels, it’s a good idea to call spu_readchcnt to make sure that the available capacity is nonzero.

The first argument in each of the functions must be one of the channel constants declared in spu_intrinsics.h. Table 13.1 lists these constants and the channels they represent. The entries marked — are reserved.

Table 13.1. SPE Channels

Ch #

Constant

R/W

Blocking/Nonblocking

Cap.

Purpose

0

SPU_RdEventStat

R

B

1

Read SPU Event Status

1

SPU_WrEventMask

W

N

1

Write Event Mask

2

SPU_WrEventAck

W

N

1

Write Event Acknowledge

3

SPU_RdSigNotify1

R

B

1

Read Signal Notification 1

4

SPU_RdSigNotify2

R

B

1

Read Signal Notification 2

5

    

6

    

7

SPU_WrDec

W

N

1

Write to SPU Decrementer

8

SPU_RdDec

R

N

1

Read SPU Decrementer

9

SPU_WrMSSyncReq

W

B

1

Write to MS Synchronization Register

10

    

11

SPU_RdEventMask

R

N

1

Read SPU Event Mask

12

SPU_RdTagMask

R

N

1

Read SPU Tag Mask

13

SPU_RdMachStat

R

N

1

Read SPU Machine Status

14

SPU_WrSRR0

W

N

1

Write to Save/Restore Register

15

SPU_RdSRR0

R

N

1

Read from Save/Restore Register

16

MFC_LSA

W

N

1

MFC Local Storage Address

17

MFC_EAH

W

N

1

MFC Effective Address High

18

MFC_EAL

W

N

1

MFC Effective Address Low

19

MFC_Size

W

N

1

MFC Transfer/List Size

20

MFC_TagID

W

N

1

MFC Command Tag ID

21

MFC_Cmd

W

B

16

MFC Class ID

22

MFC_WrTagMask

W

N

1

Write to MFC Tag Group Mask

23

MFC_WrTagUpdate

W

B

1

Write to MFC Tag Update Request

24

MFC_RdTagStat

R

B

1

Read MFC Tag Group Status

25

MFC_RdListStallStat

R

B

1

Read Stall-and-Notify Tag

26

MFC_WrListStallAck

W

N

1

Write to Stall-and-Notify Tag Ack

27

MFC_RdAtomicStat

R

B

1

Read Atomic Command Status

28

SPU_WrOutMbox

W

B

1

Write to SPU Outbound Mailbox

29

SPU_RdInMbox

R

B

4

Read from SPU Inbound Mailbox

30

SPU_WrOutIntrMbox

W

B

1

Write to SPU Outbound Interrupt

31

    

 

For example, to obtain the value of the SPU’s Machine Status register, you’d use the following:

value = spu_readch(SPU_RdMachStat);

And to determine the capacity of the inbound mailbox channel, you’d use this:

cap = spu_readchcnt(SPU_RdInMbox);

Don’t worry about remembering all the channel names. The spu_mfcio.h header declares shorthand functions that read and write to specific channels. For example, Chapter 11, “SIMD Programming on the SPU,” discussed spu_read_decrementer and spu_write_decrementer, which keep track of time by accessing the SPU’s decrementer. These functions perform the same operations as spu_readch(SPU_RdDec) and spu_writech(SPU_WrDec, value), but are easier to remember.

Channels starting with MFC_ are generally used for DMA. Channels 16–21 are used to create a DMA transfer, and Channels 22–27 are used to monitor the transfer’s completion. As explained in Chapter 12, “SPU Communication, Part 1: Direct Memory Access (DMA),” mfc_get and mfc_put are the basic functions of DMA, but these are actually composite functions based on the MFC’s channels.

For example, when you transfer data from &buff to ea_addr with

mfc_put(&buff, ea_addr, sizeof(buff), tag, 0, 0)

you’re actually calling the following channel functions:

spu_writech(MFC_LSA, &buff);
spu_writech(MFC_EAH, mfc_ea2h(ea_addr));
spu_writech(MFC_EAL, mfc_ea2l(ea_addr));
spu_writech(MFC_Size, sizeof(buff));
spu_writech(MFC_TagID, tag);
spu_writech(MFC_Cmd, MFC_CMD_WORD(0, 0, MFC_PUT_CMD));

The first five calls can be made in any order, but writing to the MFC_Cmd channel has to be performed last. In this case, the value written to this channel identifies the Transfer Class ID (0), the Replacement Class ID (0), and the opcode that identifies the command (MFC_PUT_CMD). When this channel is updated, the MFC starts the DMA transfer by placing a command inside its queue. This is why all the transfer parameters need to be initialized before the MFC_Cmd channel is written to.

The channels whose names start with SPU_ are used for operations other than DMA. Most of this chapter is concerned with how these channels work and how they make events, mailboxes, and signals possible.

PPU Access to MFC Registers

The previous chapter explained how the PPU performs DMA by proxy: It accesses an SPE’s MFC and sends transfer commands similar to those of SPU-initiated DMA. But the PPU can’t access the MFC through channels. Instead, it reads and writes values to the MFC’s registers, which are mapped to the effective address space. This is shown in Figure 13.1.

PPU and SPU access to the Memory Flow Controller

Figure 13.1. PPU and SPU access to the Memory Flow Controller

The names and addresses of the MFC’s memory-mapped I/O (MMIO) registers are beyond the scope of this book. The Cell BE Handbook lists each register and its corresponding Synergistic Processor Element (SPE) channel, if applicable.

Events and Interrupts

In many instances, a processing element may need to respond to an external occurrence. The external occurrence is called an event, and the element’s response is called an event handler. Handling events in code is a three-step process:

  1. Select events of interest

  2. Recognize events when they occur

  3. Acknowledge events

The SPU can recognize and respond to 12 different types of events. Table 13.2 lists the mnemonic associated with each event and the external condition that causes the event to become pending.

Table 13.2. SPE Events

Event Mnemonic

Event Condition

MFC_TAG_STATUS_UPDATE_EVENT

The status of a DMA tag group transfer is available.

MFC_LIST_STALL_NOTIFY_EVENT

The DMA list element’s Stall and Notify bit is high.

MFC_LLR_LOST_EVENT

The reservation acquired through mfc_getllar is lost.

MFC_COMMAND_QUEUE_AVAILABLE_EVENT

The MFC command queue is no longer full.

MFC_DECREMENTER_EVENT

The decrementer value has changed from positive to negative.

MFC_OUT_MBOX_AVAILABLE_EVENT

There is space in the outgoing mailbox for further messages.

MFC_OUT_INTR_MBOX_AVAILABLE_EVENT

There is space in the outgoing interrupt mailbox for further messages.

MFC_IN_MBOX_AVAILABLE_EVENT

There is a message available in the incoming mailbox.

MFC_SIGNAL_NOTIFY_1_EVENT

An unread signal is available in the SPU_RdSigNotify1 channel.

MFC_SIGNAL_NOTIFY_2_EVENT

An unread signal is available in the SPU_RdSigNotify2 channel.

MFC_MULTI_SRC_SYNC_EVENT

All the data transfers being monitored have completed.

MFC_PRIV_ATTN_EVENT

An element (PPE) with privileged access has requested attention.

The first four events in the table are caused by DMA operations, and the fifth is produced by the SPU’s decrementer. The next five involve mailboxes and signals, which are discussed shortly. The second-to-last event deals with the MFC multisource synchronization capability, which is explained in the final section of this chapter. The last event concerns privileged PPE access, and is beyond the scope of this book.

Step 1: Select Events of Interest

The SPU tells the MFC which events it’s interested in by writing values to the Event Mask channel, SPU_WrEventMask. For example, the following code selects the event corresponding to the loss of an atomic DMA transfer’s reservation (see Section 12.7, “Atomic DMA and the Synchronization Library”):

spu_writech(SPU_WrEventMask, MFC_LLR_LOST_EVENT)

The more descriptive intrinsic function in spu_intrinsics.h is this:

spu_write_event_mask(MFC_LLR_LOST_EVENT)

Similarly, the event mask can be read with the spu_read_event_mask intrinsic. This book makes use of instrinsics whenever possible.

Step 2: Recognize Events as They Occur

After identifying the events it’s interested in, the SPU needs a way to monitor events and respond when they occur. The three methods of event recognition are waiting, polling, and interrupt handling. Table 13.3 lists the intrinsic functions that make these methods possible.

Table 13.3. SPU Event-Recognition Functions

Function

Purpose

spu_read_event_stat

Halts the SPU until a selected event has occurred

spu_stat_event_status

Checks event status and returns immediately

spu_bisled

Branches to a function when an event has occurred

spu_ienable

Enables interrupt processing

spu_idisable

Disable interrupt processing

asm("iret")

Returns from an interrupt service routine

Waiting

If an SPU reads from Channel 0, SPU_RdEventStat, one of two things will happen. If none of the selected events have occurred, the SPU will wait. If any of the selected events have become pending, the read function will return an int whose bits correspond to the events in Table 13.2. If a bit has a value of one, the corresponding event has occurred. Bits corresponding to unselected events will remain at zero even if they’ve occurred.

The SPU intrinsic that reads event status is spu_read_event_stat, and if a thread calls

spu_read_event_stat()

it will block until one of the selected events becomes pending. This waiting is power efficient because the SPU switching logic stops during the wait. The SPU will respond immediately when an event is raised.

Polling

In many cases, the SPU needs to continue processing data while waiting for an event. A common method is to create a loop that checks for the event’s status with each iteration. Of course, you want the status check to return immediately, so spu_read_event_stat won’t be suitable.

Instead of reading the event status channel, the SPU can read the channel’s available capacity. This returns 1 if a selected event has been raised and 0 if it hasn’t. To read this capacity, the low-level channel function is as follows:

spu_readchcnt(SPU_RdEventStat)

The specific intrinsic is this:

spu_stat_event_status()

This function returns immediately, and because the return value is 1 or 0, it works well as a loop condition. For example, to perform routine() while waiting for a tag group to finish its DMA transfers, you could use code similar to the following:

spu_write_event_mask(MFC_TAG_STATUS_UPDATE_EVENT);
do {
   routine();
} while(!spu_stat_event_status());

The spu_bisled function is similar to spu_stat_event_status, and they both check for selected events without waiting. But spu_bisled does more. Instead of returning a 1 or 0, it accepts a pointer to a function and branches to that function when the selected event occurs.

For example, to branch to the event_handler function when a selected event occurs, you’d use the following code:

spu_bisled(&event_handler);

If none of the selected events have become pending, spu_bisled will do nothing and the SPU can pass to the next instruction.

Interrupt Handling

Polling frees the SPU to perform other tasks, but if the loop is large, the SPU won’t be able to respond quickly to an event’s occurrence. Using interrupts provides the best of both worlds: Like polling, it allows the SPU to continue processing while checking for events. Like blocking, the SPU reacts immediately when an event condition is raised.

The SPU’s interrupt mechanism allows you to create an interrupt service routine (ISR) that will be called when a selected event occurs. The ISR must be placed at a specific memory location, and for the SPU, the ISR must be at 0x0000. This corresponds to the .interrrupt section of an SPU ELF file, and Appendix A, “Understanding ELF Files,” explains how ELF files are structured.

Interrupt detection requires additional processing time and is disabled by default. The spu_ienable() intrinsic tells the SPU to start checking for interrupts and spu_idisable() tells it to stop.

In Listing 13.1, the SPU tells the MFC that it’s interested in the MFC_DECREMENTER_EVENT. This occurs whenever the decrementer value falls from a positive to negative value. Then it enables interrupt processing and executes a loop while the decrementer operates.

When the decrementer reaches a negative value, the SPU’s interrupt controller calls interrupt_service(), a void function with no parameters. Because this function’s attribute places it in the .interrupt section, the interrupt controller will find it in the LS at 0x0000.

Example 13.1. SPU Interrupts: spu_interrupt.c

#include <spu_mfcio.h>

void interrupt_service(void)
   __attribute__ ((section (".interrupt")));

volatile unsigned int check_value = 0;

int main(unsigned long long speid,
         unsigned long long argp,
         unsigned long long envp) {

   unsigned int mask;

   /* Enable interrupt processing */
   spu_ienable();

   /* Read the event mask */
   mask = spu_read_event_mask();

   /* Write to the event mask */
   spu_write_event_mask(MFC_DECREMENTER_EVENT);

   /* Write to the decrementer and begin countdown */
   spu_write_decrementer(10000);

   /* Loop while waiting for interrupt */
   while(check_value == 0);

   /* Restore the event mask */
   spu_write_event_mask(mask);

   return 0;
}

void interrupt_service(void) {

   int dec = spu_read_decrementer();
   printf("ISR: Decrementer = %d.
", dec);

   /* End loop in main function */
   check_value = 1;

   /* Acknowledge event detection */
   spu_write_event_ack(MFC_DECREMENTER_EVENT);

   /* Return to main function */
   asm("iret");
}

The name of the interrupt service routine isn’t important, but its section attribute must be set equal to .interrupt. Otherwise, the processor won’t be able to find the ISR and the main loop will continue spinning.

The iret (Interrupt Return) instruction makes it possible for interrupt_service to return to the main function. iret resets the program counter to its preinterrupt position and allows normal processing to continue. You can find further examples of how iret is used in the spu_interrupt project in the SDK samples directory.

Many applications use of two levels of interrupt servicing. A first-level interrupt handler (FLIH) performs preliminary routines and schedules the second-level interrupt handler (SLIH). This handler performs long-term tasks and completes the interrupt servicing. The sample spu_interrupt project in the SDK samples directory provides an example of how these different handlers operate.

Step 3: Acknowledge Events

An event will remain pending until it is acknowledged. Acknowledgment is accomplished by writing the event number to the SPU_WrEventAck channel. This clears the event from the list of pending events and allows similar events to be received again.

The shorthand function for acknowledging events is spu_write_event_ack. In Listing 13.1, the interrupt service routing acknowledges the decrementer event with the following line:

spu_write_event_ack(MFC_DECREMENTER_EVENT);

This clears the current event and allows further decrementer events to be detected.

PPE Event Handling

Chapter 7, “The SPE Runtime Management Library (libspe),” discusses the libspe library and the many steps taken by the PPU to manage operation of the SPUs. The fourth step involves creating an event handler to respond to SPU events. Section 7.2, “The SPE Management Process,” describes how to create a handler and register events in code, but it helps to see once again which event types are available:

  • SPE_EVENT_SPE_STOPPED: Responds when the SPU finishes processing

  • SPE_EVENT_IN_MBOX: Responds when the SPU inbound mailbox is able to receive data

  • SPE_EVENT_OUT_INTR_MBOX: Responds when the SPU receives data in its outbound interrupting mailbox

  • SPE_EVENT_TAG_GROUP: Responds when the DMA transfer associated with a given tag group has completed

  • SPE_EVENT_ALL_EVENTS: Responds to all the above events

Section 7.2 also provided a sample application in which the PPU receives event data when an SPU completes its processing. The following code shows how to create an event handler to keep track of the completion of a tag group’s DMA transfer:

/* Create the event handler */
spe_event_handler_ptr_t ehandler
   = spe_event_handler_create();

/* Initialize an event structure */
spe_event_unit_t event;
event.spe = ctx;
event.events = SPE_EVENT_TAG_GROUP;

/* Register the event with the handler
spe_event_handler_register(ehandler, &event);

After the context begins running, the PPE listens for SPE events with the following line:

e_count = spe_event_wait(ehandler, events, MAX_EVENTS, 10);

Chapter 7 fully explains the functions, data structures, and constants that make PPE event processing possible.

Mailboxes

Mailboxes are the simplest way to transfer data from one processing unit (SPU or PPU) to another. Mailbox messages can hold only 4 bytes of information, so they’re commonly used to transmit control data or one half of a 64-bit effective address. For example, the PPU may use mailbox messaging to tell SPUs the address of a data buffer so that they know where to direct their DMA transfers.

SPU Mailbox Communication

The first thing you need to understand about SPU mailbox messaging is the difference between mailboxes and DMA. With DMA, an SPU tells the MFC to deliver data between specific memory locations. But when sending a mailbox message, the SPU can’t specify where the message should be sent; all it can do is write data to an outgoing channel and hope that the intended recipient finds it.

Table 13.4 lists the functions used by the SPU for mailbox messaging and the channels involved.

Table 13.4. SPU Mailbox Communication Functions

Function

Channel

Operation

spu_write_out_mbox

SPU_WrOutMBox

Writes an int to the outgoing mailbox

spu_stat_out_mbox

SPU_WrOutMBox

Returns the available capacity of the outgoing mailbox

spu_write_out_intr_mbox

SPU_WrOutIntrMBox

Writes an int to the outgoing interrupt mailbox

spu_stat_out_intr_mbox

SPU_WrOutIntrMBox

Returns the available capacity of the outgoing interrupt mailbox

spu_read_in_mbox

SPU_RdInMBox

Reads the int in the incoming mailbox

spu_stat_in_mbox

SPU_RdInMBox

Returns the available capacity of the incoming mailbox

These functions are as simple as they look. The two write functions, spu_write_out_mbox and spu_write_stat_mbox, accept an int parameter that serves as the outgoing mailbox message. The rest of the functions return values corresponding to incoming messages or the capacity of mailbox channels.

Each function accesses one of three channels, and because they’re used for mailbox messaging, the channels are commonly referred to using one of three names:

  • Outgoing mailbox—Holds values to be read by external processing units

  • Outgoing interrupt mailbox—Holds values to be read by external processing units and causes an interrupt if applicable

  • Incoming mailbox—Holds values updated by external processing units

Mailbox Write

The outgoing mailbox capacity starts with a value of 1. When an SPU calls spu_write_out_mbox, the outgoing mailbox is full and its capacity drops to zero. If the SPU attempts a second write, it will block until the first message is read. That is, if the SPU calls

spu_write_out_mbox(0xAAAABBBB)
spu_write_out_mbox(0xCCCCDDDD)

the second write will stall the SPU until an external element reads the 0xAAAABBBB value. For this reason, it’s a good idea to check the available capacity of the outgoing mailbox before each write. The spu_stat_out_mbox function returns a 1 if there are no messages in the outgoing mailbox and 0 if the mailbox is full.

The next two functions in the table are nearly identical to the first two, but when the outgoing interrupt mailbox is written to, it can cause a hardware interrupt in the PPU. PPU interrupt handling requires privileged access, and is therefore beyond the scope of this book.

Mailbox Read

If a message is available, spu_read_in_mbox returns an int from the incoming mailbox. If all four of the incoming mailbox slots are empty, the function stalls the SPU until a message is available.

Mailbox Event Processing

The SPU will block if it attempts to write to a filled outgoing mailbox or attempts to read from an empty incoming mailbox. The three spu_stat_ functions in Table 13.4 provide advance warning, but mailbox channel availability can also be monitored with events.

Table 13.2 lists three events related to mailboxes: MFC_OUT_MBOX_AVAILABLE_EVENT, MFC_OUT_INTR_MBOX_AVAILABLE_EVENT, and MFC_IN_MBOX_AVAILABLE_EVENT. These events are raised whenever the corresponding mailbox is available for accessing, and the SPU can detect and respond to these events using the methods described in the previous section.

There’s no reason to use events for waiting or polling, because the regular mailbox functions are available. However, mailbox events become important when an SPU needs to be immediately interrupted when a mailbox message is received. The code in Listing 13.2 shows how to interrupt an SPU when data appears in the incoming mailbox. The code is similar to that in Listing 13.1, but now the SPU reads the content of the mailbox and displays the content of the message.

Example 13.2. SPU Interrupts: spu_mbox_interrupt.c

#include <spu_mfcio.h>

void interrupt_service(void)
   __attribute__ ((section (".interrupt")));

volatile unsigned int check_value = 0;

int main(unsigned long long speid,
         unsigned long long argp,
         unsigned long long envp) {

   unsigned int mbox_content;

   /* Write to the event mask */
   spu_write_event_mask(MFC_IN_MBOX_AVAILABLE_EVENT);

   /* Enable interrupt processing and wait */
   spu_ienable();
   while(!check_value);

   /* Read mailbox and display result */
   mbox_content = spu_read_in_mbox();
   printf("Received data = %x
", mbox_content);
   return 0;
}

void interrupt_service(void) {
   spu_write_event_ack(MFC_IN_MBOX_AVAILABLE_EVENT);
   check_value++;
   asm("iret");
}

The SPU’s mailbox interrupt won’t trigger until it receives a message from an external processing unit. In the spu_mbox_interrupt project, the PPU invokes a function from the SPE Runtime Management library (libspe) to send an int to the SPU’s incoming mailbox. The PPU’s mailbox communication functions are described next.

PPU Mailbox Communication

SPUs write messages to their outgoing mailboxes, but the data remains in the mailbox until it is read. The PPU is the only external processing unit that knows the addresses of the SPU’s mailboxes, so it makes sense that most mailbox communication takes place between the PPU and SPUs.

Chapter 12 explained how the PPU accesses an SPU’s DMA resources through MMIO registers. The process is similar for mailbox messaging: The PPU can access the SPU’s outgoing or incoming mailboxes by performing simple read/write operations on memory addresses. Thankfully, the libspe functions in Table 13.5 enable you to access mailboxes without knowing their mapped locations.

Table 13.5. PPU Mailbox Communication Functions

Function

Operation

spe_out_mbox_read

Reads from the SPU’s outgoing mailbox

spe_out_mbox_status

Returns the number of messages in the SPU’s outgoing mailbox

spe_out_intr_mbox_read

Reads from the SPU’s outgoing interrupt mailbox

spe_out_intr_mbox_status

Returns the number of messages in the SPU’s outgoing interrupt mailbox

spe_in_mbox_write

Writes an int into the SPU’s incoming mailbox

spe_in_mbox_status

Returns the number of slots available for accepting messages

These functions work like the ones listed in Table 13.3, but in reverse. They read from an SPU’s outgoing mailboxes and write to an SPU’s incoming mailbox. The functions that end in _status don’t return the channel’s available capacity, but return how much of the channel is occupied with messages.

All the functions require the SPU context as the first parameter. The spe_in_mbox_write and spe_out_intr_mbox_read functions also accept an unsigned int argument called behavior. This is shown in the signature of spe_in_mbox_write:

int spe_in_mbox_write(spe_context_ptr_t spe, unsigned int *mbox_data, int count, unsigned int behavior)

behavior controls the PPU’s processing while the mailbox operation is carried out. Its possible values include the following:

  • SPE_MBOX_ALL_BLOCKING: The PPU waits until all mailbox operations are completed.

  • SPE_MBOX_ANY_BLOCKING: The PPU waits until at least one mailbox operation has completed.

  • SPE_MBOX_ANY_NONBLOCKING: The PPU doesn’t wait for any mailbox operation to complete.

As an example, the PPU code in the mbox_interrupt project uses the following code to send a mailbox message to the SPU with context spe:

mbox_data[0] = 0x12345678;
if(spe_in_mbox_status())
   spe_in_mbox_write(spe, mbox_data, 1, SPE_MBOX_ALL_BLOCKING);

First, the PPU checks to make sure that the SPE’s incoming mailbox isn’t full. Then it sends a one-element array to the SPU’s incoming mailbox and waits for the mailbox write to finish.

SPU-SPU Mailbox Communication

Sending mailbox messages between SPUs presents the same challenges as creating DMA transfers between SPUs. The PPU must tell each sending SPU the effective address of its recipient’s incoming mailbox. But none of the SPU’s mailbox functions send data to an external memory location, so a DMA transfer is required.

The data in this DMA transfer must be aligned so that the 32-bit message matches the alignment of the recipient’s incoming mailbox. This is a complex operation, and for this reason, SPUs usually don’t use mailbox messaging among themselves.

Signal Communication

Signals are a lot like mailbox messages: Both methods transfer 32 bits at a time, both are controlled by the MFC, and both are commonly used to send control data. But there are three important differences:

  1. Signals are commonly used for DMA notification and are sent with tag group identifiers.

  2. Signals can be sent directly from an SPU to other processing elements.

  3. Whereas mailboxes provide only one-to-one communication, signals allow both one-to-one and many-to-one communication.

This section explains how signals work and the functions used to send and receive them. It also describes how the many-to-one communication works and how to configure its operation in PPU code. This capability can be used to provide ordering and synchronization between multiple processors.

Signal Notification Channels and Read Operations

The SPU’s signaling mechanism relies on two channels: SPU_RdSigNotify1 and SPU_RdSigNotify2. These are called the signal notification channels. Both are read-only and have a capacity of one. By reading these channels, the SPU can access signals in the same way that it reads messages from an incoming mailbox.

Table 13.6 lists the functions in spu_mfcio.h that provide access to the two signal channels. Each function accepts no parameters and returns an int.

Table 13.6. SPU Signal Receive Functions

Function

Operation

spu_read_signal1

Reads an int from Signal Notification 1

spu_stat_signal1

Detect pending signals on the Signal Notification 1 channel

spu_read_signal2

Reads an int from Signal Notification 2

spu_stat_signal2

Detect pending signals on the Signal Notification 2 channel

These functions are similar to the mailbox read functions (spu_read_in_mbox and spu_stat_in_mbox), and serve essentially the same roles. spu_read_signal1 and spu_read_signal2 return an int if the corresponding channel contains a signal. If no signal is available, they force the SPU to block until a signal enters the channel.

spu_stat_signal1 and spu_stat_signal2 provide access to the notification channels without blocking. Both functions return 1 if a signal is present and a 0 otherwise. This is shown in the following code, which executes processing_loop until a signal is available, and then reads the signal:

int sig_result;
do {
   processing_loop();
} while(!spu_stat_signal2());

sig_result = spu_read_signal2();

In addition to using spu_stat_signal1 or spu_stat_signal2, the SPU can use events to receive notification when a signal arrives. Table 13.2 lists the two events related to signals: MFC_SIGNAL_NOTIFY_1_EVENT and MFC_SIGNAL_NOTIFY_2_EVENT. These events function like the MFC_IN_MBOX_AVAILABLE_EVENT used for mailbox signaling, and the process of waiting for signal events is exactly similar. For example, the following code polls for the event corresponding to a signal arriving in notification channel 2:

spu_write_event_mask(MFC_SIGNAL_NOTIFY_2_EVENT);
do {
   routine();
} while(!spu_stat_event_status());

There is one functional difference between reading signals and mailbox messages. When a message is read from a mailbox channel, the mailbox entry is consumed. When a signal is read from one of the signal notification channels, however, the signal bits are set to 0. This means nothing to the local SPU—it blocks if no message or signal is available and receives an int if data is available.

This difference becomes important when external elements send signals into one of the notification channels. With mailboxes, only one message can be received at a time. But, when configured for many-to-one communication, a signal channel can combine incoming data from multiple sources into a single signal value.

Sending Signals from an SPE

Unlike mailbox messaging, there are no MFC channels for outgoing signals. Instead, spu_mfcio.h provides three functions that enable an SPU to transfer 4-byte signals to other elements using a DMA-like process. Table 13.7 lists each of them and the operations they perform.

Table 13.7. SPU Signal Send Functions

Function

Operation

spu_sndsig

Send a signal from the LS to an effective address

spu_sndsigb

Send a signal from the LS to an effective address with a barrier

spu_sndsigf

Send a signal from the LS to an effective address with a fence

These functions all have the same signature, and the signature for spu_sndsig is given by the following:

mfc_sndsig(volatile void *ls, uint64_t ea, uint32_t tag, uint32_t tid, uint32_t rid)

This is similar to the DMA function mfc_put, described in the previous chapter. The only difference is that mfc_sndsig doesn’t have a size parameter: Signal data must be 4 bytes wide. For example, the following function sends a signal containing the 32 bits at sig_src to the sig_dst address:

mfc_sndsig(sig_src, sig_dst, tag, 0, 0)

The tag parameter is present because signals are sent through the MFC’s queue like regular DMA transfers. The status of a signal’s transmission can be monitored with mfc_write_tag_mask and mfc_read_tag_status_all.

The mfc_sndsig command seems simple, but dealing with the source and destination addresses can be complicated. First, the PPU must get the address of the target SPU’s signal notification register. The following code shows how this is done:

/* Get the address of the SPU's Signal Notification Memory */
sig_area = (spe_sig_notify_1_area_t *)
   spe_ps_area_get(spe, SPE_SIG_NOTIFY_1_AREA);

/* Get the address of the first signal notification register */
sig_addr = (unsigned long long)&(sig_area->SPU_Sig_Notify_1);

The problem is that sig_addr isn’t aligned on a convenient boundary. The last hexadecimal digit of sig_addr will be C because the signal notification register is located 12 bytes away from sig_area, which is aligned on a 16-byte boundary.

This means that the signal data to be transferred needs to be placed 12 bytes away from a 16-byte boundary in the LS. A simple way to do this is to create a vector unsigned int and use spu_promote to set the fourth element equal to the signal’s value. This alignment problem is shown in Figure 13.2.

Sending signal data to a signal notification register

Figure 13.2. Sending signal data to a signal notification register

As an example, the following code declares the vector, sets the signal data, and uses mfc_sndsig to transfer the signal to the location pointed to by argp:

volatile vector unsigned int sig_vec;
sig_vec = spu_promote(sig_data, 3);
mfc_sndsig((volatile void *)(&sig_vec)+12, argp, TAG, 0, 0);

Note that the LS address and the effective address are both 12 bytes higher than the nearest 16-byte boundary.

Signal Notification Modes and Many-to-One Communication

Chapter 7 described the SPE Runtime Management library (libspe) in depth, but many aspects of the library went unexplained. This is because the SPU and its communication capabilities hadn’t been fully introduced. But now that you understand the basics of SPU signals, you’re ready to see what the signal notification modes are and how to configure them with libspe functions.

Section 7.2 lists the five possible flags of spe_context_create. At this point, you should understand what the SPE_EVENTS_ENABLE, SPE_MAP_PS, and SPE_ISOLATE flags do, but the last two have remained a mystery:

  • SPE_CFG_SIGNOTIFY1_OR: Configure Signal Notification Register 1 to operate in Logical OR mode

  • SPE_CFG_SIGNOTIFY2_OR: Configure Signal Notification Register 2 to operate in Logical OR mode

These flags control how a signal notification channel responds when receiving multiple signals. By default, these channels operate like incoming mailbox channels: The first write operation completes successfully, and further writes are ignored until the SPE reads the signal data from the channel.

When one of the above flags is set, however, the corresponding signal channel will accept signals from multiple senders and OR them together into a single int. This is called the Logical OR mode, and provides the many-to-one communication mentioned earlier. For example, when SPE_CFG_SIGNOTIFY_1_OR is used, the SPE’s Signal Notification 1 channel will accept multiple signals at once and return the ORed combination in a single signal. Listing 13.3 shows how this is configured in code.

Example 13.3. SPU Logical OR Mode: spu_signal_or.c

#include <spu_mfcio.h>

int main(unsigned long long speid,
         unsigned long long argp,
         unsigned long long envp) {

   unsigned int sig_content;

   /* Block until the confirmation signal arrives */
   spu_read_signal2();

   /* Read the data signals and display the ORed content */
   sig_content = spu_read_signal1();
   printf("Received signal = %x
", sig_content);
   return 0;
}

The PPU configures the SPU context in Logical OR mode with the following line:

spe = spe_context_create(SPE_CFG_SIGNOTIFY1_OR, 0);

Then it sends three signals (0x1, 0x2, and 0x4) and a mailbox message to alert the SPU that the signals have arrived. The SPU reads the signal and displays the result. When executed, the SPU displays the logically ORed combination of the three incoming signals: 0x7.

PPU Signaling

The PPU performs signaling by accessing memory-mapped registers that correspond to an MFC’s signal notification channels. When it writes to these registers, each SPU can read the data as a regular signal.

libspe provides one function for writing to an SPE’s signal notification channel: spe_signal_write. Its signature is given by the following:

int spe_signal_write(spe_context_ptr_t spe, unsigned int signal_reg, unsigned int data)

The first and third parameters are straightforward. spe is the context of the SPU whose registers are being written to, and data is the value being written. The second parameter, signal_reg, identifies which of the two signal notification registers is being accessed. It can take either SPE_SIG_NOTIFY_REG_1 or SPE_SIG_NOTIFY_REG_2 as a value.

As an example, ppu_signal_or.c uses the following code to communicate with the SPEs:

/* Send three signals to the SPU */
spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x1);
spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x2);
spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x4);

/* Tell the SPU to check for the signals */
spe_signal_write(spe, SPE_SIG_NOTIFY_REG_2, 0x0);

The fourth call of spe_signal_write sends 0x0 to Signal Notification Register 2. This tells the SPU to check the value in (0x7) its MFC’s Signal Notification Channel 1. But there’s no assurance that the first three signal transfers will complete before the fourth is finished.

Signals and SPE Synchronization[1]

The PPU has coordinated the SPUs up to now, but in many cases, it’s better to make one SPU the master while the others function as slaves. This way, the PPU only needs to communicate with one SPU rather than all of them.

This master-slave functionality can be implemented by configuring one SPU to receive signals in Logical OR mode and assigning one signal bit to each of the slave SPUs. For example, to create a barrier, the master waits for all the slaves to write their bit to the ORed signal. Once all the slave SPUs are ready, the master sends them a signal telling them to continue processing.

The PPU initializes the master-slave relationship by sending each SPU two pieces of information: an identifier and an address. When each SPU starts, it checks the identifier to determine whether it’s the master or a slave. This is shown in Listing 13.4.

Example 13.4. SPU Synchronization Master: spu_sigsync.c

#include <spu_mfcio.h>
#include <spu_intrinsics.h>

#define TAG 3
#define SPUS 5

/* SPU initialization data */
typedef struct _control_block {
   unsigned long long ea_addr[SPUS];
   unsigned long long pad[8-SPUS];
} control_block;

int main(unsigned long long speid,
         unsigned long long argp,
         unsigned long long envp) {

   control_block cb __attribute__ ((aligned (128)));
   volatile vector unsigned int sig_vec;
   unsigned int i, sig_data;

   if (envp != 0) {

      /* This SPU is a slave */
      sig_data = 1 << (unsigned int)envp-1;
      sig_vec = spu_promote(sig_data, 3);

      /* Send signal to master */
      mfc_sndsig((volatile void *)(&sig_vec)+12,
         argp, TAG, 0, 0);
      mfc_write_tag_mask(1<<TAG);
      mfc_read_tag_status_all();

      /* Receive signal from master */
      spu_read_signal1();
      printf("SPU %llu starting operation
", envp);
   }


   else {
      /* This SPU is the master */

      /* Transfer the array from argp */
      mfc_get(&cb, argp, sizeof(cb), TAG, 0, 0);
      mfc_write_tag_mask(1<<TAG);
      mfc_read_tag_status_all();

      /* Check to make sure the slave SPUs are ready */
      unsigned int count = 1;
      unsigned int total_count = 15;
      while (count < total_count)
         count |= spu_read_signal1();

      /* Tell the slave SPUs to start processing */
      sig_vec = spu_promote(sig_data, 3);
      for(i=1; i<SPUS; i++)
         mfc_sndsig((volatile void *)(&sig_vec)+12,
            b.ea_addr[i], TAG, 0, 0);
      mfc_write_tag_mask(1<<TAG);
      mfc_read_tag_status_all();
   }
   return 0;
}

Each SPU receives its assigned id through envp and an address through argp. The master’s address points to a structure containing all the SPUs’ signal notification register addresses. The slaves’ address points to the master’s signal notification register. The barrier starts with each slave SPU sending a signal to the master containing an identification bit. When all the signals have been received, the master sends signals to the slaves telling them to continue operation.

If the PPU wants to check the status of the SPUs’ operation, it only needs to communicate with the master. Used properly, this synchronization scheme can provide very efficient use of bus bandwidth.

Multiprocessor Synchronization

The previous chapter discussed barriers, fences, and the Synchronization library. This chapter has described how signals can be used to provide synchronization between SPUs. But the SDK provides additional capabilities for synchronizing multi-unit processing. The set of SPU intrinsics contains functions that order DMA commands across multiple MFCs. Other functions provide the MFC’s multisource synchronization. This section discusses both topics in detail.

Multiprocessor DMA Ordering

The fence/barrier DMA functions described in the previous chapter only order data transfers with regard to one MFC. However, spu_mfcio.h declares two functions that order DMA transfers across multiple MFCs. Table 13.8 lists both and the roles they serve.

Table 13.8. SPU DMA Synchronization Intrinsics

Function

Operation

mfc_eieio

Enforce in-order input/output between MFCs

mfc_sync

Synchronize storage access between MFCs

Note

Each of these functions accepts a tag parameter, but as of the first-generation Cell, the tag identifier has no effect.

These functions effectively form a barrier, forcing all preceding storage accesses to appear to complete before succeeding storage accesses appear to complete. The difference between the two involves the type of storage access they affect. The first function, mfc_sync, provides a barrier for all storage accesses, regardless of what type of memory is being read or written to.

mfc_eieio (eieio stands for “enforce in-order I/O”) only orders two types of memory operations:

  • Reads from guarded, caching-inhibited storage with respect to other reads and writes

  • Writes to cacheable, coherent storage with respect to other writes

These specific situations are usually encountered when accessing the Cell’s I/O capability.

Both of these functions require a significant amount of resources to operate, and it’s better to use the fence/barrier mfc_get/mfc_put commands if they’ll suffice. If they won’t suffice, it’s better to use mfc_eieio than mfc_sync.

MFC Multisource Synchronization

Barriers and fences affect transfer requests entering an MFC from the SPU, but there are no commands that order the transfers entering an MFC from the rest of the Cell. That is, when a set of data transfers leaves a sending MFC, the DMA commands don’t influence when the data reaches the receiving MFC or what order they’ll be received in.

However, each MFC has a register/channel that can be used to check whether a series of transfers has completed. After a value has been written, the MFC will monitor transfers directed to it before the value was written. The register/channel will return a 1 until the transfers are completed, and then it will return 0.

For example, an external element can track transfers entering an MFC with the following two steps:

  1. Write a value to the MFC’s multisource synchronization register.

  2. Wait until the value of the register changes to zero.

When the register value becomes zero, the external unit can be sure that the transferred data has reached its destination. To determine the address of the multisource synchronization register, the PPU needs to use code similar to the following:

/* Get the address of the SPU's multisource
   syncrhonization region */
mss_area = (spe_mssync_area_t *)
   spe_ps_area_get(spe, SPE_MSSYNC_AREA);

/* Get the address of the multisource synchronization
   register in the region */
mss_addr = (unsigned long long)&(mss_area-> MFC_MSSync);

The local SPU can also take advantage of multisource synchronization by accessing the SPU_WrMSSyncReq channel, also called the MFC multisource synchronization channel. spu_mfcio.h declares two functions for this purpose:

  • mfc_write_multi_src_sync_request: Sends a value to the MSS channel and tells the MFC to start tracking previously created data transfers.

  • mfc_stat_multi_src_sync_request: Checks the value of the MSS channel. A zero means the transfers haven’t finished, and a one means they have.

Unlike the signal/mailbox functions, neither function forces the calling SPU to block.

Conclusion

The Cell provides many capabilities for interprocessor communication that go beyond DMA. Mailboxes and signals transport data in small sizes, which makes them suitable for delivering control information and memory address data.

An SPU interfaces the rest of the Cell by reading or writing to one of its MFC’s 32 channels. There are only three basic channel functions: readch reads the channel value, writech sets the channel value, and readchcnt returns the channel’s available capacity. But spu_mfcio.h declares many specific intrinsic functions that are easier to use and remember.

With events, the SPU can respond to external conditions affecting the MFC’s channels. There are three ways to monitor events: blocking, polling, and interrupt servicing. Blocking forces the SPU to wait for an event, and polling allows the SPU to execute a processing loop until the event occurs. Interrupt servicing is more involved than either, but allows for immediate response without halting the SPU’s operation.

Mailboxes provide a simple means of sending 4-byte messages between processors. The three main channels used for mailbox communication are commonly called the incoming mailbox, the outgoing mailbox, and the outgoing interrupt mailbox. Because neither outgoing mailbox transports data outside the Cell, mailboxes are usually used to send messages between the PPU and SPUs.

Signals are like mailboxes, and deliver 32 bits of data at a time. However, SPUs have two read-only channels for signals and no write-only channels. Instead, SPUs use a DMA-like mechanism to send signals to one another. If configured properly by the PPU, an SPU can receive signals in Logical OR mode. This means that incoming signals are ORed together into a single value.

Signals can be used to provide synchronization between elements, but this chapter has discussed two further methods. First, mfc_eieio and mfc_sync can order external accesses across multiple MFCs. Second, the multisource synchronization register/channel allows monitoring for transfer completion at the receiving MFC.



[1] Thanks go to Daniel Brokenshire for suggesting this method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.37.62