Direct memory access (DMA) gets all the attention, but the Cell provides other ways to communicate between processors. These methods, which include signals and mailboxes, are generally used to send control data or small, important pieces of information such as memory addresses.
Events and interrupts play a central role in this discussion. In many cases, it’s not enough to transfer data into an SPU’s local store (LS); the SPU needs to be alerted that new data has arrived. This can be accomplished with interrupts, which immediately tell the processing unit that an event has occurred.
The Cell provides three main mechanisms for data transfer: DMA, signals, and mailboxes. These topics can seem bewildering at first, but at a low level, they all represent different uses of channels. This chapter begins by describing what channels are, how they work, and the functions that access them.
No matter which method of SPU communication you use, the underlying process boils down to a series of reads and writes to channels. Channels are like Linux file descriptors—processes read and write data to channels by accessing their associated numbers.
The MFC (Memory Flow Controller) contains 32 channels, which can be interfaced with three basic functions. But before I describe the functions, I want to present an analogy that (I hope) will provide some insight into how channels work.
Section 12.1, “The Element Interconnect Bus (EIB) and the Memory Flow Controller (MFC),”presented a brief analogy that compares the relationship between an SPU and its MFC to that between an isolated scholar and its butler. The scholar communicates with the external world by ringing a bell to summon a butler and requesting that the butler send and receive messages. This is fine for DMA, but to explain the broader topic of channels, the analogy needs to be elaborated further.
The scholar (SPU) actually has 32 bells and the servants’ quarters (MFC) provides lodging for 32 butlers (channels), each with a specific task to perform. There are two types of butlers: The first type of butler receives information (write-only) and the second type provides information (read-only). Scholar-butler communication is always one way. If the scholar wants specific data, the usual process is to request the information from a butler of the first type and hear the information from a butler of the second type.
Many butlers respond immediately, no matter how many times the scholar rings the bell (nonblocking channels). But some butlers can only be summoned a set number of times (channel capacity). If the scholar rings the bell again, it must wait until the butler finishes one of its tasks (blocking channel).
Butlers can communicate with other butlers, but not through bells. Instead, for a butler to send a message to another, it needs to have the address of the recipient butler’s lodging (MFC register). This makes it possible for scholars to speak to one another through their butlers.
The SPU accesses the MFC’s channels by calling one of three functions:
unsigned int spu_readch(chan)
: Returns the value of the channel
void spu_writech(chan, int)
: Writes the int
value into the channel
unsigned int spu_readchcnt(chan)
: Returns the unused capacity of the channel
This last function is important. Most of the channels can be read or written to repeatedly, but some of the channels, such as those used for mailbox communication, can be accessed only when capacity is available. Before accessing one of these channels, it’s a good idea to call spu_readchcnt
to make sure that the available capacity is nonzero.
The first argument in each of the functions must be one of the channel constants declared in spu_intrinsics.h. Table 13.1 lists these constants and the channels they represent. The entries marked — are reserved.
Table 13.1. SPE Channels
Ch # | Constant | R/W | Blocking/Nonblocking | Cap. | Purpose |
---|---|---|---|---|---|
0 |
| R | B | 1 | Read SPU Event Status |
1 |
| W | N | 1 | Write Event Mask |
2 |
| W | N | 1 | Write Event Acknowledge |
3 |
| R | B | 1 | Read Signal Notification 1 |
4 |
| R | B | 1 | Read Signal Notification 2 |
5 | — | ||||
6 | — | ||||
7 |
| W | N | 1 | Write to SPU Decrementer |
8 |
| R | N | 1 | Read SPU Decrementer |
9 |
| W | B | 1 | Write to MS Synchronization Register |
10 | — | ||||
11 |
| R | N | 1 | Read SPU Event Mask |
12 |
| R | N | 1 | Read SPU Tag Mask |
13 |
| R | N | 1 | Read SPU Machine Status |
14 |
| W | N | 1 | Write to Save/Restore Register |
15 |
| R | N | 1 | Read from Save/Restore Register |
16 |
| W | N | 1 | MFC Local Storage Address |
17 |
| W | N | 1 | MFC Effective Address High |
18 |
| W | N | 1 | MFC Effective Address Low |
19 |
| W | N | 1 | MFC Transfer/List Size |
20 |
| W | N | 1 | MFC Command Tag ID |
21 |
| W | B | 16 | MFC Class ID |
22 |
| W | N | 1 | Write to MFC Tag Group Mask |
23 |
| W | B | 1 | Write to MFC Tag Update Request |
24 |
| R | B | 1 | Read MFC Tag Group Status |
25 |
| R | B | 1 | Read Stall-and-Notify Tag |
26 |
| W | N | 1 | Write to Stall-and-Notify Tag Ack |
27 |
| R | B | 1 | Read Atomic Command Status |
28 |
| W | B | 1 | Write to SPU Outbound Mailbox |
29 |
| R | B | 4 | Read from SPU Inbound Mailbox |
30 |
| W | B | 1 | Write to SPU Outbound Interrupt |
31 | — |
For example, to obtain the value of the SPU’s Machine Status register, you’d use the following:
value = spu_readch(SPU_RdMachStat);
And to determine the capacity of the inbound mailbox channel, you’d use this:
cap = spu_readchcnt(SPU_RdInMbox);
Don’t worry about remembering all the channel names. The spu_mfcio.h header declares shorthand functions that read and write to specific channels. For example, Chapter 11, “SIMD Programming on the SPU,” discussed spu_read_decrementer
and spu_write_decrementer
, which keep track of time by accessing the SPU’s decrementer. These functions perform the same operations as spu_readch(SPU_RdDec)
and spu_writech(SPU_WrDec, value)
, but are easier to remember.
Channels starting with MFC_
are generally used for DMA. Channels 16–21 are used to create a DMA transfer, and Channels 22–27 are used to monitor the transfer’s completion. As explained in Chapter 12, “SPU Communication, Part 1: Direct Memory Access (DMA),” mfc_get
and mfc_put
are the basic functions of DMA, but these are actually composite functions based on the MFC’s channels.
For example, when you transfer data from &buff
to ea_addr
with
mfc_put(&buff, ea_addr, sizeof(buff), tag, 0, 0)
you’re actually calling the following channel functions:
spu_writech(MFC_LSA, &buff); spu_writech(MFC_EAH, mfc_ea2h(ea_addr)); spu_writech(MFC_EAL, mfc_ea2l(ea_addr)); spu_writech(MFC_Size, sizeof(buff)); spu_writech(MFC_TagID, tag); spu_writech(MFC_Cmd, MFC_CMD_WORD(0, 0, MFC_PUT_CMD));
The first five calls can be made in any order, but writing to the MFC_Cmd
channel has to be performed last. In this case, the value written to this channel identifies the Transfer Class ID (0
), the Replacement Class ID (0
), and the opcode that identifies the command (MFC_PUT_CMD
). When this channel is updated, the MFC starts the DMA transfer by placing a command inside its queue. This is why all the transfer parameters need to be initialized before the MFC_Cmd
channel is written to.
The channels whose names start with SPU_
are used for operations other than DMA. Most of this chapter is concerned with how these channels work and how they make events, mailboxes, and signals possible.
The previous chapter explained how the PPU performs DMA by proxy: It accesses an SPE’s MFC and sends transfer commands similar to those of SPU-initiated DMA. But the PPU can’t access the MFC through channels. Instead, it reads and writes values to the MFC’s registers, which are mapped to the effective address space. This is shown in Figure 13.1.
The names and addresses of the MFC’s memory-mapped I/O (MMIO) registers are beyond the scope of this book. The Cell BE Handbook lists each register and its corresponding Synergistic Processor Element (SPE) channel, if applicable.
In many instances, a processing element may need to respond to an external occurrence. The external occurrence is called an event, and the element’s response is called an event handler. Handling events in code is a three-step process:
The SPU can recognize and respond to 12 different types of events. Table 13.2 lists the mnemonic associated with each event and the external condition that causes the event to become pending.
Table 13.2. SPE Events
Event Mnemonic | Event Condition |
---|---|
| The status of a DMA tag group transfer is available. |
| The DMA list element’s Stall and Notify bit is high. |
| The reservation acquired through |
| The MFC command queue is no longer full. |
| The decrementer value has changed from positive to negative. |
| There is space in the outgoing mailbox for further messages. |
| There is space in the outgoing interrupt mailbox for further messages. |
| There is a message available in the incoming mailbox. |
| An unread signal is available in the SPU_RdSigNotify1 channel. |
| An unread signal is available in the SPU_RdSigNotify2 channel. |
| All the data transfers being monitored have completed. |
| An element (PPE) with privileged access has requested attention. |
The first four events in the table are caused by DMA operations, and the fifth is produced by the SPU’s decrementer. The next five involve mailboxes and signals, which are discussed shortly. The second-to-last event deals with the MFC multisource synchronization capability, which is explained in the final section of this chapter. The last event concerns privileged PPE access, and is beyond the scope of this book.
The SPU tells the MFC which events it’s interested in by writing values to the Event Mask channel, SPU_WrEventMask
. For example, the following code selects the event corresponding to the loss of an atomic DMA transfer’s reservation (see Section 12.7, “Atomic DMA and the Synchronization Library”):
spu_writech(SPU_WrEventMask, MFC_LLR_LOST_EVENT)
The more descriptive intrinsic function in spu_intrinsics.h is this:
spu_write_event_mask(MFC_LLR_LOST_EVENT)
Similarly, the event mask can be read with the spu_read_event_mask
intrinsic. This book makes use of instrinsics whenever possible.
After identifying the events it’s interested in, the SPU needs a way to monitor events and respond when they occur. The three methods of event recognition are waiting, polling, and interrupt handling. Table 13.3 lists the intrinsic functions that make these methods possible.
Table 13.3. SPU Event-Recognition Functions
Purpose | |
---|---|
| Halts the SPU until a selected event has occurred |
| Checks event status and returns immediately |
| Branches to a function when an event has occurred |
| Enables interrupt processing |
| Disable interrupt processing |
| Returns from an interrupt service routine |
If an SPU reads from Channel 0, SPU_RdEventStat
, one of two things will happen. If none of the selected events have occurred, the SPU will wait. If any of the selected events have become pending, the read function will return an int
whose bits correspond to the events in Table 13.2. If a bit has a value of one, the corresponding event has occurred. Bits corresponding to unselected events will remain at zero even if they’ve occurred.
The SPU intrinsic that reads event status is spu_read_event_stat
, and if a thread calls
spu_read_event_stat()
it will block until one of the selected events becomes pending. This waiting is power efficient because the SPU switching logic stops during the wait. The SPU will respond immediately when an event is raised.
In many cases, the SPU needs to continue processing data while waiting for an event. A common method is to create a loop that checks for the event’s status with each iteration. Of course, you want the status check to return immediately, so spu_read_event_stat
won’t be suitable.
Instead of reading the event status channel, the SPU can read the channel’s available capacity. This returns 1 if a selected event has been raised and 0 if it hasn’t. To read this capacity, the low-level channel function is as follows:
spu_readchcnt(SPU_RdEventStat)
The specific intrinsic is this:
spu_stat_event_status()
This function returns immediately, and because the return value is 1 or 0, it works well as a loop condition. For example, to perform routine()
while waiting for a tag group to finish its DMA transfers, you could use code similar to the following:
spu_write_event_mask(MFC_TAG_STATUS_UPDATE_EVENT); do { routine(); } while(!spu_stat_event_status());
The spu_bisled
function is similar to spu_stat_event_status
, and they both check for selected events without waiting. But spu_bisled
does more. Instead of returning a 1 or 0, it accepts a pointer to a function and branches to that function when the selected event occurs.
For example, to branch to the event_handler
function when a selected event occurs, you’d use the following code:
spu_bisled(&event_handler);
If none of the selected events have become pending, spu_bisled
will do nothing and the SPU can pass to the next instruction.
Polling frees the SPU to perform other tasks, but if the loop is large, the SPU won’t be able to respond quickly to an event’s occurrence. Using interrupts provides the best of both worlds: Like polling, it allows the SPU to continue processing while checking for events. Like blocking, the SPU reacts immediately when an event condition is raised.
The SPU’s interrupt mechanism allows you to create an interrupt service routine (ISR) that will be called when a selected event occurs. The ISR must be placed at a specific memory location, and for the SPU, the ISR must be at 0x0000. This corresponds to the .interrrupt
section of an SPU ELF file, and Appendix A, “Understanding ELF Files,” explains how ELF files are structured.
Interrupt detection requires additional processing time and is disabled by default. The spu_ienable()
intrinsic tells the SPU to start checking for interrupts and spu_idisable()
tells it to stop.
In Listing 13.1, the SPU tells the MFC that it’s interested in the MFC_DECREMENTER_EVENT
. This occurs whenever the decrementer value falls from a positive to negative value. Then it enables interrupt processing and executes a loop while the decrementer operates.
When the decrementer reaches a negative value, the SPU’s interrupt controller calls interrupt_service()
, a void
function with no parameters. Because this function’s attribute places it in the .interrupt
section, the interrupt controller will find it in the LS at 0x0000.
Example 13.1. SPU Interrupts: spu_interrupt.c
#include <spu_mfcio.h> void interrupt_service(void) __attribute__ ((section (".interrupt"))); volatile unsigned int check_value = 0; int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { unsigned int mask; /* Enable interrupt processing */ spu_ienable(); /* Read the event mask */ mask = spu_read_event_mask(); /* Write to the event mask */ spu_write_event_mask(MFC_DECREMENTER_EVENT); /* Write to the decrementer and begin countdown */ spu_write_decrementer(10000); /* Loop while waiting for interrupt */ while(check_value == 0); /* Restore the event mask */ spu_write_event_mask(mask); return 0; } void interrupt_service(void) { int dec = spu_read_decrementer(); printf("ISR: Decrementer = %d. ", dec); /* End loop in main function */ check_value = 1; /* Acknowledge event detection */ spu_write_event_ack(MFC_DECREMENTER_EVENT); /* Return to main function */ asm("iret"); }
The name of the interrupt service routine isn’t important, but its section attribute must be set equal to .interrupt
. Otherwise, the processor won’t be able to find the ISR and the main loop will continue spinning.
The iret
(Interrupt Return) instruction makes it possible for interrupt_service
to return to the main
function. iret
resets the program counter to its preinterrupt position and allows normal processing to continue. You can find further examples of how iret
is used in the spu_interrupt project in the SDK samples directory.
Many applications use of two levels of interrupt servicing. A first-level interrupt handler (FLIH) performs preliminary routines and schedules the second-level interrupt handler (SLIH). This handler performs long-term tasks and completes the interrupt servicing. The sample spu_interrupt project in the SDK samples directory provides an example of how these different handlers operate.
An event will remain pending until it is acknowledged. Acknowledgment is accomplished by writing the event number to the SPU_WrEventAck
channel. This clears the event from the list of pending events and allows similar events to be received again.
The shorthand function for acknowledging events is spu_write_event_ack
. In Listing 13.1, the interrupt service routing acknowledges the decrementer event with the following line:
spu_write_event_ack(MFC_DECREMENTER_EVENT);
This clears the current event and allows further decrementer events to be detected.
Chapter 7, “The SPE Runtime Management Library (libspe
),” discusses the libspe
library and the many steps taken by the PPU to manage operation of the SPUs. The fourth step involves creating an event handler to respond to SPU events. Section 7.2, “The SPE Management Process,” describes how to create a handler and register events in code, but it helps to see once again which event types are available:
SPE_EVENT_SPE_STOPPED
: Responds when the SPU finishes processing
SPE_EVENT_IN_MBOX
: Responds when the SPU inbound mailbox is able to receive data
SPE_EVENT_OUT_INTR_MBOX
: Responds when the SPU receives data in its outbound interrupting mailbox
SPE_EVENT_TAG_GROUP
: Responds when the DMA transfer associated with a given tag group has completed
SPE_EVENT_ALL_EVENTS
: Responds to all the above events
Section 7.2 also provided a sample application in which the PPU receives event data when an SPU completes its processing. The following code shows how to create an event handler to keep track of the completion of a tag group’s DMA transfer:
/* Create the event handler */ spe_event_handler_ptr_t ehandler = spe_event_handler_create(); /* Initialize an event structure */ spe_event_unit_t event; event.spe = ctx; event.events = SPE_EVENT_TAG_GROUP; /* Register the event with the handler spe_event_handler_register(ehandler, &event);
After the context begins running, the PPE listens for SPE events with the following line:
e_count = spe_event_wait(ehandler, events, MAX_EVENTS, 10);
Chapter 7 fully explains the functions, data structures, and constants that make PPE event processing possible.
Mailboxes are the simplest way to transfer data from one processing unit (SPU or PPU) to another. Mailbox messages can hold only 4 bytes of information, so they’re commonly used to transmit control data or one half of a 64-bit effective address. For example, the PPU may use mailbox messaging to tell SPUs the address of a data buffer so that they know where to direct their DMA transfers.
The first thing you need to understand about SPU mailbox messaging is the difference between mailboxes and DMA. With DMA, an SPU tells the MFC to deliver data between specific memory locations. But when sending a mailbox message, the SPU can’t specify where the message should be sent; all it can do is write data to an outgoing channel and hope that the intended recipient finds it.
Table 13.4 lists the functions used by the SPU for mailbox messaging and the channels involved.
Table 13.4. SPU Mailbox Communication Functions
Channel | Operation | |
---|---|---|
|
| Writes an |
|
| Returns the available capacity of the outgoing mailbox |
|
| Writes an |
|
| Returns the available capacity of the outgoing interrupt mailbox |
|
| Reads the |
|
| Returns the available capacity of the incoming mailbox |
These functions are as simple as they look. The two write functions, spu_write_out_mbox
and spu_write_stat_mbox
, accept an int
parameter that serves as the outgoing mailbox message. The rest of the functions return values corresponding to incoming messages or the capacity of mailbox channels.
Each function accesses one of three channels, and because they’re used for mailbox messaging, the channels are commonly referred to using one of three names:
Outgoing mailbox—Holds values to be read by external processing units
Outgoing interrupt mailbox—Holds values to be read by external processing units and causes an interrupt if applicable
Incoming mailbox—Holds values updated by external processing units
The outgoing mailbox capacity starts with a value of 1. When an SPU calls spu_write_out_mbox
, the outgoing mailbox is full and its capacity drops to zero. If the SPU attempts a second write, it will block until the first message is read. That is, if the SPU calls
spu_write_out_mbox(0xAAAABBBB) spu_write_out_mbox(0xCCCCDDDD)
the second write will stall the SPU until an external element reads the 0xAAAABBBB
value. For this reason, it’s a good idea to check the available capacity of the outgoing mailbox before each write. The spu_stat_out_mbox
function returns a 1 if there are no messages in the outgoing mailbox and 0 if the mailbox is full.
The next two functions in the table are nearly identical to the first two, but when the outgoing interrupt mailbox is written to, it can cause a hardware interrupt in the PPU. PPU interrupt handling requires privileged access, and is therefore beyond the scope of this book.
If a message is available, spu_read_in_mbox
returns an int
from the incoming mailbox. If all four of the incoming mailbox slots are empty, the function stalls the SPU until a message is available.
The SPU will block if it attempts to write to a filled outgoing mailbox or attempts to read from an empty incoming mailbox. The three spu_stat_
functions in Table 13.4 provide advance warning, but mailbox channel availability can also be monitored with events.
Table 13.2 lists three events related to mailboxes: MFC_OUT_MBOX_AVAILABLE_EVENT, MFC_OUT_INTR_MBOX_AVAILABLE_EVENT,
and MFC_IN_MBOX_AVAILABLE_EVENT
. These events are raised whenever the corresponding mailbox is available for accessing, and the SPU can detect and respond to these events using the methods described in the previous section.
There’s no reason to use events for waiting or polling, because the regular mailbox functions are available. However, mailbox events become important when an SPU needs to be immediately interrupted when a mailbox message is received. The code in Listing 13.2 shows how to interrupt an SPU when data appears in the incoming mailbox. The code is similar to that in Listing 13.1, but now the SPU reads the content of the mailbox and displays the content of the message.
Example 13.2. SPU Interrupts: spu_mbox_interrupt.c
#include <spu_mfcio.h> void interrupt_service(void) __attribute__ ((section (".interrupt"))); volatile unsigned int check_value = 0; int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { unsigned int mbox_content; /* Write to the event mask */ spu_write_event_mask(MFC_IN_MBOX_AVAILABLE_EVENT); /* Enable interrupt processing and wait */ spu_ienable(); while(!check_value); /* Read mailbox and display result */ mbox_content = spu_read_in_mbox(); printf("Received data = %x ", mbox_content); return 0; } void interrupt_service(void) { spu_write_event_ack(MFC_IN_MBOX_AVAILABLE_EVENT); check_value++; asm("iret"); }
The SPU’s mailbox interrupt won’t trigger until it receives a message from an external processing unit. In the spu_mbox_interrupt project, the PPU invokes a function from the SPE Runtime Management library (libspe
) to send an int
to the SPU’s incoming mailbox. The PPU’s mailbox communication functions are described next.
SPUs write messages to their outgoing mailboxes, but the data remains in the mailbox until it is read. The PPU is the only external processing unit that knows the addresses of the SPU’s mailboxes, so it makes sense that most mailbox communication takes place between the PPU and SPUs.
Chapter 12 explained how the PPU accesses an SPU’s DMA resources through MMIO registers. The process is similar for mailbox messaging: The PPU can access the SPU’s outgoing or incoming mailboxes by performing simple read/write operations on memory addresses. Thankfully, the libspe
functions in Table 13.5 enable you to access mailboxes without knowing their mapped locations.
Table 13.5. PPU Mailbox Communication Functions
Function | Operation |
---|---|
| Reads from the SPU’s outgoing mailbox |
| Returns the number of messages in the SPU’s outgoing mailbox |
| Reads from the SPU’s outgoing interrupt mailbox |
| Returns the number of messages in the SPU’s outgoing interrupt mailbox |
| Writes an int into the SPU’s incoming mailbox |
| Returns the number of slots available for accepting messages |
These functions work like the ones listed in Table 13.3, but in reverse. They read from an SPU’s outgoing mailboxes and write to an SPU’s incoming mailbox. The functions that end in _status
don’t return the channel’s available capacity, but return how much of the channel is occupied with messages.
All the functions require the SPU context as the first parameter. The spe_in_mbox_write
and spe_out_intr_mbox_read
functions also accept an unsigned int
argument called behavior
. This is shown in the signature of spe_in_mbox_write
:
int spe_in_mbox_write(spe_context_ptr_t spe, unsigned int *mbox_data, int count, unsigned int behavior)
behavior
controls the PPU’s processing while the mailbox operation is carried out. Its possible values include the following:
SPE_MBOX_ALL_BLOCKING
: The PPU waits until all mailbox operations are completed.
SPE_MBOX_ANY_BLOCKING
: The PPU waits until at least one mailbox operation has completed.
SPE_MBOX_ANY_NONBLOCKING
: The PPU doesn’t wait for any mailbox operation to complete.
As an example, the PPU code in the mbox_interrupt project uses the following code to send a mailbox message to the SPU with context spe
:
mbox_data[0] = 0x12345678; if(spe_in_mbox_status()) spe_in_mbox_write(spe, mbox_data, 1, SPE_MBOX_ALL_BLOCKING);
First, the PPU checks to make sure that the SPE’s incoming mailbox isn’t full. Then it sends a one-element array to the SPU’s incoming mailbox and waits for the mailbox write to finish.
Sending mailbox messages between SPUs presents the same challenges as creating DMA transfers between SPUs. The PPU must tell each sending SPU the effective address of its recipient’s incoming mailbox. But none of the SPU’s mailbox functions send data to an external memory location, so a DMA transfer is required.
The data in this DMA transfer must be aligned so that the 32-bit message matches the alignment of the recipient’s incoming mailbox. This is a complex operation, and for this reason, SPUs usually don’t use mailbox messaging among themselves.
Signals are a lot like mailbox messages: Both methods transfer 32 bits at a time, both are controlled by the MFC, and both are commonly used to send control data. But there are three important differences:
Signals are commonly used for DMA notification and are sent with tag group identifiers.
Signals can be sent directly from an SPU to other processing elements.
Whereas mailboxes provide only one-to-one communication, signals allow both one-to-one and many-to-one communication.
This section explains how signals work and the functions used to send and receive them. It also describes how the many-to-one communication works and how to configure its operation in PPU code. This capability can be used to provide ordering and synchronization between multiple processors.
The SPU’s signaling mechanism relies on two channels: SPU_RdSigNotify1
and SPU_RdSigNotify2
. These are called the signal notification channels. Both are read-only and have a capacity of one. By reading these channels, the SPU can access signals in the same way that it reads messages from an incoming mailbox.
Table 13.6 lists the functions in spu_mfcio.h that provide access to the two signal channels. Each function accepts no parameters and returns an int
.
These functions are similar to the mailbox read functions (spu_read_in_mbox
and spu_stat_in_mbox
), and serve essentially the same roles. spu_read_signal1
and spu_read_signal2
return an int
if the corresponding channel contains a signal. If no signal is available, they force the SPU to block until a signal enters the channel.
spu_stat_signal1
and spu_stat_signal2
provide access to the notification channels without blocking. Both functions return 1 if a signal is present and a 0 otherwise. This is shown in the following code, which executes processing_loop
until a signal is available, and then reads the signal:
int sig_result; do { processing_loop(); } while(!spu_stat_signal2()); sig_result = spu_read_signal2();
In addition to using spu_stat_signal1
or spu_stat_signal2
, the SPU can use events to receive notification when a signal arrives. Table 13.2 lists the two events related to signals: MFC_SIGNAL_NOTIFY_1_EVENT
and MFC_SIGNAL_NOTIFY_2_EVENT
. These events function like the MFC_IN_MBOX_AVAILABLE_EVENT
used for mailbox signaling, and the process of waiting for signal events is exactly similar. For example, the following code polls for the event corresponding to a signal arriving in notification channel 2:
spu_write_event_mask(MFC_SIGNAL_NOTIFY_2_EVENT); do { routine(); } while(!spu_stat_event_status());
There is one functional difference between reading signals and mailbox messages. When a message is read from a mailbox channel, the mailbox entry is consumed. When a signal is read from one of the signal notification channels, however, the signal bits are set to 0. This means nothing to the local SPU—it blocks if no message or signal is available and receives an int
if data is available.
This difference becomes important when external elements send signals into one of the notification channels. With mailboxes, only one message can be received at a time. But, when configured for many-to-one communication, a signal channel can combine incoming data from multiple sources into a single signal value.
Unlike mailbox messaging, there are no MFC channels for outgoing signals. Instead, spu_mfcio.h provides three functions that enable an SPU to transfer 4-byte signals to other elements using a DMA-like process. Table 13.7 lists each of them and the operations they perform.
These functions all have the same signature, and the signature for spu_sndsig
is given by the following:
mfc_sndsig(volatile void *ls, uint64_t ea, uint32_t tag, uint32_t tid, uint32_t rid)
This is similar to the DMA function mfc_put
, described in the previous chapter. The only difference is that mfc_sndsig
doesn’t have a size
parameter: Signal data must be 4 bytes wide. For example, the following function sends a signal containing the 32 bits at sig_src
to the sig_dst
address:
mfc_sndsig(sig_src, sig_dst, tag, 0, 0)
The tag
parameter is present because signals are sent through the MFC’s queue like regular DMA transfers. The status of a signal’s transmission can be monitored with mfc_write_tag_mask
and mfc_read_tag_status_all
.
The mfc_sndsig
command seems simple, but dealing with the source and destination addresses can be complicated. First, the PPU must get the address of the target SPU’s signal notification register. The following code shows how this is done:
/* Get the address of the SPU's Signal Notification Memory */ sig_area = (spe_sig_notify_1_area_t *) spe_ps_area_get(spe, SPE_SIG_NOTIFY_1_AREA); /* Get the address of the first signal notification register */ sig_addr = (unsigned long long)&(sig_area->SPU_Sig_Notify_1);
The problem is that sig_addr
isn’t aligned on a convenient boundary. The last hexadecimal digit of sig_addr
will be C because the signal notification register is located 12 bytes away from sig_area
, which is aligned on a 16-byte boundary.
This means that the signal data to be transferred needs to be placed 12 bytes away from a 16-byte boundary in the LS. A simple way to do this is to create a vector unsigned int
and use spu_promote
to set the fourth element equal to the signal’s value. This alignment problem is shown in Figure 13.2.
As an example, the following code declares the vector, sets the signal data, and uses mfc_sndsig
to transfer the signal to the location pointed to by argp
:
volatile vector unsigned int sig_vec; sig_vec = spu_promote(sig_data, 3); mfc_sndsig((volatile void *)(&sig_vec)+12, argp, TAG, 0, 0);
Note that the LS address and the effective address are both 12 bytes higher than the nearest 16-byte boundary.
Chapter 7 described the SPE Runtime Management library (libspe
) in depth, but many aspects of the library went unexplained. This is because the SPU and its communication capabilities hadn’t been fully introduced. But now that you understand the basics of SPU signals, you’re ready to see what the signal notification modes are and how to configure them with libspe
functions.
Section 7.2 lists the five possible flags of spe_context_create
. At this point, you should understand what the SPE_EVENTS_ENABLE
, SPE_MAP_PS
, and SPE_ISOLATE
flags do, but the last two have remained a mystery:
SPE_CFG_SIGNOTIFY1_OR
: Configure Signal Notification Register 1 to operate in Logical OR mode
SPE_CFG_SIGNOTIFY2_OR
: Configure Signal Notification Register 2 to operate in Logical OR mode
These flags control how a signal notification channel responds when receiving multiple signals. By default, these channels operate like incoming mailbox channels: The first write operation completes successfully, and further writes are ignored until the SPE reads the signal data from the channel.
When one of the above flags is set, however, the corresponding signal channel will accept signals from multiple senders and OR them together into a single int
. This is called the Logical OR mode, and provides the many-to-one communication mentioned earlier. For example, when SPE_CFG_SIGNOTIFY_1_OR
is used, the SPE’s Signal Notification 1 channel will accept multiple signals at once and return the ORed combination in a single signal. Listing 13.3 shows how this is configured in code.
Example 13.3. SPU Logical OR Mode: spu_signal_or.c
#include <spu_mfcio.h> int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { unsigned int sig_content; /* Block until the confirmation signal arrives */ spu_read_signal2(); /* Read the data signals and display the ORed content */ sig_content = spu_read_signal1(); printf("Received signal = %x ", sig_content); return 0; }
The PPU configures the SPU context in Logical OR mode with the following line:
spe = spe_context_create(SPE_CFG_SIGNOTIFY1_OR, 0);
Then it sends three signals (0x1
, 0x2
, and 0x4
) and a mailbox message to alert the SPU that the signals have arrived. The SPU reads the signal and displays the result. When executed, the SPU displays the logically ORed combination of the three incoming signals: 0x7
.
The PPU performs signaling by accessing memory-mapped registers that correspond to an MFC’s signal notification channels. When it writes to these registers, each SPU can read the data as a regular signal.
libspe
provides one function for writing to an SPE’s signal notification channel: spe_signal_write
. Its signature is given by the following:
int spe_signal_write(spe_context_ptr_t spe, unsigned int signal_reg, unsigned int data)
The first and third parameters are straightforward. spe
is the context of the SPU whose registers are being written to, and data
is the value being written. The second parameter, signal_reg
, identifies which of the two signal notification registers is being accessed. It can take either SPE_SIG_NOTIFY_REG_1
or SPE_SIG_NOTIFY_REG_2
as a value.
As an example, ppu_signal_or.c uses the following code to communicate with the SPEs:
/* Send three signals to the SPU */ spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x1); spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x2); spe_signal_write(spe, SPE_SIG_NOTIFY_REG_1, 0x4); /* Tell the SPU to check for the signals */ spe_signal_write(spe, SPE_SIG_NOTIFY_REG_2, 0x0);
The fourth call of spe_signal_write
sends 0x0
to Signal Notification Register 2. This tells the SPU to check the value in (0x7
) its MFC’s Signal Notification Channel 1. But there’s no assurance that the first three signal transfers will complete before the fourth is finished.
The PPU has coordinated the SPUs up to now, but in many cases, it’s better to make one SPU the master while the others function as slaves. This way, the PPU only needs to communicate with one SPU rather than all of them.
This master-slave functionality can be implemented by configuring one SPU to receive signals in Logical OR mode and assigning one signal bit to each of the slave SPUs. For example, to create a barrier, the master waits for all the slaves to write their bit to the ORed signal. Once all the slave SPUs are ready, the master sends them a signal telling them to continue processing.
The PPU initializes the master-slave relationship by sending each SPU two pieces of information: an identifier and an address. When each SPU starts, it checks the identifier to determine whether it’s the master or a slave. This is shown in Listing 13.4.
Example 13.4. SPU Synchronization Master: spu_sigsync.c
#include <spu_mfcio.h> #include <spu_intrinsics.h> #define TAG 3 #define SPUS 5 /* SPU initialization data */ typedef struct _control_block { unsigned long long ea_addr[SPUS]; unsigned long long pad[8-SPUS]; } control_block; int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { control_block cb __attribute__ ((aligned (128))); volatile vector unsigned int sig_vec; unsigned int i, sig_data; if (envp != 0) { /* This SPU is a slave */ sig_data = 1 << (unsigned int)envp-1; sig_vec = spu_promote(sig_data, 3); /* Send signal to master */ mfc_sndsig((volatile void *)(&sig_vec)+12, argp, TAG, 0, 0); mfc_write_tag_mask(1<<TAG); mfc_read_tag_status_all(); /* Receive signal from master */ spu_read_signal1(); printf("SPU %llu starting operation ", envp); } else { /* This SPU is the master */ /* Transfer the array from argp */ mfc_get(&cb, argp, sizeof(cb), TAG, 0, 0); mfc_write_tag_mask(1<<TAG); mfc_read_tag_status_all(); /* Check to make sure the slave SPUs are ready */ unsigned int count = 1; unsigned int total_count = 15; while (count < total_count) count |= spu_read_signal1(); /* Tell the slave SPUs to start processing */ sig_vec = spu_promote(sig_data, 3); for(i=1; i<SPUS; i++) mfc_sndsig((volatile void *)(&sig_vec)+12, b.ea_addr[i], TAG, 0, 0); mfc_write_tag_mask(1<<TAG); mfc_read_tag_status_all(); } return 0; }
Each SPU receives its assigned id through envp
and an address through argp
. The master’s address points to a structure containing all the SPUs’ signal notification register addresses. The slaves’ address points to the master’s signal notification register. The barrier starts with each slave SPU sending a signal to the master containing an identification bit. When all the signals have been received, the master sends signals to the slaves telling them to continue operation.
If the PPU wants to check the status of the SPUs’ operation, it only needs to communicate with the master. Used properly, this synchronization scheme can provide very efficient use of bus bandwidth.
The previous chapter discussed barriers, fences, and the Synchronization library. This chapter has described how signals can be used to provide synchronization between SPUs. But the SDK provides additional capabilities for synchronizing multi-unit processing. The set of SPU intrinsics contains functions that order DMA commands across multiple MFCs. Other functions provide the MFC’s multisource synchronization. This section discusses both topics in detail.
The fence/barrier DMA functions described in the previous chapter only order data transfers with regard to one MFC. However, spu_mfcio.h declares two functions that order DMA transfers across multiple MFCs. Table 13.8 lists both and the roles they serve.
Each of these functions accepts a tag parameter, but as of the first-generation Cell, the tag identifier has no effect.
These functions effectively form a barrier, forcing all preceding storage accesses to appear to complete before succeeding storage accesses appear to complete. The difference between the two involves the type of storage access they affect. The first function, mfc_sync
, provides a barrier for all storage accesses, regardless of what type of memory is being read or written to.
mfc_eieio
(eieio stands for “enforce in-order I/O”) only orders two types of memory operations:
Reads from guarded, caching-inhibited storage with respect to other reads and writes
Writes to cacheable, coherent storage with respect to other writes
These specific situations are usually encountered when accessing the Cell’s I/O capability.
Both of these functions require a significant amount of resources to operate, and it’s better to use the fence/barrier mfc_get
/mfc_put
commands if they’ll suffice. If they won’t suffice, it’s better to use mfc_eieio
than mfc_sync
.
Barriers and fences affect transfer requests entering an MFC from the SPU, but there are no commands that order the transfers entering an MFC from the rest of the Cell. That is, when a set of data transfers leaves a sending MFC, the DMA commands don’t influence when the data reaches the receiving MFC or what order they’ll be received in.
However, each MFC has a register/channel that can be used to check whether a series of transfers has completed. After a value has been written, the MFC will monitor transfers directed to it before the value was written. The register/channel will return a 1 until the transfers are completed, and then it will return 0.
For example, an external element can track transfers entering an MFC with the following two steps:
Write a value to the MFC’s multisource synchronization register.
Wait until the value of the register changes to zero.
When the register value becomes zero, the external unit can be sure that the transferred data has reached its destination. To determine the address of the multisource synchronization register, the PPU needs to use code similar to the following:
/* Get the address of the SPU's multisource syncrhonization region */ mss_area = (spe_mssync_area_t *) spe_ps_area_get(spe, SPE_MSSYNC_AREA); /* Get the address of the multisource synchronization register in the region */ mss_addr = (unsigned long long)&(mss_area-> MFC_MSSync);
The local SPU can also take advantage of multisource synchronization by accessing the SPU_WrMSSyncReq
channel, also called the MFC multisource synchronization channel. spu_mfcio.h declares two functions for this purpose:
mfc_write_multi_src_sync_request
: Sends a value to the MSS channel and tells the MFC to start tracking previously created data transfers.
mfc_stat_multi_src_sync_request
: Checks the value of the MSS channel. A zero means the transfers haven’t finished, and a one means they have.
Unlike the signal/mailbox functions, neither function forces the calling SPU to block.
The Cell provides many capabilities for interprocessor communication that go beyond DMA. Mailboxes and signals transport data in small sizes, which makes them suitable for delivering control information and memory address data.
An SPU interfaces the rest of the Cell by reading or writing to one of its MFC’s 32 channels. There are only three basic channel functions: readch
reads the channel value, writech
sets the channel value, and readchcnt
returns the channel’s available capacity. But spu_mfcio.h declares many specific intrinsic functions that are easier to use and remember.
With events, the SPU can respond to external conditions affecting the MFC’s channels. There are three ways to monitor events: blocking, polling, and interrupt servicing. Blocking forces the SPU to wait for an event, and polling allows the SPU to execute a processing loop until the event occurs. Interrupt servicing is more involved than either, but allows for immediate response without halting the SPU’s operation.
Mailboxes provide a simple means of sending 4-byte messages between processors. The three main channels used for mailbox communication are commonly called the incoming mailbox, the outgoing mailbox, and the outgoing interrupt mailbox. Because neither outgoing mailbox transports data outside the Cell, mailboxes are usually used to send messages between the PPU and SPUs.
Signals are like mailboxes, and deliver 32 bits of data at a time. However, SPUs have two read-only channels for signals and no write-only channels. Instead, SPUs use a DMA-like mechanism to send signals to one another. If configured properly by the PPU, an SPU can receive signals in Logical OR mode. This means that incoming signals are ORed together into a single value.
Signals can be used to provide synchronization between elements, but this chapter has discussed two further methods. First, mfc_eieio
and mfc_sync
can order external accesses across multiple MFCs. Second, the multisource synchronization register/channel allows monitoring for transfer completion at the receiving MFC.
18.116.37.62