RDMA Read Relaxed Ordering Rules

Writes Can Complete Faster Than Reads

For the following reasons, a QP can often complete a write to its local memory faster than it can read requested read data from its local memory:

  • Inbound memory writes (e.g., a Send or an RDMA Write operation) to local memory are frequently handled by posting the memory write data into a fast posted memory write buffer. The fast-access buffer absorbs the data and the addresses it is to be written to, thereby accepting responsibility to write the data to memory at a later time. From the perspective of the device requesting the memory write (e.g., a local QP), the memory write thereby appears to complete very rapidly and the QP can move on to another operation.

  • Memory reads, on the other hand, can take quite a while. As an example, assume that a QP's RQ Logic has just latched an RDMA Read request to read data from its local memory and source it back to the requester QP's SQ Logic in a series of one or more RDMA Read response packets. In order to perform the read, the CA's logic must arbitrate for ownership of the bus that the memory resides on (e.g., a PCI or PCI-X bus) and then issue a series of one or more memory read transactions to fetch the data from memory. The amount of time for the bus arbitration as well as the memory access time can be fairly lengthy.

An Example of Relaxed RDMA Ordering

The Scenario

Refer to Figure 13-2 on page 282. In the example, three requests are issued to the responder QP's RQ Logic:

  1. An RDMA Read request for three packets of read data.

  2. Another RDMA Read request for a single packet of read data.

  3. An RDMA Write request with a single packet of write data.

Figure 13-2. Relaxed RDMA Read Ordering Example


The Rule

The RQ Logic receives the first two read requests. The relaxed RDMA Read ordering rules state that the responder is permitted to begin executing one or more Send or RDMA Write requests that arrive after an RDMA Read request (before the read is executed). Before executing any of the requests following the RDMA Read, however, the header fields of the RDMA Read request must be validated. In other words, the RQ Logic must verify that the VA, the R_Key, and the transfer length specified in the RETH (RDMA Extended Transport Header) are valid. Although executed before the read, the requests received after the RDMA Read must not be Ack'd until the outstanding RDMA Read responses have been sent.

Sequence of Events

The RQ Logic in the example executes the following actions:

  1. It receives the first read request, verifies that the VA, R_Key and length are valid, but does not yet start the read from local memory (perhaps it begins arbitrating for ownership of the bus that the local memory resides on).

  2. It receives the second read request, verifies that the VA, R_Key and length are valid, but does not yet start the read from local memory.

  3. It receives the RDMA Write and immediately performs the write to its local memory. In reality, the write data is quickly absorbed into a fast posted memory write buffer.

  4. Although it has the completed the write, it is not permitted to return the Ack for it until it has completed the two earlier reads.

  5. It performs the first read and returns the three packets of read data. The requester QP's SQ Logic retires the WQE for the first read.

  6. It performs the second read and returns the single packet of read data. The requester QP's SQ Logic retires the WQE for the second read.

  7. Only now does it return the Ack for the write. Upon receiving the Ack, the requester QP's SQ Logic retires the WQE for the write.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.103.59