SQ Error State

This state is implemented on all QP types except for RC QPs. It is also not implemented on a RD EEC (for more information, refer to the chapter entitled, “RD Transport Service” on page 461).

SQ Error Entry Conditions

The SQE state is entered automatically when a SQ WQE encounters an error during the performance of a message transfer. SQE can be entered either from the RTS state or the SQD state.

How Do the SQE and Error States Differ?

In Error State, the QP Ceases All Processing

When a QP transitions to the Error state, it becomes completely unproductive. It stops processing both SQ and RQ WQEs. It ceases to transmit message transfer request packets to the remote QP's RQ Logic. Likewise, its RQ Logic ceases to process any incoming request packets transmitted by the remote QP's SQ Logic. They are silently dropped.

In SQE, Send Side Shuts Down But Receive Side Operates

When a QP enters the SQE state (from the RTS or SQD state), its SQ Logic ceases to process SQ WQEs. No new request packets are transmitted to the remote QP's RQ Logic.

The QP's RQ Logic, on the other hand, remains fully functional. RQ WQEs that were posted to the RQ prior to entering the SQE state are processed and the QP's RQ Logic responds to incoming request packets transmitted by the remote QP's SQ Logic. While new WQEs can be posted to the QP's RQ, the specification appears to imply that they will not be processed until the QP is transitioned back to the RTS state by software. The following is the quote from the specification:

“C10-38: Receive Work Requests which were submitted to a Receive Queue prior to that queue's transition into the SQEr state shall continue to be processed normally. New Receives must be able to be posted to such a Receive Queue.”

SQE State Operational Characteristics

When a QP enters the SQE state, it has the following operational characteristics:

  • Error CQE created. The SQE state is entered when the message transfer described by the currently executing SQ WQE incurs an error while the QP is in the RTS or SQD state. The SQ Logic creates an error CQE on the SCQ for that message transfer.

  • Subsequent WQEs are flushed. WQEs posted to the SQ after the one that caused the error are all retired and CQEs are created in the SCQ indicating that they were all flushed due to the error on the earlier SQ WQE.

  • Definition of the completion error types:

    - Interface Check. There was an error in the WR information that was supplied to the HCA. In this case, the error is detected before any message packets are placed onto the link.

    - Processing Error. An error was encountered during the processing of the WQE by the SQ Logic.

  • The remote QP's RQ state isn't known. For all QP types other than RD, the SQ Logic can start processing the next SQ WQE before the current WQE's message transfer completes. On RD, however, the QP's SQ Logic is only permitted to work on a single SQ WQE and cannot move on to the next message transfer until the current one is completed and has received all of its corresponding response packets from the remote QP.

    For the QP types other than RD, at the moment when an error is detected by the SQ Logic (e.g., it receives a Nak error code for a request packet transmitted earlier) the QP's SQ Logic may have already launched one or more request packets associated with subsequent message transfer (i.e., subsequent SQ WQEs).

    In this event, those subsequently transmitted request packets may have arrived at the remote QP's RQ Logic and may have been executed. Depending on the type of requests, this may have affected the state of the remote QP's RQ Logic:

    - Send operations may have been partially or fully completed by the RQ Logic and a RQ CQE may or may not have been generated by the remote RQ Logic.

    - RDMA Read operations may have partially completed (i.e., one or more RDMA Read response packets may have been returned to the requester and may or may not have been written to the local CA's local memory). The contents of the Scatter buffers pointed to by their WRs is therefore indeterminate.

    - RDMA Write operations may have partially completed and the contents of the destination CA's local memory is indeterminate. If the operation specified the delivery of an Immediate Data item in the message's final packet, a RQ CQE may or may not have been created.

    - Atomic operations may or may not have been attempted in the remote CA's memory and the memory contents of the semaphore pointed to by the remote address of the WR may contain either the original or updated data. At the requester QP's end, the contents of the memory buffer allocated to hold the returned read data is indeterminate.

  • RQ remains functional. WQEs that were posted to the local QP's RQ prior to the QP's transition into the SQE state continue normal processing. Newly posted RQ WQEs are posted to the RQ, but don't begin processing.

  • Verb used to leave this state. The Modify QP verb can be used to transition the QP from SQE to the RTS, Error, or Reset state.

  • Detection of another error leads to Error state. The detection of a RQ error or an Asynchronous Error while the QP is in SQE results in the QP's transition to the Error state.

  • SQE doesn't apply to RC. Transition into the SQE state applies to all QPs except for RC QPs. The SQE state is not used by an RC QP.

Software Actions When SQE State Entered

On a transition to the SQE state, software may take the following actions:

  1. Software examines the error CQE related to the failed SQ WQE to determine the cause of the error.

  2. Software may or may not determine that revising the WR would result in a successful message transfer. If the problem cannot be fixed by revising and reposting the WR, software would transition the QP to the Error state and return a hard-failure notice to the application that requested the message transfer.

  3. Assuming that a revision of the WR can fix the problem, software can re-post the revised WR to the SQ. If there had been any additional WRs posted to the SQ at the time of the error, they were flushed, so software would also repost those WRs.

  4. Software then returns the QP to the fully operational RTS state utilizing the Modify QP verb.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.171.147