Keep RDC Operational If a QP Goes Down

General

The EEC is a common resource utilized by its client RD QPs to perform message transfers. It is therefore very important that errors related to a QP rather than the EEC must not result in crashing the EEC. Examples of errors that must not crash the EEC would be a Q_Key miscompare, an R_Key violation, or the receipt of an RNR Nak. The sections that follow provide a detailed discussion of the various types of errors that may be encountered by an EEC during a message transfer and how they are handled.

Error Handling for Requester-Detected Conditions

Receipt of an RNR Nak Causes Suspend Followed by Restart

When the EEC Send Logic receives an RNR Nak in response to a request packet, the EEC decrements its RNR Nak Retry Counter:

  • If the RNR Retry Count is not exhausted:

    1. The EEC's Send Logic transmits a Resync request packet (a packet with a BTH:Opcode of Resync) to the remote EEC and waits for it to be Ack'd before proceeding to the next step. For a detailed description of the Resync operation, refer to “Resync Operation” on page 499.

    2. Checks the MSN in the Resync's Ack packet:

      - If the MSN has been incremented by one, this indicates that the destination QP has abandoned the message transfer. In this case, go to step c (unless the message was an Atomic operation; in that case, tell the client QP to abandon the SQ WQE, go to the SQE state, and create an error CQE).

      - If the MSN has been incremented by two, this indicates the message in progress has been completed at the destination QP. In this case, the EEC's Send Logic instructs its client QP to retire the active SQ WQE and create a good completion CQE. The operation will not be retried. The remaining steps are not performed.

    3. The packet's Timer value (from its Syndrome field) is passed back to the client QP's SQ Logic.

    4. The EEC removes the QP from its processing list.

    5. The EEC moves on to service the next client QP in its processing list. In this way, the RDC remains operational and isn't tied up while the RNR timeout elapses.

    6. The QP that received the RNR Nak waits at least until the timeout has elapsed (if not longer) before it issues a request to rejoin the EEC's processing list. In other words, the message transfer for the client QP currently being processed is “suspended” until the timer times out.

    7. Once the timer times out, the QP rejoins the EEC's processing list. When it arrives at the head of the list, the suspended message must be restarted (referred to as a “restart”):

      - It is restarted from the beginning if it is a Send or an RDMA Write With Immediate.

      - If it is an RDMA Read or an RDMA Write Without Immediate, it can (implementation-specific) be restarted where it left off, but it must appear to be a new message to the responder. By “new” the specification means that the request packet must have a BTH opcode of “first” or “only.”

  • When the EEC exhausts its RNR Nak Retry Count:

    1. The EEC “abandons” the current message transfer.

    2. The EEC passes the error to the client QP's SQ Logic and removes the QP from its processing list.

    3. The EEC's Send Logic transmits a Resync request packet to the remote EEC and waits for it to be Ack'd before proceeding to the next step.

    4. The EEC moves on to service the next client QP in its processing list. In this way, the RDC remains operational so it can service its other client QPs.

    5. The QP that experienced the error transitions to the SQE state.

    6. The QP retires the active SQ WQE and creates an error CQE on the SCQ. The error code reported is “RNR Retry Count Exceeded.” All of the subsequently posted SQ WQEs are retired, and CQEs are created on the SCQ indicating that they were not executed because they were flushed due to the WQE that completed in error.

The specification states that an Atomic operation may not be suspended (due to an RNR Nak). It is a rule that an EEC's Receive Logic will support one outstanding RDMA Read or Atomic request at a time (i.e., the depth of its special queue for handling these operation types is one). The EEC's Send Logic would know that it hadn't yet sent an RDMA Read or an Atomic request so it would know that the special queue is currently empty. The remote EEC's Receive Logic is therefore not permitted to respond to an inbound RDMA Read or Atomic request with an RNR Nak (which would cause it to be suspended on the requester's end).

Receipt of a PSN Sequence Error Nak
Before Retry Count Exhaustion

The actions taken by the EEC Send Logic when it receives a PSN Sequence Error Nak from the remote EEC's Receive Logic are somewhat similar to those taken upon receipt of an RNR Nak (but note that the EEC Send Logic does not issue a Resync request packet to the remote EEC's Receive Logic). At the time that it was set up, the EEC was programmed with two Retry Counts:

- The RNR Retry Count.

- The Retry Count that is related to PSN Sequence Error Naks, missing RDMA Read Response or Atomic packets, and Transport Timer timeouts.

When a PSN Sequence Error Nak is received, the EEC decrements its Retry Counter and, if it's not exhausted, tells its client QP's SQ Logic to rewind to at least the request packet indicated by the Nak packet's PSN. The QP's SQ Logic then resumes forwarding packets to the EEC starting with that request packet.

On Retry Count Exhaustion

If the EEC's Send Logic should exhaust its Retry Counter, the actions it takes depends on whether or not the two EECs have been set up for Automatic Path Migration [APM; for more information, refer to “Automatic Path Migration” on page 575]. If they haven't, then the actions described in “If the Error Proves to Be Unrecoverable” on page 489 are taken. If APM has been enabled, the following actions are taken:

- EEC's Send Logic overwrites the primary path address information in its EEC with its alternate path address information.

- Reloads its Retry Counter with its initial value.

- Tells its client QP's SQ Logic to rewind to at least the request packet indicated by the Nak packet's PSN. The QP's SQ Logic then resumes forwarding packets to the EEC starting with that request packet.

- Sets the BTH:MigReq bit to one in the first request packet it retransmits.

- Upon receipt of the request packet over the new path, the remote EEC verifies that the path taken matches its alternate path information.

- The remote EEC's Receive Logic overwrites the primary path address information in its EEC with the alternate path address information stored in its EEC.

- The next response packet sent by the remote EEC's Receive Logic will therefore use the new path to get back to the other EEC's Send Logic.

- If the EEC whose Send Logic exhausted its Retry Count and triggered the migration is still unsuccessful and once again exhausts its Retry Count, the actions described in “If the Error Proves to Be Unrecoverable” on this page are taken.

If the Error Proves to Be Unrecoverable

If the PSN Sequence Error proves to be unrecoverable, the following actions are taken:

- The EEC terminates the current message transfer and transitions to the Error state.

- The EEC tells its client QP to retire the active SQ WQE and create an error CQE on the SCQ. The specification doesn't clearly define the error code supplied in the CQE, but, because this error uses the same Retry Counter as that used for Transport Timer timeouts, it is the author's opinion that it would be “Transport Retry Counter Exceeded.”

- The QP that experienced the error transitions to the SQE state.

- All of the client QP's subsequently posted SQ WQEs are retired and CQEs are created on the SCQ indicating that they were not executed. Rather, they were flushed due to the WQE that completed in error.

- The EEC removes the offending QP from its processing list.

- As long as the EEC remains in the Error state, it instructs all client QPs that wish it to send messages to flush their respective SQs and to transition to the Error state.

Transport Timer Timeout

The EEC's Send Logic may experience a Transport Timer timeout while awaiting the expected response to a request packet sent earlier.

The actions taken are the same as those taken when a PSN Sequence Error Nak is received. Refer to “Receipt of a PSN Sequence Error Nak” on page 488.

Detected Missing RDMA Read or Atomic Response

The EEC's Send Logic may receive a response packet with a PSN higher than that of the next expected RDMA Read response or Atomic response packet. This indicates that one or more RDMA Read response packets or an Atomic response packet may have been lost in the fabric.

The actions taken are the same as those taken when a PSN Sequence Error Nak is received. Refer to “Receipt of a PSN Sequence Error Nak” on page 488.

Receipt of a Remote Access Error Nak
Reasons for Remote Access Error Nak

The remote EEC's Receive Logic returns a Remote Access Error Nak for one of the following reasons:

- The request packet's R_Key field is invalid.

- The virtual memory start address (VA), the transfer length, or the type of access (read or write) is not permitted using the specified R_Key.

Actions Taken on Receipt of Remote Access Error Nak

Upon receipt of this Nak, the EEC's Send Logic takes the following actions:

  1. Terminates the current message transfer.

  2. The EEC's Send Logic transmits a Resync request packet to the remote EEC and waits for it to be Ack'd before proceeding to the next step. For a detailed description of the Resync operation, refer to “Resync Operation” on page 499.

  3. Checks the MSN in the Resync's Ack packet.

    - If the MSN has been incremented by one, this indicates that the destination QP has abandoned the message transfer. In this case, go to step 4 (unless the message was an Atomic operation; in that case, tell the client QP to abandon the SQ WQE, go to the SQE state, and create an error CQE).

    - If the MSN has been incremented by two, this indicates that the message in progress has been completed at the destination QP. The actions taken by the EEC's Send Logic depend on the type of message transfer that was completed:

    - If it was an RDMA Read, tell the client QP to abandon the SQ WQE, go to the SQE state, and create an error CQE.

    - If it was an Atomic operation, tell the client QP to go to the SQE state and retire the SQ WQE with an error CQE indicating a “Remote Aborted Error” and “Known Complete at Responder.”

    - If it was a Send or an RDMA Write, tell the client QP to retire the SQ WQE with a good CQE.

  4. Instructs its client QP to transition to the SQE state.

  5. Removes the current client QP from its processing list.

  6. The client QP's SQ retires the currently executing WQE and creates an error CQE on its SQ's CQ. The error code reported in the CQE is “Remote Access Error.”

  7. The client QP retires all subsequently posted WQEs from its SQ and creates CQEs indicating that they were all flushed due to the error in the earlier WQE's message transfer.

  8. The EEC's Send Logic moves on to the next client QP in its processing list and continues operation.

Receipt of a Remote Invalid Request Nak

A Remote Invalid Request Nak may be returned for any of the reasons stated in “Reason for Invalid Request Nak” on page 404. Upon receipt of this Nak, the EEC's Send Logic takes the same actions defined in “Receipt of a Remote Access Error Nak” on page 490. The error code returned in the CQE is “Remote Invalid Request Error.”

Receipt of a Remote Operational Error Nak

A remote operational error occurs when the responder QP's RQ Logic encounters a situation that prevents its RQ from completing the current request. The list of error conditions detectable by the responder, and reportable as a Remote Operational error, is implementation-specific. Remote operational errors cannot be caused by anything the requester may have done. Rather, they reflect a fault in the responder.

Possible causes include:

  • The responder QP's RQ Logic detected a malformed RQ WQE while processing the current request packet.

  • The responder QP's RQ Logic detected a QP-related error while executing the current request packet. The error prevented the responder from completing the request.

Upon receipt of this Nak, the EEC's Send Logic takes the same actions defined in “Receipt of a Remote Access Error Nak” on page 490. The error code returned in the CQE is “Remote Operation Error.”

Receipt of an Remote Invalid RD Request Nak

A Remote Invalid RD Request Nak is returned for one of the following reasons:

  • The RDD of the destination EEC and the destination QP did not match.

  • The Q_Key delivered in the request packet's DETH:Q_Key field did not match the Q_Key of the destination QP.

Upon receipt of this Nak, the EEC's Send Logic takes the same actions defined in “Receipt of a Remote Access Error Nak” on page 490. The error code returned in the CQE is “Remote Invalid RD Request Error.”

Requester Detects a Local Problem within Its CA

The possible error conditions defined by the specification and how they are handled are described in the subsections that follow.

Locally Detected Memory Protection Error

As an example, the Gather Buffer List specified in a WQE for a Send or an RDMA Write may have one or more entries with invalid L_Key values. In response to this type of error, the EEC's Send Logic takes the same actions specified in “Receipt of a Remote Access Error Nak” on page 490 (the error code returned in the CQE is “Local Protection Error”).

Implementation-Specific Error Associated With a WQE

An implementation-specific error occurred in the requester's local CI that can be associated with a certain WQE. In response to this type of error, the EEC's Send Logic takes the same actions specified in “Receipt of a Remote Access Error Nak” on page 490 (the error code returned in the CQE is “Local QP Operation Error”).

Implementation-Specific Error With No WQE Association

The error can be associated with a specific QP and EEC, but not with a specific WQE on that QP's SQ. In this case, the following actions are taken:

- That QP is transitioned to the SQE state.

- All of the QP's SQ WQEs are retired and CQEs are created for each of them indicating that they were flushed and not executed.

- If the CA is an HCA, the CA causes the Asynchronous Event Handler to be called (typically via an interrupt) and the error is reported as an Asynchronous Affiliated Error.

Implementation-Specific Error Associated With an EEC

The error cannot be associated with a specific QP, but can be associated with a specific EEC. In this case, the following actions are taken:

- The EEC is transitioned to the Error state.

- All of the EEC's client QPs are transitioned to the Error state.

- All WQEs on its client QPs' SQs are retired and CQEs are created for each of them indicating that they were flushed and not executed.

- If the CA is an HCA, the CA causes the Asynchronous Event Handler to be called (typically via an interrupt) and the error is reported as an Asynchronous Affiliated Error.

Error Not Associated With EEC or QP

The error cannot be associated with a specific QP or EEC. If the CA is an HCA, the CA causes the Asynchronous Event Handler to be called (typically via an interrupt) and the error is reported as an Asynchronous Unaffiliated Error.

Locally Detected RDD Violation Error

The RDD of the local client QP and the local EEC in the requester CA did not match. In response to this type of error, the EEC's Send Logic takes the same actions specified in “Receipt of a Remote Access Error Nak” on page 490 (the error code returned in the CQE is “Remote Invalid RD Request Error”).

RDMA Read Response Packet's Payload Size Is Wrong

An RDMA Read response returned too much or too little payload data. In response to this type of error, the EEC's Send Logic takes the same actions specified in “Receipt of a Remote Access Error Nak” on page 490. The specification is not clear on what error code is returned in the CQE and it's not evident to the author. Hopefully, this will be clarified in the next version of the specification.

Response With Unexpected Opcode

The response packet received had the correct PSN, but contained an unexpected opcode. In response to this type of error, the EEC's Send Logic takes the same actions specified in “Receipt of a Remote Access Error Nak” on page 490. The specification is not clear on what error code is returned in the CQE and it's not evident to the author. Hopefully, this will be clarified in the next version of the specification.

Good Transfer But Can't Post CQE

The current message transfer completed without error and was fully Ack'd, but the CQE could not be written to the CQ due to a failure internal to the CA. The CQ is inaccessible or full and an attempt was made to retire a WQE and create a CQE. The following actions are taken:

- The affected QP is transitioned to the SQE state.

- If the CA is an HCA, the CA causes the Asynchronous Event Handler to be called (typically via an interrupt) and the error is reported as an Asynchronous Affiliated Error.

- The current WQE and any subsequent WQEs on the currently active client QP's SQ are left in an unknown state.

Error Handling for Responder-Detected Conditions

Table 19-2 on page 494 defines each type of error that may be detected by the EEC's Receive Logic and how that error is handled.

Table 19-2. Error Handling for RD Responder-Detected Conditions
ErrorDescriptionHandling
Malformed RQ WQEResponder detected a malformed RQ WQE while processing an inbound Send or RDMA Write With immediate request packet.Class A error handling:
  • Remote Operational Error Nak returned.

  • Destination QP transitions to the Error state.

  • EEC state does not change.

  • Destination QP's RQ WQE is retired and an error CQE is created indicating a “Local QP Operation Error.”

  • All remaining RQ WQEs are retired and a CQE is created for each indicating it was flushed and not executed.

  • All SQ WQEs are retired and a CQE is created for each indicating it was flushed and not executed.

Unsupported or Reserved OpcodeInbound request packet's BTH:Opcode was either reserved or was for a function not supported by this QP (e.g., an RDMA or Atomic request on a QP that doesn't support it).Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Misaligned semaphore start addressIn an Atomic request packet's AtomicETH, the VA is not quadword-aligned.Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Too many RDMA Read or Atomic RequestsThe remote EEC's Send Logic transmitted more than one RDMA Read or Atomic request packet. Any one received before the previous one has been completed isn't responded to.Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Out of sequence request packetPSN of the inbound request packet does not match the EEC Receive Logic's ePSN and is in the Invalid range.Class B error handling:
  • Return a PSN Sequence Error Nak to the remote EEC's Send Logic.

  • No other action is taken.

Current request packet is “First” or “Only” and should have been “Middle” or “Last”The responder was expecting a request packet with a “Middle” or a “Last” opcode and received a “First” or an “Only”. Indicates one of the following:
  • One or more “Middle” packets and the “Last” packet of the current message were lost in the fabric.

  • The “Last” packet of the current message was lost in the fabric.

Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Current request packet should have been a First or “Only”The responder was expecting a request packet with a “First”or “Only” opcode, but received a “Middle” or a “Last”. Indicates that one or more request packets were lost in the fabric.Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Resync Opcode incomplete WQEA Resync request packet arrives and the responder QP has a partially complete RQ WQE. Indicates that the remote EEC's Send Logic has aborted the transfer. For a detailed description of the Resync operation, refer to “Resync Operation” on page 499.Class E error handling:
  • Using one of the following methods, abort the current message if it is not complete:

    - Reset the WQE so that it can be reused for a future message

    - If a RQ WQE is in use, retire current RQ WQE and create error CQE indicating “Remote Aborted Error.”

  • Prepare to receive a new inbound message. QP continues operation without a transition to the Error state.

R_Key ViolationThe EEC's Receive Logic detects an R_Key violation while executing an RDMA or an Atomic request.Class B error handling:
  • Return a Remote Access Error Nak to the remote EECs Send Logic.

  • No other action is taken.

Local QP ErrorThe EEC's Receive Logic detected a local QP-related error while executing the request packet. The local error prevented the EEC's Receive Logic from completing the request.See the Class A error handling description in the description of the “Malformed RQ WQE” in this table.
Q_Key ViolationThe destination QP's Q_Key did not match the Q_Key delivered in the request packet's DETH:Q_Key field.Class B error handling:
  • Return an Invalid RD Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Packet Header ViolationThe EEC's Receive Logic detected a header violation requiring that the request packet be silently dropped. Figure 17-25 on page 438 illustrates the header validation process.Class D error handling:
  • Silently drop request packet.

  • Don't generate an Ack or Nak.

  • Don't retire a RQ WQE for the current message.

  • Wait for first packet of a new message.

  • The new message must begin at the ePSN.

  • If a RQ WQE was in use, reset it to accept the next incoming Send or RDMA Write with Immediate.

Please note that the information in italics will be removed from the next version of the specification.
RDD ViolationThe RDD of the receiving EEC did not match the destination QP's RDD.Class B error handling:
  • Return an Invalid RD Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Invalid Destination QPThe destination QP indicated by the BTH:DestQP field does not exist or is not configured for RD service.Class B error handling:
  • Return an Invalid Request Nak to the remote EEC's Send Logic.

  • No other action is taken.

Resources Not Ready ErrorWQE or other resource is not currently available, so the EEC's Receive Logic has returned an RNR Nak.Class B error handling:
  • Return an RNR Nak to the remote EEC's Send Logic.

  • No other action is taken.

Length errors
  • Inbound message “Send” operation exceeded the Scatter Buffer List in the RQ WQE.

  • RDMA Write operation contained too much or too little payload data compared to the transfer length advertised in the first or only packet.

  • Payload length was not consistent with the opcode:

    - “Only” must contain 0 to PMTU bytes.

    - “First” or “Middle” must contain PMTU bytes.

    - “Last” must contain 1 to PMTU bytes.

Class F error handling:
  • Only occurs due to invalid request length.

  • Return Invalid Request Nak.

  • If the current message was using a RQ WQE, that WQE is retired and an error CQE is created indicating an “Remote Invalid request Error.”

  • The local QP remains operational and does not change state.

  • The EEC remains operational.

Invalid duplicate Atomic RequestDuplicate Atomic request received, but its PSN does not match PSN of a previously executed Atomic Request whose results were saved.See the Class D error handling description in the description of the “Packet Header Violation” in this table.
CQ overflowMessage was fully executed and Ack'd, but CQE could not be written to the CQ. Occurs when the CQ is inaccessible or full and an attempt is made to complete a WQE.Class G error handling:
  • The affected QP transitions to the Error state.

  • No WQEs are retired and no CQEs are created.

  • The current WQE and any subsequent WQEs are left in an unknown state.

  • If the CA is an HCA, the Asynchronous Event Handler is called (typically via an interrupt) and the error is reported as an Asynchronous Affiliated Error.

Local EEC ErrorThere was an error related to the local EEC's Receive Logic while executing the request packet. The error prevented the responder from completing the request. These errors are not caused by the sender.Class H error handling:
  • The EEC is transitioned to the Error state.

  • If the current message was using a RQ WQE, that WQE is retired and an error CQE is created indicating an “Local EEC Operation Error”.

  • If the CA is an HCA and the current message was not using a RQ WQE, the CA causes the Asynchronous Event Handler to be called (typically via an interrupt) and the error is reported as an Asynchronous Affiliated Error. Specifically, it indicates a “Local Work Queue Catastrophic Error.”


Resync Operation

On Error or Delay, EEC Suspends or Abandons Current Message

Under certain circumstances, the EEC's Send Logic suspends or abandons the message transfer currently in progress:

  • When an EEC's Send Logic receives an RNR Nak, it suspends the message transfer currently in progress. For more information, see “Receipt of an RNR Nak Causes Suspend Followed by Restart” on page 486.

  • When the EEC's Send Logic receives an error Nak, it abandons the message transfer currently in progress.

  • When the EEC's Send Logic detects certain errors local to the EEC or its client RD QP, it abandons the message transfer currently in progress.

The EEC suspends or abandons the in-progress message transfer so it can remain productive by proceeding to service the next client QP in its processing list.

Problems Associated with Suspend or Abandon
PSN Synchronization Problem

When the EEC's Send Logic suspends or abandons an error transfer and then starts transmitting a new message for another of its client QPs, this can result in the remote EEC's Receive Logic ePSN being incorrect.

Inform Remote QP of Suspension or Abandonment

When a transfer in progress is aborted midstream, the remote destination QP must be informed.

Retries and/or Delayed Packets Can Cause Problems

In some cases, problems can be caused by request packet retries and/or the slow delivery of packets.

Data Corruption Problems

In some cases, data can become corrupted in the remote CA's local memory.

Resync Appears to Be a Required EEC Send Logic Capability

In order to keep the RDC operational, an EEC's Send Logic is required to transmit a Resync request packet (the BTH:Opcode field contains the Resync request opcode) when it is forced to suspend or abandon the message transfer it is performing for the client QP currently at the head of its processing list. The specifications states:

“Under some conditions, a requester's EE Context is required to generate a special form of a request packet called a RESYNC request. This occurs when the requester EE Context elects to discontinue (note that the specification uses the termabortin reference to both suspension and abandonment) the current request message.”

Also refer to “Statement Declares Resync Transmit Optional” on page 516 and “Statements Declaring Resync Transmit Requirement” on page 516.

Send Logic Conditions That Require a Resync

The EEC's Send Logic issues a Resync request packet to the remote EEC's Receive Logic under the following circumstances:

  • Upon receipt of an RNR Nak.

  • Upon receipt of a Remote Invalid Request Nak.

  • Upon local detection of a local memory protection error.

  • Upon receipt of a Remote Access Error Nak.

  • Upon receipt of a Remote Operation Error Nak.

  • An implementation-specific error occurred in the requester's local CI that can be associated with a certain WQE.

  • The RDD of the local client QP and the local EEC in the requester CA did not match.

  • Upon receipt of a Remote Invalid RD Request Nak.

  • Upon local detection of an inbound RDMA Read response packet containing too much or too little payload data.

  • Upon local detection of a response packet with the correct PSN, but containing an unexpected opcode.

Example Problem Cases and Resync's Resolution of Problems
Scenario One

Refer to Figure 19-4 on this page. The sequence of events is numbered and self-explanatory. The receipt of the RNR Nak and the resulting suspension of the message transfer, coupled with the retry caused by the Transport Timer timeout, results in confusion at both ends of the RDC.

Figure 19-4. Scenario One


Resync's Resolution of Scenario One Problems

Figure 19-5 on page 502 describes how the Send Logic's issuance of the Resync request packet resolves all of the problems associated with scenario one.

Figure 19-5. Scenario One Resolution


Scenario Two

Refer to Figure 19-6 on page 505. The sequence of events is numbered and self-explanatory. In this example, a multi-packet memory write (either a Send or an RDMA Write) is initiated. The EEC Receive Logic finds a problem with the first request packet (e.g., a bad R_Key), so it returns a Remote Access Error Nak. In the meantime, the Send Logic has launched multiple request packets for the multi-packet write into the fabric. The second packet arrives at the Receive Logic and is discarded because its PSN (2) is greater than the ePSN. The third request packet experiences a long delay in the fabric and arrives much later. As will be seen, this can result in data corruption in the remote CA's memory.

Figure 19-6. Scenario Two


Resync's Resolution of Scenario Two Problems

The Send Logic's issuance of the Resync request packet resolves all of the problems associated with this scenario.

Upon receipt of the Nak, the Send Logic stops (abandons) the current message transfer and transmits a Resync request packet to the remote EEC's Receive Logic. The PSN inserted in the Resync request packet is the Send Logic's nPSN (4). Upon receipt of the Resync request packet, the Receive Logic resets its ePSN to one more than the Resync's PSN (4 + 1 = 5). It returns an Ack with the same PSN as the Resync (4) and an MSN of one, indicating that, of the two messages received, only the Resync has completed. The previous message has been abandoned by the destination QP and EEC.

Upon receipt of the Resync's Ack packet, the Send Logic determines (because the MSN returned = the eMSN—expected MSN—of 1, rather than eMSN + 1) that the earlier message had been abandoned on the remote end. As a result, the Send Logic instructs its client QP to transition to the SQE state and to create an error CQE for the transfer. When software receives the CQE, it posts one or more new message transfer requests to the SQ and transitions the QP back to the RTS state. The first one it posts is a two-packet Send or RDMA Write. It should be noted that, in this example, no other client QPs on the Send Logic's end need to use this RDC. The EEC Send Logic now performs the two-packet Send or RDMA Write to the same destination QP targeted by the earlier Nak'd message. The remote EEC's Receive Logic receives the first request packet and the packet's PSN matches the ePSN (5). The data payload is written to local memory. The Receive Logic is coalescing Acks and doesn't return an Ack yet. It bumps its ePSN to 6.

The Receive Logic now receives the long-delayed request packet from the earlier message transfer. It's PSN (3) is less than the ePSN (6), so it is treated as a duplicate request packet. It's data payload is not written to local memory. An Ack isn't sent back yet due to continued coalescing on the Receive Logic's part.

The remote EEC's Receive Logic receives the second request packet of message two and the packet's PSN matches the ePSN (6). The data payload is written to local memory. This completes the two-packet write to local memory. At this point, the Receive Logic stops coalescing and sends back an Ack packet with a PSN of 6 and an MSN of 2. It bumps its ePSN to 7.

Upon receipt of the Ack packet with an MSN = the eMSN (2), the Send Logic knows the message in progress (the two packet Send or RDMA Write) has been completed. It therefore signals its client QP to retire the currently active WQE and create a good completion CQE.

If the Resync had not been performed, the second message transfer would have started with a PSN of two. The Receive Logic would have received the second message's first request packet with a PSN of two and correctly written its data payload into local memory. It would have then received the long-delayed request packet from the first message with a PSN of three (the ePSN) and would have written its data payload into local memory. Write data from two separate messages is being intermingled in the local memory buffer, resulting in memory corruption.

Additional Resync Information

For additional information regarding Resync, refer to “Resync Reference Material” on page 516.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.140.189