Chapter 17. Network Drivers, Part 2: Packet Reception and Transmission

image with no caption

This chapter examines the packet reception and transmission components of em(4). Predictably, em(4) uses both mbufs and MSI for packet reception and transmission.

Packet Reception

When an interface receives a packet, it sends an interrupt. Naturally, this causes its interrupt handler to execute. For example, here is what executes in em(4):

static void
em_msix_rx(void *arg)
{
        struct rx_ring *rxr = arg;
        struct adapter *adapter = rxr->adapter;
        bool more;

        ++rxr->rx_irq;

        more = em_rxeof(rxr, adapter->rx_process_limit, NULL);
        if (more)
                taskqueue_enqueue(rxr->tq, &rxr->rx_task);
        else
                E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims);
}

This function takes a pointer to a ring buffer that contains one or more received packets, and calls em_rxeof to process those packets. If there are more than rx_process_limit packets, a task structure is queued; otherwise, this interrupt is reenabled. I’ll discuss the task structure and its associated function in em_handle_rx Function in em_handle_rx Function.

em_rxeof Function

As mentioned previously, em_rxeof processes received packets. Its function definition is listed below, but because this function is fairly long and involved, I’ll introduce it in parts. Here is the first part:

static bool
em_rxeof(struct rx_ring *rxr, int count, int *done)
{
        struct adapter *adapter = rxr->adapter;
        struct ifnet *ifp = adapter->ifp;
        struct e1000_rx_desc *cur;
        struct mbuf *mp, *sendmp;
        u8 status = 0;
        u16 len;
        int i, processed, rxdone = 0;
        bool eop;

        EM_RX_LOCK(rxr);

      for (i = rxr->next_to_check, processed = 0; count != 0; ) {
              if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0)
                        break;

              bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
                    BUS_DMASYNC_POSTREAD);

                mp = sendmp = NULL;
                cur = &rxr->rx_base[i];
                status = cur->status;
                if ((status & E1000_RXD_STAT_DD) == 0)
                        break;
                len = le16toh(cur->length);
                eop = (status & E1000_RXD_STAT_EOP) != 0;

                if ((cur->errors & E1000_RXD_ERR_FRAME_ERR_MASK) ||
                    (rxr->discard == TRUE)) {
                        ++ifp->if_ierrors;
                        ++rxr->rx_discarded;
                        if (!eop)
                                rxr->discard = TRUE;
                        else
                                rxr->discard = FALSE;
                      em_rx_discard(rxr, i);
                        goto next_desc;
                }
...

This function’s execution is contained primarily within a for loop. This loop begins by verifying that the interface is up and running. Then it synchronizes the DMA buffer currently loaded in rxr->rxdma.dma_map, which is rxr->rx_base.

The buffer rxr->rx_base[i] contains a descriptor that describes a received packet. When a packet spans multiple mbufs, rxr->rx_base[i] describes one mbuf in the chain.

If rxr->rx_base[i] lacks the E1000_RXD_STAT_DD flag, the for loop exits. (The E1000_RXD_STAT_DD flag stands for receive descriptor status: descriptor done. We’ll see its effects shortly.)

If rxr->rx_base[i] describes the last mbuf in the chain, the Boolean variable eop, which stands for end of packet, is set to TRUE. (Needless to say, when a packet requires only one mbuf, that mbuf is still the last mbuf in the chain.)

If the packet described by rxr->rx_base[i] contains any errors, it is discarded. Note that I use the word packet, not mbuf, here, because every mbuf in the packet is discarded.

Now let’s look at the next part of em_rxeof:

...
              mp = rxr->rx_buffers[i].m_head;
                mp->m_len = len;
                rxr->rx_buffers[i].m_head = NULL;

              if (rxr->fmp == NULL) {
                        mp->m_pkthdr.len = len;
                      rxr->fmp = rxr->lmp = mp;
                } else {
                        mp->m_flags &= ˜M_PKTHDR;
                      rxr->lmp->m_next = mp;
                      rxr->lmp = mp;
                        rxr->fmp->m_pkthdr.len += len;
                }
...

Here, rxr->fmp and rxr->lmp point to the first and last mbuf in the chain, mp is the mbuf described by rxr->rx_base[i], and len is mp’s length.

So, this part simply identifies whether mp is the first mbuf in the chain. If it is not, then mp is linked into the chain.

Here is the next part of em_rxeof:

...
                 if (eop) {
                          --count;
                        sendmp = rxr->fmp;
                            sendmp->m_pkthdr.rcvif = ifp;
                          ++ifp->if_ipackets;
                        em_receive_checksum(cur, sendmp);
 #ifndef __NO_STRICT_ALIGNMENT
                        if (adapter->max_frame_size >
                              (MCLBYTES - ETHER_ALIGN) &&
                            em_fixup_rx(rxr) != 0)
                                  goto skip;
  #endif
                          if (status & E1000_RXD_STAT_VP) {
                                  sendmp->m_pkthdr.ether_vtag =
                                      le16toh(cur->special) &
                                      E1000_RXD_SPC_VLAN_MASK;
                                  sendmp->m_flags |= M_VLANTAG;
                          }
  #ifndef __NO_STRICT_ALIGNMENT
  skip:
  #endif
                        rxr->fmp = rxr->lmp = NULL;
                  }
  ...

If mp is the last mbuf in the chain, sendmp is set to the first mbuf in the chain, and the header checksum is verified.

If our architecture requires strict alignment and jumbo frames are enabled, em_rxeof aligns the mbuf chain. (Jumbo frames are Ethernet packets with more than 1500 bytes of data.)

This part concludes by setting rxr->fmp and rxr->lmp to NULL. Here is the next part of em_rxeof:

...
next_desc:
                cur->status = 0;
                ++rxdone;
                ++processed;

                if (++i == adapter->num_rx_desc)
                        i = 0;

              if (sendmp != NULL) {
                        rxr->next_to_check = i;
                        EM_RX_UNLOCK(rxr);
                      (*ifp->if_input)(ifp, sendmp);
                        EM_RX_LOCK(rxr);
                        i = rxr->next_to_check;
                }

                if (processed == 8) {
                      em_refresh_mbufs(rxr, i);
                        processed = 0;
                }
        }                                      /* The end of the for loop. */
...

Here, i is incremented so that em_rxeof can get to the next mbuf in the ring. Then, if sendmp points to an mbuf chain, em(4)’s input routine is executed to send that chain to the upper layers. Afterward, new mbufs are allocated for em(4).

Note

When an mbuf chain is sent to the upper layers, drivers must not access those mbufs anymore. For all intents and purposes, those mbufs have been freed.

To sum up, this for loop simply links together every mbuf in a received packet and then sends that to the upper layers. This continues until every packet in the ring has been processed or rx_process_limit is hit (rx_process_limit was described in Packet Reception in Packet Reception).

Here is the final part of em_rxeof:

...
        if (e1000_rx_unrefreshed(rxr))
                em_refresh_mbufs(rxr, i);

        rxr->next_to_check = i;
        if (done != NULL)
                *done = rxdone;
        EM_RX_UNLOCK(rxr);

      return ((status & E1000_RXD_STAT_DD) ? TRUE : FALSE);
}

If there are more packets to process, em_rxeof returns TRUE.

em_handle_rx Function

Recall that when em_rxeof returns TRUE, em_msix_rx queues a task structure (em_msix_rx was discussed in Packet Reception in Packet Reception).

Here is that task structure’s function:

static void
em_handle_rx(void *context, int pending)
{
        struct rx_ring *rxr = context;
        struct adapter *adapter = rxr->adapter;
        bool more;

        more = em_rxeof(rxr, adapter->rx_process_limit, NULL);
        if (more)
                taskqueue_enqueue(rxr->tq, &rxr->rx_task);
        else
                E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims);
}

This function is nearly identical to em_msix_rx. When there are more packets to process, em_rxeof just gets called again.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.248.149