This chapter examines the packet reception and transmission components of em(4)
. Predictably, em(4)
uses both mbufs and MSI for packet reception and transmission.
When an interface receives a packet, it sends an interrupt. Naturally, this causes its interrupt handler to execute. For example, here is what executes in em(4)
:
static void em_msix_rx(void *arg) { struct rx_ring *rxr = arg; struct adapter *adapter = rxr->adapter; bool more; ++rxr->rx_irq; more = em_rxeof(rxr, adapter->rx_process_limit, NULL); if (more) taskqueue_enqueue(rxr->tq, &rxr->rx_task); else E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); }
This function takes a pointer to a ring buffer that contains one or more received packets, and calls em_rxeof
to process those packets. If there are more than rx_process_limit
packets, a task
structure is queued; otherwise, this interrupt is reenabled. I’ll discuss the task
structure and its associated function in em_handle_rx Function in em_handle_rx Function.
As mentioned previously, em_rxeof
processes received packets. Its function definition is listed below, but because this function is fairly long and involved, I’ll introduce it in parts. Here is the first part:
static bool em_rxeof(struct rx_ring *rxr, int count, int *done) { struct adapter *adapter = rxr->adapter; struct ifnet *ifp = adapter->ifp; struct e1000_rx_desc *cur; struct mbuf *mp, *sendmp; u8 status = 0; u16 len; int i, processed, rxdone = 0; bool eop; EM_RX_LOCK(rxr); for (i = rxr->next_to_check, processed = 0; count != 0; ) { if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0) break; bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map, BUS_DMASYNC_POSTREAD); mp = sendmp = NULL; cur = &rxr->rx_base[i]; status = cur->status; if ((status & E1000_RXD_STAT_DD) == 0) break; len = le16toh(cur->length); eop = (status & E1000_RXD_STAT_EOP) != 0; if ((cur->errors & E1000_RXD_ERR_FRAME_ERR_MASK) || (rxr->discard == TRUE)) { ++ifp->if_ierrors; ++rxr->rx_discarded; if (!eop) rxr->discard = TRUE; else rxr->discard = FALSE; em_rx_discard(rxr, i); goto next_desc; } ...
This function’s execution is contained primarily within a for
loop. This loop begins by verifying that the interface is up and running. Then it synchronizes the DMA buffer currently loaded in rxr->rxdma.dma_map
, which is rxr->rx_base
.
The buffer rxr->rx_base[i]
contains a descriptor that describes a received packet. When a packet spans multiple mbufs, rxr->rx_base[i]
describes one mbuf in the chain.
If rxr->rx_base[i]
lacks the E1000_RXD_STAT_DD
flag, the for
loop exits. (The E1000_RXD_STAT_DD
flag stands for receive descriptor status: descriptor done. We’ll see its effects shortly.)
If rxr->rx_base[i]
describes the last mbuf in the chain, the Boolean variable eop
, which stands for end of packet, is set to TRUE
. (Needless to say, when a packet requires only one mbuf, that mbuf is still the last mbuf in the chain.)
If the packet described by rxr->rx_base[i]
contains any errors, it is discarded. Note that I use the word packet, not mbuf, here, because every mbuf in the packet is discarded.
Now let’s look at the next part of em_rxeof
:
... mp = rxr->rx_buffers[i].m_head; mp->m_len = len; rxr->rx_buffers[i].m_head = NULL; if (rxr->fmp == NULL) { mp->m_pkthdr.len = len; rxr->fmp = rxr->lmp = mp; } else { mp->m_flags &= ˜M_PKTHDR; rxr->lmp->m_next = mp; rxr->lmp = mp; rxr->fmp->m_pkthdr.len += len; } ...
Here, rxr->fmp
and rxr->lmp
point to the first and last mbuf in the chain, mp
is the mbuf described by rxr->rx_base[i]
, and len
is mp
’s length.
So, this part simply identifies whether mp
is the first mbuf in the chain. If it is not, then mp
is linked into the chain.
Here is the next part of em_rxeof
:
... if (eop) { --count; sendmp = rxr->fmp; sendmp->m_pkthdr.rcvif = ifp; ++ifp->if_ipackets; em_receive_checksum(cur, sendmp); #ifndef __NO_STRICT_ALIGNMENT if (adapter->max_frame_size > (MCLBYTES - ETHER_ALIGN) && em_fixup_rx(rxr) != 0) goto skip; #endif if (status & E1000_RXD_STAT_VP) { sendmp->m_pkthdr.ether_vtag = le16toh(cur->special) & E1000_RXD_SPC_VLAN_MASK; sendmp->m_flags |= M_VLANTAG; } #ifndef __NO_STRICT_ALIGNMENT skip: #endif rxr->fmp = rxr->lmp = NULL; } ...
If mp
is the last mbuf in the chain, sendmp
is set to the first mbuf in the chain, and the header checksum is verified.
If our architecture requires strict alignment and jumbo frames are enabled, em_rxeof
aligns the mbuf chain. (Jumbo frames are Ethernet packets with more than 1500 bytes of data.)
This part concludes by setting rxr->fmp
and rxr->lmp
to NULL
. Here is the next part of em_rxeof
:
... next_desc: cur->status = 0; ++rxdone; ++processed; if (++i == adapter->num_rx_desc) i = 0; if (sendmp != NULL) { rxr->next_to_check = i; EM_RX_UNLOCK(rxr); (*ifp->if_input)(ifp, sendmp); EM_RX_LOCK(rxr); i = rxr->next_to_check; } if (processed == 8) { em_refresh_mbufs(rxr, i); processed = 0; } } /* The end of the for loop. */ ...
Here, i
is incremented so that em_rxeof
can get to the next mbuf in the ring. Then, if sendmp
points to an mbuf chain, em(4)
’s input routine is executed to send that chain to the upper layers. Afterward, new mbufs are allocated for em(4)
.
When an mbuf chain is sent to the upper layers, drivers must not access those mbufs anymore. For all intents and purposes, those mbufs have been freed.
To sum up, this for
loop simply links together every mbuf in a received packet and then sends that to the upper layers. This continues until every packet in the ring has been processed or rx_process_limit
is hit (rx_process_limit
was described in Packet Reception in Packet Reception).
Here is the final part of em_rxeof
:
... if (e1000_rx_unrefreshed(rxr)) em_refresh_mbufs(rxr, i); rxr->next_to_check = i; if (done != NULL) *done = rxdone; EM_RX_UNLOCK(rxr); return ((status & E1000_RXD_STAT_DD) ? TRUE : FALSE); }
If there are more packets to process, em_rxeof
returns TRUE
.
Recall that when em_rxeof
returns TRUE, em_msix_rx
queues a task structure (em_msix_rx
was discussed in Packet Reception in Packet Reception).
Here is that task
structure’s function:
static void em_handle_rx(void *context, int pending) { struct rx_ring *rxr = context; struct adapter *adapter = rxr->adapter; bool more; more = em_rxeof(rxr, adapter->rx_process_limit, NULL); if (more) taskqueue_enqueue(rxr->tq, &rxr->rx_task); else E1000_WRITE_REG(&adapter->hw, E1000_IMS, rxr->ims); }
This function is nearly identical to em_msix_rx
. When there are more packets to process, em_rxeof
just gets called again.
18.119.248.149