Troubleshooting Guide

This guide contains information on how to troubleshoot Aeron DPDK installations.

Aeron DPDK provides additional executables to help with troubleshooting:

  • dpdk_arp_tool: for debugging ARP issues.

  • dpdk_ping_raw: for testing connectivity between machines.

    # ping host: send ping and print stats
    $ sudo ./dpdk_ping_raw -- -s -l <ping_addr>:<ping_port> -r <pong_addr>:<pong_port> -m <pong_mac_addr>
    
    # pong host: respond to ping requests
    $ sudo ./dpdk_ping_raw -- -l <pong_addr>:<pong_port> -r <ping_addr>:<ping_port> -m <ping_mac_addr>
Both require root privileges to run.

Run either tool with -h to get the usage instructions, e.g.:

$ sudo ./dpdk_ping_raw -- -h
You need to pass parameters after double dash (--).

Observability

Aeron DPDK provides a set of DPDK counters which expose runtime state of the system. This includes Aeron-specific counters as well environment-specific DPDK counters.

Here is an example AeronStat output from one of our AWS tests:

 43:          453,803,278 - DPDK poll count SENDER
 44:          436,106,067 - DPDK poll count RECEIVER
 45:                    0 - DPDK - local: 10.0.10.191/6:ac:db:fb:56:2d rx: 1/4096 tx: 2/1024 65536
 46:            9,081,956 - DPDK TX packets
 47:        3,286,117,876 - DPDK TX bytes
 48:            9,072,742 - DPDK RX packets
 49:        3,285,730,128 - DPDK RX bytes
 50:                    0 - DPDK TX no buffers
 51:                    0 - DPDK RX no buffers
 52:                    0 - DPDK TX EAGAIN
 53:                    0 - DPDK TX ERROR
 54:                    0 - DPDK RX ERROR
 55:                    0 - DPDK RX H/W missed packets
 56:                    6 - DPDK RX Sender DISCARD
 57:                    0 - DPDK RX Sender Queue Drop
 58:                    1 - DPDK ARP Misses
 59:                    0 - DPDK Checksum Failures
 60:                    0 - DPDK Fragmented Packets
 61:                4,145 - DPDK RX Mempool Available
 62:                2,013 - DPDK TX Sender Mempool Available
 63:                2,001 - DPDK TX Receiver Mempool Available
 64:            9,072,751 - DPDK rx_good_packets
 65:            9,081,965 - DPDK tx_good_packets
 66:        3,285,733,386 - DPDK rx_good_bytes
 67:        3,286,121,134 - DPDK tx_good_bytes
 68:                    0 - DPDK rx_missed_errors
 69:                    0 - DPDK rx_errors
 70:                    0 - DPDK tx_errors
 71:                    0 - DPDK rx_mbuf_allocation_errors
 72:            9,072,752 - DPDK rx_q0_packets
 73:        3,285,733,748 - DPDK rx_q0_bytes
 74:                    0 - DPDK rx_q0_errors
 75:            9,076,425 - DPDK tx_q0_packets
 76:        3,285,689,378 - DPDK tx_q0_bytes
 77:                5,541 - DPDK tx_q1_packets
 78:              432,118 - DPDK tx_q1_bytes
 79:                    0 - DPDK wd_expired
 80:                    1 - DPDK dev_start
 81:                    0 - DPDK dev_stop
 82:                    0 - DPDK tx_drops
 83:                    0 - DPDK bw_in_allowance_exceeded
 84:                    0 - DPDK bw_out_allowance_exceeded
 85:                    0 - DPDK pps_allowance_exceeded
 86:                    0 - DPDK conntrack_allowance_exceeded
 87:                    0 - DPDK linklocal_allowance_exceeded
 88:            2,462,629 - DPDK conntrack_allowance_available
 89:                    0 - DPDK ena_srd_mode
 90:                    0 - DPDK ena_srd_tx_pkts
 91:                    0 - DPDK ena_srd_eligible_tx_pkts
 92:                    0 - DPDK ena_srd_rx_pkts
 93:                    0 - DPDK ena_srd_resource_utilization
 94:            9,072,749 - DPDK rx_q0_cnt
 95:        3,285,732,662 - DPDK rx_q0_bytes
 96:                    0 - DPDK rx_q0_refill_partial
 97:                    0 - DPDK rx_q0_l3_csum_bad
 98:                    0 - DPDK rx_q0_l4_csum_bad
 99:            9,072,746 - DPDK rx_q0_l4_csum_good
100:                    0 - DPDK rx_q0_mbuf_alloc_fail
101:                    0 - DPDK rx_q0_bad_desc_num
102:                    0 - DPDK rx_q0_bad_req_id
103:                    0 - DPDK rx_q0_bad_desc
104:                    0 - DPDK rx_q0_unknown_error
105:            9,076,423 - DPDK tx_q0_cnt
106:                5,541 - DPDK tx_q1_cnt
107:        3,285,688,654 - DPDK tx_q0_bytes
108:              432,118 - DPDK tx_q1_bytes
109:                    0 - DPDK tx_q0_prepare_ctx_err
110:                    0 - DPDK tx_q1_prepare_ctx_err
111:            9,076,423 - DPDK tx_q0_tx_poll
112:                5,541 - DPDK tx_q1_tx_poll
113:            9,076,423 - DPDK tx_q0_doorbells
114:                5,541 - DPDK tx_q1_doorbells
115:                    0 - DPDK tx_q0_bad_req_id
116:                    0 - DPDK tx_q1_bad_req_id
117:                  942 - DPDK tx_q0_available_desc
118:                  986 - DPDK tx_q1_available_desc
119:                    0 - DPDK tx_q0_missed_tx
120:                    0 - DPDK tx_q1_missed_tx

Common issues

Invalid gateway

If AERON_DPDK_GATEWAY_IPV4_ADDRESS is set incorrectly it will prevent any Aeron DPDK traffic from reaching its destination.

It manifests itself as continuously increasing value of the DPDK ARP Misses counter. If configured correctly this counter should remain at value 1.

On AWS gateway (AERON_DPDK_GATEWAY_IPV4_ADDRESS) and local IP address (AERON_DPDK_LOCAL_IPV4_ADDRESS) should be omitted entirely as Aeron can auto resolve both.

Connectivity issues

Other than the (see Invalid gateway) issue there are other possible causes for connectivity issues:

  • Network configuration: routing, firewall etc.

    Verify that machines can see each other and that UDP traffic is allowed between the boxes on the DPDK interfaces.

    Two possible approaches to take:

    • Use dpdk_ping_raw to send DPDK traffic between the machines.

    • Unbind DPDK interface and use any tool (e.g. iperf) that can send UDP traffic.

      This requires reversing the steps outlined in the Device and Driver Configuration section:

      1. Find the interface

        $ ./dpdk-devbind.py --status-dev net
        
        Network devices using DPDK-compatible driver
        ============================================
        0000:48:00.0 '82598EB 10-Gigabit AF Dual Port Network Connection 10f1' drv=vfio-pci unused=ixgbe

        In this example the only DPDK interface was 0000:48:00.0

      2. Unbind the interface found in step a.

        $ sudo ./dpdk-devbind.py -u 0000:48:00.0
        $ ./dpdk-devbind.py --status-dev net
        
        Network devices using kernel driver
        ===================================
        0000:01:00.0 'MT27800 Family [ConnectX-5] 1017' if=enp1s0f0 drv=mlx5_core unused=vfio-pci *Active*
        0000:01:00.1 'MT27800 Family [ConnectX-5] 1017' if=enp1s0f1 drv=mlx5_core unused=vfio-pci
        0000:43:00.0 'I211 Gigabit Network Connection 1539' if=enp67s0 drv=igb unused=vfio-pci *Active*
        0000:44:00.0 'Wi-Fi 6 AX200 2723' if=wlo2 drv=iwlwifi unused=vfio-pci
        0000:48:00.0 '82598EB 10-Gigabit AF Dual Port Network Connection 10f1' if=enp72s0f0 drv=ixgbe unused=vfio-pci
        0000:48:00.1 '82598EB 10-Gigabit AF Dual Port Network Connection 10f1' if=enp72s0f1 drv=ixgbe unused=vfio-pci

        Now 0000:48:00.0 is visible to Linux kernel again under enp72s0f0 name.

      3. Enable the link from step b.

        $ sudo ip link set enp72s0f0 up
      4. Verify connectivity using standard Linux tools.

        After the test follow the Device and Driver Configuration section to undo the changes.

  • DPDK-specific issues

    If the network test was successful then the issue might be DPDK-specific in which case DPDK counters (see Observability) might provide additional context.

Packet capture

Packet capture must be enabled at the source level and requires specialized tools execute. See DPDK packet capture libraries and tools guide for more information.

It is not possible to capture packets using aeronmd_dpdk. However, both dpdk_arp_tool and dpdk_ping_raw are compiled with packet capture support.