EthernetFace cannot receive on Ubuntu 14.04
Ubuntu 14.04 Server 64-bit
Steps to reproduce:
- start NFD and NRD on A and B
- on A, start
- on B, run
nfd-statusto observe FaceId of the Ethernet multicast face connected on A-B link
- on B, invoke
nfdc add-nexthop / <etherFaceId>
- on B, start
Expected: ndnping displays "Content from /A"
Actual: ndnping displays "Timeout from /A", host A's NFD complains "1398054254.409812 WARNING: [EthernetFace] [id:7,endpoint:eth2] pcap_next_ex() timed out"
Note: on both A and B,
sudo tcpdump -p -i ifname shows Ethernet frames from B to multicast group
#1 Updated by Davide Pesavento over 5 years ago
Junxiao Shi wrote:
"1398054254.409812 WARNING: [EthernetFace] [id:7,endpoint:eth2] pcap_next_ex() timed out"
Note: on both A and B,
sudo tcpdump -p -i eth1shows Ethernet frames between two hosts
So you're saying that the EthernetFace endpoint is eth2 but tcpdump sees packets on eth1?
#3 Updated by Davide Pesavento over 5 years ago
I think I figured out what's going on here. It's a case of bad interaction between boost::asio, libpcap, and the kernel (though IMHO most of the fault lies in the kernel).
The problem is triggered by the usage of the new TPACKET_V3 kernel code for AF_PACKET sockets and its associated semantics. TPACKET_V3 was introduced in Linux 3.2 but not used until libpcap 1.5.0 (this explains why prior versions of ubuntu are unaffected).
A bit of background... In TPACKET_V3, the RX ring buffer consists of multiple "slots", each of which is marked to belong either to the kernel or userland. As packets arrive, the kernel starts filling the first non-full slot marked for kernel usage, until there's no room left in the slot. At that point, the slot is handed to userland, and the kernel starts filling the next available slot in the buffer. If there are no buffer slots available (i.e. all slots are marked as belonging to userland), incoming packets are dropped. In addition to handing over slots to userland when they're full, the kernel also marks a slot for userland when an internal timeout expires, even if the current slot is empty. The default timeout is 8ms, but usually it's dynamically adjusted depending on line speed. However, it appears that the user process waiting on the packet socket fd is woken up only when a packet is put in the buffer or when a packet is dropped due to the buffer being full.
Therefore what happens here is that boost::asio event loop enters a wait (
epoll_wait on linux, according to strace) on the EthernetFace's fd (dup'ed from libpcap's fd), triggered by an earlier
async_read_some call. There are no incoming packets yet, so the kernel's TPACKET_V3 code repeatedly times out without filling any buffer slots, and eventually ends up marking all slots as belonging to userland, thus running out of buffer space. Under this condition, arriving packets must be dropped, and userland notified. So only at this point, when the ping interest arrives and is dropped by the kernel, nfd wakes up and
EthernetFace::handleRead is invoked. The handler calls
pcap_next_ex which however finds that the buffer slots are empty and returns 0, causing the warning print. Finally nfd goes back to sleep waiting on the socket, but is bound to fail again when the next packet arrives if the incoming rate is sufficiently low.
I'm open to suggestions for an efficient solution to this problem, in a way that avoids polling but which also doesn't add noticeable latency to the packet receive path.
#5 Updated by Davide Pesavento over 5 years ago
libpcap itself worked around the issue essentially with polling, i.e. by default
poll() is called with a timeout of 1ms and you cannot "wait forever" with TPACKET_V3 (see https://github.com/the-tcpdump-group/libpcap/commit/ee4085152260466ea845d9e9109a251a39ded93b). This workaround doesn't work for us because we're not using libpcap's routines for the event loop.
Other network monitoring or packet capturing apps such as tcpdump, wireshark and dumpcap have a radically different purpose and requirements, and I wouldn't be surprised if they raised the timeout to a few tens or hundreds of milliseconds in order to be more efficient. Maybe they already did that before the behavior changed. However introducing such a long delay at the beginning of the packet processing path is IMO unacceptable for an application like nfd.
#9 Updated by Davide Pesavento over 5 years ago
Davide Pesavento wrote:
Other network monitoring or packet capturing apps such as tcpdump, wireshark and dumpcap have a radically different purpose and requirements, and I wouldn't be surprised if they raised the timeout to a few tens or hundreds of milliseconds in order to be more efficient.
For the record, I checked tcpdump, it uses a timeout of 1 second.
#10 Updated by Davide Pesavento over 5 years ago
- Status changed from New to Code review
- % Done changed from 0 to 100
I submitted http://gerrit.named-data.net/#/c/800/ which in practice forces libpcap to use the old TPACKET_V2 memory-mapped interface on Linux, thus solving this issue. Unfortunately
pcap_set_immediate_mode() is new in libpcap 1.5.0 and is almost undocumented (there's a man page for the function saying that it enables "immediate mode" but it doesn't explain what "immediate mode" actually means), so I had to look at the sources to figure out what really happens under the hood.
#12 Updated by Davide Pesavento over 4 years ago
For the record, the kernel behavior with TPACKET_V3 has recently been fixed: http://www.spinics.net/lists/netdev/msg309291.html