Bug #2441
closedEthernetTransport drops frames under heavy load
Added by susmit shannigrahi almost 10 years ago. Updated about 5 years ago.
100%
Description
Confirming fragment drop on Ethernet face with NDNLP.
I have added print statements to the partial message store (attached).
When I try to send/receive large packets (2x5MB for this specific test), I see fragment losses.
Also, the sender side crashes after sending the fragments.
Producer:
ndnputbigchunk /test 5000000 2
Consumer:
nfd-start
nfdc add-nexthop /test 257
ndncatbigchunks /test -c 2 -p 2
Error:
***** Gap detected ***** Current fragment: 2843Last fragment:2839 Total loss169
1422479959.704064 TRACE: [EthernetFace] [id:257,endpoint:em4] Received: 1500 bytes from c8:1f:66:e6:7e:3a
***** Gap detected ***** Current fragment: 2848Last fragment:2843 Total loss173
1422479959.704110 TRACE: [EthernetFace] [id:257,endpoint:em4] Received: 1500 bytes from c8:1f:66:e6:7e:3a
Files
ndnputbigchunk.cpp (4.34 KB) ndnputbigchunk.cpp | Producer | susmit shannigrahi, 01/28/2015 03:45 PM | |
ndncatbigchunks.cpp (5.63 KB) ndncatbigchunks.cpp | Consumer | susmit shannigrahi, 01/28/2015 03:45 PM | |
ndnlp-partial-message-store.cpp (3.56 KB) ndnlp-partial-message-store.cpp | Changed to log gaps in fragment sequence | susmit shannigrahi, 01/28/2015 03:48 PM | |
tlv.hpp (14.2 KB) tlv.hpp | Increase MAX_NDN_PACKET_SIZE to 10MB | susmit shannigrahi, 01/28/2015 03:50 PM | |
sender_trace.txt (927 Bytes) sender_trace.txt | GDB trace from producer | susmit shannigrahi, 01/28/2015 04:36 PM | |
nfd_trace.txt (5.04 KB) nfd_trace.txt | GDB trace from NFD | susmit shannigrahi, 01/28/2015 04:36 PM | |
nfd_producer_crashlog.txt (426 KB) nfd_producer_crashlog.txt | Producer side log of crashed NFD | susmit shannigrahi, 01/29/2015 10:25 AM | |
loss.txt (37.1 KB) loss.txt | susmit shannigrahi, 01/29/2015 01:31 PM |
Updated by Junxiao Shi almost 10 years ago
Please attach pcap captures from send machine and receive machine.
Please run sender in GDB, and obtain a stacktrace after it crashes.
Updated by Davide Pesavento almost 10 years ago
- Category set to Faces
- Start date deleted (
01/28/2015)
What's the MTU of the interface? Also please attach a pcap trace of the packet exchange.
Updated by susmit shannigrahi almost 10 years ago
- File sender_trace.txt sender_trace.txt added
- File nfd_trace.txt nfd_trace.txt added
The MTU was 1500 bytes.The receiver.cap and sender.cap are below. Can not attach them due to size constrain.
http://www.cs.colostate.edu/~susmit/ndn/sender.cap
http://www.cs.colostate.edu/~susmit/ndn/receiver.cap
Error in sender: sender_trace.txt
Error in NFD: nfd_trace.txt
Updated by Junxiao Shi almost 10 years ago
pcap traces show that all 4016 fragments are successfully sent, and all of them are received.
One possibility is that NFD EthernetFace cannot process fragments fast enough, so that some packets are dropped by the kernel.
Updated by Davide Pesavento almost 10 years ago
Junxiao Shi wrote:
One possibility is that NFD EthernetFace cannot process fragments fast enough, so that some packets are dropped by the kernel.
Yep, I had the same thought. There's limited buffer space in the TPACKET implementation. If the application is unable to read the fragments fast enough, eventually the buffer will fill up and the kernel will start dropping incoming packets.
Updated by Junxiao Shi almost 10 years ago
There's limited buffer space in the TPACKET implementation. If the application is unable to read the fragments fast enough, eventually the buffer will fill up and the kernel will start dropping incoming packets.
Is there a way to confirm whether this situation is happening, and to what extent?
Also, is the sender crash caused by writing packets too fast and filling up some buffer?
Updated by Davide Pesavento almost 10 years ago
Junxiao Shi wrote:
Is there a way to confirm whether this situation is happening, and to what extent?
Yes, pcap_stats(3)
.
Also, is the sender crash caused by writing packets too fast and filling up some buffer?
I doubt it. In fact the stack trace shows that the crash is happening during a handleRead
call... there are some too low values for this
in the topmost 3 frames, suggesting a null pointer somewhere.
Updated by Davide Pesavento almost 10 years ago
susmit, what does NFD print before crashing? Please attach a full log file of a crashing scenario.
Updated by Davide Pesavento almost 10 years ago
You can also try increasing the ring buffer size (the libpcap default is 2M on Linux):
pcap_set_buffer_size(m_pcap.get(), 10*1024*1024); // e.g. 10 MB
Note that this call must happen before pcap_activate
or it will fail.
Updated by Davide Pesavento almost 10 years ago
Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.
Updated by Davide Pesavento almost 10 years ago
Davide Pesavento wrote:
Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.
Make sure you build NFD in debug mode (./waf configure --debug
)
Updated by susmit shannigrahi almost 10 years ago
what does NFD print before crashing? Please attach a full log file of a crashing scenario.
Attaching the log for crashing NFD at the producer side.
Updated by Davide Pesavento almost 10 years ago
This doesn't make any sense. The stack trace shows that the handleRead
callback is invoked, and execution arrives at least as far as the async_read_some
call before crashing. Thus there are two possibilities:
- We received something (a frame) and processed it in
processIncomingPacket
pcap_next_ex
returned 0 (read timeout)
In both cases one or more log messages should have been printed, but I see nothing in the log file, so... what the hell?
Another thing:
1422555794.215244 INFO: [FaceTable] Removed face id=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eno4
what happened here?
Updated by Davide Pesavento almost 10 years ago
Davide Pesavento wrote:
1422555794.215244 INFO: [FaceTable] Removed face id=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eno4
what happened here?
ah, it turns out that the failure reason passed to Face::fail
is never logged anywhere. I'll submit a patch shortly.
Updated by susmit shannigrahi almost 10 years ago
Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.
Make sure you build NFD in debug mode (
./waf configure --debug
)
Davide, something is printing the packets when I compile with --debug. I see a bunch of binary output on the terminal (Mostly As).
Updated by susmit shannigrahi almost 10 years ago
Attaching a partial loss log. The pcap/kernel does drop packets.
Updated by Davide Pesavento almost 10 years ago
Well there's not much we can do then... NFD is single-threaded. Try increasing the buffer size as I said in note-9, and see at what point the packet loss stops (or is significantly reduced).
Updated by susmit shannigrahi almost 10 years ago
The loss is not reduced even after increasing the said buffer.
Updated by Davide Pesavento almost 10 years ago
What size did you set it to? If you're sending two 5 MB packets, I expect you won't see any drops with a 20 MB buffer.
How many fragments (ethernet frames) are being sent per second?
Updated by Davide Pesavento almost 10 years ago
- Subject changed from NDNLP on Ethernet face drops fragments to EthernetFace drops frames under heavy load
- Priority changed from Normal to Low
- Target version set to Unsupported
Updated by Davide Pesavento almost 10 years ago
Is the crash still reproducible with the latest NFD?
Updated by Junxiao Shi over 7 years ago
- Status changed from New to Closed
I'll assume this issue has been solved because the reporter did not post a new log against the updated code.
Updated by Davide Pesavento over 7 years ago
- Subject changed from EthernetFace drops frames under heavy load to EthernetTransport drops frames under heavy load
- Assignee set to Davide Pesavento
- Priority changed from Low to Normal
- Target version changed from Unsupported to v0.6
Actually, while working on the unicast ethernet feature, I noticed that the number of slots in the TPACKET ring buffer is way too small, which can cause frame drops if NFD is unable to handle incoming frames quickly enough.
This needs to be investigated. Please reopen.
Updated by Davide Pesavento over 6 years ago
Davide Pesavento wrote:
the number of slots in the TPACKET ring buffer is way too small, which can cause frame drops if NFD is unable to handle incoming frames quickly enough.
This situation can be improved and packet loss alleviated by reducing the snaplen and/or increasing the buffer size.
Updated by Davide Pesavento about 5 years ago
- Status changed from New to In Progress
- Assignee changed from Davide Pesavento to Junxiao Shi
- Target version set to v0.7
- % Done changed from 0 to 50
Updated by Davide Pesavento about 5 years ago
- Status changed from In Progress to Closed
- % Done changed from 50 to 100