Bug #2441: EthernetTransport drops frames under heavy load - NFD - NDN project issue tracking system

Actions

Copy link

Bug #2441

closed

EthernetTransport drops frames under heavy load

Added by susmit shannigrahi over 11 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Junxiao Shi

Category:

Faces

Target version:

v0.7

Start date:

Due date:

% Done:

100%

Estimated time:

Description

Confirming fragment drop on Ethernet face with NDNLP.
I have added print statements to the partial message store (attached).
When I try to send/receive large packets (2x5MB for this specific test), I see fragment losses.
Also, the sender side crashes after sending the fragments.

Producer:

 ndnputbigchunk /test 5000000 2

Consumer:

nfd-start
nfdc add-nexthop /test 257
ndncatbigchunks /test -c 2 -p 2

Error:

 ***** Gap detected ***** Current fragment: 2843Last fragment:2839 Total loss169
1422479959.704064 TRACE: [EthernetFace] [id:257,endpoint:em4] Received: 1500 bytes from c8:1f:66:e6:7e:3a
***** Gap detected ***** Current fragment: 2848Last fragment:2843 Total loss173
1422479959.704110 TRACE: [EthernetFace] [id:257,endpoint:em4] Received: 1500 bytes from c8:1f:66:e6:7e:3a

Files

Download all files

ndnputbigchunk.cpp (4.34 KB) ndnputbigchunk.cpp	Producer	susmit shannigrahi, 01/28/2015 03:45 PM
ndncatbigchunks.cpp (5.63 KB) ndncatbigchunks.cpp	Consumer	susmit shannigrahi, 01/28/2015 03:45 PM
ndnlp-partial-message-store.cpp (3.56 KB) ndnlp-partial-message-store.cpp	Changed to log gaps in fragment sequence	susmit shannigrahi, 01/28/2015 03:48 PM
tlv.hpp (14.2 KB) tlv.hpp	Increase MAX_NDN_PACKET_SIZE to 10MB	susmit shannigrahi, 01/28/2015 03:50 PM
sender_trace.txt (927 Bytes) sender_trace.txt	GDB trace from producer	susmit shannigrahi, 01/28/2015 04:36 PM
nfd_trace.txt (5.04 KB) nfd_trace.txt	GDB trace from NFD	susmit shannigrahi, 01/28/2015 04:36 PM
nfd_producer_crashlog.txt (426 KB) nfd_producer_crashlog.txt	Producer side log of crashed NFD	susmit shannigrahi, 01/29/2015 10:25 AM
loss.txt (37.1 KB) loss.txt		susmit shannigrahi, 01/29/2015 01:31 PM

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

Please attach pcap captures from send machine and receive machine.

Please run sender in GDB, and obtain a stacktrace after it crashes.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

Category set to Faces
Start date deleted (~~01/28/2015~~)

What's the MTU of the interface? Also please attach a pcap trace of the packet exchange.

Actions

Copy link Download all files

Updated by susmit shannigrahi over 11 years ago

File sender_trace.txt sender_trace.txt added
File nfd_trace.txt nfd_trace.txt added

The MTU was 1500 bytes.The receiver.cap and sender.cap are below. Can not attach them due to size constrain.

http://www.cs.colostate.edu/~susmit/ndn/sender.cap

http://www.cs.colostate.edu/~susmit/ndn/receiver.cap

Error in sender: sender_trace.txt

Error in NFD: nfd_trace.txt

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

pcap traces show that all 4016 fragments are successfully sent, and all of them are received.

One possibility is that NFD EthernetFace cannot process fragments fast enough, so that some packets are dropped by the kernel.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

Junxiao Shi wrote:

One possibility is that NFD EthernetFace cannot process fragments fast enough, so that some packets are dropped by the kernel.

Yep, I had the same thought. There's limited buffer space in the TPACKET implementation. If the application is unable to read the fragments fast enough, eventually the buffer will fill up and the kernel will start dropping incoming packets.

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

There's limited buffer space in the TPACKET implementation. If the application is unable to read the fragments fast enough, eventually the buffer will fill up and the kernel will start dropping incoming packets.

Is there a way to confirm whether this situation is happening, and to what extent?

Also, is the sender crash caused by writing packets too fast and filling up some buffer?

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

Junxiao Shi wrote:

Is there a way to confirm whether this situation is happening, and to what extent?

Yes, pcap_stats(3).

Also, is the sender crash caused by writing packets too fast and filling up some buffer?

I doubt it. In fact the stack trace shows that the crash is happening during a handleRead call... there are some too low values for this in the topmost 3 frames, suggesting a null pointer somewhere.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

susmit, what does NFD print before crashing? Please attach a full log file of a crashing scenario.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

You can also try increasing the ring buffer size (the libpcap default is 2M on Linux):

pcap_set_buffer_size(m_pcap.get(), 10*1024*1024); // e.g. 10 MB

Note that this call must happen before pcap_activate or it will fail.

Actions

Copy link

#10

Updated by Davide Pesavento over 11 years ago

Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.

Actions

Copy link

#11

Updated by Davide Pesavento over 11 years ago

Davide Pesavento wrote:

Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.

Make sure you build NFD in debug mode (./waf configure --debug)

Actions

Copy link

#12

Updated by susmit shannigrahi over 11 years ago

File nfd_producer_crashlog.txt nfd_producer_crashlog.txt added

what does NFD print before crashing? Please attach a full log file of a crashing scenario.

Attaching the log for crashing NFD at the producer side.

Actions

Copy link

#13

Updated by Davide Pesavento over 11 years ago

This doesn't make any sense. The stack trace shows that the handleRead callback is invoked, and execution arrives at least as far as the async_read_some call before crashing. Thus there are two possibilities:

We received something (a frame) and processed it in processIncomingPacket
pcap_next_ex returned 0 (read timeout)

In both cases one or more log messages should have been printed, but I see nothing in the log file, so... what the hell?

Another thing:

1422555794.215244 INFO: [FaceTable] Removed face id=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eno4

what happened here?

Actions

Copy link

#14

Updated by Davide Pesavento over 11 years ago

Davide Pesavento wrote:

1422555794.215244 INFO: [FaceTable] Removed face id=258 remote=ether://[01:00:5e:00:17:aa] local=dev://eno4

what happened here?

ah, it turns out that the failure reason passed to Face::fail is never logged anywhere. I'll submit a patch shortly.

Actions

Copy link

#15

Updated by susmit shannigrahi over 11 years ago

Apply this patch http://gerrit.named-data.net/1692 and see if it reports any packet drop.

Make sure you build NFD in debug mode (./waf configure --debug)

Davide, something is printing the packets when I compile with --debug. I see a bunch of binary output on the terminal (Mostly As).

Actions

Copy link

#16

Updated by susmit shannigrahi over 11 years ago

File loss.txt loss.txt added

Attaching a partial loss log. The pcap/kernel does drop packets.

Actions

Copy link

#17

Updated by Davide Pesavento over 11 years ago

Well there's not much we can do then... NFD is single-threaded. Try increasing the buffer size as I said in note-9, and see at what point the packet loss stops (or is significantly reduced).

Actions

Copy link

#18

Updated by susmit shannigrahi over 11 years ago

The loss is not reduced even after increasing the said buffer.

Actions

Copy link

#19

Updated by Davide Pesavento over 11 years ago

What size did you set it to? If you're sending two 5 MB packets, I expect you won't see any drops with a 20 MB buffer.

How many fragments (ethernet frames) are being sent per second?

Actions

Copy link

#20

Updated by Davide Pesavento over 11 years ago

Subject changed from NDNLP on Ethernet face drops fragments to EthernetFace drops frames under heavy load
Priority changed from Normal to Low
Target version set to Unsupported

Actions

Copy link

#21

Updated by Davide Pesavento over 11 years ago

Is the crash still reproducible with the latest NFD?

Actions

Copy link

#22

Updated by Junxiao Shi about 9 years ago

Status changed from New to Closed

I'll assume this issue has been solved because the reporter did not post a new log against the updated code.

Actions

Copy link

#23

Updated by Davide Pesavento about 9 years ago

Subject changed from EthernetFace drops frames under heavy load to EthernetTransport drops frames under heavy load
Assignee set to Davide Pesavento
Priority changed from Low to Normal
Target version changed from Unsupported to v0.6

Actually, while working on the unicast ethernet feature, I noticed that the number of slots in the TPACKET ring buffer is way too small, which can cause frame drops if NFD is unable to handle incoming frames quickly enough.

This needs to be investigated. Please reopen.

Actions

Copy link

#24

Updated by Junxiao Shi about 9 years ago

Status changed from Closed to New

Actions

Copy link

#25

Updated by Davide Pesavento over 8 years ago

Target version deleted (~~v0.6~~)

Actions

Copy link

#26

Updated by Davide Pesavento about 8 years ago

Description updated (diff)

Actions

Copy link

#27

Updated by Davide Pesavento almost 8 years ago

Davide Pesavento wrote:

the number of slots in the TPACKET ring buffer is way too small, which can cause frame drops if NFD is unable to handle incoming frames quickly enough.

This situation can be improved and packet loss alleviated by reducing the snaplen and/or increasing the buffer size.

Actions

Copy link

#28

Updated by Davide Pesavento almost 7 years ago

Status changed from New to In Progress
Assignee changed from Davide Pesavento to Junxiao Shi
Target version set to v0.7
% Done changed from 0 to 50

https://gerrit.named-data.net/c/NFD/+/5706

Actions

Copy link

#29

Updated by Davide Pesavento almost 7 years ago

Status changed from In Progress to Closed
% Done changed from 50 to 100

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #2441

EthernetTransport drops frames under heavy load

Updated by Junxiao Shi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by susmit shannigrahi over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by susmit shannigrahi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by susmit shannigrahi over 11 years ago

Updated by susmit shannigrahi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by susmit shannigrahi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Junxiao Shi about 9 years ago

Updated by Davide Pesavento about 9 years ago

Updated by Junxiao Shi about 9 years ago

Updated by Davide Pesavento over 8 years ago

Updated by Davide Pesavento about 8 years ago

Updated by Davide Pesavento almost 8 years ago

Updated by Davide Pesavento almost 7 years ago

Updated by Davide Pesavento almost 7 years ago