Bug #5118: Timeout/packet loss on docker - ndn-tools - NDN project issue tracking system

Actions

Copy link

Bug #5118

closed

Timeout/packet loss on docker

Added by susmit shannigrahi about 5 years ago. Updated almost 5 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Start date:

Due date:

% Done:

Estimated time:

Description

I am not sure if this is a tools problem or NFD problem.

Here is what is happening:

We are trying to containerize NDN and deploy it on Pacific Research Platform (PRP). The same setup works on other platforms but herepacket retrieval starts and then stops after 50 packets.
Sometimes we are seeing the route being dropped and other times, the route is there but packet is being dropped.

We looked at the pcap and NFD logs but everything looks normal. We tried both TCP and UDP tunnels and the same problem occurs.

This is reproducible every time we run the commands.


Image: cbmckni/ndn-tools:latest

/usr/local/bin/nfd -c /workspace/ndn/nfd.conf > /workspace/ndn/debug.log 2>&1

nfdc face create tcp4://atmos-csu.research-lan.colostate.edu | tee -a /logs/ndn-debug.log

nfdc route add /BIOLOGY tcp4://atmos-csu.research-lan.colostate.edu | tee -a /logs/ndn-debug.log

ndncatchunks -v /BIOLOGY/SRA/9605/9606/NaN/RNA-Seq/ILLUMINA/TRANSCRIPTOMIC/PAIRED/Kidney/PRJNA359795/SRP095950/SRX2458154/SRR5139395/SRR5139395_1 |& tee -a /logs/ndn-debug.log

# failed after ~50 segments, route/face removed for some reason

ndncatchunks -v /BIOLOGY/SRA/9605/9606/NaN/RNA-Seq/ILLUMINA/TRANSCRIPTOMIC/PAIRED/Kidney/PRJNA359795/SRP095950/SRX2458154/SRR5139395/SRR5139395_1 |& tee -a /logs/ndn-debug.log

# failed immediately, no route

nfdc face create tcp4://atmos-csu.research-lan.colostate.edu | tee -a /logs/ndn-debug.log

nfdc route add /BIOLOGY tcp4://atmos-csu.research-lan.colostate.edu | tee -a /logs/ndn-debug.log

ndncatchunks -v /BIOLOGY/SRA/9605/9606/NaN/RNA-Seq/ILLUMINA/TRANSCRIPTOMIC/PAIRED/Kidney/PRJNA359795/SRP095950/SRX2458154/SRR5139395/SRR5139395_1 |& tee -a /logs/ndn-debug.log

# failed after ~50 segments, but route/face not removed this time

ndncatchunks -v /BIOLOGY/SRA/9605/9606/NaN/RNA-Seq/ILLUMINA/TRANSCRIPTOMIC/PAIRED/Kidney/PRJNA359795/SRP095950/SRX2458154/SRR5139395/SRR5139395_1 |& tee -a /logs/ndn-debug.log

# failed immediately, timeout

Files

Download all files

putchunks.log (423 Bytes) putchunks.log		susmit shannigrahi, 06/10/2020 03:31 PM
catchunks-debug.log (4 KB) catchunks-debug.log		susmit shannigrahi, 06/10/2020 03:31 PM
serverside-nfd.log (1.05 MB) serverside-nfd.log		susmit shannigrahi, 06/10/2020 03:31 PM
client-side-nfd--debug.log (382 KB) client-side-nfd--debug.log		susmit shannigrahi, 06/10/2020 03:31 PM
client-side-pcap.pcap (744 KB) client-side-pcap.pcap		susmit shannigrahi, 06/10/2020 03:32 PM
20200613.pcap (2.89 KB) 20200613.pcap		Junxiao Shi, 06/13/2020 01:21 PM
nfd.log (712 KB) nfd.log		susmit shannigrahi, 06/22/2020 06:24 AM
ndn.log (31 KB) ndn.log		susmit shannigrahi, 06/22/2020 06:24 AM
debug.pcap (3.87 MB) debug.pcap		susmit shannigrahi, 06/22/2020 06:24 AM
debug.pcap (64 KB) debug.pcap		susmit shannigrahi, 06/22/2020 06:24 AM
ndn.log (143 KB) ndn.log		susmit shannigrahi, 06/22/2020 06:24 AM
nfd.log (4.77 MB) nfd.log		susmit shannigrahi, 06/22/2020 06:24 AM
server-side-nfd.log (791 KB) server-side-nfd.log		susmit shannigrahi, 06/22/2020 06:38 AM

Actions

Copy link

Updated by Junxiao Shi about 5 years ago

File 20200613.pcap 20200613.pcap added

I tried the published container and couldn't reproduce the bug.
It seems that the Data packet is missing on the server, so that the server responds with Nack.
tcpdump is attached.

Actions

Copy link

Updated by susmit shannigrahi about 5 years ago

It seems that the Data packet is missing on the server, so that the server responds with Nack.

So I verified there is no packet missing. If we pull with a simple pipeline (-s 5), it completes. This has something to do with rate control.

ndnping fails if we push it too hard (1ms interval). Once it fails, nfd needs to be restarted.

Please find the logs - I do see link layer congestion markings.

Actions

Copy link Download all files

Updated by susmit shannigrahi about 5 years ago

File nfd.log nfd.log added
File debug.pcap debug.pcap added
File ndn.log ndn.log added

Succesful logs.

Actions

Copy link Download all files

Updated by susmit shannigrahi about 5 years ago

File nfd.log nfd.log added
File debug.pcap debug.pcap added
File ndn.log ndn.log added

Failed logs - client side.

Actions

Copy link

Updated by susmit shannigrahi about 5 years ago

File server-side-nfd.log server-side-nfd.log added

NDN log server side - pcap is too large. I can host it somewhere if needed.

1592253354.344069 DEBUG: [nfd.GenericLinkService] [id=301,local=tcp4://129.82.175.10:6363,remote=tcp4://209.129.248.194:41640] Send queue length dropped below congestion threshold
1592253502.193208 DEBUG: [nfd.GenericLinkService] [id=301,local=tcp4://129.82.175.10:6363,remote=tcp4://209.129.248.194:41640] Send queue length dropped below congestion threshold
1592253510.434830 DEBUG: [nfd.GenericLinkService] [id=301,local=tcp4://129.82.175.10:6363,remote=tcp4://209.129.248.194:41640] Send queue length dropped below congestion threshold
1592253510.476897 DEBUG: [nfd.GenericLinkService] [id=301,local=tcp4://129.82.175.10:6363,remote=tcp4://209.129.248.194:41640] Send queue length dropped below congestion threshold

Actions

Copy link

Updated by Davide Pesavento almost 5 years ago

Subject changed from Timeout/packet loss on docker to Timeout/packet loss on docker
Status changed from New to Rejected

Based on the June 11th discussion on Slack, this seems to be a problem with the underlying network infrastructure, either hardware of software. Closing.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD » ndn-tools

Tags

Bug #5118

Timeout/packet loss on docker

Updated by Junxiao Shi about 5 years ago

Updated by susmit shannigrahi about 5 years ago

Updated by susmit shannigrahi about 5 years ago

Updated by susmit shannigrahi about 5 years ago

Updated by susmit shannigrahi about 5 years ago

Updated by Davide Pesavento almost 5 years ago