Bug #5079
closedPacket retransmission issue because of LpReliability
100%
Description
There are some pieces of evidence that show LpReliability
causes multiple retransmissions before either timeout happens at the sender (i.e., 200 ms) or any ack for greater sequence numbers receives by the sender.
This problem has been seen between two testbed hubs where both sides have LpReliability
on. Worth mentioning that this problem is not deterministic and happens only for some Interests.
To show this issue, a client ndnpinged Washington hub (wu
) through Arizona hub (hobo
). Attached is the tcpdump pcap file that is collected on hobo
.
Upon receiving Interest /ndn/edu/wustl/ping/1761450254637779364
from the client, hobo
sends the Interest to wu
. However, before timeout or receiving any Ack it will retransmit the same Interest for 3 times. In return wu
sends back 4 Data packets and 12 Nacks.
Here is the output from pcap, particularly for /ndn/edu/wustl/ping/1761450254637779364
:
No. Time Source Destination Protocol Length Info delta
915 61.626371 10.132.245.39 128.196.203.36 UDP (NDN) 103 Interest /ndn/edu/wustl/ping/1761450254637779364 0.042070
916 61.626475 128.196.203.36 128.252.153.194 UDP (NDN) 119 Interest /ndn/edu/wustl/ping/1761450254637779364 0.000104
918 61.717688 128.196.203.36 128.252.153.194 UDP (NDN) 119 Interest /ndn/edu/wustl/ping/1761450254637779364 0.039450
928 61.808919 128.196.203.36 128.252.153.194 UDP (NDN) 119 Interest /ndn/edu/wustl/ping/1761450254637779364 0.049886
929 61.900236 128.196.203.36 128.252.153.194 UDP (NDN) 119 Interest /ndn/edu/wustl/ping/1761450254637779364 0.091317
930 62.067296 128.252.153.194 128.196.203.36 UDP (NDN) 165 Data /ndn/edu/wustl/ping/1761450254637779364 0.167060
931 62.067405 128.196.203.36 10.132.245.39 UDP (NDN) 137 Data /ndn/edu/wustl/ping/1761450254637779364 0.000109
933 62.100172 128.252.153.194 128.196.203.36 UDP (NDN) 140 Nack /ndn/edu/wustl/ping/1761450254637779364 0.027834
935 62.255842 128.252.153.194 128.196.203.36 UDP (NDN) 140 Nack /ndn/edu/wustl/ping/1761450254637779364 0.150629
937 62.282512 128.252.153.194 128.196.203.36 UDP (NDN) 153 Data /ndn/edu/wustl/ping/1761450254637779364 0.021628
938 62.284511 128.252.153.194 128.196.203.36 UDP (NDN) 140 Nack /ndn/edu/wustl/ping/1761450254637779364 0.001999
940 62.313064 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.025509
942 62.458480 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.140376
943 62.458967 128.252.153.194 128.196.203.36 UDP (NDN) 153 Data /ndn/edu/wustl/ping/1761450254637779364 0.000487
947 62.486449 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.000866
949 62.508351 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.016863
952 62.581047 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.004247
954 62.630670 128.252.153.194 128.196.203.36 UDP (NDN) 153 Data /ndn/edu/wustl/ping/1761450254637779364 0.044582
956 62.651119 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.015408
957 62.652113 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.000994
960 62.678285 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.010253
964 62.711556 128.252.153.194 128.196.203.36 UDP (NDN) 128 Nack /ndn/edu/wustl/ping/1761450254637779364 0.028228
Files
Updated by Eric Newberry almost 5 years ago
- Assignee set to Eric Newberry
I think one of the first things we should do is add logging to LpReliability to help us trace down this issue.
Updated by Eric Newberry almost 5 years ago
- Related to Feature #5080: Add logging to LpReliability added
Updated by Davide Pesavento almost 5 years ago
- Category set to Faces
- Target version set to 22.02
- Start date deleted (
02/03/2020)
Updated by Eric Newberry over 4 years ago
With the version of NFD currently deployed on the testbed (0.6.6), LpReliability makes use of nfd::RttEstimator
, which has a minimum RTO of 1ms. As of 0.7.0, LpReliability relies upon ndn::util::RttEstimator
from ndn-cxx to compute its estimated RTOs, which features the minimum RTO of 200ms expected in the issue description. Therefore, the scenario described in the issue description seems to be the expected behavior in 0.6.6.
Updated by Davide Pesavento over 4 years ago
Ok, that explains the aggressive retransmissions, but why are we seeing multiple Data/Nacks for the same Interest in some cases?
Updated by Eric Newberry over 4 years ago
Davide Pesavento wrote:
Ok, that explains the aggressive retransmissions, but why are we seeing multiple Data/Nacks for the same Interest in some cases?
I'm assuming that you're referring to a Data packet and Nack for the same Interest like seen at lines 480-481 and 483 in the attached pcap file, all in response to the Interests at lines 478-479? If so, it looks like the two Data packets being sent are the first transmission and then a retransmission of the same Data after the first one timed out. Both are being sent in response to the original Interest. Then, the Nack occurs because the Interest was retransmitted, despite the first copy having being received successfully, causing a duplicate Interest packet to be processed with the same nonce, causing a Duplicate Nack to be sent in response. The remaining string of Nacks/Datas for the same Interest are probably caused by retransmission semantics due to the use of the older nfd::RttEstimator
with the lower minimum RTO - my educated guess is that the low RTOs are causing the relevant packets to be dropped from the retransmission queue with that TxSequence before they can be acknowledged (with those Acks likely being dropped since they refer to now unknown TxSequences), causing further retransmissions.
Updated by Davide Pesavento over 4 years ago
Eric Newberry wrote:
Then, the Nack occurs because the Interest was retransmitted, despite the first copy having being received successfully, causing a duplicate Interest packet to be processed with the same nonce, causing a Duplicate Nack to be sent in response.
But the Nack is generated by higher layers, which means LpReliability is passing all the retransmitted Interests up to the forwarder/strategy. Is that the expected behavior? I believe LpReliability should be transparent and not cause additional network-layer packets to be seen by the upper layers. In other words, retransmissions should be filtered and dropped.
Updated by Eric Newberry over 4 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
Then, the Nack occurs because the Interest was retransmitted, despite the first copy having being received successfully, causing a duplicate Interest packet to be processed with the same nonce, causing a Duplicate Nack to be sent in response.
But the Nack is generated by higher layers, which means LpReliability is passing all the retransmitted Interests up to the forwarder/strategy. Is that the expected behavior? I believe LpReliability should be transparent and not cause additional network-layer packets to be seen by the upper layers. In other words, retransmissions should be filtered and dropped.
Yes, this is seemingly the expected behavior, since LpReliability has no understanding of upper layer semantics (apart from reporting lost Interests to the strategy). I'll look further into the design to see if there was any discussion of this issue around the time of initial implementation.
Updated by Eric Newberry over 4 years ago
Eric Newberry wrote:
Davide Pesavento wrote:
Eric Newberry wrote:
Then, the Nack occurs because the Interest was retransmitted, despite the first copy having being received successfully, causing a duplicate Interest packet to be processed with the same nonce, causing a Duplicate Nack to be sent in response.
But the Nack is generated by higher layers, which means LpReliability is passing all the retransmitted Interests up to the forwarder/strategy. Is that the expected behavior? I believe LpReliability should be transparent and not cause additional network-layer packets to be seen by the upper layers. In other words, retransmissions should be filtered and dropped.
Yes, this is seemingly the expected behavior, since LpReliability has no understanding of upper layer semantics (apart from reporting lost Interests to the strategy). I'll look further into the design to see if there was any discussion of this issue around the time of initial implementation.
I looked at the original design document (https://docs.google.com/presentation/d/1cP1ya0oEUw1wjpF3SWqaUgLNLx5iH50Y7v2slVTvLOU) and it appears that deduplicating retransmissions was not considered in the design of this protocol.
Updated by Davide Pesavento over 4 years ago
Eric Newberry wrote:
Yes, this is seemingly the expected behavior, since LpReliability has no understanding of upper layer semantics (apart from reporting lost Interests to the strategy).
You don't need to understand upper layer semantics. A simple "packet identifier" that is kept the same for all LP-layer retransmissions should be sufficient.
Updated by Eric Newberry over 4 years ago
Davide Pesavento wrote:
Eric Newberry wrote:
Yes, this is seemingly the expected behavior, since LpReliability has no understanding of upper layer semantics (apart from reporting lost Interests to the strategy).
You don't need to understand upper layer semantics. A simple "packet identifier" that is kept the same for all LP-layer retransmissions should be sufficient.
You're right. This already exists as the Sequence
field, although it is currently only included in packets that are fragmented, presumably to reduce the size of transmitted frames. I think we could use this to detect duplicates if we also included it when LpReliability is enabled on the link.
Updated by Eric Newberry over 4 years ago
- Status changed from New to In Progress
I added a deduplication mechanism to the design of LpReliability. Received frames will be dropped if their sequence numbers have been seen within the last estimated RTO. The new design is here: https://docs.google.com/presentation/d/1FfT25zv936TVv5BzejYuQWJ6VRBgcnswfR9On7n3lFY/edit?usp=sharing
This design will require that Sequence numbers be generated for and included in LpPackets if LpReliability is enabled (currently Sequence numbers are only included if the LpPacket is fragmented).
Updated by Eric Newberry over 4 years ago
Per the NFD call on February 6, 2020, the following changes to the reliability protocol will be adopted to resolve the problems described in this issue:
- The Sequence field is now required to be included in transmitted LpPackets when the LpReliability feature is enabled.
- Received LpPackets will have their fragments dropped if their Sequence field matches the Sequence of an already received LpPacket on a link.
- To enable the above, the receiver will keep track of received Sequence numbers for 1x the estimated link RTO at the time of receipt.
These changes are reflected in NDNLPv2 revision 57.
Updated by Eric Newberry over 4 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 100
Updated by Davide Pesavento over 4 years ago
- Status changed from Code review to Closed