Bug #4770
closedNLSR hangs due to ndn-cxx SegmentFetcher segfault
100%
Description
1) Segment fetcher sends an interest with Nonce=1
2) After 1 second, segment fetcher sends another interest with same name with Nonce=2
3) Face gets data, satisfies interest with Nonce=1
4) Segment fetcher informs NLSR of the new data
5) Segment fetcher removes corresponding entry from from pending interest map
6) Face satisfies interest with Nonce=2
7) Segment fetcher hangs while trying to re-erase the entry already erased in 5) by using an invalid iterator (here)
See attached folder for reproduction on a small scale. Logs from consumer:
1541784024.347425 DEBUG: [ndn.Face] <I /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=2593228152
1541784025.347404 DEBUG: [ndn.Face] <I /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=1097587936
1541784025.350393 DEBUG: [ndn.Face] >D /example/testApp/%00%00
1541784025.350405 DEBUG: [ndn.Face] satisfying /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=2593228152 from app
1541784025.350436 DEBUG: [ndn.security.v2.Validator] > Start validating data /example/testApp/%00%00
1541784025.350441 TRACE: [ndn.security.v2.ValidationState] > Signature verification bypassed for data `/example/testApp/%00%00`
Got data
1541784025.350508 TRACE: [ndn.security.v2.ValidationState] ~ValidationState
1541784025.350520 DEBUG: [ndn.Face] satisfying /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=1097587936 from app
1541784025.350530 DEBUG: [ndn.security.v2.Validator] > Start validating data /example/testApp/%00%00
1541784025.350533 TRACE: [ndn.security.v2.ValidationState] > Signature verification bypassed for data `/example/testApp/%00%00
Hangs with 100% CPU usage.
This behaviour is not observed in latest ndn-cxx fetcher as the fetcher destroys itself after informing the application about the data on step 4 (see note1).
Workaround is to set useConstantInterestTimeout
to true so that segment fetcher will not send interests before actual timeout.
Files
Updated by Ashlesh Gawande almost 6 years ago
There is a possibility that this could happen with latest ndn-cxx as well if the data is segmented then zeroth segment will be received twice because fetcher would still be alive to send for and receive the next segment.
Updated by Ashlesh Gawande almost 6 years ago
- Project changed from NLSR to ndn-cxx
- Subject changed from NLSR hangs on testbed due to ndn-cxx (0.6.3) segment fetcher segfault to NLSR hangs on testbed due to ndn-cxx segment fetcher segfault
- Description updated (diff)
- Status changed from In Progress to Code review
- Target version deleted (
v0.5.0)
The solution is to remove the first pending interest from face when we get a Nack or a timeout (before re-expressing the interest). So face informs the fetcher with data only for the latest interest and not for the previous interest as well that lead to the problem. This follows the suggestion given here: https://www.lists.cs.ucla.edu/pipermail/nfd-dev/2018-November/003446.html
The unit test in the change fails with heap-buffer-overflow if the solution is not applied:
https://gerrit.named-data.net/c/ndn-cxx/+/5023
Updated by Davide Pesavento almost 6 years ago
- Subject changed from NLSR hangs on testbed due to ndn-cxx segment fetcher segfault to NLSR hangs due to ndn-cxx SegmentFetcher segfault
- Description updated (diff)
- Target version set to v0.7
- % Done changed from 0 to 100
Updated by Ashlesh Gawande almost 6 years ago
- Status changed from Code review to Closed