Project

General

Profile

Bug #4770

NLSR hangs due to ndn-cxx SegmentFetcher segfault

Added by Ashlesh Gawande 2 months ago. Updated 21 days ago.

Status:
Closed
Priority:
Urgent
Category:
Utils
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

1) Segment fetcher sends an interest with Nonce=1
2) After 1 second, segment fetcher sends another interest with same name with Nonce=2
3) Face gets data, satisfies interest with Nonce=1
4) Segment fetcher informs NLSR of the new data
5) Segment fetcher removes corresponding entry from from pending interest map
6) Face satisfies interest with Nonce=2
7) Segment fetcher hangs while trying to re-erase the entry already erased in 5) by using an invalid iterator (here)

See attached folder for reproduction on a small scale. Logs from consumer:

1541784024.347425 DEBUG: [ndn.Face] <I /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=2593228152
1541784025.347404 DEBUG: [ndn.Face] <I /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=1097587936
1541784025.350393 DEBUG: [ndn.Face] >D /example/testApp/%00%00
1541784025.350405 DEBUG: [ndn.Face]    satisfying /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=2593228152 from app
1541784025.350436 DEBUG: [ndn.security.v2.Validator] > Start validating data /example/testApp/%00%00
1541784025.350441 TRACE: [ndn.security.v2.ValidationState] > Signature verification bypassed for data `/example/testApp/%00%00`
Got data
1541784025.350508 TRACE: [ndn.security.v2.ValidationState] ~ValidationState
1541784025.350520 DEBUG: [ndn.Face]    satisfying /example/testApp?ndn.MustBeFresh=true&ndn.Nonce=1097587936 from app
1541784025.350530 DEBUG: [ndn.security.v2.Validator] > Start validating data /example/testApp/%00%00
1541784025.350533 TRACE: [ndn.security.v2.ValidationState] > Signature verification bypassed for data `/example/testApp/%00%00
Hangs with 100% CPU usage.

This behaviour is not observed in latest ndn-cxx fetcher as the fetcher destroys itself after informing the application about the data on step 4 (see note1).

Workaround is to set useConstantInterestTimeout to true so that segment fetcher will not send interests before actual timeout.

fetcher-test.tar.gz (3.22 KB) fetcher-test.tar.gz Ashlesh Gawande, 11/09/2018 10:16 AM

History

#1 Updated by Ashlesh Gawande 2 months ago

There is a possibility that this could happen with latest ndn-cxx as well if the data is segmented then zeroth segment will be received twice because fetcher would still be alive to send for and receive the next segment.

#2 Updated by Ashlesh Gawande 2 months ago

  • Project changed from NLSR to ndn-cxx
  • Subject changed from NLSR hangs on testbed due to ndn-cxx (0.6.3) segment fetcher segfault to NLSR hangs on testbed due to ndn-cxx segment fetcher segfault
  • Description updated (diff)
  • Status changed from In Progress to Code review
  • Target version deleted (v0.5.0)

The solution is to remove the first pending interest from face when we get a Nack or a timeout (before re-expressing the interest). So face informs the fetcher with data only for the latest interest and not for the previous interest as well that lead to the problem. This follows the suggestion given here: https://www.lists.cs.ucla.edu/pipermail/nfd-dev/2018-November/003446.html

The unit test in the change fails with heap-buffer-overflow if the solution is not applied:
https://gerrit.named-data.net/c/ndn-cxx/+/5023

#3 Updated by Ashlesh Gawande 2 months ago

  • Category set to Utils

#4 Updated by Davide Pesavento 2 months ago

  • Subject changed from NLSR hangs on testbed due to ndn-cxx segment fetcher segfault to NLSR hangs due to ndn-cxx SegmentFetcher segfault
  • Description updated (diff)
  • Target version set to v0.7
  • % Done changed from 0 to 100

#5 Updated by Ashlesh Gawande 21 days ago

  • Status changed from Code review to Closed

Also available in: Atom PDF