Project

General

Profile

Bug #5009

Sync interest should be sent only after NLSR has registered routes

Added by Ashlesh Gawande about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

NLSR was changed in the past to initialize everything in the constructor such that Sync is also initialized there.
In Sync constructor, the initial sync interest is sent and NACK'd immidiately. This is because NLSR has not yet registered the routes for sync prefix to its neighbors. The registration only happens after NLSR has received face dataset from NFD. Neither ChronoSync nor PSync reacts to NACKs so sync is delayed by syncInterestLifetime / 2 and treat them as a timeout delaying the sync process to the next sync interest (which is after 60 seconds by default in NLSR). ChronoSync and PSync should expose their sendSyncInterest function which can then be used by NLSR to start sync after it has registered the sync prefix to the neighbors. ChronoSync and PSync should also react to the NACK by sending a sync interest after some jitter instead of waiting for scheduled sync interest to take place.


Related issues

Related to NLSR - Bug #2649: SyncLogicHandler does not check validity of SyncSocket before useClosedVince Lehman03/16/2015

Actions
#1

Updated by Ashlesh Gawande about 2 years ago

Or we could also not initialize sync in the constructor like the old way. But the problem with this is that LSDB needs sync object to be alive when NLSR object tells it to publish own name LSA.

#2

Updated by Davide Pesavento about 2 years ago

In Sync constructor, the initial sync interest is sent and NACK'd immidiately. This is because NLSR has not yet registered the routes for sync prefix to its neighbors

This would be "solved" by #4931. However, I think it's still a good idea to avoid the problem altogether.

ChronoSync and PSync should expose their sendSyncInterest function which can then be used by NLSR to start sync after it has registered the sync prefix to the neighbors.

This is still a good idea regardless of this NLSR bug. I never liked the coupling between sending the first sync interest and the construction of the Logic instance in ChronoSync. I didn't realize PSync made the same mistake in its API.

ChronoSync and PSync should also react to the NACK by sending a sync interest after some jitter instead of waiting for scheduled sync interest to take place

You should open separate bugs for ChronoSync and PSync for this.

#3

Updated by Ashlesh Gawande about 2 years ago

  • Status changed from New to Code review
#4

Updated by Ashlesh Gawande about 2 years ago

  • Target version changed from v0.5.0 to Minor release v0.5.2
#5

Updated by Ashlesh Gawande about 2 years ago

It is not straight forward to modify ChronoSync to not send a sync interest in the beginning.
Even if it is not sent in the beginning, but ChronoSync receives sync interest from other nodes it starts the recovery process and sends the sync interest. This is the reason ChronoSync does not react to NACK. The recovery process will send another sync interest once the process is triggered via incoming sync interests (whose digest is not recognized) effectively reacting to NACK.

So NLSR should just instantiate the sync object in the beginning and let the sync recover from any NACKs internally. This will let NLSR publish to the sync and answer any sync interests while sync tries to send out a sync interest.

For PSync, not reacting to NACK means waiting till the next sync cycle. PSync should either react to NACKs by scheduling a sync interest or determine if there were any negative elements in the difference with the received IBF. If so then reschedule/send a sync interest (similar in spirit to ChronoSync's recovery process).

#6

Updated by Ashlesh Gawande almost 2 years ago

  • Related to Bug #2649: SyncLogicHandler does not check validity of SyncSocket before use added
#7

Updated by Ashlesh Gawande almost 2 years ago

If nlsrc advertise is called before sync logic handler is created then NLSR crashes (#2649). With current patchset of this change, moving sync object construction out of constructor causes this problem again. The error is reproduced by LSA segmentation experiment of Mini-NDN bot where publish is done through nlsrc immediately after NLSR is started.

Solutions:

  • Delay setting Interest filters for nlsrc till after sync starts.
  • Schedule the publish into sync to give time for the sync logic handler to be created. Increases complexity.
  • Create the sync logic object after sync prefix registration and not after first Hello Data is validated. Problem is that we could potentially be syncing with malicious nodes as sync messages are not validated.

Current change has the following changes:

  • Send Hello Interest after route to neighbor is successfully registered
    • First hello interval is eliminated from nlsr.conf that waited for the prefix registrations to be complete
  • After Hello Nack, wait exponentially before processing it as a timeout
  • Start sync after first hello data is validated
#8

Updated by Ashlesh Gawande almost 2 years ago

  • Status changed from Code review to Closed
  • % Done changed from 0 to 100

ChronoSync and PSync are patched to respond to NACK.

In this change, initialization order was changed along with other changes:

  • Send Hello Interest after route to neighbor is successfully registered
    • First Hello interval is eliminated
    • After Hello Nack, wait exponentially before processing it as a timeout
  • Register sync route for each neighbor after its Hello Data is validated

Also available in: Atom PDF