Bug #5386
openSequence number lost on router reconfiguration
100%
Description
We recently reset a testbed router, causing it to lose/forget its sequence number. Upon restarting, its updates were not fetched by other routers since they had sequence numbers in the 1-20 range, while other routers thought the lower sequence numbers were already fetched.
In the logs, you could see the original sequence number it lost, in part because other routers were trying to fetch it. Stopping NLSR, modifying the nlsrSeqNo.txt file's numbers to be greater than the one observed, and restarting NLSR fixed the problem, causing the other routers to fetch its updates. Ideally, there would be some logic inside NLSR to handle this issue automatically in some way, since modifying the state file felt like a hack.
Updated by Adam Thieme about 2 months ago
By "greater", I mean anywhere from 5 to 800000 more worked; I tested one node individually with a number around 5 greater than its last, as well as resetting all nodes to 1mil to start "fresh".
Updated by Ashlesh Gawande about 1 month ago
- Status changed from New to In Progress
- % Done changed from 0 to 60
Updated by Ashlesh Gawande about 1 month ago
Updated by Ashlesh Gawande about 1 month ago
- Status changed from In Progress to Code review
- % Done changed from 60 to 90
Updated by Ashlesh Gawande about 1 month ago
- % Done changed from 90 to 100
Needed a fix for ChronoSync bug triggered by the new unit test in NLSR (https://gerrit.named-data.net/c/ChronoSync/+/7819 so that needs to be merged first, I have temporarily modified jenkins script to pull this so that build can pass).
Updated by Ashlesh Gawande 9 days ago
The fix relies on the sync protocol update notification of a router's own prefix containing a higher sequence number.
The router updates its own sequence number to the one given to it by the network, builds a new LSA (the recently updated received seq + 1), and publishes it back to sync.