In an email conversation with Dr. Wang (edited for content):
For Sync, the definition of what set is being sync’ed is important - (a) If the set is defined as any data that has ever been produced under a given name prefix then whether a node is alive or not does not make any difference; (b) If the definition of the data set is what data the current sync group members have then when a group member is gone then that should be reflected in the set... (ChronoSync) should have mechanism to detect the liveliness of the group members... without this mechanism, we run into the situation described in the redmine issue. When a router recovers, it syncs with other nodes and gets some data names that belong to nodes that are already dead or gone. It then tries to fetch the data wasting its CPU and bandwidth. The data may or may not be in the network depending on when the node disappeared.
For NLSR, we certainly care about whether a node is live or not. When a node is dead or not reachable, its data should not be retrieved or considered in the routing computation. Note that theoretically their data should not affect the results as they are not reachable anyway... We require nodes to periodically refresh their LSAs so any LSAs that haven’t been refreshed for a while will expire and be deleted...
Since ChronoSync does not have a complete solution yet, the solution suggested in the redmine is a quick fix, i.e., if we put a timestamp in every ChronoSync reply to indicate when that data was first put into the sync data set, then NLSR can use that information to decide whether to retrieve the data. This does require ChronoSync to add the timestamp information to its data structure. It’s not a complete solution in the sense that ChronoSync doesn’t do liveness detection and still sync’s obsolete data names. It just gives NLSR some more information to solve the problem mentioned in the redmine. It assumes that NLSR still does its periodic refreshes for removing obsolete LSAs.
Now that I’m thinking about this again, we can actually add the timestamp information to the LSA data name (at the end after any version number). This way NLSR just needs to look at the LSA data name and then decide whether to retrieve it. This does require clock synchronization, but it can be a relatively loose sync as long as we have a relatively long grace period for considering LSAs obsolete.
From this conversation I see two approaches to a solution for this issue:
- Add a timestamp to sync
- Add a timestamp to the LSA
Inarguably (2) will be less work, but I wonder if this is only a band-aid solution and not mending the broken bone, so to speak. I suspect that many applications will want to be able to selectively fetch items in a sync set based on age. In that case, it seems that (1) is the more systematic solution. What's more, (1) doesn't place a limitation on whether the sync set prunes items based on age: if some consumer is interested in the whole set, they can ignore the timestamp. The overhead of this is known and static, specifically the 64-bit UTC timestamp plus the overhead to encode it in the data.
To me, it seems that the solution of adding a timestamp to ChronoSync replies allows ChronoSync to behave like an insertion-only, comprehensive set, but still provide consumers the ability to select a temporally-relevant subset, which satisfies the most people.