Bug #3219
openAccessStrategy measurements ineffective for NDN-RTC traffic
0%
Description
AccessStrategy v1 stores last working nexthop in measurements table.
An measurements entry is created for one-shorter-prefix of Data Name: for example, when Data /arizona/cs/www/index.html/v3/s0
arrives, its incoming face is recorded as last working nexthop of /arizona/cs/www/index.html/v3
prefix.
To forward a subsequent Interest, a longest prefix match is performed on the measurements table, and if an entry is found, the recorded last working nexthop is considered; otherwise, the Interest would be multicast to all nexthops in FIB entry.
There is no attempt to aggregate measurements onto a shorter prefix.
NDN-RTC is a realtime conference library.
The bulk of Data packets generated by NDN-RTC have Names similar to
/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5/27576
.
In this name structure, the component before data
is the frame number which changes more than 50 times per second,
and the component after data
is the segment number within a frame which is usually no more than 25.
Interest Names end with the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00
).
AccessStrategy measurements are ineffective for NDN-RTC traffic.
With NDN-RTC namespace design, AccessStrategy's one-shorter-prefix measurements entry is created beyond the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5
), so it's worthless for subsequent Interests.
AccessStrategy creates one measurements entry per Data and retain it for 8 seconds; under high traffic, these measurements entries can consume considerable memory and degrade forwarder performance.
Files
Updated by Junxiao Shi about 9 years ago
- File ndndump.ndnrtc.log ndndump.ndnrtc.log added
This problem was originally reported as a potential memory leak.
On 20150920, the nfd
process on NDN testbed REMAP node was observed to be consuming over 5GB memory.
It's also observed that the nMeasurementsEntries counter on REMAP is much higher than other testbed nodes:
- REMAP: nNameTreeEntries=20775, nMeasurementsEntries=1288
- UCLA: nNameTreeEntries=5391, nMeasurementsEntries=1
- ARIZONA: nNameTreeEntries=865, nMeasurementsEntries=2
An ndndump log was captured on REMAP node, 2 hours after taking the above readings.
AccessStrategy
and NccStrategy
are the only two strategies that use measurements entries, but only AccessStrategy
is enabled in the StrategyChoice table for /ndn/edu/ucla/remap
namespace, which is used by an NDN-RTC producer.
Analysis on the ndndump log confirms that the number of measurements entries created by AccessStrategy
is consistent with the nMeasurementsEntries counter.
Command line grep DATA ndndump.ndnrtc.log | cut -d' ' -f1,10 | gawk '{ $1=int($1); ncomps=split($2,comps,"/"); name=""; for(i=2;i<=ncomps-1;++i){name=name "/" comps[i]} $2=name; print }' | sort | uniq | cut -d' ' -f1 | uniq -c | less
gives the number of unique one-shorter-prefix of Data Names per second, which returns [113,148,125,149].
This is the number of measurements entries created per second.
Since each entry is retained for 8 seconds, the measurements table size would be 149*8=1192, which is close to nMeasurementsEntries=1288.
The nNameTreeEntries counter reading is consistent with the measurements table size and the NDN-RTC namespace design:
there are 6 NameComponents between the frame number and the one-shorter-prefix of Data Name, so that every measurements entry would create at least 6 NameTree nodes that aren't shared among other measurements entries.
Therefore, I conclude higher nNameTreeEntries and nMeasurementsEntries counters on REMAP node is caused by AccessStrategy's ineffective measurements for NDN-RTC traffic.
It's unknown whether measurements table and NameTree are the sole reason of 5GB memory usage; but there's no evidence of a memory leak elsewhere.
This Bug is caused by the design of AccessStrategy: it assumes the Data Names follow the convention where the last two components are version and segment, but this assumption does not hold for NDN-RTC traffic.
It is not an implementation mistake.
Updated by Junxiao Shi about 9 years ago
20150922 conference call discussed this bug.
AccessStrategy is designed to handle inaccurate FIB entries, primarily caused by the use of nfd-autoreg
.
It may not be necesary when automatic prefix progagaton (formerly remote prefix regstration) can create accurate routes.
Instead of attempting to fix AccessStrategy (which require a redesign in measurements collection), we may as well stop using it.
It's worth noting that the 5GB memory usage on testbed REMAP node probably is not caused by this Bug alone.
- 20775 NameTree nodes and 1288 Measurements entries do not add up to 5GB memory.
A report from ndnSIM 2.0 says a PIT entry on average can consume 6KB of memory. If we assume each NameTree node consumes 100KB memory (the actual should be less), 20775 nodes is 2GB. - AccessStrategy extends Measurements entry lifetime to 8 seconds from now, so all entries created should be erased after 8 seconds. However, NDNOPS has reported that the memory use doesn't come down until NFD is restarted. This indicates a potential memory leak somewhere, but most likely it's in Measurements table or AccessStrategy's StrategyInfo object, because it's not reported before AccessStrategy is widely deployed.
Some adjustments are needed on NDN testbed in order to diagnose this problem.
I'll send specific changes to NDNOPS.
Updated by Junxiao Shi about 9 years ago
- Related to Bug #3222: NFD performance degradation over time using NDN-RTC added
Updated by Junxiao Shi about 9 years ago
NDN-RTC developers have completed one diagnostic test.
In this test, NDN-RTC traffic goes through a testbed node with AccessStrategy, and then the traffic stops.
ContentStore is full throughout the test.
Memory usage and table sizes are recorded.
Selected points are shown in the table:
timestamp (UTC) | nNameTreeEntries | nMeasurementsEntries | memory (RSS) |
---|---|---|---|
20150923 233719 | 16156 | 2645 | |
20150923 235831 | 15275 | 2469 | |
20150924 001447 | 644184 | ||
20150924 052244 | 1069872 | ||
20150924 101250 | 1090164 | ||
20150924 150256 | 1089776 | ||
20150924 174259 | 1167580 | ||
20150924 175244 | 11784 | 1720 | |
20150924 175259 | 1187492 | ||
20150924 175501 | 1428 | 0 | |
20150924 175612 | 1187492 | ||
20150924 175635 | 1442 | 1 | |
20150924 180514 | 1187492 |
This observation confirms that the increased Measurements table size is caused by AccessStrategy, and they are erased as expected.
However, nfd
memory usage is growing almost steadily during the execution, and it does not release after Measurements entries are erased.
This means there is a memory leak somewhere.
A separate Bug will be created when the source of that leak is found.
Updated by Junxiao Shi about 9 years ago
- File CAIDA_1443828000.png CAIDA_1443828000.png added
NDN-RTC developers have completed another diagnostic test. In this test, NDN-RTC traffic goes through a testbed node with MulticastStrategy, and then the traffic stops. ContentStore is full throughout the test.
Memory usage and table sizes are recorded, and plotted as follows:
(X-axis is NFD uptime, primary Y-axis is memory (RSS) in KB, secondary Y-axis is table size)
Observations:
- FIB and Measurements tables don't show significant activities. MulticastStrategy does not use Measurements table.
- After uptime 202013, memory usage starts to grow steadily. This happens later than traffic starting.
- After traffic stops at 207086, PIT entries are erased, but memory usage is still at a high position.
- PIT and NameTree counters return to the starting position after traffic stops.
- Memory usage does shrink by small amounts near the end of the plot, but it never returns to the starting point.
Conclusion: As noted in note-4, there is a memory leak somewhere.
This diagnostic test confirms the possible leak is not in AccessStrategy or Measurements, because neither is being used.
A separate Bug will be created when the source of that leak is found.
Updated by Junxiao Shi about 9 years ago
Updated by Jeff Burke over 8 years ago
Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.
Updated by Junxiao Shi over 8 years ago
Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.
As identified in note-2, applications including but not limited to NdnCon should stop relying on nfd-autoreg
for the bulk of communication, but use automatic prefix propagation instead.
This means, each NdnCon participant should publish streams under a prefix covered by his/her testbed certificate (such as ndn:/ndn/guest/someone%40example.com/ndnrtc
, instead of ndn:/ndn/guest/ndnrtc/someone%40example.com
), so that automatic prefix propagation would create a back route for ndn:/ndn/guest/someone%40example.com
from the testbed to the end host.
nfd-autoreg
currently is a dependency for certificate fetching in automatic prefix registration, but traffic volume of certificate fetching is much smaller than application traffic.
Updated by Jeff Burke over 8 years ago
Ok, let me confirm and then we'll update the namespace. Please provide a pointer to documentation for automatic prefix propagation.
Updated by Junxiao Shi over 8 years ago
Please provide a pointer to documentation for automatic prefix propagation.
See #3513 note-1.
Updated by Junxiao Shi over 8 years ago
- Related to Task #3513: [NdnCon] Change user prefix to be compatible with auto prefix propagation added
Updated by Anonymous over 8 years ago
- Related to Feature #3592: Design mechanism to reliably identify content object name added
Updated by Junxiao Shi over 8 years ago
The weakness found in this issue is added into AccessStrategy design slides on "too long Data names" page.
Updated by Anonymous over 8 years ago
Just a question: Would my suggestion in #3592 solve the problem of both "too long data names" and "too short data names" or do we need additional design?
Assuming that the consumer knows what the producer can serve, which can be hard-coded or easily determined during the connection set-up.