Project

General

Profile

Bug #3219

AccessStrategy measurements ineffective for NDN-RTC traffic

Added by Junxiao Shi over 4 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Forwarding
Target version:
-
Start date:
09/21/2015
Due date:
% Done:

0%

Estimated time:

Description

AccessStrategy v1 stores last working nexthop in measurements table.

An measurements entry is created for one-shorter-prefix of Data Name: for example, when Data /arizona/cs/www/index.html/v3/s0 arrives, its incoming face is recorded as last working nexthop of /arizona/cs/www/index.html/v3 prefix.

To forward a subsequent Interest, a longest prefix match is performed on the measurements table, and if an entry is found, the recorded last working nexthop is considered; otherwise, the Interest would be multicast to all nexthops in FIB entry.

There is no attempt to aggregate measurements onto a shorter prefix.

NDN-RTC is a realtime conference library.

The bulk of Data packets generated by NDN-RTC have Names similar to
/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5/27576.

In this name structure, the component before data is the frame number which changes more than 50 times per second,
and the component after data is the segment number within a frame which is usually no more than 25.

Interest Names end with the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00).

AccessStrategy measurements are ineffective for NDN-RTC traffic.

With NDN-RTC namespace design, AccessStrategy's one-shorter-prefix measurements entry is created beyond the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5), so it's worthless for subsequent Interests.

AccessStrategy creates one measurements entry per Data and retain it for 8 seconds; under high traffic, these measurements entries can consume considerable memory and degrade forwarder performance.


Files

ndndump.ndnrtc.log (506 KB) ndndump.ndnrtc.log ndndump log of NDN-RTC traffic on testbed REMAP node, 20150920 Junxiao Shi, 09/21/2015 06:11 PM
CAIDA_1443828000.png (12.3 KB) CAIDA_1443828000.png Junxiao Shi, 10/03/2015 09:00 AM
access-strategy_20160629.pptx (79 KB) access-strategy_20160629.pptx Junxiao Shi, 06/29/2016 11:23 AM

Related issues

Related to NFD - Bug #3222: NFD performance degradation over time using NDN-RTCRejected09/23/2015

Actions
Related to ndnrtc - Task #3513: [NdnCon] Change user prefix to be compatible with auto prefix propagationClosed03/16/2016

Actions
Related to NDN Specifications - Feature #3592: Design mechanism to reliably identify content object nameNew

Actions

History

#1

Updated by Junxiao Shi over 4 years ago

This problem was originally reported as a potential memory leak.

On 20150920, the nfd process on NDN testbed REMAP node was observed to be consuming over 5GB memory.

It's also observed that the nMeasurementsEntries counter on REMAP is much higher than other testbed nodes:

  • REMAP: nNameTreeEntries=20775, nMeasurementsEntries=1288
  • UCLA: nNameTreeEntries=5391, nMeasurementsEntries=1
  • ARIZONA: nNameTreeEntries=865, nMeasurementsEntries=2

An ndndump log was captured on REMAP node, 2 hours after taking the above readings.

AccessStrategy and NccStrategy are the only two strategies that use measurements entries, but only AccessStrategy is enabled in the StrategyChoice table for /ndn/edu/ucla/remap namespace, which is used by an NDN-RTC producer.

Analysis on the ndndump log confirms that the number of measurements entries created by AccessStrategy is consistent with the nMeasurementsEntries counter.

Command line grep DATA ndndump.ndnrtc.log | cut -d' ' -f1,10 | gawk '{ $1=int($1); ncomps=split($2,comps,"/"); name=""; for(i=2;i<=ncomps-1;++i){name=name "/" comps[i]} $2=name; print }' | sort | uniq | cut -d' ' -f1 | uniq -c | less gives the number of unique one-shorter-prefix of Data Names per second, which returns [113,148,125,149].
This is the number of measurements entries created per second.
Since each entry is retained for 8 seconds, the measurements table size would be 149*8=1192, which is close to nMeasurementsEntries=1288.

The nNameTreeEntries counter reading is consistent with the measurements table size and the NDN-RTC namespace design:
there are 6 NameComponents between the frame number and the one-shorter-prefix of Data Name, so that every measurements entry would create at least 6 NameTree nodes that aren't shared among other measurements entries.

Therefore, I conclude higher nNameTreeEntries and nMeasurementsEntries counters on REMAP node is caused by AccessStrategy's ineffective measurements for NDN-RTC traffic.

It's unknown whether measurements table and NameTree are the sole reason of 5GB memory usage; but there's no evidence of a memory leak elsewhere.

This Bug is caused by the design of AccessStrategy: it assumes the Data Names follow the convention where the last two components are version and segment, but this assumption does not hold for NDN-RTC traffic.

It is not an implementation mistake.

#2

Updated by Junxiao Shi over 4 years ago

20150922 conference call discussed this bug.

AccessStrategy is designed to handle inaccurate FIB entries, primarily caused by the use of nfd-autoreg.
It may not be necesary when automatic prefix progagaton (formerly remote prefix regstration) can create accurate routes.

Instead of attempting to fix AccessStrategy (which require a redesign in measurements collection), we may as well stop using it.


It's worth noting that the 5GB memory usage on testbed REMAP node probably is not caused by this Bug alone.

  • 20775 NameTree nodes and 1288 Measurements entries do not add up to 5GB memory.
    A report from ndnSIM 2.0 says a PIT entry on average can consume 6KB of memory. If we assume each NameTree node consumes 100KB memory (the actual should be less), 20775 nodes is 2GB.
  • AccessStrategy extends Measurements entry lifetime to 8 seconds from now, so all entries created should be erased after 8 seconds. However, NDNOPS has reported that the memory use doesn't come down until NFD is restarted. This indicates a potential memory leak somewhere, but most likely it's in Measurements table or AccessStrategy's StrategyInfo object, because it's not reported before AccessStrategy is widely deployed.

Some adjustments are needed on NDN testbed in order to diagnose this problem.
I'll send specific changes to NDNOPS.

#3

Updated by Junxiao Shi over 4 years ago

  • Related to Bug #3222: NFD performance degradation over time using NDN-RTC added
#4

Updated by Junxiao Shi over 4 years ago

NDN-RTC developers have completed one diagnostic test.
In this test, NDN-RTC traffic goes through a testbed node with AccessStrategy, and then the traffic stops.
ContentStore is full throughout the test.

Memory usage and table sizes are recorded.
Selected points are shown in the table:

timestamp (UTC) nNameTreeEntries nMeasurementsEntries memory (RSS)
20150923 233719 16156 2645
20150923 235831 15275 2469
20150924 001447 644184
20150924 052244 1069872
20150924 101250 1090164
20150924 150256 1089776
20150924 174259 1167580
20150924 175244 11784 1720
20150924 175259 1187492
20150924 175501 1428 0
20150924 175612 1187492
20150924 175635 1442 1
20150924 180514 1187492

This observation confirms that the increased Measurements table size is caused by AccessStrategy, and they are erased as expected.

However, nfd memory usage is growing almost steadily during the execution, and it does not release after Measurements entries are erased.
This means there is a memory leak somewhere.
A separate Bug will be created when the source of that leak is found.

#5

Updated by Junxiao Shi over 4 years ago

NDN-RTC developers have completed another diagnostic test. In this test, NDN-RTC traffic goes through a testbed node with MulticastStrategy, and then the traffic stops. ContentStore is full throughout the test.

Memory usage and table sizes are recorded, and plotted as follows:

(X-axis is NFD uptime, primary Y-axis is memory (RSS) in KB, secondary Y-axis is table size)

plot

Observations:

  • FIB and Measurements tables don't show significant activities. MulticastStrategy does not use Measurements table.
  • After uptime 202013, memory usage starts to grow steadily. This happens later than traffic starting.
  • After traffic stops at 207086, PIT entries are erased, but memory usage is still at a high position.
  • PIT and NameTree counters return to the starting position after traffic stops.
  • Memory usage does shrink by small amounts near the end of the plot, but it never returns to the starting point.

Conclusion: As noted in note-4, there is a memory leak somewhere.
This diagnostic test confirms the possible leak is not in AccessStrategy or Measurements, because neither is being used.
A separate Bug will be created when the source of that leak is found.

#6

Updated by Junxiao Shi over 4 years ago

The memory leak is confirmed to be #3236.

After fixing #3236, we'll be able to see whether a fix for #3219 is necessary.

#7

Updated by Jeff Burke about 4 years ago

Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.

#8

Updated by Junxiao Shi about 4 years ago

Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.

As identified in note-2, applications including but not limited to NdnCon should stop relying on nfd-autoreg for the bulk of communication, but use automatic prefix propagation instead.

This means, each NdnCon participant should publish streams under a prefix covered by his/her testbed certificate (such as ndn:/ndn/guest/someone%40example.com/ndnrtc, instead of ndn:/ndn/guest/ndnrtc/someone%40example.com), so that automatic prefix propagation would create a back route for ndn:/ndn/guest/someone%40example.com from the testbed to the end host.

nfd-autoreg currently is a dependency for certificate fetching in automatic prefix registration, but traffic volume of certificate fetching is much smaller than application traffic.

#9

Updated by Jeff Burke about 4 years ago

Ok, let me confirm and then we'll update the namespace. Please provide a pointer to documentation for automatic prefix propagation.

#10

Updated by Junxiao Shi about 4 years ago

Please provide a pointer to documentation for automatic prefix propagation.

See #3513 note-1.

#11

Updated by Junxiao Shi about 4 years ago

  • Related to Task #3513: [NdnCon] Change user prefix to be compatible with auto prefix propagation added
#12

Updated by Klaus Schneider almost 4 years ago

  • Related to Feature #3592: Design mechanism to reliably identify content object name added
#13

Updated by Junxiao Shi almost 4 years ago

The weakness found in this issue is added into AccessStrategy design slides on "too long Data names" page.

#14

Updated by Klaus Schneider over 3 years ago

Just a question: Would my suggestion in #3592 solve the problem of both "too long data names" and "too short data names" or do we need additional design?

Assuming that the consumer knows what the producer can serve, which can be hard-coded or easily determined during the connection set-up.

Also available in: Atom PDF