Bug #3219: AccessStrategy measurements ineffective for NDN-RTC traffic - NFD - NDN project issue tracking system

Actions

Copy link

Bug #3219

open

AccessStrategy measurements ineffective for NDN-RTC traffic

Added by Junxiao Shi almost 10 years ago. Updated almost 9 years ago.

Status:

New

Priority:

Normal

Assignee:

Category:

Forwarding

Target version:

Start date:

09/21/2015

Due date:

% Done:

Estimated time:

Description

AccessStrategy v1 stores last working nexthop in measurements table.

An measurements entry is created for one-shorter-prefix of Data Name: for example, when Data /arizona/cs/www/index.html/v3/s0 arrives, its incoming face is recorded as last working nexthop of /arizona/cs/www/index.html/v3 prefix.

To forward a subsequent Interest, a longest prefix match is performed on the measurements table, and if an entry is found, the recorded last working nexthop is considered; otherwise, the Interest would be multicast to all nexthops in FIB entry.

There is no attempt to aggregate measurements onto a shorter prefix.

NDN-RTC is a realtime conference library.

The bulk of Data packets generated by NDN-RTC have Names similar to
/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5/27576.

In this name structure, the component before data is the frame number which changes more than 50 times per second,
and the component after data is the segment number within a frame which is usually no more than 25.

Interest Names end with the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00).

AccessStrategy measurements are ineffective for NDN-RTC traffic.

With NDN-RTC namespace design, AccessStrategy's one-shorter-prefix measurements entry is created beyond the segment number (/ndn/edu/ucla/remap/ndnrtc/user/remap/streams/camera_1469c/mid/key/2991/data/%00%00/23/89730/86739/5), so it's worthless for subsequent Interests.

AccessStrategy creates one measurements entry per Data and retain it for 8 seconds; under high traffic, these measurements entries can consume considerable memory and degrade forwarder performance.

Files

Download all files

ndndump.ndnrtc.log (506 KB) ndndump.ndnrtc.log	ndndump log of NDN-RTC traffic on testbed REMAP node, 20150920	Junxiao Shi, 09/21/2015 06:11 PM
CAIDA_1443828000.png (12.3 KB) CAIDA_1443828000.png		Junxiao Shi, 10/03/2015 09:00 AM
access-strategy_20160629.pptx (79 KB) access-strategy_20160629.pptx		Junxiao Shi, 06/29/2016 11:23 AM

Related issues 3 (1 open — 2 closed)

Actions

Copy link

Updated by Junxiao Shi almost 10 years ago

File ndndump.ndnrtc.log ndndump.ndnrtc.log added

This problem was originally reported as a potential memory leak.

On 20150920, the nfd process on NDN testbed REMAP node was observed to be consuming over 5GB memory.

It's also observed that the nMeasurementsEntries counter on REMAP is much higher than other testbed nodes:

REMAP: nNameTreeEntries=20775, nMeasurementsEntries=1288
UCLA: nNameTreeEntries=5391, nMeasurementsEntries=1
ARIZONA: nNameTreeEntries=865, nMeasurementsEntries=2

An ndndump log was captured on REMAP node, 2 hours after taking the above readings.

AccessStrategy and NccStrategy are the only two strategies that use measurements entries, but only AccessStrategy is enabled in the StrategyChoice table for /ndn/edu/ucla/remap namespace, which is used by an NDN-RTC producer.

Analysis on the ndndump log confirms that the number of measurements entries created by AccessStrategy is consistent with the nMeasurementsEntries counter.

Command line grep DATA ndndump.ndnrtc.log | cut -d' ' -f1,10 | gawk '{ $1=int($1); ncomps=split($2,comps,"/"); name=""; for(i=2;i<=ncomps-1;++i){name=name "/" comps[i]} $2=name; print }' | sort | uniq | cut -d' ' -f1 | uniq -c | less gives the number of unique one-shorter-prefix of Data Names per second, which returns [113,148,125,149].
This is the number of measurements entries created per second.
Since each entry is retained for 8 seconds, the measurements table size would be 149*8=1192, which is close to nMeasurementsEntries=1288.

The nNameTreeEntries counter reading is consistent with the measurements table size and the NDN-RTC namespace design:
there are 6 NameComponents between the frame number and the one-shorter-prefix of Data Name, so that every measurements entry would create at least 6 NameTree nodes that aren't shared among other measurements entries.

Therefore, I conclude higher nNameTreeEntries and nMeasurementsEntries counters on REMAP node is caused by AccessStrategy's ineffective measurements for NDN-RTC traffic.

It's unknown whether measurements table and NameTree are the sole reason of 5GB memory usage; but there's no evidence of a memory leak elsewhere.

This Bug is caused by the design of AccessStrategy: it assumes the Data Names follow the convention where the last two components are version and segment, but this assumption does not hold for NDN-RTC traffic.

It is not an implementation mistake.

Actions

Copy link

Updated by Junxiao Shi almost 10 years ago

20150922 conference call discussed this bug.

AccessStrategy is designed to handle inaccurate FIB entries, primarily caused by the use of nfd-autoreg.
It may not be necesary when automatic prefix progagaton (formerly remote prefix regstration) can create accurate routes.

Instead of attempting to fix AccessStrategy (which require a redesign in measurements collection), we may as well stop using it.

It's worth noting that the 5GB memory usage on testbed REMAP node probably is not caused by this Bug alone.

20775 NameTree nodes and 1288 Measurements entries do not add up to 5GB memory.
A report from ndnSIM 2.0 says a PIT entry on average can consume 6KB of memory. If we assume each NameTree node consumes 100KB memory (the actual should be less), 20775 nodes is 2GB.
AccessStrategy extends Measurements entry lifetime to 8 seconds from now, so all entries created should be erased after 8 seconds. However, NDNOPS has reported that the memory use doesn't come down until NFD is restarted. This indicates a potential memory leak somewhere, but most likely it's in Measurements table or AccessStrategy's StrategyInfo object, because it's not reported before AccessStrategy is widely deployed.

Some adjustments are needed on NDN testbed in order to diagnose this problem.
I'll send specific changes to NDNOPS.

Actions

Copy link

Updated by Junxiao Shi almost 10 years ago

Related to Bug #3222: NFD performance degradation over time using NDN-RTC added

Actions

Copy link

Updated by Junxiao Shi almost 10 years ago

NDN-RTC developers have completed one diagnostic test.
In this test, NDN-RTC traffic goes through a testbed node with AccessStrategy, and then the traffic stops.
ContentStore is full throughout the test.

Memory usage and table sizes are recorded.
Selected points are shown in the table:

timestamp (UTC)	nNameTreeEntries	nMeasurementsEntries	memory (RSS)
20150923 233719	16156	2645
20150923 235831	15275	2469
20150924 001447			644184
20150924 052244			1069872
20150924 101250			1090164
20150924 150256			1089776
20150924 174259			1167580
20150924 175244	11784	1720
20150924 175259			1187492
20150924 175501	1428	0
20150924 175612			1187492
20150924 175635	1442	1
20150924 180514			1187492

This observation confirms that the increased Measurements table size is caused by AccessStrategy, and they are erased as expected.

However, nfd memory usage is growing almost steadily during the execution, and it does not release after Measurements entries are erased.
This means there is a memory leak somewhere.
A separate Bug will be created when the source of that leak is found.

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

File CAIDA_1443828000.png CAIDA_1443828000.png added

NDN-RTC developers have completed another diagnostic test. In this test, NDN-RTC traffic goes through a testbed node with MulticastStrategy, and then the traffic stops. ContentStore is full throughout the test.

Memory usage and table sizes are recorded, and plotted as follows:

(X-axis is NFD uptime, primary Y-axis is memory (RSS) in KB, secondary Y-axis is table size)

plot

Observations:

FIB and Measurements tables don't show significant activities. MulticastStrategy does not use Measurements table.
After uptime 202013, memory usage starts to grow steadily. This happens later than traffic starting.
After traffic stops at 207086, PIT entries are erased, but memory usage is still at a high position.
PIT and NameTree counters return to the starting position after traffic stops.
Memory usage does shrink by small amounts near the end of the plot, but it never returns to the starting point.

Conclusion: As noted in note-4, there is a memory leak somewhere.
This diagnostic test confirms the possible leak is not in AccessStrategy or Measurements, because neither is being used.
A separate Bug will be created when the source of that leak is found.

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

The memory leak is confirmed to be #3236.

After fixing #3236, we'll be able to see whether a fix for #3219 is necessary.

Actions

Copy link

Updated by Jeff Burke over 9 years ago

Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.

Actions

Copy link

Updated by Junxiao Shi over 9 years ago

Given the impact on NDN-RTC performance on the testbed, can the NFD team please assign this to someone and update the status / plans to resolve.

As identified in note-2, applications including but not limited to NdnCon should stop relying on nfd-autoreg for the bulk of communication, but use automatic prefix propagation instead.

This means, each NdnCon participant should publish streams under a prefix covered by his/her testbed certificate (such as ndn:/ndn/guest/someone%40example.com/ndnrtc, instead of ndn:/ndn/guest/ndnrtc/someone%40example.com), so that automatic prefix propagation would create a back route for ndn:/ndn/guest/someone%40example.com from the testbed to the end host.

nfd-autoreg currently is a dependency for certificate fetching in automatic prefix registration, but traffic volume of certificate fetching is much smaller than application traffic.

Actions

Copy link

Updated by Jeff Burke over 9 years ago

Ok, let me confirm and then we'll update the namespace. Please provide a pointer to documentation for automatic prefix propagation.

Actions

Copy link

#10

Updated by Junxiao Shi over 9 years ago

Please provide a pointer to documentation for automatic prefix propagation.

See #3513 note-1.

Actions

Copy link

#11

Updated by Junxiao Shi over 9 years ago

Related to Task #3513: [NdnCon] Change user prefix to be compatible with auto prefix propagation added

Actions

Copy link

#12

Updated by Anonymous about 9 years ago

Related to Feature #3592: Design mechanism to reliably identify content object name added

Actions

Copy link

#13

Updated by Junxiao Shi almost 9 years ago

File access-strategy_20160629.pptx access-strategy_20160629.pptx added

The weakness found in this issue is added into AccessStrategy design slides on "too long Data names" page.

Actions

Copy link

#14

Updated by Anonymous almost 9 years ago

Just a question: Would my suggestion in #3592 solve the problem of both "too long data names" and "too short data names" or do we need additional design?

Assuming that the consumer knows what the producer can serve, which can be hard-coded or easily determined during the connection set-up.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #3219

AccessStrategy measurements ineffective for NDN-RTC traffic

Updated by Junxiao Shi almost 10 years ago

Updated by Junxiao Shi almost 10 years ago

Updated by Junxiao Shi almost 10 years ago

Updated by Junxiao Shi almost 10 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Jeff Burke over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Jeff Burke over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Anonymous about 9 years ago

Updated by Junxiao Shi almost 9 years ago

Updated by Anonymous almost 9 years ago