Bug #3618: Memory bloating in RIB - NFD - NDN project issue tracking system

Actions

Copy link

Bug #3618

closed

Memory bloating in RIB

Added by Anil Jangam about 9 years ago. Updated over 8 years ago.

Status:

Rejected

Priority:

High

Assignee:

Weiwei Liu

Category:

RIB

Target version:

v0.5

Start date:

05/09/2016

Due date:

% Done:

Estimated time:

Description

There is a consistent increase in memory consumption in my NLSR simulation scenario where I am doing routing test with 25 and 50 nodes. The Valgrind logs indicates that it is not a leak; however process is allocating memory but not releasing it. It could be a legitimate allocation; however given the scale at which it is allocating and not releasing it, it does not seems to be a normal behaviour. Valgrind logs points that memory is allocated by NFD and its subcomponents.

Files

Download all files

25_50_node_valgrind_report.tar.gz (496 KB) 25_50_node_valgrind_report.tar.gz	Valgrind leak analysis report.	Anil Jangam, 05/09/2016 11:27 AM
25_50_nodes_massiff.tar.gz (299 KB) 25_50_nodes_massiff.tar.gz	Valgrind Massif logs.	Anil Jangam, 05/09/2016 11:28 AM
50_nodes_massiff_data.png (155 KB) 50_nodes_massiff_data.png	Valgrind Massif graphical visual.	Anil Jangam, 05/09/2016 11:28 AM
block_cpp.diff (3.26 KB) block_cpp.diff		Anil Jangam, 05/13/2016 02:08 AM
massif.out.10720 (1.93 MB) massif.out.10720	valgrind massif test log - 50 node.	Anil Jangam, 05/19/2016 08:10 AM
50_node_massif.png (519 KB) 50_node_massif.png	massif analyser output - 50 node	Anil Jangam, 05/19/2016 08:12 AM
massif.out.10720 (1.93 MB) massif.out.10720	valgrind massif test log - 25 node.	Anil Jangam, 05/19/2016 08:13 AM
25_node_massif.png (453 KB) 25_node_massif.png	massif analyser output - 25 node	Anil Jangam, 05/19/2016 08:13 AM
massif.out.4868 (1.57 MB) massif.out.4868	valgrind massif test log - 50 node.	Anil Jangam, 05/19/2016 08:17 AM

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by Anil Jangam about 9 years ago

50 node simulation valgrind summary:

==9587== LEAK SUMMARY:
==9587==    definitely lost: 0 bytes in 0 blocks
==9587==    indirectly lost: 0 bytes in 0 blocks
==9587==      possibly lost: 2,263,514 bytes in 67,928 blocks
==9587==    still reachable: 1,474,943,776 bytes in 3,910,237 blocks
==9587==         suppressed: 0 bytes in 0 blocks
==9587==
==9587== For counts of detected and suppressed errors, rerun with: -v
==9587== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from 0)

25 node simulation valgrind summary:

==9287== LEAK SUMMARY:
==9287==    definitely lost: 0 bytes in 0 blocks
==9287==    indirectly lost: 0 bytes in 0 blocks
==9287==      possibly lost: 400,259 bytes in 11,100 blocks
==9287==    still reachable: 437,147,930 bytes in 1,132,024 blocks
==9287==         suppressed: 0 bytes in 0 blocks
==9287==
==9287== For counts of detected and suppressed errors, rerun with: -v
==9287== ERROR SUMMARY: 31 errors from 31 contexts (suppressed: 0 from 0)

Actions

Copy link

Updated by Anil Jangam about 9 years ago

This issue also present in most up-to-date NFD code. It is not yet reported perhaps because in a standalone instance of NFD this is not a concern. Since I am running simulation with 25, 50 and 100+ nodes, each having its own NFD instance, the growth is very quick.

Vince and I debugged this issue little further and below are our findings:

The issue is reproducible also on standalone NFD. Vince tried about 100 registration requests and there is a consistent increase in memory. This increase is present even if RibManager::sendSuccessResponse is not called.
The memory grows even if we bypass the RibManager completely by using "nfdc add-nexthop" and this problem is present in the latest code of NFD since Vince tested this with the most up-to-date version.
Another possibility we thought about was response messages getting cached in CS leading to increase in memory consumption by NFD. To rule this out, we set the CS size to 1 by calling 'ndnHelper.setCsSize(1);' before installing NDN L3 Stack on nodes, but yet we see memory growth.

I also checked that default CS size is 100 packets. So even with this size, CS should not grow beyond 100 packets. So we do not think CS is causing this growth.

 42 StackHelper::StackHelper()
 43   : m_needSetDefaultRoutes(false)
 44   , m_maxCsSize(100)
 45   , m_isRibManagerDisabled(false)
 46   , m_isFaceManagerDisabled(false)
 47   , m_isStatusServerDisabled(false)
 48   , m_isStrategyChoiceManagerDisabled(false)

It seems to be some internal pipeline issue, because when we performed either 1000 add-nexthop commands or 1000 registration commands for the same prefix, the memory increase was observed.

Actions

Copy link

Updated by Anil Jangam about 9 years ago

I investigate the stack trace from Valgrind report.

=================================
./NFD/rib/rib-manager.cpp

188   m_keyChain.sign(*responseData);
189   m_face.put(*responseData);

./NFD/daemon/mgmt/manager-base.cpp

98   m_keyChain.sign(*responseData);
99   m_face->put(*responseData);

Here each time it is allocating ~9K block of memory. I am not sure when this is released, but this is the topmost contributor for memory build up.
./ndn-cxx/src/encoding/encoder.cpp

27 Encoder::Encoder(size_t totalReserve/* = 8800*/, size_t reserveFromBack/* = 400*/)
28   : m_buffer(new Buffer(totalReserve))
29 {
30   m_begin = m_end = m_buffer->end() - (reserveFromBack < totalReserve ? reserveFromBack : 0);
31 }

This is another place in the encoder code, a new memory allocation takes place. This operates on the 9 KByte m_buffer allocated earlier.

83     Buffer* buf = new Buffer(size);
84     std::copy_backward(m_buffer->begin(), m_buffer->end(), buf->end());
85

=================================
The other possibility is that dead-nonce-list is not getting cleared after the loop detection duration. Or perhaps the ndn::Block() is not released after the ndn::Name wireEncode() is done for the first time. This is the second most reason for memory build up.
Ref: https://github.com/named-data/NFD/blob/149e02cf7da50ac526f7d850c2ecd9570e943f9b/daemon/table/dead-nonce-list.cpp#L108

./NFD/daemon/table/dead-nonce-list.cpp

105 DeadNonceList::Entry
106 DeadNonceList::makeEntry(const Name& name, uint32_t nonce)
107 {
108   Block nameWire = name.wireEncode();

./ndn-cxx/src/encoding/block.cpp

344       m_subBlocks.push_back(Block(m_buffer,
345                                   type,
346                                   element_begin, element_end,
347                                   begin, element_end));
348

Actions

Copy link

Updated by Junxiao Shi about 9 years ago

Subject changed from Memory bloating issue with NFD affecting scale test. to Memory bloating in RIB
Category changed from Core to RIB
Target version set to v0.5

I briefly looked at both places pointed by note-3.

I think DeadNonceList::makeEntry has no problem.
nameWire is a local variable within this function, which is released as soon as this function returns.
The DeadNonceList::Entry type is a typedef of integer, which by itself cannot leak.

NFD-RIB implementation is problematic:

NFD-RIB is an application connected to NFD with a ndn::Face.
ndn::Face allocates an 8800-octet buffer for each incoming packet.
A command Interest for prefix registration is parsed and authenticated. The ndn::Name prefix being registered is passed to and stored in the RIB.
As indicated in #3495-1, when a sub-block is obtained from its parent, both blocks share the same buffer through a shared_ptr, unless explicitly unshared with wireEncode. In this instance, the ndn::Name shares a buffer with the incoming Interest, which means the 8800-octet buffer from step 2 is retained, even if only a small portion (the name prefix) is useful. This is why you see a lot of memory being used after registering many prefixes.

Actions

Copy link

Updated by Vince Lehman about 9 years ago

If this is the problem, why is there still a memory increase when the same route is registered repeatedly? If the Name and route are not new, the Name from the request is not inserted into the RIB.

Actions

Copy link

Updated by Alex Afanasyev about 9 years ago

Few comments. Name::wireEncode would work only in #3495, as there is a call for Name::getFullName that reset the cached version of the wire for the name.

Allocation of 8800 block doesn't mean that the underlying buffer of all packets is 8800 bytes. We don't have (yet) the zero copy implementation and the buffer is being copied into the new buffer that has size of the actual interest or data packet.

However, this doesn't mean there is no problem. I agree that it is unintentional excessive memory use because getName() in RibManager, FibManager and other places does not "compact" the memory.

I have two alternative proposals:

::compact() method for Name to compress memory representation for this structure (as well as all other data structures)
::copy() method for Name that returns a compacted/copied data structure (can also work for other data structures)

Actions

Copy link

Updated by Anil Jangam about 9 years ago

Junxiao, I agree that 'nameWire' is not leaking memory. The subsequent part of the stack trace does the allocation when 'wireEncode()' is called for the first time. Alex's reference to #3495 confirms the same (i.e. Right now, you can simply call Name::wireEncode() which will make a separate memory allocation, releasing the previously referenced buffer). So question is: when does NFD releases this allocation? How long does it need to hold it?

Also, right now, I have configured only two name prefixes per node. So at any point in simulation there are not going to be '2 x N' prefixes in the network (where 'N' is # of nodes in topology). So I did not get what you are trying to say in #4.4. Once all the name prefixes are registered it should stabilize, right?

Actions

Copy link

Updated by Junxiao Shi about 9 years ago

Related to Bug #3619: NameTree entry not released added

Actions

Copy link

Updated by Anil Jangam about 9 years ago

Analysis for the top most memory consumer. I am going through other other stack traces using same technique I used here.

src/ndnSIM/ndn-cxx/src/security/key-chain.cpp

697 void
698 KeyChain::signPacketWrapper(Data& data, const Signature& signature,
699                             const Name& keyName, DigestAlgorithm digestAlgorithm)
700 {
701   data.setSignature(signature);
702 
703   EncodingBuffer encoder;
704   std::cout << "1 Reference count of buffer: " << encoder.getBuffer().use_count() << std::endl;
705 
706   data.wireEncode(encoder, true);
707   std::cout << "2 Reference count of buffer: " << encoder.getBuffer().use_count() << std::endl;
708 
709   Block sigValue = pureSign(encoder.buf(), encoder.size(), keyName, digestAlgorithm);
710   std::cout << "3 Reference count of buffer: " << encoder.getBuffer().use_count() << std::endl;
711 
712   data.wireEncode(encoder, sigValue);
713   std::cout << "4 Reference count of buffer: " << encoder.getBuffer().use_count() << std::endl;
714 }

I am not sure if Alex mentioned this when he talked about compacting the buffer (in comment #6). But I have made some observations. EncodingBuffer, which derives Encoder allocates the 'share_ptr<Buffer> m_Buffer'. In above code, instance 'encoder' is a local so it should get deleted automatically; however, I see that there is no destructor defined for EncodingBuffer neither it is defined for Buffer class. IMO, we need to clear the vector and shrink_to_fit it? Correct?

Also, I added some debugs (see 'cout' in above code) and to print the reference count of share_ptr m_Buffer and I got this.

1 Reference count of buffer: 2
2 Reference count of buffer: 2
3 Reference count of buffer: 2
4 Reference count of buffer: 29

This indicates that by the time signPacketWrapper returns, the memory held by m_Buffer is not being released. IMO ideally this count should be 0. Since the reference count is non-zero, the memory is not getting deallocated. And even if this count becomes zero, I guess the resource will still not be cleaned since there is no destructor defined. Correct?

~/sandbox/ndnSIM/ns-3$ grep "4 Reference count of buffer: " 25node_traces/25-node-nlsr-simulation-log.txt  | wc -l
36941

For a 25 node topology test, there are close to 37K m_Buffer objects getting created and still being held. As per the above debugs, each of these 37K objects has reference count of 29 and above.

Actions

Copy link

#10

Updated by Anil Jangam about 9 years ago

File block_cpp.diff block_cpp.diff added

Some observations from the "block.cpp" file and I added some debug cout (block_cpp.diff) to check how the size of how m_subBlock (typedef std::vector<Block> element_container;) is changing. I observe that not all blocks are freed and they still have some entries indicating they are still holding some memory. I am not sure how this vector needs to be cleaned and how it behaves at run time. Since this is happening for every incoming interest, thought to highlight this.

As per the debug logs (block_cpp.diff) added, observation is there is no ERASE, RESET or REMOVE operation is called on these objects. Can you please check when these objects should get cleaned?

0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 6
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 7
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 8
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 1
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 2
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 3
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 4
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 5
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 6
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 7
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 8
0x7ffd1e8e7a60 PUSH_BACK2 Size of Sub block container: 9
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 1
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 2
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 3
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 4
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 5
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 6
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 7
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 8
0x7ffd1e8e7a60 PUSH_BACK1 Size of Sub block container: 9

Actions

Copy link

#11

Updated by Anil Jangam about 9 years ago

==== Adding on behalf of Alex. ====

Anil, I've pushed to the code review system a few patches that should reduce some memory use

library http://gerrit.named-data.net/#/c/2863/
NFD http://gerrit.named-data.net/#/c/2864/

Can you try to rerun some of your analysis with those patches? Just to see if we see smaller memory footprint or it says the same for some other reason.

--
Alex

Actions

Copy link Download all files

#12

Updated by Anil Jangam about 9 years ago

File massif.out.10720 massif.out.10720 added
File 50_node_massif.png 50_node_massif.png added
File massif.out.10720 massif.out.10720 added
File 25_node_massif.png 25_node_massif.png added

Updating the results after a fix. Memory growth is still observed. The scale of growth is still same as before. Performed two test runs: 25 node and 50 node. I have done valgrind massif test run only.

Actions

Copy link

#13

Updated by Anil Jangam about 9 years ago

File massif.out.4868 massif.out.4868 added

Actions

Copy link

#14

Updated by Anil Jangam about 9 years ago

Can someone please share the steps to measure memory consumption by RIB?
I will first revert the patch to get the reference and then test it again after the patch.

Actions

Copy link

#15

Updated by Junxiao Shi about 9 years ago

The patches quoted in note-11 is only about RIB. It does not solve #3619, and the observation in note-12 is mixing two issues.

Reply to note-14:

I do not know the steps. In general:

The test should observe memory allocations, and record an allocation only if a function in nfd::rib namespace is on the stack.
This shall observe whether note-11 patch fixes #3618, without being affected by #3619.

Actions

Copy link

#16

Updated by Anil Jangam about 9 years ago

Below are the valgrind run AFTER Alex's fix. (Check below update #17) for readings BEFORE the fix).

Observations:

The size of "possibly lost memory" has INCREASED in this run compared to the run WITHOUT this fix.
The size of "still rechable" memory has DECREASED compared to the run WITHOUT the fix.

This indicates that while amount of memory held by process gone down, the potential leaks have increased.

The amount of memory used by "nfd::rib" is fairly consistent with or without Alex's fix. I am pasting one of the stack traces involving the RibManager.

==2770== 57,965,600 bytes in 6,587 blocks are possibly lost in loss record 1,218 of 1,218
==2770==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2770==    by 0x69A1353: allocate (new_allocator.h:104)
==2770==    by 0x69A1353: _M_allocate (stl_vector.h:168)
==2770==    by 0x69A1353: _M_create_storage (stl_vector.h:181)
==2770==    by 0x69A1353: _Vector_base (stl_vector.h:136)
==2770==    by 0x69A1353: vector (stl_vector.h:283)
==2770==    by 0x69A1353: ndn::Buffer::Buffer(unsigned long) (buffer.cpp:43)
==2770==    by 0x69A1F0A: ndn::encoding::Encoder::Encoder(unsigned long, unsigned long) (encoder.cpp:28)
==2770==    by 0x6A41E57: EncodingImpl (encoding-buffer.hpp:42)
==2770==    by 0x6A41E57: ndn::security::KeyChain::signPacketWrapper(ndn::Data&, ndn::Signature const&, ndn::Name const&, ndn::DigestAlgorithm) (key-chain.cpp:703)
==2770==    by 0x6A49755: void ndn::security::KeyChain::signImpl<ndn::Data>(ndn::Data&, ndn::security::SigningInfo const&) (key-chain.hpp:892)
==2770==    by 0x6B6384B: nfd::rib::RibManager::sendResponse(ndn::Name const&, ndn::mgmt::ControlResponse const&) (rib-manager.cpp:188)
==2770==    by 0x6B6589B: nfd::rib::RibManager::sendSuccessResponse(std::shared_ptr<ndn::Interest const> const&, ndn::nfd::ControlParameters const&) (rib-manager.cpp:505)
==2770==    by 0x6B664B0: nfd::rib::RibManager::registerEntry(std::shared_ptr<ndn::Interest const> const&, ndn::nfd::ControlParameters&) (rib-manager.cpp:287)
==2770==    by 0x6B63B26: operator() (functional:2471)
==2770==    by 0x6B63B26: nfd::rib::RibManager::onCommandValidated(std::shared_ptr<ndn::Interest const> const&) (rib-manager.cpp:254)
==2770==    by 0x6A98DDE: ndn::ValidatorConfig::checkPolicy(ndn::Interest const&, int, std::function<void (std::shared_ptr<ndn::Interest const> const&)> const&, std::function<void (std::shared_ptr<ndn::Interest const> const&, std::string const&)> const&, std::vector<std::shared_ptr<ndn::ValidationRequest>, std::allocator<std::shared_ptr<ndn::ValidationRequest> > >&) (validator-config.cpp:530)
==2770==    by 0x6AAE02A: ndn::Validator::validate(ndn::Interest const&, std::function<void (std::shared_ptr<ndn::Interest const> const&)> const&, std::function<void (std::shared_ptr<ndn::Interest const> const&, std::string const&)> const&, int) (validator.cpp:54)
==2770==    by 0x6B63272: validate (validator.hpp:101)
==2770==    by 0x6B63272: nfd::rib::RibManager::onLocalhostRequest(ndn::Interest const&) (rib-manager.cpp:216)
==2770==
==2770== LEAK SUMMARY:
==2770==    definitely lost: 0 bytes in 0 blocks
==2770==    indirectly lost: 0 bytes in 0 blocks
==2770==      possibly lost: 452,854,398 bytes in 1,688,776 blocks
==2770==    still reachable: 59,138 bytes in 152 blocks
==2770==         suppressed: 0 bytes in 0 blocks
==2770==
==2770== For counts of detected and suppressed errors, rerun with: -v

Actions

Copy link

#17

Updated by Anil Jangam about 9 years ago

Below are the valgrind readings BEFORE Alex's fix.

==3531== 64,512,800 bytes in 7,331 blocks are still reachable in loss record 1,187 of 1,187
==3531==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3531==    by 0x69A1323: allocate (new_allocator.h:104)
==3531==    by 0x69A1323: _M_allocate (stl_vector.h:168)
==3531==    by 0x69A1323: _M_create_storage (stl_vector.h:181)
==3531==    by 0x69A1323: _Vector_base (stl_vector.h:136)
==3531==    by 0x69A1323: vector (stl_vector.h:283)
==3531==    by 0x69A1323: ndn::Buffer::Buffer(unsigned long) (buffer.cpp:43)
==3531==    by 0x69A1EDA: ndn::encoding::Encoder::Encoder(unsigned long, unsigned long) (encoder.cpp:28)
==3531==    by 0x6A41E27: EncodingImpl (encoding-buffer.hpp:42)
==3531==    by 0x6A41E27: ndn::security::KeyChain::signPacketWrapper(ndn::Data&, ndn::Signature const&, ndn::Name const&, ndn::DigestAlgorithm) (key-chain.cpp:703)
==3531==    by 0x6A49725: void ndn::security::KeyChain::signImpl<ndn::Data>(ndn::Data&, ndn::security::SigningInfo const&) (key-chain.hpp:892)
==3531==    by 0x6B6365B: nfd::rib::RibManager::sendResponse(ndn::Name const&, ndn::mgmt::ControlResponse const&) (rib-manager.cpp:188)
==3531==    by 0x6B656AB: nfd::rib::RibManager::sendSuccessResponse(std::shared_ptr<ndn::Interest const> const&, ndn::nfd::ControlParameters const&) (rib-manager.cpp:505)
==3531==    by 0x6B662C0: nfd::rib::RibManager::registerEntry(std::shared_ptr<ndn::Interest const> const&, ndn::nfd::ControlParameters&) (rib-manager.cpp:287)
==3531==    by 0x6B63936: operator() (functional:2471)
==3531==    by 0x6B63936: nfd::rib::RibManager::onCommandValidated(std::shared_ptr<ndn::Interest const> const&) (rib-manager.cpp:254)
==3531==    by 0x6A98DAE: ndn::ValidatorConfig::checkPolicy(ndn::Interest const&, int, std::function<void (std::shared_ptr<ndn::Interest const> const&)> const&, std::function<void (std::shared_ptr<ndn::Interest const> const&, std::string const&)> const&, std::vector<std::shared_ptr<ndn::ValidationRequest>, std::allocator<std::shared_ptr<ndn::ValidationRequest> > >&) (validator-config.cpp:530)
==3531==    by 0x6AADFFA: ndn::Validator::validate(ndn::Interest const&, std::function<void (std::shared_ptr<ndn::Interest const> const&)> const&, std::function<void (std::shared_ptr<ndn::Interest const> const&, std::string const&)> const&, int) (validator.cpp:54)
==3531==    by 0x6B63082: validate (validator.hpp:101)
==3531==    by 0x6B63082: nfd::rib::RibManager::onLocalhostRequest(ndn::Interest const&) (rib-manager.cpp:216)
==3531==
==3531== LEAK SUMMARY:
==3531==    definitely lost: 0 bytes in 0 blocks
==3531==    indirectly lost: 0 bytes in 0 blocks
==3531==      possibly lost: 2,396,214 bytes in 72,078 blocks
==3531==    still reachable: 479,916,787 bytes in 1,680,083 blocks
==3531==         suppressed: 0 bytes in 0 blocks
==3531==
==3531== For counts of detected and suppressed errors, rerun with: -v
==3531== ERROR SUMMARY: 37 errors from 37 contexts (suppressed: 0 from 0)

Actions

Copy link

#18

Updated by Junxiao Shi almost 9 years ago

Assignee set to Weiwei Liu

Weiwei agrees to investigate this.

There might also be memory leaks elsewhere.
They should be investigated, and reported as separate bugs to fix.

Actions

Copy link

#19

Updated by Junxiao Shi almost 9 years ago

Status changed from New to In Progress

On 20160902 Weiwei and I ran a test with ndn-cxx:commit:cfaa1addf4592d44e119604abcec06c73ef113f2 and NFD:commit:9627e88d4b9a3e30090ad4d5240d58525ba0126f on OSX 10.10.5, LLVM 7.0.2 clang-700.1.81. ndn-cxx is configured as --without-osx-keychain to avoid #1663-4 memory leak.
We conclude that there's no measurable memory growth after both ContentStore and InMemoryStorage are full.

We set ContentStore capacity to 1 packet.
NFD Management and NFD-RIB each has an InMemoryStorage. The capacity is hard-coded as 256 packets.
We ran 1000 prefix registration with a command similar to nfdc register /example udp4://192.0.2.1, take a note of NFD process memory usage (as seen with top command).
Then we ran another 10000 prefix registrations, and look at NFD process memory usage again.

The memory usage observed in step3 and step4 are the same, which indicates there's no memory leak in NFD-RIB.

@Anil, if you have further evidence on a memory leak on latest ndn-cxx and NFD versions (not ndnSIM or Android fork) after both ContentStore and InMemoryStorage are full, we can investigate.
Otherwise, this issue will be rejected after 7 days.

Actions

Copy link

#20

Updated by Junxiao Shi over 8 years ago

Status changed from In Progress to Rejected

More than 7 days have passed since note-19 is posted, and Anil did not enter new evidence.

Actions

Copy link

#21

Updated by Anil Jangam over 8 years ago

@Junxiao, how do I bring in the fixes for ndn-cxx:cfaa1addf4592d44e119604abcec06c73ef113f2 and NFD:commit:9627e88d4b9a3e30090ad4d5240d58525ba0126f onto ndnSIM? I think without these two fixes, the leak/boating in ndnSIM will not go off. If I just patch in these two diffs on ndnSIM branch, will it help? I saw your readmine updates late so could not verify these so far. I will do it now.

/anil.

Actions

Copy link

#22

Updated by Alex Afanasyev over 8 years ago

ndnSIM uses a slightly modified versions of ndn-cxx and NFD:

On Github (for public access):

On Gerrit (for submitting patches):

Actions

Copy link

#23

Updated by Anil Jangam over 8 years ago

@Alex @Junxiao, which NFD branch the commit '9627e88d4b9a3e3009' is made into? I searched it in mainline NFD but did not find it. Am I missing something? (I could locate the ndn-cxx patch/diff)
@Alex, I need to try these first on my NFD clone under ndnSIM.

ubuntu:~/workspace/NFD$ git log > ~/nfd_log.txt
ubuntu:~/workspace/NFD$ cat ~/nfd_log.txt 

snip...
/snip

commit 336e4f71cf4e6e6d02329a69088eef4f21088e35
Author: Junxiao Shi <git@mail1.yoursunny.com>
Date:   Wed Jan 22 19:38:31 2014 -0700

doc: add pkg-config as a prerequisite

Change-Id: I820491e6441e0d459834a3f47a3ca5b5516cbc04

commit 2aa396272b9a6892b8c29f76efb66e81b443949a
Author: Alexander Afanasyev <alexander.afanasyev@ucla.edu>
Date:   Wed Jan 22 11:51:11 2014 -0800

Adding build system, README, COPYING, basic INSTALL.md, and skeleton for unit tests

Change-Id: I00a58106e43f6eaaec6eedf9fa7d217a22c19d2b

commit a14170028c5f6611add84362f18b476031590fc0
Author: Alexander Afanasyev <alexander.afanasyev@ucla.edu>
Date:   Tue Jan 21 20:29:55 2014 -0800

Initial commit
ubuntu:~/workspace/NFD$ 
ubuntu:~/workspace/NFD$ cat ~/nfd_log.txt | grep 9627e88d4b9a
ubuntu:~/workspace/NFD$

Actions

Copy link

#24

Updated by Junxiao Shi over 8 years ago

which NFD branch the commit '9627e88d4b9a3e3009' is made into?

This commit can be found at https://gerrit.named-data.net/#/c/3152/2 and it's not merged into any branch.

It's worth noting that this commit does not contain a fix for RIB memory growth issue, and applying it onto an older version of NFD will not fix the issue.
Any one or more commits since the older version of NFD used by ndnSIM to the current NFD may contain the fix, and all of them should be applied. This would take significant effort, and it's still ongoing in #3560.

Actions

Copy link

#25

Updated by Junxiao Shi over 8 years ago

Benchmark of https://gerrit.named-data.net/2864 patchset3:

vagrant@m0212:~/NFD$ /usr/bin/time -v build/pit-fib-benchmark 
Running 1 test case...

*** No errors detected
    Command being timed: "build/pit-fib-benchmark"
    User time (seconds): 26.22
    System time (seconds): 2.31
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.67
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3418672
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 84
    Minor (reclaiming a frame) page faults: 845481
    Voluntary context switches: 130
    Involuntary context switches: 216
    Swaps: 0
    File system inputs: 30096
    File system outputs: 8
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Comparing to master branch nfd:commit:2a0019431687ce44b060cc441f01bf7e24945c34 :

vagrant@m0212:~/NFD$ /usr/bin/time -v build/pit-fib-benchmark 
Running 1 test case...

*** No errors detected
    Command being timed: "build/pit-fib-benchmark"
    User time (seconds): 21.95
    System time (seconds): 2.42
    Percent of CPU this job got: 99%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.60
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 3366316
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 28
    Minor (reclaiming a frame) page faults: 836082
    Voluntary context switches: 52
    Involuntary context switches: 428
    Swaps: 0
    File system inputs: 4968
    File system outputs: 8
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

Actions

Copy link

#26

Updated by Alex Afanasyev over 8 years ago

This test is almost irrelevant to the patch. The primary concern in the patch is about FIB/RIB managers (minus change in name tree that may be useless/wrong). The tests doesn't exercise any of the registration commands.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #3618

Memory bloating in RIB

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Junxiao Shi about 9 years ago

Updated by Vince Lehman about 9 years ago

Updated by Alex Afanasyev about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Junxiao Shi about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Junxiao Shi about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Anil Jangam about 9 years ago

Updated by Junxiao Shi almost 9 years ago

Updated by Junxiao Shi almost 9 years ago

Updated by Junxiao Shi over 8 years ago

Updated by Anil Jangam over 8 years ago

Updated by Alex Afanasyev over 8 years ago

Updated by Anil Jangam over 8 years ago

Updated by Junxiao Shi over 8 years ago

Updated by Junxiao Shi over 8 years ago

Updated by Alex Afanasyev over 8 years ago