Project

General

Profile

Actions

Task #2538

open

Figure out whether there are memory leaks in ndnSIM.

Added by Spyros Mastorakis over 9 years ago. Updated over 6 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
tests
Target version:
-
Start date:
02/18/2015
Due date:
% Done:

0%

Estimated time:

Description

I address this issue as a task, because I do not really know whether there are indeed such issues and, if so,
whether they are related to ndnSIM or NFD.

Potential memory leaks were reported by Christian Kreuzberger in ndnSIM mailing list. Part of the original email:

using the latest version of ndnSIM and ndn-cxx, I’ve noticed a rather high memory usage (several gigabytes, with
content store set to 1 packet) in simulations involving plenty of interests and data packets. So I’ve compiled ndnSIM
in debug mode and started the ndn-simple example (from scratch directory) with valgrind active, here is the
summary of the output.

./waf --command-template="valgrind --leak-check=full --show-reachable=yes %s" --run ndn-simple


==9678== 1,738,712 (408 direct, 1,738,304 indirect) bytes in 3 blocks are definitely lost in loss record 448 of 448
==9678==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9678==    by 0x5AD027F: ns3::Ptr<ns3::Node> ns3::CreateObject<ns3::Node>() (object.h:423)
==9678==    by 0xD9C8CFA: ns3::NodeContainer::Create(unsigned int) (node-container.cc:137)
==9678==    by 0x41C856: ns3::main(int, char**) (ndn-simple.cc:63)
==9678==    by 0x41D2D7: main (ndn-simple.cc:107)
==9678==
==9678== LEAK SUMMARY:
==9678==    definitely lost: 408 bytes in 3 blocks
==9678==    indirectly lost: 1,738,304 bytes in 8,257 blocks
==9678==      possibly lost: 98 bytes in 3 blocks
==9678==    still reachable: 656 bytes in 33 blocks
==9678==         suppressed: 0 bytes in 0 blocks


At first, this looks like that the memory lost is only when creating a node, which is not good, but would be “okay” for
simulations. But when you dig deeper into the generated output of valgrind, you will find that the problem is caused
by several allocations about packets, regular expressions, skiplists in content store, etc… I’m not an expert in that field,
but I thought it is worth mentioning on the mailing list.
Example of the most important ones:

==9806== 238,400 bytes in 200 blocks are indirectly lost in loss record 447 of 448
==9806==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9806==    by 0x8DE20A2: __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<ndn::Data, std::allocator<ndn::Data>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) (new_allocator.h:104)
==9806==    by 0x8DE1D3D: std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<ndn::Data, std::allocator<ndn::Data>, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<ndn::Data, std::allocator<ndn::Data>, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) (alloc_traits.h:351)
==9806==    by 0x8DE18E7: std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<ndn::Data, std::allocator<ndn::Data>>(std::_Sp_make_shared_tag, ndn::Data*, std::allocator<ndn::Data> const&) (shared_ptr_base.h:499)
==9806==    by 0x8DE138D: std::__shared_ptr<ndn::Data, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<ndn::Data>>(std::_Sp_make_shared_tag, std::allocator<ndn::Data> const&) (shared_ptr_base.h:957)
==9806==    by 0x8DE0B6B: std::shared_ptr<ndn::Data>::shared_ptr<std::allocator<ndn::Data>>(std::_Sp_make_shared_tag, std::allocator<ndn::Data> const&) (shared_ptr.h:316)
==9806==    by 0x8DDFC5B: std::shared_ptr<ndn::Data> std::allocate_shared<ndn::Data, std::allocator<ndn::Data>>(std::allocator<ndn::Data> const&) (shared_ptr.h:598)
==9806==    by 0x8DDF103: std::shared_ptr<ndn::Data> std::make_shared<ndn::Data>() (shared_ptr.h:614)
==9806==    by 0x8FD6B9C: ns3::ndn::PacketHeader<ndn::Data>::Deserialize(ns3::Buffer::Iterator) (ndn-header.cpp:120)
==9806==    by 0xD912BAA: ns3::Packet::RemoveHeader(ns3::Header&) (packet.cc:290)
==9806==    by 0x8FF2F53: std::shared_ptr<ndn::Data const> ns3::ndn::Convert::FromPacket<ndn::Data>(ns3::Ptr<ns3::Packet>) (ndn-ns3.cpp:37)
==9806==    by 0x8FF18DD: ns3::ndn::NetDeviceFace::receiveFromNetDevice(ns3::Ptr<ns3::NetDevice>, ns3::Ptr<ns3::Packet const>, unsigned short, ns3::Address const&, ns3::Address const&, ns3::NetDevice::PacketType) (ndn-net-device-face.cpp:130)
==9806==

==9806== 216,879 bytes in 201 blocks are indirectly lost in loss record 446 of 448
==9806==    at 0x4C2B800: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9806==    by 0xD8E7074: ns3::Buffer::Allocate(unsigned int) (buffer.cc:172)
==9806==    by 0xD8E6D93: ns3::Buffer::Create(unsigned int) (buffer.cc:141)
==9806==    by 0xD8E8666: ns3::Buffer::AddAtStart(unsigned int) (buffer.cc:331)
==9806==    by 0xD912A53: ns3::Packet::AddHeader(ns3::Header const&) (packet.cc:278)
==9806==    by 0xAD8A64A: ns3::PointToPointNetDevice::AddHeader(ns3::Ptr<ns3::Packet>, unsigned short) (point-to-point-net-device.cc:167)
==9806==    by 0xAD8E466: ns3::PointToPointNetDevice::Send(ns3::Ptr<ns3::Packet>, ns3::Address const&, unsigned short) (point-to-point-net-device.cc:502)
==9806==    by 0x8FF10E5: ns3::ndn::NetDeviceFace::send(ns3::Ptr<ns3::Packet>) (ndn-net-device-face.cpp:89)
==9806==    by 0x8FF150C: ns3::ndn::NetDeviceFace::sendData(ndn::Data const&) (ndn-net-device-face.cpp:111)
==9806==    by 0x8DF6F06: nfd::Forwarder::onOutgoingData(ndn::Data const&, nfd::Face&) (forwarder.cpp:377)
==9806==    by 0x8DF6455: nfd::Forwarder::onIncomingData(nfd::Face&, ndn::Data const&) (forwarder.cpp:333)
==9806==    by 0x8DEDA96: nfd::Forwarder::onData(nfd::Face&, ndn::Data const&) (forwarder.hpp:263)
==9806==

The valgrind output is rather long, I suggest you try generating it by yourself. I’m using Ubuntu 14.04, if that matters.
I’m not after chasing every single memory leak, though if it involves packets/packet based structures, there might be
an issue with ndnSIM or even NFD, that could affect the testbed aswell.

In addition, I’ve tested ndn-simple with Frequency set to other values instead of 10, here is the resulting output:
http://pastebin.com/0PZw40bk


Files

ndn-memory-test.cc (2.95 KB) ndn-memory-test.cc memory test via content store Christian Kreuzberger, 02/19/2015 01:42 AM
patch_ndn_consumer.txt (940 Bytes) patch_ndn_consumer.txt patch for ndn-consumer Christian Kreuzberger, 08/06/2015 02:29 AM
Actions #1

Updated by Spyros Mastorakis over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Michael Sweatt
Actions #2

Updated by Alex Afanasyev over 9 years ago

  • Description updated (diff)
Actions #3

Updated by Christian Kreuzberger over 9 years ago

After some debugging, I think ONE of the issues that would appear to be a memory leak, but "should not be" a true memory leak, is caused by CS and PIT.
Whenever an interest is transmitted, it is added to the pit (via smart pointers and dereferencing etc...).
This causes multiple copies of the interest pointer to be stored, and I'm not sure if the ReferenceCounter ever reaches 0 and the interests are then destroyed.
The same is true for data packets: They are added to the content store, and I'm not sure if they are properly removed from the content store.

In addition, when Simulator::Destroy(); is called at the end of each simulation, all content store entries "should" be destroyed when the nodes are being destroyed.

To verify the content store hypothesis, I've modified the simple scenario (see attached .cc file) and evaluated with content store size 1,2,3,5 and 10. Here are the results:
content store size 1:

==10795== LEAK SUMMARY:
==10795== definitely lost: 408 bytes in 3 blocks
==10795== indirectly lost: 262,082 bytes in 2,190 blocks
==10795== possibly lost: 98 bytes in 3 blocks
==10795== still reachable: 656 bytes in 33 blocks
==10795== suppressed: 0 bytes in 0 blocks

content store size 2:

==10825== LEAK SUMMARY:
==10825== definitely lost: 408 bytes in 3 blocks
==10825== indirectly lost: 278,282 bytes in 2,263 blocks
==10825== possibly lost: 98 bytes in 3 blocks
==10825== still reachable: 656 bytes in 33 blocks
==10825== suppressed: 0 bytes in 0 blocks

content store size 3:

==10835== LEAK SUMMARY:
==10835== definitely lost: 408 bytes in 3 blocks
==10835== indirectly lost: 293,002 bytes in 2,321 blocks
==10835== possibly lost: 98 bytes in 3 blocks
==10835== still reachable: 656 bytes in 33 blocks
==10835== suppressed: 0 bytes in 0 blocks
==10835==

content store size 5:

==10815== LEAK SUMMARY:
==10815== definitely lost: 408 bytes in 3 blocks
==10815== indirectly lost: 322,442 bytes in 2,437 blocks
==10815== possibly lost: 98 bytes in 3 blocks
==10815== still reachable: 656 bytes in 33 blocks
==10815== suppressed: 0 bytes in 0 blocks

content store size 10:

==10805== LEAK SUMMARY:
==10805== definitely lost: 408 bytes in 3 blocks
==10805== indirectly lost: 396,525 bytes in 2,743 blocks
==10805== possibly lost: 98 bytes in 3 blocks
==10805== still reachable: 656 bytes in 33 blocks
==10805== suppressed: 0 bytes in 0 blocks

Actions #4

Updated by Michael Sweatt over 9 years ago

Poking around with memory tools I do not see a memory leak either, and can confirm no memory corruptions.
I intend to run clang memory sanitizer '-fsanitize=address' on a Linux machine, but for now cannot on the Mac I originally tested on.

I will update with the results.

Actions #5

Updated by Spyros Mastorakis over 9 years ago

Mickey, any final news on that?

Actions #6

Updated by Spyros Mastorakis over 9 years ago

I think that the root cause for this issue was a bug in ndn-cxx that was fixed and the commit was merged a long time ago. So, I feel that we can close this issue.

Actions #7

Updated by Christian Kreuzberger over 9 years ago

Hi!

I've checked with the current version of ndn-cxx, and valgrind still tells that there are plenty of indirectly lost bytes, however this number heavily depends on the content store size, rather than the transmitted packets. This is better, but there certainly are still memory leaks and problems - not sure if in ndn-cxx or ndnSIM though.

For instance, I found (and fixed, patch attached) the following "rarely occuring" memory problem in ndn-consumer:
In ndn-consumer there are several lists, among them Retransmission counts. This list is accessed in WillSendOutInterest: https://github.com/named-data/ndnSIM/blob/master/apps/ndn-consumer.cpp#L280
I believe that while the entry forsequenceNumber is deleted in OnData, it is not deleted when one or multiple timeouts occur and the data packet NEVER arrives (should not happen, agreed, but it can happen). So this list should be cleared in StopApplication (not before though, because obviously the packet can be rentransmitted...).

I strongly believe that every object that is created should be deconstructed/removed properly in the code. Therefore some attention needs to be paid to all of those lists / vectors / maps / ... in all of the consumers, and they should be deconstructed/deleted/cleared/... in StopApplication.
The lists specifically are:

RetxSeqsContainer m_retxSeqs;
SeqTimeoutsContainer m_seqTimeouts;
SeqTimeoutsContainer m_seqLastDelay;
SeqTimeoutsContainer m_seqFullDelay;
std::map m_seqRetxCounts;

In addition, I believe that m_rtt needs to be deconstructed / set to NULL in ~Consumer(), or alternatively, and this is better I think:
it should be initialized in StartApplication and removed in stopApplication.

Actions #8

Updated by Christian Kreuzberger over 9 years ago

edit: after some research/googling/analyzing, the remaining problems reported (for now) are no problems.
The remaining data that is indirectly lost belongs to the NodeContainer structure of ns3, and according to what I found those are not real errors/problem.
Hence the content store problem is also related to this.

Actions #9

Updated by Alex Afanasyev about 9 years ago

Christian, if you want, you can try to clean up global state at the end of simulation:

Simulator::Destroy();
Names::Clear();
GlobalRouter::clear();

Though I agree that these are not leaks, rather just uncleaned global state.

Actions #10

Updated by Alex Afanasyev over 8 years ago

  • Status changed from In Progress to New
  • Assignee deleted (Michael Sweatt)
Actions #11

Updated by Alex Afanasyev over 6 years ago

  • Target version deleted (2.1)
Actions

Also available in: Atom PDF