Bug #3326
openndn::tlv::Error (TLV length exceeds buffer length) when calling /localhost/nfd/faces/list without caching
0%
Description
In my scenario (see attached file) I get ndn::tlv::Error (TLV length exceeds buffer length). The gdb stack trace looks as follows:
#5 0x00007fffe8e06922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff23a7814 in boost::throw_exception<boost::exception_detail::error_info_injector<ndn::tlv::Error>
> (e=...) at /usr/include/boost/throw_exception.hpp:67
#7 0x00007ffff23a6e21 in boost::exception_detail::throw_exception_<ndn::tlv::Error> (x=..., current_function=0x7ffff26e33f0 <ndn::Block::parse() const::__PRETTY_FUNCTION__>
"void ndn::Block::parse() const", file=0x7ffff26e2990 "../src/ndnSIM/ndn-cxx/src/encoding/block.cpp", line=340) at /usr/include/boost/throw_exception.hpp:84
#8 0x00007ffff23a9db1 in ndn::Block::parse (this=0x7fffffffc1e0) at ../src/ndnSIM/ndn-cxx/src/encoding/block.cpp:340
#9 0x00007ffff23f995c in ndn::nfd::FaceStatus::wireDecode (this=0x7fffffffc170, block=...) at ../src/ndnSIM/ndn-cxx/src/management/nfd-face-status.cpp:121
#10 0x00007ffff23f9738 in ndn::nfd::FaceStatus::FaceStatus (this=0x7fffffffc170, block=...) at ../src/ndnSIM/ndn-cxx/src/management/nfd-face-status.cpp:49
#11 0x00007ffff26059a8 in nfd::rib::RibManager::removeInvalidFaces (this=0x5ad20018, buffer=...) at ../src/ndnSIM/NFD/rib/rib-manager.cpp:704
#12 0x00007ffff260557b in nfd::rib::RibManager::fetchSegments (this=0x5ad20018, data=..., buffer=...) at ../src/ndnSIM/NFD/rib/rib-manager.cpp:679
#13 0x00007ffff2617f41 in boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >::operator()
(this=0xa021be20, p=0x5ad20018, a1=..., a2=...) at /usr/include/boost/bind/mem_fn_template.hpp:280
#14 0x00007ffff2616040 in boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> >
>::operator()<boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >, boost::_bi::list2<ndn::Interest const&, ndn::Data&> >
(this=0xa021be30, f=..., a=...) at /usr/include/boost/bind/bind.hpp:392
#15 0x00007ffff2612c44 in boost::_bi::bind_t<void, boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >,
boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> > >
>::operator()<ndn::Interest, ndn::Data> (this=0xa021be20, a1=..., a2=...) at /usr/include/boost/bind/bind_template.hpp:76
#16 0x00007ffff260eeb2 in std::_Function_handler<void (ndn::Interest const&, ndn::Data&), boost::_bi::bind_t<void, boost::_mfi::mf2<void,
nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >,
boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> > >
> >::_M_invoke(std::_Any_data const&, ndn::Interest const&, ndn::Data&) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/4.8/functional:2071
#17 0x00007ffff23c0d63 in std::function<void (ndn::Interest const&, ndn::Data&)>::operator()(ndn::Interest const&, ndn::Data&)
const (this=0x98f2a648, __args#0=..., __args#1=...) at /usr/include/c++/4.8/functional:2471
#18 0x00007ffff23be299 in ndn::PendingInterest::invokeDataCallback (this=0x98f2a638, data=...) at ../src/ndnSIM/ndn-cxx/src/detail/pending-interest.hpp:81
#19 0x00007ffff23bf41d in ndn::Face::Impl::satisfyPendingInterests (this=0x5ad1ce20, data=...) at ../src/ndnSIM/ndn-cxx/src/detail/face-impl.hpp:140
#20 0x00007ffff23be88e in ndn::Face::Impl::NfdFace::sendData(ndn::Data const&)::{lambda()#1}::operator()() const (__closure=0x81b20ed0) at ../src/ndnSIM/ndn-cxx/src/detail/face-impl.hpp:91
#21 0x00007ffff23c3ddc in std::_Function_handler<void (), ndn::Face::Impl::NfdFace::sendData(ndn::Data const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...)
at /usr/include/c++/4.8/functional:2071
#22 0x00007ffff23c0e4a in std::function<void ()>::operator()() const (this=0x9ada39c0) at /usr/include/c++/4.8/functional:2471
#23 0x00007ffff2514135 in ns3::EventImpl* ns3::MakeEvent<void (std::function<void ()>::*)() const, std::function<void ()> >(void
(std::function<void ()>::*)() const, std::function<void ()>)::EventMemberImpl0::Notify() (this=0x9ada39b0) at ./ns3/make-event.h:323
With the help of coffee, gdb and fprintf I found the cause:
Every 300 seconds the RIBManager queries /localhost/nfd/faces/list.
Then, FaceManager::listFaces()
reacts on this call, and calls m_faceStatusPublisher.publish()
;
m_faceStatusPublisher
is a "class FaceStatusPublisher : public SegmentPublisher<AppFace>
", and overwrites the generate() method.
Now this generate method does the following:
For each face, collect statistics and wire-decode them into outBuffer (see face-status-publisher.cpp#L47-L55 )
So far, so good. The SegmentPublisher uses outBuffer (see segment-publisher.hpp#L76 ), segmentizes it (at a static size of 4400 byte - not optimal, but this is a different story), and then calls publishSegment(data) for EACH segment (see segment-publisher.hpp#L104 ).
publishSegment, on the other hand, signs the packet and forwards it to the face.
There is nothing wrong with this behaviour, at least for now. If there are many faces, then the information is segmentized and everything works perfect (fyi, each face consumes roughly 70 bytes of information).
However, when there is HEAVY traffic flowing over this node and there is no ContentStore active (this scenario seems plausible to me for nodes with only forwarding-capacity, but not much memory/diskspace for managing a content-store), then something weird happens.
The publish() method is invoked multiple times, mainly because when the Interest "/localhost/nfd/faces/list/segmentXYZ" (where XYZ is the segment number) is issued, and not found in the ContentStore, it has to re-generate the whole packet (which is also quite time-consuming).
Now, what if the Face statistics have changed? The TLV encoding will change as well. Therefore, when re-assembling the packets in RibManager (see rib-manager.cpp#L648-L717 ), we get a TLV error.
Trust me, this is not an easy bug to find, and I'm also not sure whether this effects NFD in general or only ndnSIM. The easiest way to try out is to spawn many many virtual faces (e.g., by installing many applications, like I do in the example scenario), and disabling the content store. Then there needs to be some traffic flowing (preferably a lot).
Anyway, I believe the bug is easy to fix by holding a separate instance of whatever value is trying to be published in SegmentPublisher based on freshnessPeriod (I will provide a patch file ASAP).
In addition, I think that this TlvError should be caught by a try / catch block in RibManager, and an adequate error message should be printed (instead of ndnSIM or potentially NFD crashing).
Files
Updated by Christian Kreuzberger about 9 years ago
- File NFD_tlv_proposed.patch NFD_tlv_proposed.patch added
Proposed patch attached:
- added a try-catch block NFD/rib/rib-manager.cpp to prevent the application from crashing
- added a "timeout" based on freshness period in NFD/core/segment-publisher. Renamed
m_buffer
tom_encodingBuffer
and made it a class member variable, rather than a local variable.
Tests are looking good so far.
- The try-catch block prevents the application from crashing (though it also prevents the face-list to be updated).
- The updated segment-publisher seems to be working as expected, as the Tlv Exception is no longer appearing.
As already mentioned, I believe that this problem COULD also appear outside of ndnSIM (as it is related to NFD), but I haven't got the resources (time, computers, ...) to test this. Feel free to forward any information to the NFD redmine.
HTH
Christian
Updated by Christian Kreuzberger about 9 years ago
Initial testing has shown that the patch potentially leads to a memory leak. Use the previous patch with care for now.
Updated by Junxiao Shi about 9 years ago
- Description updated (diff)
Updated by Alex Afanasyev over 6 years ago
- Target version changed from 2.1 to 2.7
Need to check if the error still exists