Bug #3326
openndn::tlv::Error (TLV length exceeds buffer length) when calling /localhost/nfd/faces/list without caching
0%
Description
In my scenario (see attached file) I get ndn::tlv::Error (TLV length exceeds buffer length). The gdb stack trace looks as follows:
#5 0x00007fffe8e06922 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6 0x00007ffff23a7814 in boost::throw_exception<boost::exception_detail::error_info_injector<ndn::tlv::Error>
> (e=...) at /usr/include/boost/throw_exception.hpp:67
#7 0x00007ffff23a6e21 in boost::exception_detail::throw_exception_<ndn::tlv::Error> (x=..., current_function=0x7ffff26e33f0 <ndn::Block::parse() const::__PRETTY_FUNCTION__>
"void ndn::Block::parse() const", file=0x7ffff26e2990 "../src/ndnSIM/ndn-cxx/src/encoding/block.cpp", line=340) at /usr/include/boost/throw_exception.hpp:84
#8 0x00007ffff23a9db1 in ndn::Block::parse (this=0x7fffffffc1e0) at ../src/ndnSIM/ndn-cxx/src/encoding/block.cpp:340
#9 0x00007ffff23f995c in ndn::nfd::FaceStatus::wireDecode (this=0x7fffffffc170, block=...) at ../src/ndnSIM/ndn-cxx/src/management/nfd-face-status.cpp:121
#10 0x00007ffff23f9738 in ndn::nfd::FaceStatus::FaceStatus (this=0x7fffffffc170, block=...) at ../src/ndnSIM/ndn-cxx/src/management/nfd-face-status.cpp:49
#11 0x00007ffff26059a8 in nfd::rib::RibManager::removeInvalidFaces (this=0x5ad20018, buffer=...) at ../src/ndnSIM/NFD/rib/rib-manager.cpp:704
#12 0x00007ffff260557b in nfd::rib::RibManager::fetchSegments (this=0x5ad20018, data=..., buffer=...) at ../src/ndnSIM/NFD/rib/rib-manager.cpp:679
#13 0x00007ffff2617f41 in boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >::operator()
(this=0xa021be20, p=0x5ad20018, a1=..., a2=...) at /usr/include/boost/bind/mem_fn_template.hpp:280
#14 0x00007ffff2616040 in boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> >
>::operator()<boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >, boost::_bi::list2<ndn::Interest const&, ndn::Data&> >
(this=0xa021be30, f=..., a=...) at /usr/include/boost/bind/bind.hpp:392
#15 0x00007ffff2612c44 in boost::_bi::bind_t<void, boost::_mfi::mf2<void, nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >,
boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> > >
>::operator()<ndn::Interest, ndn::Data> (this=0xa021be20, a1=..., a2=...) at /usr/include/boost/bind/bind_template.hpp:76
#16 0x00007ffff260eeb2 in std::_Function_handler<void (ndn::Interest const&, ndn::Data&), boost::_bi::bind_t<void, boost::_mfi::mf2<void,
nfd::rib::RibManager, ndn::Data const&, std::shared_ptr<ndn::OBufferStream> >,
boost::_bi::list3<boost::_bi::value<nfd::rib::RibManager*>, boost::arg<2>, boost::_bi::value<std::shared_ptr<ndn::OBufferStream> > >
> >::_M_invoke(std::_Any_data const&, ndn::Interest const&, ndn::Data&) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/4.8/functional:2071
#17 0x00007ffff23c0d63 in std::function<void (ndn::Interest const&, ndn::Data&)>::operator()(ndn::Interest const&, ndn::Data&)
const (this=0x98f2a648, __args#0=..., __args#1=...) at /usr/include/c++/4.8/functional:2471
#18 0x00007ffff23be299 in ndn::PendingInterest::invokeDataCallback (this=0x98f2a638, data=...) at ../src/ndnSIM/ndn-cxx/src/detail/pending-interest.hpp:81
#19 0x00007ffff23bf41d in ndn::Face::Impl::satisfyPendingInterests (this=0x5ad1ce20, data=...) at ../src/ndnSIM/ndn-cxx/src/detail/face-impl.hpp:140
#20 0x00007ffff23be88e in ndn::Face::Impl::NfdFace::sendData(ndn::Data const&)::{lambda()#1}::operator()() const (__closure=0x81b20ed0) at ../src/ndnSIM/ndn-cxx/src/detail/face-impl.hpp:91
#21 0x00007ffff23c3ddc in std::_Function_handler<void (), ndn::Face::Impl::NfdFace::sendData(ndn::Data const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...)
at /usr/include/c++/4.8/functional:2071
#22 0x00007ffff23c0e4a in std::function<void ()>::operator()() const (this=0x9ada39c0) at /usr/include/c++/4.8/functional:2471
#23 0x00007ffff2514135 in ns3::EventImpl* ns3::MakeEvent<void (std::function<void ()>::*)() const, std::function<void ()> >(void
(std::function<void ()>::*)() const, std::function<void ()>)::EventMemberImpl0::Notify() (this=0x9ada39b0) at ./ns3/make-event.h:323
With the help of coffee, gdb and fprintf I found the cause:
Every 300 seconds the RIBManager queries /localhost/nfd/faces/list.
Then, FaceManager::listFaces()
reacts on this call, and calls m_faceStatusPublisher.publish()
;
m_faceStatusPublisher
is a "class FaceStatusPublisher : public SegmentPublisher<AppFace>
", and overwrites the generate() method.
Now this generate method does the following:
For each face, collect statistics and wire-decode them into outBuffer (see face-status-publisher.cpp#L47-L55 )
So far, so good. The SegmentPublisher uses outBuffer (see segment-publisher.hpp#L76 ), segmentizes it (at a static size of 4400 byte - not optimal, but this is a different story), and then calls publishSegment(data) for EACH segment (see segment-publisher.hpp#L104 ).
publishSegment, on the other hand, signs the packet and forwards it to the face.
There is nothing wrong with this behaviour, at least for now. If there are many faces, then the information is segmentized and everything works perfect (fyi, each face consumes roughly 70 bytes of information).
However, when there is HEAVY traffic flowing over this node and there is no ContentStore active (this scenario seems plausible to me for nodes with only forwarding-capacity, but not much memory/diskspace for managing a content-store), then something weird happens.
The publish() method is invoked multiple times, mainly because when the Interest "/localhost/nfd/faces/list/segmentXYZ" (where XYZ is the segment number) is issued, and not found in the ContentStore, it has to re-generate the whole packet (which is also quite time-consuming).
Now, what if the Face statistics have changed? The TLV encoding will change as well. Therefore, when re-assembling the packets in RibManager (see rib-manager.cpp#L648-L717 ), we get a TLV error.
Trust me, this is not an easy bug to find, and I'm also not sure whether this effects NFD in general or only ndnSIM. The easiest way to try out is to spawn many many virtual faces (e.g., by installing many applications, like I do in the example scenario), and disabling the content store. Then there needs to be some traffic flowing (preferably a lot).
Anyway, I believe the bug is easy to fix by holding a separate instance of whatever value is trying to be published in SegmentPublisher based on freshnessPeriod (I will provide a patch file ASAP).
In addition, I think that this TlvError should be caught by a try / catch block in RibManager, and an adequate error message should be printed (instead of ndnSIM or potentially NFD crashing).
Files