Project

General

Profile

Bug #4520

NLSR crashing every ~1800 seconds

Added by Ashlesh Gawande over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Immediate
Target version:
Start date:
02/27/2018
Due date:
% Done:

100%

Estimated time:

Description

Not sure if related to LSA refresh time because I put the refresh time as 500 seconds and still NLSR crashes around 1800 seconds.
Can be reproduced in two node topology with HR and security turned on. Just bring up the topology and have it running for 30 minutes and then check if nlsr is still running.
(We should also turn on some long time experiment on the bot).

#1

Updated by Ashlesh Gawande over 3 years ago

I think we are trying to erase an LSA from map that is not there:
https://github.com/named-data/NLSR/blob/0b27c1ff5ccf970a5f15448eed12e3adacd4ad2d/src/lsa-segment-storage.cpp?utf8=%E2%9C%93#L141

Muktadir Chowdhury if we delete Old LSAs after getting a newer LSA do we still need to schedule LSA segment deletion?

1519782731.997892 DEBUG: [nlsr.route.FaceMap] ------- Face Map-----------
1519782731.997900 DEBUG: [nlsr.route.FaceMap] Face Map Entry (FaceUri: udp4://1.0.0.10:6363 Face Id: 277)
1519782731.997905 DEBUG: [nlsr.route.FaceMap] Face Map Entry (FaceUri: udp4://1.0.0.14:6363 Face Id: 279)
1519782731.997911 DEBUG: [nlsr.route.FaceMap] Face Map Entry (FaceUri: udp4://1.0.0.18:6363 Face Id: 281)
1519782731.997915 DEBUG: [nlsr.route.FaceMap] Face Map Entry (FaceUri: udp4://1.0.0.2:6363 Face Id: 273)
1519782731.997919 DEBUG: [nlsr.route.FaceMap] Face Map Entry (FaceUri: udp4://1.0.0.6:6363 Face Id: 275)
ASAN:SIGSEGV
=================================================================
==2477==ERROR: AddressSanitizer: SEGV on unknown address 0x601ffffffff0 (pc 0x7feb77252580 sp 0x7fff37105b28 bp 0x000000000000 T0)
    #0 0x7feb7725257f (/usr/local/lib/libndn-cxx.so.0.6.1+0x17c57f)
    #1 0x7feb772f8c20 in unsigned long ndn::name::Component::wireEncode<(ndn::encoding::Tag)0>(ndn::encoding::EncodingImpl<(ndn::encoding::Tag)0>&) const ../src/name-component.cpp:424
    #2 0x7feb772ff7ae in unsigned long ndn::Name::wireEncode<(ndn::encoding::Tag)0>(ndn::encoding::EncodingImpl<(ndn::encoding::Tag)0>&) const ../src/name.cpp:135
    #3 0x7feb772fbd92 in ndn::Name::wireEncode() const ../src/name.cpp:152
    #4 0x7feb772fc840 in std::hash<ndn::Name>::operator()(ndn::Name const&) const ../src/name.cpp:341
    #5 0x4ac8b5 in std::__detail::_Hash_code_base<ndn::Name, std::pair<ndn::Name const, ndn::Data>, std::__detail::_Select1st, std::hash<ndn::Name>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, true>::_M_hash_code(ndn::Name const&) const /usr/include/c++/4.9/bits/hashtable_policy.h:1261
    #6 0x4ac8b5 in std::_Hashtable<ndn::Name, std::pair<ndn::Name const, ndn::Data>, std::allocator<std::pair<ndn::Name const, ndn::Data> >, std::__detail::_Select1st, std::equal_to<ndn::Name>, std::hash<ndn::Name>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase(std::integral_constant<bool, true>, ndn::Name const&) /usr/include/c++/4.9/bits/hashtable.h:1813
    #7 0x49f14b in std::_Hashtable<ndn::Name, std::pair<ndn::Name const, ndn::Data>, std::allocator<std::pair<ndn::Name const, ndn::Data> >, std::__detail::_Select1st, std::equal_to<ndn::Name>, std::hash<ndn::Name>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::erase(ndn::Name const&) /usr/include/c++/4.9/bits/hashtable.h:741
    #8 0x49f14b in std::unordered_map<ndn::Name, ndn::Data, std::hash<ndn::Name>, std::equal_to<ndn::Name>, std::allocator<std::pair<ndn::Name const, ndn::Data> > >::erase(ndn::Name const&) /usr/include/c++/4.9/bits/unordered_map.h:500
    *#9 0x49f14b in operator() ../src/lsa-segment-storage.cpp:141*
    #10 0x49f14b in _M_invoke /usr/include/c++/4.9/functional:2039
    #11 0x7feb7745db48 in std::function<void ()>::operator()() const /usr/include/c++/4.9/functional:2440
    #12 0x7feb7745db48 in ndn::util::scheduler::Scheduler::executeEvent(boost::system::error_code const&) ../src/util/scheduler.cpp:161
    #13 0x7feb7745e0c3 in operator()<const boost::system::error_code&, void> /usr/include/c++/4.9/functional:569
    #14 0x7feb7745e0c3 in __call<void, const boost::system::error_code&, 0ul, 1ul> /usr/include/c++/4.9/functional:1264
    #15 0x7feb7745e0c3 in operator()<const boost::system::error_code&, void> /usr/include/c++/4.9/functional:1323
    #16 0x7feb7745e0c3 in boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::util::scheduler::Scheduler::*)(boost::system::error_code const&)> (ndn::util::scheduler::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code>::operator()() /usr/include/boost/asio/detail/bind_handler.hpp:47
    #17 0x7feb7745e0c3 in asio_handler_invoke<boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::util::scheduler::Scheduler::*)(const boost::system::error_code&)>(ndn::util::scheduler::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code> > /usr/include/boost/asio/handler_invoke_hook.hpp:69
    #18 0x7feb7745e0c3 in invoke<boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::util::scheduler::Scheduler::*)(const boost::system::error_code&)>(ndn::util::scheduler::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code>, std::_Bind<std::_Mem_fn<void (ndn::util::scheduler::Scheduler::*)(const boost::system::error_code&)>(ndn::util::scheduler::Scheduler*, std::_Placeholder<1>)> > /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
    #19 0x7feb7745e0c3 in boost::asio::detail::wait_handler<std::_Bind<std::_Mem_fn<void (ndn::util::scheduler::Scheduler::*)(boost::system::error_code const&)> (ndn::util::scheduler::Scheduler*, std::_Placeholder<1>)> >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) /usr/include/boost/asio/detail/wait_handler.hpp:70
    #20 0x7feb772768d0 in boost::asio::detail::task_io_service_operation::complete(boost::asio::detail::task_io_service&, boost::system::error_code const&, unsigned long) /usr/include/boost/asio/detail/task_io_service_operation.hpp:38
    #21 0x7feb772768d0 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
    #22 0x7feb772768d0 in boost::asio::detail::task_io_service::run(boost::system::error_code&) /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
    #23 0x7feb7726ec52 in boost::asio::io_service::run() /usr/include/boost/asio/impl/io_service.ipp:59
    #24 0x7feb7726ec52 in ndn::Face::doProcessEvents(boost::chrono::duration<long, boost::ratio<1l, 1000l> >, bool) ../src/face.cpp:339
    #25 0x50a20a in ndn::Face::processEvents(boost::chrono::duration<long, boost::ratio<1l, 1000l> >, bool) /usr/local/include/ndn-cxx/face.hpp:454
    #26 0x50a20a in nlsr::Nlsr::startEventLoop() ../src/nlsr.cpp:681
    #27 0x4fad57 in nlsr::NlsrRunner::run() ../src/nlsr-runner.cpp:61
    #28 0x41cfce in main ../src/main.cpp:78
    #29 0x7feb75c02f44 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21f44)
    #30 0x41c6e8 (/usr/local/bin/nlsr+0x41c6e8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV ??:0 ??
==2477==ABORTING

#2

Updated by Ashlesh Gawande over 3 years ago

Also I can't find the default value of m_lsaDeletionTimepoint (ndn::time::seconds) anywhere! So if it is set to zero by default the deletion should take place immediately.

#3

Updated by Ashlesh Gawande over 3 years ago

Okay I found out that it is done in 1800 seconds. This time is set in lsdb from confParameter at constructor time so it remains as default 1800 seconds (owing to the bad design of NLSR configuration processing).

#4

Updated by Ashlesh Gawande over 3 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

Solution was to capture lsaSegment by value in lambda to avoid the segfault. As a part of this issue I also use the LSA expiration point to schedule the deletion of lsaSegment instead of our own LSA refresh time.

#5

Updated by Ashlesh Gawande over 3 years ago

  • Target version set to Minor release 0.4.2

Also available in: Atom PDF