Project

General

Profile

Actions

Bug #4215

closed

NLSR advertise functionality is broken

Added by Ashlesh Gawande over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Target version:
Start date:
08/01/2017
Due date:
% Done:

100%

Estimated time:

Description

nlsrc advertise /hello
Request timed out (code: 10060, error: Timeout)

NLSR terminates trying to publish the sync update:

1501617012.454 TRACE: [PrefixUpdateProcessor] Received Interest: /localhost/nlsr/prefix-update/advertise/h%09%07%07%08%05hello/%00%00%01%5D%9FY%3E%DA/%A9u%24%03%3F%1C%88%F2/%16%3C%1B%01%01%1C7%075%08%0Ctmp-identity%08%04V%F9%01%87%08%03KEY%08%11ksk-1501616584383%08%07ID-CERT/%17%FD%01%00%19%DB%5D%01X%BC%27%DA%D1os%E5%1D-%40%23%F8%CB%5B%06%A2%7Di%D1W%24%8Eu%1C%8C%27%C7%F2e%00%D1%E0%7D%ABH%AD%C5J%D60%16%C0%BA%C0%F41%AC%A4%40%AB%29%20%E0%E3G%5DHK%89%EF%DBQ%5E%91Ei%29%07%B5%0F%1F%84k%40%08V%E1%85%25d%3BN%D6%85I%E6N%D7%F3Nh%0Fb%29%23%F5%18%05%15%AB%20%04%A7%3C%40%E1%0A%DE%10%F3iG%20%26C%9F%F5Xv%81F%BDJ%27%EB%22p%15%19%F6L%F5%C9%28Z%88%82%2AMF%3F%F1%D3%DE1%FFVB%0C%D5M%D9Y%95%3D%D0%D3%9C%FD%E2%DB%90k%B8%F8%CA%B5r%F6Q%CA%5B%7Dt%EB%02%8B-%CE%C0%C6%08J%E1%08%5C%CE%3D%86%0C%7D%C6%1F%A7%EE%F7%99%15%92%DD%CB%97OOo%90%3F%F3+%1D%D1%07%1C%25W%C2%3F%D6Xn%81%88%BF%1FaA%D5%11FN53%C2%04%16%D6%B3%C4%7F+%D5%E4%B0%97%E6%18%F5f%81%A5T?ndn.MustBeFresh=1&ndn.Nonce=279280279
1501617012.454 INFO: [PrefixUpdateProcessor] Advertising name: /hello
1501617012.454 DEBUG: [SequencingManager] ----SequencingManager----
1501617012.454 DEBUG: [SequencingManager] Adj LSA seq no: 1
1501617012.454 DEBUG: [SequencingManager] Cor LSA Seq no: 0
1501617012.454 DEBUG: [SequencingManager] Name LSA Seq no: 2
1501617012.476 DEBUG: [SyncLogicHandler] Publishing Sync Update. Prefix: /localhop/ndn/NLSR/LSA/a-site/%C1.Router/cs/a/name Seq No: 2

Single NLSR works, but in Mini-NDN there is an error in advertising.

Actions #1

Updated by Laqin Fan over 6 years ago

  • Assignee changed from Laqin Fan to Ashlesh Gawande
Thread 1 "nlsr" received signal SIGSEGV, Segmentation fault.
__gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x10a) at /usr/include/c++/5/ext/atomicity.h:96
96        __atomic_add(__mem, __val);
(gdb) bt full
#0  __gnu_cxx::__atomic_add_dispatch (__val=1, __mem=0x10a) at /usr/include/c++/5/ext/atomicity.h:96
No locals.
#1  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_add_ref_copy (this=0x102) at /usr/include/c++/5/bits/shared_ptr_base.h:134
No locals.
#2  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count (__r=..., this=0x7fffffffb458) at /usr/include/c++/5/bits/shared_ptr_base.h:666
No locals.
#3  std::__shared_ptr<ndn::Buffer const, (__gnu_cxx::_Lock_policy)2>::__shared_ptr (this=0x7fffffffb450)
    at /usr/include/c++/5/bits/shared_ptr_base.h:923
No locals.
#4  std::shared_ptr<ndn::Buffer const>::shared_ptr (this=0x7fffffffb450) at /usr/include/c++/5/bits/shared_ptr.h:107
No locals.
#5  ndn::Block::Block (this=0x7fffffffb450) at /usr/local/include/ndn-cxx/encoding/block.hpp:43
No locals.
#6  ndn::Name::Name (this=0x7fffffffb440) at /usr/local/include/ndn-cxx/name.hpp:46
No locals.
#7  nlsr::SyncLogicHandler::publishSyncUpdate (this=this@entry=0x7fffffffd0d8, updatePrefix=..., seqNo=14)
    at ../src/communication/sync-logic-handler.cpp:227
        __PRETTY_FUNCTION__ = "void nlsr::SyncLogicHandler::publishSyncUpdate(const ndn::Name&, uint64_t)"
        updateName = {<std::enable_shared_from_this<ndn::Name>> = {_M_weak_this = std::weak_ptr (empty) 0x0}, static npos = 18446744073709551615, 
          m_nameBlock = {m_buffer = <error reading variable: Cannot access memory at address 0x10a>, m_type = 7910488, m_begin = 72 'H', 
            m_end = <error reading variable>}
        data = "\220\235x\000\000\000\000\000ng Sync Update. Prefix: /localhop/ndn/NLSR/LSA/a-site/%C1.Router/cs/a/name Seq No: 14"
#8  0x0000000000430776 in nlsr::SyncLogicHandler::publishRoutingUpdate (this=this@entry=0x7fffffffd0d8, type=..., seqNo=@0x7fffffffb678: 14)
    at ../src/communication/sync-logic-handler.cpp:194
        __PRETTY_FUNCTION__ = "void nlsr::SyncLogicHandler::publishRoutingUpdate(const ndn::Name&, const uint64_t&)"
#9  0x000000000045cade in nlsr::Lsdb::buildAndInstallOwnNameLsa (this=0x7fffffffd0c8) at ../src/lsdb.cpp:160
Actions #2

Updated by Ashlesh Gawande over 6 years ago

  • Description updated (diff)

Simply constructing Name with toUri instead of Name reference fixes the error:
https://github.com/named-data/NLSR/blob/3909aa160a0e7edb5d857e2849e833765191124a/src/communication/sync-logic-handler.cpp#L226

git bisect shows that error was introduces in NLSR face discovery change - not sure how since the change does not touch lsdb or sync logic handler.
This is the link to the NLSR face discovery change:
https://gerrit.named-data.net/#/c/3304/

Actions #3

Updated by Ashlesh Gawande over 6 years ago

  • Status changed from New to Code review
Actions #4

Updated by Ashlesh Gawande over 6 years ago

I started undoing one by one the changes done in NLSR Face discovery patch and discovered that commenting out the redundant setting of faceId in fib.cpp fixes the NLSR crash on advertise.
https://github.com/named-data/NLSR/blob/820bb66350b1559b789ede184d4fb9432e0f166b/src/route/fib.cpp#L298

After discussion with Nick it seems like:
Checking that the adjacent who's faceURI we are searching for in the adjacency list exists before setting its face Id fixes the problem.

The fix:
https://gerrit.named-data.net/#/c/4079/3/src/route/fib.cpp

The strange part is problem was somewhere else but it crashed NLSR on a completely unrelated section (sync logic handler upon receiving advertise command interest).

Actions #5

Updated by Ashlesh Gawande over 6 years ago

The main problem is that https://github.com/named-data/NLSR/blob/820bb66350b1559b789ede184d4fb9432e0f166b/src/route/fib.cpp#L298 does not call the right findAdjacent.

faceUri is a string here which is converted to a name and findAdjacent(adjName) instead of findAdjacent(faceUri) is called which of course can't match the face uri!!!!!

Instead findAdjacent(ndn::util::FaceUri(faceUri)) must be used to call the right function and everything will proceed as normal.

We need to make the names of these function different and not overload them so that this problem does not repeat.

Actions #6

Updated by Ashlesh Gawande over 6 years ago

Error reproduced by unit test:
https://gerrit.named-data.net/#/c/4095/

The back trace is similar to the one that I got during integration testing (Mini-NDN) or real testing:

GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./build/unit-tests-nlsr...done.
(gdb) r
Starting program: /home/ashlesh/ndn-src/2NLSR/build/unit-tests-nlsr -t TestAdvertiseCrash/Basic
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Running 1 test case...
log4cxx: No appender could be found for logger (LsdbDatasetInterestHandler).
log4cxx: Please initialize the log4cxx system properly.
[New Thread 0x7ffff0eab700 (LWP 27292)]

Thread 1 "unit-tests-nlsr" received signal SIGSEGV, Segmentation fault.
0x000000000043a1c5 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x9) at /usr/include/c++/5/ext/atomicity.h:49
49    { return __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
(gdb) bt
#0  0x000000000043a1c5 in __gnu_cxx::__exchange_and_add (__val=-1, __mem=0x9) at /usr/include/c++/5/ext/atomicity.h:49
#1  __gnu_cxx::__exchange_and_add_dispatch (__val=-1, __mem=0x9) at /usr/include/c++/5/ext/atomicity.h:82
#2  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x1) at /usr/include/c++/5/bits/shared_ptr_base.h:147
#3  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7fffffffc390, __in_chrg=<optimized out>)
    at /usr/include/c++/5/bits/shared_ptr_base.h:659
#4  std::__shared_ptr<ndn::Buffer const, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7fffffffc388, __in_chrg=<optimized out>)
    at /usr/include/c++/5/bits/shared_ptr_base.h:925
#5  std::shared_ptr<ndn::Buffer const>::~shared_ptr (this=0x7fffffffc388, __in_chrg=<optimized out>) at /usr/include/c++/5/bits/shared_ptr.h:93
#6  ndn::Block::~Block (this=0x7fffffffc388, __in_chrg=<optimized out>) at /usr/local/include/ndn-cxx/encoding/block.hpp:43
#7  ndn::Name::~Name (this=0x7fffffffc378, __in_chrg=<optimized out>) at /usr/local/include/ndn-cxx/name.hpp:46
#8  0x000000000044680e in nlsr::SyncLogicHandler::~SyncLogicHandler (this=0x7fffffffc2e8, __in_chrg=<optimized out>)
    at /home/ashlesh/ndn-src/2NLSR/src/communication/sync-logic-handler.hpp:42
#9  nlsr::Lsdb::~Lsdb (this=0x7fffffffc2d8, __in_chrg=<optimized out>) at /home/ashlesh/ndn-src/2NLSR/src/lsdb.hpp:43
#10 nlsr::Nlsr::~Nlsr (this=0x7fffffffbea0, __in_chrg=<optimized out>) at /home/ashlesh/ndn-src/2NLSR/src/nlsr.hpp:62
#11 0x000000000050d99d in nlsr::update::test::AdvertiseCrashFixture::~AdvertiseCrashFixture (this=0x7fffffffbc80, __in_chrg=<optimized out>)
    at ../tests/update/test-advertise-crash.cpp:29
#12 nlsr::update::test::TestAdvertiseCrash::Basic::~Basic (this=0x7fffffffbc80, __in_chrg=<optimized out>)
    at ../tests/update/test-advertise-crash.cpp:70
#13 nlsr::update::test::TestAdvertiseCrash::Basic_invoker () at ../tests/update/test-advertise-crash.cpp:70
Actions #7

Updated by Ashlesh Gawande over 6 years ago

  • Status changed from Code review to Closed
  • Target version set to v0.4.0
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF