Project

General

Profile

Actions

Bug #4266

closed

NFD won't start correctly on Vagrant integration testing environment

Added by Eric Newberry about 7 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Integration Tests
Target version:
-
Start date:
09/05/2017
Due date:
% Done:

0%

Estimated time:

Description

When I attempt to start NFD in the Vagrant integration testing environment, NFD exists also immediately without printing an error. However, the following appears in syslog:

kernel: [136608.984802] traps: nfd[12548] general protection ip:7f28ede87d77 sp:7ffc3fe08fd0 error:0 in libndn-cxx.so.0.5.1[7f28edc7b000+3e3000]

Strangely, this issue only appears when running nfd through sudo. When running it directly as root, it does not exit unexpectedly.

I was unable to repeat this issue on another instance of Ubuntu 14.04 x64 (what the Vagrant integration testing environment runs), so it seems like something specific to the integration testing environment is causing the failure.


Related issues 1 (0 open1 closed)

Is duplicate of NFD - Bug #4274: NFD quits if started with nfd-start without KeyChainRejected

Actions
Actions #1

Updated by Davide Pesavento about 7 years ago

What is this "Vagrant integration testing environment"? What's special about it?

Can you run nfd under gdb and print a bt when it crashes? If that doesn't show anything useful, try strace. Make sure you have debug symbols installed, or that nfd and ndn-cxx are built in debug mode.

Actions #2

Updated by Eric Newberry about 7 years ago

I already tried gdb and didn't get anything useful. strace showed me that the NFD continually attempts to open the non-existent ~/.ndn/pib.db-wal file. Deleting the ~/.ndn directory allows NFD to start successfully; however, any future runs still crash.

Actions #3

Updated by Junxiao Shi about 7 years ago

  • Is duplicate of Bug #4274: NFD quits if started with nfd-start without KeyChain added
Actions #4

Updated by Eric Newberry about 7 years ago

It seems that I've prevented this crash from happening by modfying ndn-cxx. I looked at the output of gdb again and found that on backtrace line #5 (below), the shared_ptr to the received Interest is empty. I added a check within ndn::Face::Impl::processIncomingInterest to return if the shared_ptr is empty and NFD no longer appears to be crashing upon start.

Program received signal SIGSEGV, Segmentation fault.
ndn::name::Component::equals (this=0x638580, other=...) at ../src/name-component.cpp:389
389              value_size() == other.value_size() &&
(gdb) bt
#0  ndn::name::Component::equals (this=0x638580, other=...) at ../src/name-component.cpp:389
#1  0x00007ffff7be6ed4 in operator!= (other=..., this=<optimized out>) at ../src/name-component.hpp:510
#2  ndn::Name::isPrefixOf (this=this@entry=0x6384f8, other=...) at ../src/name.cpp:268
#3  0x00007ffff7b7cbf1 in ndn::InterestFilter::doesMatch (this=0x6384f8, name=...) at ../src/interest-filter.cpp:60
#4  0x00007ffff7b72039 in doesMatch (name=..., this=<optimized out>) at ../src/detail/interest-filter-record.hpp:65
#5  ndn::Face::Impl::processIncomingInterest (this=0x62f658, interest=std::shared_ptr (empty) 0x0) at ../src/detail/face-impl.hpp:205
#6  0x00007ffff7b66144 in ndn::Face::onReceiveElement (this=0x6272f8, blockFromDaemon=...) at ../src/face.cpp:405
#7  0x000000000045d8f1 in operator() (__closure=0x7fffffffde40) at ../daemon/face/internal-transport.cpp:80
#8  asio_handler_invoke<nfd::face::asyncReceive(nfd::face::InternalTransportBase*, const ndn::Block&)::__lambda1> (
    function=<error reading variable: access outside bounds of object referenced via synthetic pointer>)
    at /usr/include/boost/asio/handler_invoke_hook.hpp:64
#9  invoke<nfd::face::asyncReceive(nfd::face::InternalTransportBase*, const ndn::Block&)::__lambda1, nfd::face::asyncReceive(nfd::face::InternalTransportBase*, const ndn::Block&)::__lambda1> (context=..., function=...)
    at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
[trace continues]
Actions #5

Updated by Alex Afanasyev about 7 years ago

Ehm... I don't understand how it can be possible? Some weird compiler optimizations?

      auto interest = make_shared<Interest>(netPacket);
      if (false) {
        ...
      }
      else {
        ...
        m_impl->processIncomingInterest(std::move(interest));
     }

Actions #6

Updated by Eric Newberry about 7 years ago

It looks like removing std::move in the call in face.cpp also fixes it.

Actions #7

Updated by Ashlesh Gawande about 7 years ago

I am having some related? trouble in ChronoSync upon receiving data from face:
https://github.com/named-data/ChronoSync/blob/8dfa63c1dd70394c31ac07262808b185d54213c8/src/logic.cpp#L351
data.shared_from_this() throws bad weak_ptr error.

Actions #8

Updated by Davide Pesavento about 7 years ago

Ehm... I don't understand how it can be possible? Some weird compiler optimizations?

I don't understand either.

@Eric, is the above from a clean (from scratch) build of ndn-cxx and NFD?

Actions #9

Updated by Eric Newberry about 7 years ago

Davide Pesavento wrote:

Ehm... I don't understand how it can be possible? Some weird compiler optimizations?

I don't understand either.

@Eric, is the above from a clean (from scratch) build of ndn-cxx and NFD?

I made the change to the already-built ndn-cxx source, compiled the changes, and reinstalled. NFD should automatically use the new library without a rebuild because ndn-cxx is built as a shared library.

Actions #10

Updated by Davide Pesavento about 7 years ago

What I'm asking is simply if you tried a ./waf distclean before rebuilding everything.

Actions #11

Updated by Eric Newberry about 7 years ago

Davide Pesavento wrote:

What I'm asking is simply if you tried a ./waf distclean before rebuilding everything.

I did a distclean and a clean rebuild for both NFD and ndn-cxx with the change in note-6 and it looks like the issue is gone.

Actions #12

Updated by Davide Pesavento about 7 years ago

We already established that the change in note-6 fixes it. I was asking about current git HEAD ndn-cxx, built from scratch (./waf distclean), without any additional changes.

Actions #13

Updated by Eric Newberry about 7 years ago

Davide Pesavento wrote:

We already established that the change in note-6 fixes it. I was asking about current git HEAD ndn-cxx, built from scratch (./waf distclean), without any additional changes.

To my knowledge, the integration tests code doesn't change the ndn-cxx or NFD source before building, but I'll run it again just to be sure.

Actions #14

Updated by Eric Newberry about 7 years ago

I just built the git HEAD version of ndn-cxx from scratch and still encountered the issue above (with a previously-compiled version of NFD). The latest version of NFD fails to compile with the latest version of ndn-cxx, as mentioned on the ndn-interest mailing list. The failure is because the latest commit to ndn-cxx introduced a function isLessSevere into the ndn::lp namespace. A function of the same name already exists in the nfd::fw namespace and is ambiguous in one source file. Junxiao has submitted a patch to fix this (https://gerrit.named-data.net/#/c/4175/).

With NFD patched to fix the isLessSevere issue and the current git HEAD version of ndn-cxx, I still encounter the segfault issue.

Actions #15

Updated by Eric Newberry about 7 years ago

  • Status changed from New to Feedback

I recreated the Vagrant integration testing environment from scratch with the latest git HEAD versions of ndn-cxx and NFD and this issue does not appear to be present.

Actions #16

Updated by Eric Newberry about 7 years ago

  • Status changed from Feedback to Closed

I set up another Vagrant integration testing environment from scratch and still didn't encounter this issue, so I'm going to go ahead and close it.

Actions

Also available in: Atom PDF