Project

General

Profile

Task #2489

Merge nrd into nfd

Added by Davide Pesavento over 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Category:
RIB
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

There's no reason to keep nrd as a separate process, but there are a few reasons to not have it separate (e.g. it's not really standalone). Therefore it should become a thread in the nfd process.

NFD Developer Guide should be updated as part of this issue.


Related issues

Related to NFD - Task #2496: Redesign initialization of nfd/nrdClosed

Related to NFD - Task #2513: Optimize multi-threaded logging using lock-free queue and separate threadClosed

Blocks NFD - Bug #2517: NFD state is not properly cleaned up on destructionNew

History

#1 Updated by Junxiao Shi over 4 years ago

NFD is single threaded as decided on 20140114 meeting.

NFD Developer Guide gives the following reason for implementing RIB Management as a separate process:

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

If you propose adding a thread to NFD for RIB Management, please give sufficient reasons to negate the above two decisions.

#2 Updated by Davide Pesavento over 4 years ago

Junxiao Shi wrote:

NFD is single threaded as decided on 20140114 meeting.

What was decided there is that the processing of one packet in NFD (forwarding, etc...) is single threaded. In other words, one single thread is responsible for the whole chain: reading the packet from the socket, decoding, name matching, table manipulation, etc... And I agree with this choice. If the forwarding process was handled by multiple threads, the overhead for locking and context switching would be too high.

What I'm proposing here is to move the RIB processing in a separate thread. This has nothing to do with packet forwarding, as you said yourself. Another piece of NFD that can be moved to a separate thread is data signing (see #2488).

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

Again, I agree. This remains true even if the RIB manager is a thread instead of a process.

#3 Updated by Alex Afanasyev over 4 years ago

Junxiao Shi wrote:

NFD is single threaded as decided on 20140114 meeting.

This type of single threadedness has no bearing on NRD and NFD be as part of one process, but different threads. It is merely an different organization, NFD is still completely independent and RIB manager will communicate with NFD with normal channels.

NFD Developer Guide gives the following reason for implementing RIB Management as a separate process:

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

If you propose adding a thread to NFD for RIB Management, please give sufficient reasons to negate the above two decisions.

The statement still holds true and is not altered. NFD and RIB management are still independent, just inside the same process. Having two separate processes complicate a lot of deployment issues.

#4 Updated by Junxiao Shi over 4 years ago

The statement still holds true and is not altered. NFD and RIB management are still independent, just inside the same process. Having two separate processes complicate a lot of deployment issues.

So they'll still communicate over UNIX sockets?
I could agree with that.
Please update issue Description to clarity.

What about a more generalized feature: allow any ndn-cxx application to be compiled as a thread of NFD process, by following a certain programming pattern.

Docker recommends running only one user process per container, and it would be nice if NFD, RIB Daemon, ping server, user application, and even routing daemon can run together.

#5 Updated by Lan Wang over 4 years ago

I'm confused. Can someone clarify what are the specific problems can be solved by merging nrd into nfd and still using the unix socket (and I assume Interest/Data exchange between them)?

#6 Updated by Alex Afanasyev over 4 years ago

Separated and yet highly coupled daemons is a big deployment (and some usability) complication. I have spend at least two days trying to figure out how to enable support for two daemons in homebrew framework. We still have unsolved issue on how to stop NFD when NRD dies. There are many other issues I can cite.

Unix socket based transport is a first step. In the long run I would prefer some other transport that communicates directly between threads.

#7 Updated by Davide Pesavento over 4 years ago

What Alex said. The two daemons are tightly coupled, and in 99% of the cases they depend on each other for proper operation. Moreover, having to deal with two processes is much more complicated in namespaced environments (LXC, docker, systemd-nspawn, ...)

#8 Updated by Alex Afanasyev over 4 years ago

  • Related to Task #2496: Redesign initialization of nfd/nrd added

#9 Updated by Alex Afanasyev over 4 years ago

  • Status changed from New to Code review
  • Assignee set to Alex Afanasyev
  • Target version set to v0.4
  • % Done changed from 0 to 100

#10 Updated by Junxiao Shi over 4 years ago

NFD Developer Guide needs updated as part of this Task.

#11 Updated by Alex Afanasyev over 4 years ago

Current implementation in http://gerrit.named-data.net/#/c/1740/8 raises concern about thread safety: logging, io, scheduler.

Here are my current answers:

  • io_service is designed to be thread safe, so no problem
  • scheduler: I prefer not having global state where it is possible. The linked patchset explicitly introduces a single scheduler that is passed around in RIB management. RIB management and NFD itself are run using separate threads (separate io_services), so there is no problem for thread safety
  • logging: this is potential concern for thread safety as we are currently using std::clog for log operations. I currently have no solution for thread safety and would like to postpone the solution until later commits. I don't even know what should be the solution.

#12 Updated by Junxiao Shi over 4 years ago

Thread Local Storage is the solution for global scheduler. It's available natively in gcc48, and the Boost version can be used before that.

It's easier to write nfd::scheduler::schedule than passing Scheduler instance around, and nothing stops a developer from making the mistake of using nfd::scheduler::schedule in RIB daemon code.

Logger is a big problem because many (of my) experiments rely on parsing NFD logs, and an interleaved log makes all these parsing difficult or impossible.

I disagree with leaving this problem for a future commit, because it breaks those experiments with no easy fix.

A naive solution is to add a barrier before every log write operation.
Performance is bad, but correctness is more important than performance.

A better solution is:

  • NFD and RIB threads write logs, line by line, into a lock-free queue
  • introduce a third thread that pulls log lines from both queues, and writes them into std::clog

#13 Updated by Alex Afanasyev over 4 years ago

In first updated commit I will just add std::lock_guard to ensure synchronization for log messages. I will consider lock-free queue and separate logging thread as part of a separate issue.

#14 Updated by Alex Afanasyev over 4 years ago

  • Related to Task #2513: Optimize multi-threaded logging using lock-free queue and separate thread added

#15 Updated by Junxiao Shi over 4 years ago

What's the plan for global io_service and scheduler? I disagree with forcing RIB daemon to pass around its own instances. Please use thread local storage.

#16 Updated by Alex Afanasyev over 4 years ago

Oh. I will push the commit in a few minutes. With explicit io_service instance things were much simpler, but I think I got it right now.

The only "issue" I'm having is that I'm not sure how to properly test thread separation.

#17 Updated by Junxiao Shi over 4 years ago

To test thread separation, make a test case:

  1. declare two io_service pointers and initialize to nullptr
  2. fork a thread: this thread assigns its global io_service to first pointer, and then exits
  3. in main thread, assign global io_service to second pointer, then join the child thread
  4. compare the two pointers are not equal

Changes of global io_service and scheduler should appear in a commit before actually merging the two processes.

Changes of logger also need it's own commit.

#18 Updated by Alex Afanasyev over 4 years ago

Should I add printing out thread id in the log messages? It would look something like:

1423717421.858982 0x94949291 TRACE: [NetworkInterfaceInfo] bridge0: added Ethernet address 3e:15:c2:8b:65:00

(0x94949291 a hex number unique for each thread).

Right now I see that this can be used to manually verify that nfd and nrd threads are really separated.

#19 Updated by Junxiao Shi over 4 years ago

Module name (such as "[NetworkInterfaceInfo]") is sufficient for operator to know which component is producing a log line.

Thread ID is not useful. It also changes log format (which is bad news for me because I would need to adjust all parsing scripts).

#20 Updated by Alex Afanasyev over 4 years ago

I'm ok not changing. It is just I don't know any other reliable means to verify that the commit I made does exactly what we intend to do.

#21 Updated by Junxiao Shi over 4 years ago

You could add one logger call after thread start to print Thread ID.

#22 Updated by Alex Afanasyev over 4 years ago

I think you're missing my point. With thread ID in log messages you can analyze in which thread each module is running, which can be used to judge whether thread separation is effective of not (e.g., all Rib* modules should be in different thread then all NFD-related things). I have manually verified that each individual thread is using own scheduler and io_service (and I had some problems initially).

Though I cannot disagree that this is only useful for code debugging and probably useless in general.

#23 Updated by Davide Pesavento over 4 years ago

We could introduce a command line argument to turn on/off logging of the thread ID, defaulting to off for backwards compatibility. A similar switch can be added for timestamp logging.

#24 Updated by Alex Afanasyev over 4 years ago

For now, this can be only a compile-time option (std::clog operations is in the header file). If we move it to .cpp, then we could be more flexible.

#25 Updated by Junxiao Shi over 4 years ago

  • Blocks Bug #2517: NFD state is not properly cleaned up on destruction added

#26 Updated by Davide Pesavento over 4 years ago

What about getGlobalRng()?

#27 Updated by Junxiao Shi over 4 years ago

What about getGlobalRng()?

Yes, this needs taken care of as part of this issue.
It doesn't have to be done before the "actual merge" commit, because for now RNG is used in forwarding only.

#28 Updated by Junxiao Shi over 4 years ago

  • Status changed from Code review to Closed

#29 Updated by Davide Pesavento over 4 years ago

Junxiao Shi wrote:

Davide Pesavento wrote:

What about getGlobalRng()?

Yes, this needs taken care of as part of this issue.

It shouldn't be closed then.

#30 Updated by Davide Pesavento over 4 years ago

Ah sorry, I didn't see change Ib17ea05592445cf99af9726ef7799ead813ca0fe.

#31 Updated by Davide Pesavento over 4 years ago

The NFD developer guide needs to be updated. In few places it is stated that NRD is a separate process, which is no longer correct. Section 7.6 can probably be deleted entirely.

#32 Updated by Junxiao Shi over 4 years ago

  • Description updated (diff)
  • Status changed from Closed to Feedback

#33 Updated by Davide Pesavento over 4 years ago

I noticed Alex made some changes but there are still several references to "NRD" which should be removed or changed to "RIB Manager/Management". Moreover, section 7.8 - "Termination" - should be deleted entirely.

#34 Updated by Alex Afanasyev over 4 years ago

I think I have addressed these comments in my last commit (also tagged as revision-4)

#35 Updated by Davide Pesavento over 4 years ago

  • Status changed from Feedback to Closed

Yes, closing.

#36 Updated by Junxiao Shi about 4 years ago

  • Status changed from Closed to Feedback

I notice nrd(1) manpage still exists with outdated content.

It should either be deleted, or updated to reflect the deprecation.

#37 Updated by Junxiao Shi about 4 years ago

  • Status changed from Feedback to Closed

#38 Updated by Junxiao Shi about 4 years ago

  • Status changed from Closed to Feedback

commit:a714792b032295988f35ea30fd738683fbdb4580 did not delete NRD manpage completely, as I still see this warning:

[  7/205] Processing sphinx_build [man]: docs/manpages/ndn-autoconfig-server.rst docs/manpages/ndn-autoconfig.rst docs/manpages/ndn-tlv-peek.rst docs/manpages/ndn-tlv-poke.rst docs/manpages/nfd-autoreg.rst docs/manpages/nfd-status-http-server.rst docs/manpages/nfd-status.rst docs/manpages/nfd.rst docs/manpages/nfdc.rst docs/conf.py -> build/docs/manpages/nfd.1 build/docs/manpages/ndn-autoconfig-server.1 build/docs/manpages/ndn-autoconfig.1 build/docs/manpages/nfdc.1 build/docs/manpages/ndn-tlv-peek.1 build/docs/manpages/ndn-tlv-poke.1 build/docs/manpages/nfd-autoreg.1 build/docs/manpages/nfd-status-http-server.1 build/docs/manpages/nfd-status.1
/home/shijunxiao/NFD/docs/manpages.rst:6: WARNING: toctree contains reference to nonexisting document u'manpages/nrd'

#40 Updated by Junxiao Shi about 4 years ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF