Task #2489: Merge nrd into nfd - NFD - NDN project issue tracking system

Actions

#1

Updated by Junxiao Shi over 10 years ago

NFD is single threaded as decided on 20140114 meeting.

NFD Developer Guide gives the following reason for implementing RIB Management as a separate process:

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

If you propose adding a thread to NFD for RIB Management, please give sufficient reasons to negate the above two decisions.

Actions

Copy link

#2

Updated by Davide Pesavento over 10 years ago

Junxiao Shi wrote:

NFD is single threaded as decided on 20140114 meeting.

What was decided there is that the processing of one packet in NFD (forwarding, etc...) is single threaded. In other words, one single thread is responsible for the whole chain: reading the packet from the socket, decoding, name matching, table manipulation, etc... And I agree with this choice. If the forwarding process was handled by multiple threads, the overhead for locking and context switching would be too high.

What I'm proposing here is to move the RIB processing in a separate thread. This has nothing to do with packet forwarding, as you said yourself. Another piece of NFD that can be moved to a separate thread is data signing (see #2488).

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

Again, I agree. This remains true even if the RIB manager is a thread instead of a process.

Actions

Copy link

#3

Updated by Alex Afanasyev over 10 years ago

Junxiao Shi wrote:

NFD is single threaded as decided on 20140114 meeting.

This type of single threadedness has no bearing on NRD and NFD be as part of one process, but different threads. It is merely an different organization, NFD is still completely independent and RIB manager will communicate with NFD with normal channels.

NFD Developer Guide gives the following reason for implementing RIB Management as a separate process:

Logically, the RIB Manager is a part of NFD; however, it is implemented as a separate process to handle complex routing table manipulation while keeping packet forwarding logic lightweight and simple.

If you propose adding a thread to NFD for RIB Management, please give sufficient reasons to negate the above two decisions.

The statement still holds true and is not altered. NFD and RIB management are still independent, just inside the same process. Having two separate processes complicate a lot of deployment issues.

Actions

Copy link

#4

Updated by Junxiao Shi over 10 years ago

The statement still holds true and is not altered. NFD and RIB management are still independent, just inside the same process. Having two separate processes complicate a lot of deployment issues.

So they'll still communicate over UNIX sockets?
I could agree with that.
Please update issue Description to clarity.

What about a more generalized feature: allow any ndn-cxx application to be compiled as a thread of NFD process, by following a certain programming pattern.

Docker recommends running only one user process per container, and it would be nice if NFD, RIB Daemon, ping server, user application, and even routing daemon can run together.

Actions

Copy link

#5

Updated by Lan Wang over 10 years ago

I'm confused. Can someone clarify what are the specific problems can be solved by merging nrd into nfd and still using the unix socket (and I assume Interest/Data exchange between them)?

Actions

Copy link

#6

Updated by Alex Afanasyev over 10 years ago

Separated and yet highly coupled daemons is a big deployment (and some usability) complication. I have spend at least two days trying to figure out how to enable support for two daemons in homebrew framework. We still have unsolved issue on how to stop NFD when NRD dies. There are many other issues I can cite.

Unix socket based transport is a first step. In the long run I would prefer some other transport that communicates directly between threads.

Actions

Copy link

#7

Updated by Davide Pesavento over 10 years ago

What Alex said. The two daemons are tightly coupled, and in 99% of the cases they depend on each other for proper operation. Moreover, having to deal with two processes is much more complicated in namespaced environments (LXC, docker, systemd-nspawn, ...)

Actions

Copy link

#8

Updated by Alex Afanasyev over 10 years ago

Related to Task #2496: Redesign initialization of nfd/nrd added

Actions

Copy link

#9

Updated by Alex Afanasyev over 10 years ago

Status changed from New to Code review
Assignee set to Alex Afanasyev
Target version set to v0.4
% Done changed from 0 to 100

Actions

Copy link

#10

Updated by Junxiao Shi over 10 years ago

NFD Developer Guide needs updated as part of this Task.

Actions

Copy link

#11

Updated by Alex Afanasyev over 10 years ago

Current implementation in http://gerrit.named-data.net/#/c/1740/8 raises concern about thread safety: logging, io, scheduler.

Here are my current answers:

io_service is designed to be thread safe, so no problem
scheduler: I prefer not having global state where it is possible. The linked patchset explicitly introduces a single scheduler that is passed around in RIB management. RIB management and NFD itself are run using separate threads (separate io_services), so there is no problem for thread safety
logging: this is potential concern for thread safety as we are currently using std::clog for log operations. I currently have no solution for thread safety and would like to postpone the solution until later commits. I don't even know what should be the solution.

Actions

Copy link

#12

Updated by Junxiao Shi over 10 years ago

Thread Local Storage is the solution for global scheduler. It's available natively in gcc48, and the Boost version can be used before that.

It's easier to write nfd::scheduler::schedule than passing Scheduler instance around, and nothing stops a developer from making the mistake of using nfd::scheduler::schedule in RIB daemon code.

Logger is a big problem because many (of my) experiments rely on parsing NFD logs, and an interleaved log makes all these parsing difficult or impossible.

I disagree with leaving this problem for a future commit, because it breaks those experiments with no easy fix.

A naive solution is to add a barrier before every log write operation.
Performance is bad, but correctness is more important than performance.

A better solution is:

NFD and RIB threads write logs, line by line, into a lock-free queue
introduce a third thread that pulls log lines from both queues, and writes them into std::clog

Actions

Copy link

#13

Updated by Alex Afanasyev over 10 years ago

In first updated commit I will just add std::lock_guard to ensure synchronization for log messages. I will consider lock-free queue and separate logging thread as part of a separate issue.

Actions

Copy link

#14

Updated by Alex Afanasyev over 10 years ago

Related to Task #2513: Optimize multi-threaded logging using lock-free queue and separate thread added

Actions

Copy link

#15

Updated by Junxiao Shi over 10 years ago

What's the plan for global io_service and scheduler? I disagree with forcing RIB daemon to pass around its own instances. Please use thread local storage.

Actions

Copy link

#16

Updated by Alex Afanasyev over 10 years ago

Oh. I will push the commit in a few minutes. With explicit io_service instance things were much simpler, but I think I got it right now.

The only "issue" I'm having is that I'm not sure how to properly test thread separation.

Actions

Copy link

#17

Updated by Junxiao Shi over 10 years ago

To test thread separation, make a test case:

declare two io_service pointers and initialize to nullptr
fork a thread: this thread assigns its global io_service to first pointer, and then exits
in main thread, assign global io_service to second pointer, then join the child thread
compare the two pointers are not equal

Changes of global io_service and scheduler should appear in a commit before actually merging the two processes.

Changes of logger also need it's own commit.

Actions

Copy link

#18

Updated by Alex Afanasyev over 10 years ago

Should I add printing out thread id in the log messages? It would look something like:

1423717421.858982 0x94949291 TRACE: [NetworkInterfaceInfo] bridge0: added Ethernet address 3e:15:c2:8b:65:00

(0x94949291 a hex number unique for each thread).

Right now I see that this can be used to manually verify that nfd and nrd threads are really separated.

Actions

Copy link

#19

Updated by Junxiao Shi over 10 years ago

Module name (such as "[NetworkInterfaceInfo]") is sufficient for operator to know which component is producing a log line.

Thread ID is not useful. It also changes log format (which is bad news for me because I would need to adjust all parsing scripts).

Actions

Copy link

#20

Updated by Alex Afanasyev over 10 years ago

I'm ok not changing. It is just I don't know any other reliable means to verify that the commit I made does exactly what we intend to do.

Actions

Copy link

#21

Updated by Junxiao Shi over 10 years ago

You could add one logger call after thread start to print Thread ID.

Actions

Copy link

#22

Updated by Alex Afanasyev over 10 years ago

I think you're missing my point. With thread ID in log messages you can analyze in which thread each module is running, which can be used to judge whether thread separation is effective of not (e.g., all Rib* modules should be in different thread then all NFD-related things). I have manually verified that each individual thread is using own scheduler and io_service (and I had some problems initially).

Though I cannot disagree that this is only useful for code debugging and probably useless in general.

Actions

Copy link

#23

Updated by Davide Pesavento over 10 years ago

We could introduce a command line argument to turn on/off logging of the thread ID, defaulting to off for backwards compatibility. A similar switch can be added for timestamp logging.

Actions

Copy link

#24

Updated by Alex Afanasyev over 10 years ago

For now, this can be only a compile-time option (std::clog operations is in the header file). If we move it to .cpp, then we could be more flexible.

Actions

Copy link

#25

Updated by Junxiao Shi over 10 years ago

Blocks Bug #2517: NFD state is not properly cleaned up on destruction added

Actions

Copy link

#26

Updated by Davide Pesavento over 10 years ago

What about getGlobalRng()?

Actions

Copy link

#27

Updated by Junxiao Shi over 10 years ago

What about getGlobalRng()?

Yes, this needs taken care of as part of this issue.
It doesn't have to be done before the "actual merge" commit, because for now RNG is used in forwarding only.

Actions

Copy link

#28

Updated by Junxiao Shi over 10 years ago

Status changed from Code review to Closed

Actions

Copy link

#29

Updated by Davide Pesavento over 10 years ago

Junxiao Shi wrote:

Davide Pesavento wrote:

What about getGlobalRng()?

Yes, this needs taken care of as part of this issue.

It shouldn't be closed then.

Actions

Copy link

#30

Updated by Davide Pesavento over 10 years ago

Ah sorry, I didn't see change Ib17ea05592445cf99af9726ef7799ead813ca0fe.

Actions

Copy link

#31

Updated by Davide Pesavento over 10 years ago

The NFD developer guide needs to be updated. In few places it is stated that NRD is a separate process, which is no longer correct. Section 7.6 can probably be deleted entirely.

Actions

Copy link

#32

Updated by Junxiao Shi over 10 years ago

Description updated (diff)
Status changed from Closed to Feedback

Actions

Copy link

#33

Updated by Davide Pesavento over 10 years ago

I noticed Alex made some changes but there are still several references to "NRD" which should be removed or changed to "RIB Manager/Management". Moreover, section 7.8 - "Termination" - should be deleted entirely.

Actions

Copy link

#34

Updated by Alex Afanasyev over 10 years ago

I think I have addressed these comments in my last commit (also tagged as revision-4)

Actions

Copy link

#35

Updated by Davide Pesavento over 10 years ago

Status changed from Feedback to Closed

Yes, closing.

Actions

Copy link

#36

Updated by Junxiao Shi over 10 years ago

Status changed from Closed to Feedback

I notice nrd(1) manpage still exists with outdated content.

It should either be deleted, or updated to reflect the deprecation.

Actions

Copy link

#37

Updated by Junxiao Shi about 10 years ago

Status changed from Feedback to Closed

Actions

Copy link

#38

Updated by Junxiao Shi about 10 years ago

Status changed from Closed to Feedback

commit:a714792b032295988f35ea30fd738683fbdb4580 did not delete NRD manpage completely, as I still see this warning:

[  7/205] Processing sphinx_build [man]: docs/manpages/ndn-autoconfig-server.rst docs/manpages/ndn-autoconfig.rst docs/manpages/ndn-tlv-peek.rst docs/manpages/ndn-tlv-poke.rst docs/manpages/nfd-autoreg.rst docs/manpages/nfd-status-http-server.rst docs/manpages/nfd-status.rst docs/manpages/nfd.rst docs/manpages/nfdc.rst docs/conf.py -> build/docs/manpages/nfd.1 build/docs/manpages/ndn-autoconfig-server.1 build/docs/manpages/ndn-autoconfig.1 build/docs/manpages/nfdc.1 build/docs/manpages/ndn-tlv-peek.1 build/docs/manpages/ndn-tlv-poke.1 build/docs/manpages/nfd-autoreg.1 build/docs/manpages/nfd-status-http-server.1 build/docs/manpages/nfd-status.1
/home/shijunxiao/NFD/docs/manpages.rst:6: WARNING: toctree contains reference to nonexisting document u'manpages/nrd'

Actions

Copy link

#39

Updated by Junxiao Shi about 10 years ago

http://gerrit.named-data.net/2438

Actions

Copy link

#40

Updated by Junxiao Shi about 10 years ago

Status changed from Feedback to Closed

Project

General

Profile

NFD

Tags

Task #2489

Merge nrd into nfd

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Lan Wang over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi about 10 years ago

Updated by Junxiao Shi about 10 years ago