Bug #1360
closedproducers cannot connect to running nfd if by mistake another instance of nfd is started.
90%
Description
I by mistake started another instance of nfd on another terminal, which as expected quit after printing out the following messages:
$ sudo NFD=1 NRD=1 NFD_LOG=all ~/nfd/build/nfd --config /usr/local/etc/ndn/nfd.conf.sample
DEBUG: [NameTree] lookup /
DEBUG: [NameTree] insert /
DEBUG: [NameTree] Name / hash value = 2654435816 location = 488
DEBUG: [NameTree] Did not find /, need to insert it to the table
INFO: [StrategyChoice] setDefaultStrategy(/localhost/nfd/strategy/best-route) new entry
DEBUG: [FaceUri] URI [internal://] parsed into: internal, , ,
INFO: [FaceTable] addFace id=1
INFO: [InternalFace] registering callback for /localhost/nfd/fib
INFO: [InternalFace] registering callback for /localhost/nfd/faces
INFO: [InternalFace] registering callback for /localhost/nfd/control-header
INFO: [InternalFace] registering callback for /localhost/nfd/strategy-choice
DEBUG: [CommandValidator] generated certfile path: /usr/local/etc/ndn/keys/default.ndncert
INFO: [CommandValidator] Giving privilege "control-header" to identity /obaid/ksk-1395076550007
INFO: [CommandValidator] Giving privilege "faces" to identity /obaid/ksk-1395076550007
INFO: [CommandValidator] Giving privilege "fib" to identity /obaid/ksk-1395076550007
INFO: [CommandValidator] Giving privilege "strategy-choice" to identity /obaid/ksk-1395076550007
DEBUG: [TcpFactory] Channel [0.0.0.0:6363] created
DEBUG: [TcpFactory] Channel [[::]:6363] created
ERROR: [Main] Error: bind: Address already in use
However, after this I couldn't connect any producer to already running nfd. To connect any producer again I had to restart the running nfd as well.
Steps to reproduce the error:
On Terminal 1 start nfd:
$ sudo NFD=1 NRD=1 NFD_LOG=all ~/nfd/build/nfd --config /usr/local/etc/ndn/nfd.conf.sample
On Terminal 2 start another instance of nfd:
$ sudo NFD=1 NRD=1 NFD_LOG=all ~/nfd/build/nfd --config /usr/local/etc/ndn/nfd.conf.sample
(This will give the above mentioned errors and will quit)
Start producer:
$ NFD=1 ~/ndn-cpp-dev/build/examples/producer
ERROR: error while connecting to the forwarder (Connection refused)
Updated by Junxiao Shi over 10 years ago
ndnd solves this problem by:
- (old process) periodically checks whether UNIX socket still exists, and quits if the socket is gone
- (new process) checks whether UNIX socket already exists during initialization, if yes, delete the socket and wait 8 seconds for old process to stop
- deletes UNIX socket during normal shutdown
NFD doesn't always have a UNIX socket. Some script (or upstart job, not NFD itself) should record the PID into a file, and kill the old process before starting a new one.
Updated by Alex Afanasyev over 10 years ago
But the particular problem is because the second run of NFD wasn't prevented at an early stage. Should we fix this somehow in NFD or defer this to upstart-like things?
Updated by Junxiao Shi over 10 years ago
NFD itself doesn't need to prevent this. It should be left to upstart.
Code repository should have a bash script similar to ndndstart
and ndndstop
, to be used on platforms without upstart.
Updated by Alex Afanasyev over 10 years ago
But it is kind of bad that we are removing unix socket file if NFD is accidentally run for the second time... In any case, I agree to defer this to upstart.
Updated by Junxiao Shi over 10 years ago
- Category set to Faces
- Target version set to v0.1
Updated by Davide Pesavento over 10 years ago
What exactly are the requirements here? Do we really want to support running multiple instances of nfd on the same machine concurrently?
Also, I wouldn't focus too much on upstart for this, it's dying since even debian and ubuntu will abandon it soon in favor of systemd, and almost all other major distros have switched or are switching to systemd as well.
Updated by Davide Pesavento over 10 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 90
commit:e22d8c84 fixed this bug in almost all cases. However there's still a small window of opportunity for a race condition between two nfd processes starting up roughly at the same time in the presence of a stale socket file. I plan to fix the race condition in a subsequent patch.
Updated by Junxiao Shi over 10 years ago
- Status changed from In Progress to Closed
Updated by Davide Pesavento over 10 years ago
This bug is not completely fixed, why did you close it?