Bug #2517
closedNFD state is not properly cleaned up on destruction
0%
Description
In some environments (specifically Android app), it may be necessary (at least needed for now) to start/stop NFD multiple times within the same process. This requires that Nrd/Nfd destructors properly cancel everything that was posted on io_service
without requiring to stop io_service
.
The current way we do thing in main() is we stopping the global io_service
first, after which we call the destructors. This effectively destructs everything, however there are dangling handlers posted that will be removed when the application is terminated. This approach does not work when nfd needs to be started again within the same process---whenever global io_service
is started again, it will fire up a few handlers that point to deallocated memory.
Updated by Alex Afanasyev almost 10 years ago
- Tracker changed from Task to Bug
- Subject changed from ls to Full restart of NFD inside the same process
- Category set to Core
- Assignee set to Alex Afanasyev
- Target version set to v0.4
While attempting to start/stop NFD multiple times within the same process (without reset of scheduler and io_service) i got stuck with the problem that it is not currently supported.
The quickest way to solve the problem for me is to reset IO and scheduler. I will try to make things in a cleaner way, but if it will take too much time, I will just use the shortcut.
Updated by Junxiao Shi almost 10 years ago
Please update Description (not Notes) with steps to reproduce, expected, and actual behavior.
Updated by Alex Afanasyev almost 10 years ago
I'm working with the test case to reproduce/test the issue.
Updated by Alex Afanasyev almost 10 years ago
- Status changed from New to Code review
- % Done changed from 0 to 100
Updated by Junxiao Shi almost 10 years ago
- Blocked by Task #2489: Merge nrd into nfd added
Updated by Junxiao Shi almost 10 years ago
Nfd
and Nrd
types are not thread safe. The snippet calls their destructors in another thread, which could lead to undefined behavior.
Updated by Alex Afanasyev almost 10 years ago
This is just snippet. The actual test case has a little bit of thread safety handling. The test case reproduces the problem and is reference implementation. The snippet just to illustrate the issue.
Updated by Alex Afanasyev almost 10 years ago
Technically, this task is not blocked by nfd/nrd merge. It is only blocked by #2516
Updated by Alex Afanasyev almost 10 years ago
- Blocked by Bug #2516: UdpChannel::newPeer does not properly handle error condition added
Updated by Junxiao Shi almost 10 years ago
If the bug report is illustrated with code snippet, the snippet must be correct. Alternatively, describe the steps in text, and the test case in the solution should match this text.
Updated by Alex Afanasyev almost 10 years ago
Junxiao, let's be productive. I don't want to waste time on formal things. My opinion is that the snippet and the test case achieve their different goals.
Updated by Junxiao Shi almost 10 years ago
It's not "formal things". A Bug report must allow others to understand what's the problem. I cannot understand the problem from this snippet, thus it needs revising.
I don't understand the significance of this problem. Nfd
never needs to restart within a process. If it crashes, the process should exit.
Updated by Alex Afanasyev almost 10 years ago
- Related to Bug #2518: Segfault when destructing face without cancelling pending interests added
Updated by Junxiao Shi almost 10 years ago
This approach does not work when nfd needs to be started again within the same process---whenever global
io_service
is started again, it will fire up a few handlers that point to deallocated memory.
Why would NFD need to be started again within the same process? Just quit the process and let the monitor to restart it.
Updated by Alex Afanasyev almost 10 years ago
I don't have an answer of "why". It is more question of purity and "correctness": I would like to have Nrd/Nfd/Forwarder classes to do proper cleanup in a proper order, which gives ability to re-start NFD within the same process if somebody ever wants to do that.
Updated by Junxiao Shi almost 10 years ago
I don't have an answer of "why". It is more question of purity and "correctness": I would like to have Nrd/Nfd/Forwarder classes to do proper cleanup in a proper order, which gives ability to re-start NFD within the same process if somebody ever wants to do that.
I won't agree with this change until there's a use case.
The easy solution is: let Nfd
type own the io_service
instance, which is destroyed whenever Nfd
is destructed. Same goes for Nrd
type.
Updated by Alex Afanasyev almost 10 years ago
What's a point of you blocking the change? There is no functional change and it just cleans up the order things are destructed. I can come up with any work-around, this change is purely for conceptual correctness.
Updated by Junxiao Shi almost 10 years ago
I won't agree with this Change until there's a use case. Without a use case, this is "unnecessary changes".
Updated by Davide Pesavento almost 10 years ago
Even without the "in-process restart", it makes sense to cleanup in destructors... I'd tend to agree with Alex here.
Updated by Junxiao Shi almost 10 years ago
Then update the issue title and description to get rid of "restart".
Updated by Alex Afanasyev almost 10 years ago
Davide's comment on gerrit:
Shouldn't be the channels' responsibility to close the faces though? The channels create them, I'd expect the channels to also destroy them. FaceTable is just a passive observer in this.
Junxiao's comment on gerrit:
Yes
While I'm in principle agree that FaceTable may not be a good place to "own" faces. Channels is not a good place either. First, channels are not the only place where face is created. Some faces are channel-less (e.g., multicast faces) and some faces are completely channel- and factory-less (InternalFace). The only place that has a complete knowledge about all faces is the forwarder itself and in particular FaceTable. Therefore, I would argue that FaceTable is the owner of faces and channels are just observers overseeing faces that they created.
Updated by Davide Pesavento almost 10 years ago
Channels or factories it's not too important... the point is that "normal" faces are owned by channels (or factories), channels are owned by factories, factories are owned by the FaceManager. InternalFace and NullFace are constructed by Nfd and should be closed by it.
This reminds me of a thought I had a while ago, see #2300 note 8.
Also I have a question about this task description: if everything is destructed, why are there "dangling handlers"? Handlers for what specifically?
Updated by Alex Afanasyev almost 10 years ago
- Subject changed from Full restart of NFD inside the same process to Nrd/Forwarder state is not properly cleanup on destruction
- Description updated (diff)
Updated by Junxiao Shi almost 10 years ago
In some environments (specifically Android app), it may be necessary (at least needed for now) to start/stop NFD multiple times within the same process. This requires that Nrd/Nfd destructors properly cancel everything that was posted on
io_service
without requiring to stopio_service
.
Why can't Android app destroy and recreate io_service
?
Updated by Davide Pesavento almost 10 years ago
- Subject changed from Nrd/Forwarder state is not properly cleanup on destruction to Nrd/Forwarder state is not properly cleaned up on destruction
Updated by Alex Afanasyev almost 10 years ago
I could agree with using weak_ptr
instead of shared_ptr
in many cases. However, this would require a lot of changes and I really don't want to do them any time soon.
As for removal and dangling stuff (this may not be entirely correct in the description). From what I can guess, Channels/Factories belong to FaceManager, which intentionally is destructed after Forwarder class (as it has to be created prior). Unless face is already "failed" while Forwarder instance is still alive, destruction of face inside Channel/Factory will result in signalling of failure to the forwarder, while the forwarder is already destructed. I guess, this specific part can be fixed by saving connections and disconnecting them inside FaceTable destructor.
Updated by Alex Afanasyev almost 10 years ago
Junxiao Shi wrote:
Why can't Android app destroy and recreate
io_service
?
I can. The bug is not exactly about ability to "cheat" (and I'm already doing it). The bug is about correctness during the destruction process.
Updated by Davide Pesavento almost 10 years ago
Alex Afanasyev wrote:
Unless face is already "failed" while Forwarder instance is still alive, destruction of face inside Channel/Factory will result in signalling of failure to the forwarder, while the forwarder is already destructed. I guess, this specific part can be fixed by saving connections and disconnecting them inside FaceTable destructor.
Yes I guess FaceTable should use a ScopedConnection for its subscriptions to the faces then.
Updated by Alex Afanasyev almost 9 years ago
- Target version changed from v0.4 to v0.5
Updated by Junxiao Shi over 8 years ago
- Status changed from Code review to In Progress
http://gerrit.named-data.net/1759 is blocked and marked -2 by Change Owner, so Status shouldn't be CodeReview.
Updated by Junxiao Shi over 8 years ago
I rebased https://gerrit.named-data.net/1759 to patchset11, but I couldn't get the test case to work.
One problem is: FaceTable
destructor expects Face::close
(formerly Face::fail
) to close the face immediately, but Face::close
only requests the face to be closed, the actual closing is asynchronous.
Thus, FaceTable
destructor is not the right place to request closing faces.
The caller needs to request the closure (possible through a helper method on FaceTable
class), and then waits until all faces are closed before deallocating FaceTable
.
Updated by Davide Pesavento over 8 years ago
Junxiao Shi wrote:
One problem is:
FaceTable
destructor expectsFace::close
(formerlyFace::fail
) to close the face immediately, butFace::close
only requests the face to be closed, the actual closing is asynchronous.
Correct.
Thus,
FaceTable
destructor is not the right place to request closing faces.
This is only one possible conclusion. There is at least another one: FaceTable
destructor should not be expecting all faces to be closed at the end of its execution. At that point the faces can be either CLOSING
, FAILED
, or CLOSED
. The io_service
should naturally return from run()
once there are no more handlers to execute. After that, all faces will be CLOSED
(in fact, they will be deallocated immediately after their last handler has returned).
This also means that we need a stop()/reset()
method in Nfd
or NfdRunner
that triggers the destruction of everything.
The caller needs to request the closure (possible through a helper method on
FaceTable
class), and then waits until all faces are closed before deallocatingFaceTable
.
The problem here is that "the caller" is another destructor, so how do you "wait"? I don't know if it's a good idea to call io_service::poll()
from a destructor.
Updated by Alex Afanasyev almost 8 years ago
- Status changed from In Progress to New
- Assignee deleted (
Alex Afanasyev) - Target version deleted (
v0.5) - % Done changed from 100 to 0
Updated by Davide Pesavento about 6 years ago
- Subject changed from Nrd/Forwarder state is not properly cleaned up on destruction to NFD state is not properly cleaned up on destruction
- Start date deleted (
02/12/2015)
Updated by Davide Pesavento about 1 year ago
- Status changed from New to Abandoned
The corresponding change has been abandoned a long time ago with the following reason:
This probably not needed anymore, will revive if running to the problem