Bug #2517: NFD state is not properly cleaned up on destruction - NFD - NDN project issue tracking system

Actions

Copy link

Bug #2517

closed

NFD state is not properly cleaned up on destruction

Added by Alex Afanasyev over 10 years ago. Updated almost 2 years ago.

Status:

Abandoned

Priority:

Normal

Assignee:

Category:

Core

Target version:

Start date:

Due date:

% Done:

Estimated time:

Description

In some environments (specifically Android app), it may be necessary (at least needed for now) to start/stop NFD multiple times within the same process. This requires that Nrd/Nfd destructors properly cancel everything that was posted on io_service without requiring to stop io_service.

The current way we do thing in main() is we stopping the global io_service first, after which we call the destructors. This effectively destructs everything, however there are dangling handlers posted that will be removed when the application is terminated. This approach does not work when nfd needs to be started again within the same process---whenever global io_service is started again, it will fire up a few handlers that point to deallocated memory.

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by Alex Afanasyev over 10 years ago

Tracker changed from Task to Bug
Subject changed from ls to Full restart of NFD inside the same process
Category set to Core
Assignee set to Alex Afanasyev
Target version set to v0.4

While attempting to start/stop NFD multiple times within the same process (without reset of scheduler and io_service) i got stuck with the problem that it is not currently supported.

The quickest way to solve the problem for me is to reset IO and scheduler. I will try to make things in a cleaner way, but if it will take too much time, I will just use the shortcut.

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Please update Description (not Notes) with steps to reproduce, expected, and actual behavior.

Actions

Copy link

Updated by Alex Afanasyev over 10 years ago

I'm working with the test case to reproduce/test the issue.

Actions

Copy link

Updated by Alex Afanasyev over 10 years ago

Status changed from New to Code review
% Done changed from 0 to 100

Actions

Copy link

Updated by Alex Afanasyev over 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Blocked by Task #2489: Merge nrd into nfd added

Actions

Copy link

Updated by Junxiao Shi over 10 years ago

Nfd and Nrd types are not thread safe. The snippet calls their destructors in another thread, which could lead to undefined behavior.

Actions

Copy link

It's not "formal things". A Bug report must allow others to understand what's the problem. I cannot understand the problem from this snippet, thus it needs revising.

I don't understand the significance of this problem. Nfd never needs to restart within a process. If it crashes, the process should exit.

Actions

Copy link

#14

Updated by Alex Afanasyev over 10 years ago

Description updated (diff)

Actions

Copy link

#15

Updated by Alex Afanasyev over 10 years ago

Related to Bug #2518: Segfault when destructing face without cancelling pending interests added

Actions

Copy link

#16

Updated by Junxiao Shi over 10 years ago

This approach does not work when nfd needs to be started again within the same process---whenever global io_service is started again, it will fire up a few handlers that point to deallocated memory.

Why would NFD need to be started again within the same process? Just quit the process and let the monitor to restart it.

Actions

Copy link

#17

Updated by Alex Afanasyev over 10 years ago

I don't have an answer of "why". It is more question of purity and "correctness": I would like to have Nrd/Nfd/Forwarder classes to do proper cleanup in a proper order, which gives ability to re-start NFD within the same process if somebody ever wants to do that.

Actions

Copy link

#18

Updated by Junxiao Shi over 10 years ago

I don't have an answer of "why". It is more question of purity and "correctness": I would like to have Nrd/Nfd/Forwarder classes to do proper cleanup in a proper order, which gives ability to re-start NFD within the same process if somebody ever wants to do that.

I won't agree with this change until there's a use case.

The easy solution is: let Nfd type own the io_service instance, which is destroyed whenever Nfd is destructed. Same goes for Nrd type.

Actions

Copy link

#19

Updated by Alex Afanasyev over 10 years ago

What's a point of you blocking the change? There is no functional change and it just cleans up the order things are destructed. I can come up with any work-around, this change is purely for conceptual correctness.

Actions

Copy link

#20

Updated by Junxiao Shi over 10 years ago

I won't agree with this Change until there's a use case. Without a use case, this is "unnecessary changes".

Actions

Copy link

#21

Updated by Davide Pesavento over 10 years ago

Even without the "in-process restart", it makes sense to cleanup in destructors... I'd tend to agree with Alex here.

Actions

Copy link

#22

Updated by Junxiao Shi over 10 years ago

Then update the issue title and description to get rid of "restart".

Actions

Copy link

#23

Updated by Alex Afanasyev over 10 years ago

Davide's comment on gerrit:

Shouldn't be the channels' responsibility to close the faces though? The channels create them, I'd expect the channels to also destroy them. FaceTable is just a passive observer in this.

Junxiao's comment on gerrit:

Yes

While I'm in principle agree that FaceTable may not be a good place to "own" faces. Channels is not a good place either. First, channels are not the only place where face is created. Some faces are channel-less (e.g., multicast faces) and some faces are completely channel- and factory-less (InternalFace). The only place that has a complete knowledge about all faces is the forwarder itself and in particular FaceTable. Therefore, I would argue that FaceTable is the owner of faces and channels are just observers overseeing faces that they created.

Actions

Copy link

#24

Updated by Davide Pesavento over 10 years ago

Channels or factories it's not too important... the point is that "normal" faces are owned by channels (or factories), channels are owned by factories, factories are owned by the FaceManager. InternalFace and NullFace are constructed by Nfd and should be closed by it.

This reminds me of a thought I had a while ago, see #2300 note 8.

Also I have a question about this task description: if everything is destructed, why are there "dangling handlers"? Handlers for what specifically?

Actions

Copy link

#25

Updated by Alex Afanasyev over 10 years ago

Subject changed from Full restart of NFD inside the same process to Nrd/Forwarder state is not properly cleanup on destruction
Description updated (diff)

Actions

Copy link

#26

Updated by Junxiao Shi over 10 years ago

In some environments (specifically Android app), it may be necessary (at least needed for now) to start/stop NFD multiple times within the same process. This requires that Nrd/Nfd destructors properly cancel everything that was posted on io_service without requiring to stop io_service.

Why can't Android app destroy and recreate io_service?

Actions

Copy link

#27

Updated by Davide Pesavento over 10 years ago

Subject changed from Nrd/Forwarder state is not properly cleanup on destruction to Nrd/Forwarder state is not properly cleaned up on destruction

Actions

Copy link

#28

Updated by Alex Afanasyev over 10 years ago

I could agree with using weak_ptr instead of shared_ptr in many cases. However, this would require a lot of changes and I really don't want to do them any time soon.

As for removal and dangling stuff (this may not be entirely correct in the description). From what I can guess, Channels/Factories belong to FaceManager, which intentionally is destructed after Forwarder class (as it has to be created prior). Unless face is already "failed" while Forwarder instance is still alive, destruction of face inside Channel/Factory will result in signalling of failure to the forwarder, while the forwarder is already destructed. I guess, this specific part can be fixed by saving connections and disconnecting them inside FaceTable destructor.

Actions

Copy link

#29

Updated by Alex Afanasyev over 10 years ago

Junxiao Shi wrote:

Why can't Android app destroy and recreate io_service?

I can. The bug is not exactly about ability to "cheat" (and I'm already doing it). The bug is about correctness during the destruction process.

Actions

Copy link

#30

Updated by Davide Pesavento over 10 years ago

Alex Afanasyev wrote:

Unless face is already "failed" while Forwarder instance is still alive, destruction of face inside Channel/Factory will result in signalling of failure to the forwarder, while the forwarder is already destructed. I guess, this specific part can be fixed by saving connections and disconnecting them inside FaceTable destructor.

Yes I guess FaceTable should use a ScopedConnection for its subscriptions to the faces then.

Actions

Copy link

#31

Updated by Alex Afanasyev over 9 years ago

Target version changed from v0.4 to v0.5

Actions

Copy link

#32

Updated by Junxiao Shi over 9 years ago

Status changed from Code review to In Progress

http://gerrit.named-data.net/1759 is blocked and marked -2 by Change Owner, so Status shouldn't be CodeReview.

Actions

Copy link

#33

Updated by Junxiao Shi almost 9 years ago

I rebased https://gerrit.named-data.net/1759 to patchset11, but I couldn't get the test case to work.

One problem is: FaceTable destructor expects Face::close (formerly Face::fail) to close the face immediately, but Face::close only requests the face to be closed, the actual closing is asynchronous.
Thus, FaceTable destructor is not the right place to request closing faces.

The caller needs to request the closure (possible through a helper method on FaceTable class), and then waits until all faces are closed before deallocating FaceTable.

Actions

Copy link

#34

Updated by Davide Pesavento almost 9 years ago

Junxiao Shi wrote:

One problem is: FaceTable destructor expects Face::close (formerly Face::fail) to close the face immediately, but Face::close only requests the face to be closed, the actual closing is asynchronous.

Correct.

Thus, FaceTable destructor is not the right place to request closing faces.

This is only one possible conclusion. There is at least another one: FaceTable destructor should not be expecting all faces to be closed at the end of its execution. At that point the faces can be either CLOSING, FAILED, or CLOSED. The io_service should naturally return from run() once there are no more handlers to execute. After that, all faces will be CLOSED (in fact, they will be deallocated immediately after their last handler has returned).

This also means that we need a stop()/reset() method in Nfd or NfdRunner that triggers the destruction of everything.

The caller needs to request the closure (possible through a helper method on FaceTable class), and then waits until all faces are closed before deallocating FaceTable.

The problem here is that "the caller" is another destructor, so how do you "wait"? I don't know if it's a good idea to call io_service::poll() from a destructor.

Actions

Copy link

#35

Updated by Alex Afanasyev over 8 years ago

Status changed from In Progress to New
Assignee deleted (~~Alex Afanasyev~~)
Target version deleted (~~v0.5~~)
% Done changed from 100 to 0

Actions

Copy link

#36

Updated by Davide Pesavento over 6 years ago

Subject changed from Nrd/Forwarder state is not properly cleaned up on destruction to NFD state is not properly cleaned up on destruction
Start date deleted (~~02/12/2015~~)

Actions

Copy link

#37

Updated by Davide Pesavento almost 2 years ago

Status changed from New to Abandoned

The corresponding change has been abandoned a long time ago with the following reason:

This probably not needed anymore, will revive if running to the problem

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #2517

NFD state is not properly cleaned up on destruction

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Junxiao Shi over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Alex Afanasyev over 10 years ago

Updated by Davide Pesavento over 10 years ago

Updated by Alex Afanasyev over 9 years ago

Updated by Junxiao Shi over 9 years ago

Updated by Junxiao Shi almost 9 years ago

Updated by Davide Pesavento almost 9 years ago

Updated by Alex Afanasyev over 8 years ago

Updated by Davide Pesavento over 6 years ago

Updated by Davide Pesavento almost 2 years ago