Feature #4692: SegmentFetcher: allow applications to stop/cancel fetch - ndn-cxx - NDN project issue tracking system

Actions

Copy link

Feature #4692

closed

SegmentFetcher: allow applications to stop/cancel fetch

Added by Ashlesh Gawande almost 7 years ago. Updated about 5 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Ashlesh Gawande

Category:

Utils

Target version:

v0.7

Start date:

Due date:

% Done:

100%

Estimated time:

Tags:

API

Description

SegmentFetcher::start returns a shared_ptr, but this shared_ptr is kept alive internally by the SegmentFetcher instance itself, so even if the returned shared_ptr goes out of scope, the SegmentFetcher is not destructed.

In PSync #4662 (and maybe NLSR #4682) when a test completes successfully without error, ASan and valgrind reports leak due to segment fetcher start being called such that there is no expectation of data (start is called because the application is supposed to send the interest upon getting data).
The io is already stopped given that the test is successfully terminated. If advanceClocks is run for 60 seconds (maxTimeout) the segment fetcher destructs and no leaks are reported.

If a stop/cancel function is provided it can be called in the application's destructor.

Another use case is when a previous interest needs to be cancelled and a new interest needs to be sent in its place.

Files

Download all files

heap-free-after-use.txt (13.7 KB) heap-free-after-use.txt		Ashlesh Gawande, 09/01/2018 09:05 PM
asan (15.7 KB) asan		Ashlesh Gawande, 10/02/2018 05:41 PM

Related issues 4 (2 open — 2 closed)

Actions

Copy link

Updated by Davide Pesavento almost 7 years ago

Description updated (diff)
Start date deleted (~~08/03/2018~~)

Actions

Copy link

Updated by Ashlesh Gawande almost 7 years ago

One solution is to:

Stop passing shared_ptr around
Remove deprecated functions

Actions

Copy link

Updated by Davide Pesavento almost 7 years ago

Ashlesh Gawande wrote:

One solution is to:

Stop passing shared_ptr around

Uhm no, there is a reason why we do that. SegmentFetcher needs to manage its own lifetime. You could pass weak_ptr around, instead of shared_ptr, and give ownership to the caller, but that's an incompatible API change, because current consumers expect the fetch to continue even if they don't retain the shared_ptr.
I would not disagree with changing this, but you need to find existing consumers of the new start()-based API and fix them. And you still need to cancel outstanding operations in an orderly fashion.

Actions

Copy link

Updated by Ashlesh Gawande almost 7 years ago

1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr. NLSR does rely on segment fetcher managing itself. If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs? Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"

2) If it is a lot of work to change dependent applications, how about a stop() function that does the following:

Cancel any scheduled events
Manipulate (m_timeLastSegmentReceived + m_options.maxTimeout) such that it is less than time::steady_clock::now()
Call afterTimeoutCb

But again applications who wish to cancel any pending segment fetchers so as to release memory when they are successfully terminated need to keep track of each segment fetcher and call stop on each when destructor is called.

Actions

Copy link

Updated by Junxiao Shi almost 7 years ago

I disagree with any breaking changes regarding requiring the calling app to manage SegmentFetcher instance lifetime.

SegmentFetcher should keep tracking of PendingInterestId* it has expressed. SegmentFetcher::cancel simply cancels all pending Interests, which stops the transfer, and naturally releases shared_ptr<SegmentFetcher> because no callback function still references it.

Actions

Copy link

Updated by Ashlesh Gawande almost 7 years ago

What about the scheduled timeout callback, afterTimeoutCb?
https://github.com/named-data/ndn-cxx/blob/master/src/util/segment-fetcher.cpp#L148
We also need to cancel pending events from scheduler.

Actions

Copy link

Updated by Davide Pesavento almost 7 years ago

Ashlesh Gawande wrote:

1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr.

Not really. I'm only talking about the users of the new start()-based API. We can ignore the users of the deprecated API since we won't be changing that. I'm not expecting many users of the new API, given that it was added not long ago and it's not in any released version of ndn-cxx.

If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs?

Correct.

Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"

Cancel pending interests and scheduled events.

Actions

Copy link

Updated by Davide Pesavento almost 7 years ago

Junxiao Shi wrote:

I disagree with any breaking changes regarding requiring the calling app to manage SegmentFetcher instance lifetime.

Again, as I said, it wouldn't really be a breaking change since the new API hasn't appeared in any released version of ndn-cxx yet.

Actions

Copy link

Updated by Ashlesh Gawande almost 7 years ago

So if I make the following changes to the segment fetcher:

Pass around weak_ptr instead of shared_ptr
Add a destructor where I erase pending interests and scheduled events

Then how would an application manage the memory?

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

  auto fetcher = SegmentFetcher::start(m_face, Interest(prefix), m_validator, fetcherOptions);
  m_fetchers.push_back(fetcher);
  if (processResponse) {
    fetcher->onComplete.connect([this, processResponse] (ConstBufferPtr bufferPtr) {
                                  processResponse(std::move(bufferPtr));
                                  for (auto it = m_fetchers.begin(); it != m_fetchers.end();) {
                                    if (it->use_count() == 1) {
                                      it = m_fetchers.erase(it);
                                    }
                                    else {
                                      ++it;
                                    }
                                  }
                                });
  }
  if (onFailure) {
    fetcher->onError.connect([=] (uint32_t code, const std::string& msg) {
      processDatasetFetchError(onFailure, code, msg);
    });
  }

Actions

Copy link

#10

Updated by Junxiao Shi almost 7 years ago

Then how would an application manage the memory?

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.

Revert that.

But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

My first idea is to capture shared_ptr into both callbacks, but this wouldn’t work: the callback is still hooked on a signal and thus would never release the shared_ptr.

My second idea can work:

struct PtrWrapper
{
  shared_ptr<SegmentFetcher> fetcher;
};

// when you start fetching
wrapper = new PtrWrapper();
wrapper.fetcher = SegmentFetcher::start(

// in a callback
delete wrapper;

Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.
In fact, the reason to name the API SegmentFetcher::start Is to convey that the caller doesn’t need to keep the return value. If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new start implementation that internally maintains the object lifetime.

My third idea: convert the API into a Promise. There are Promise libraries out there.

Actions

Copy link

#11

Updated by Ashlesh Gawande almost 7 years ago

Okay thanks.
I think for now, providing stop() function that does the following is quicker:

Cancel pending interests
Cancel scheduled events

Then in PSync I can have a shared_ptr member variable for segment fetcher, on which I call stop() before assigning it a new segment fetcher via start.
Also, I can provide a stop function in PSync consumer which calls stop on segment fetcher member variable, that will be called in unit test destructor so that ASan leak-sanitizer does not complain.

Actions

Copy link

#12

Updated by Ashlesh Gawande almost 7 years ago

Status changed from New to Code review
Assignee set to Ashlesh Gawande
% Done changed from 0 to 100

Actions

Copy link

#13

Updated by Davide Pesavento almost 7 years ago

Junxiao Shi wrote:

Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.

Junxiao, don't spread FUD please. "maintaining the lifetime" here simply means that when the last shared_ptr goes out of scope, the segment fetcher is automatically stopped... it's not that different from the usual RAII principle.

In fact, the reason to name the API SegmentFetcher::start Is to convey that the caller doesn’t need to keep the return value.

I don't know where you got this idea from... the word "start" does not imply or even suggest anything like that.

If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new start implementation that internally maintains the object lifetime.

That's not going to help with the original issue that triggered this whole discussion: the memory "leak".

Actions

Copy link

#14

Updated by Davide Pesavento almost 7 years ago

Ashlesh Gawande wrote:

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

You're doing something wrong... are the segment fetcher callbacks checking that their weak_ptr is still valid? I can't really tell what's going on if I don't see the code.

Actions

Copy link

#15

Updated by Ashlesh Gawande almost 7 years ago

Davide Pesavento wrote:

Ashlesh Gawande wrote:

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

You're doing something wrong... are the segment fetcher callbacks checking that their weak_ptr is still valid? I can't really tell what's going on if I don't see the code.

You are right, I was not checking if weak_ptr is still valid. Updated the same gerrit change.

Actions

Copy link

#16

Updated by Davide Pesavento almost 7 years ago

Target version set to v0.7

Actions

Copy link

#17

Updated by Davide Pesavento almost 7 years ago

I was wondering if we should instead have a "scoped handle" type, separate from SegmentFetcher itself, that automatically cancels the associated fetch operation when destructed. This design would be consistent with several other facilities already available in ndn-cxx, such as ScopedConnection, ScopedEventId, and soon ScopedPendingInterest (#4316).

Actions

Copy link

#18

Updated by Junxiao Shi almost 7 years ago

I was wondering if we should instead have a "scoped handle" type, separate from SegmentFetcher itself, that automatically cancels the associated fetch operation when destructed.

Yes. It’ll take a while before I get to that template, however.

Actions

Copy link

#19

Updated by Ashlesh Gawande almost 7 years ago

File heap-free-after-use.txt heap-free-after-use.txt added

Problem with storing the shared_ptr as a class member in SegmentFetcher is when to destroy it? I destroy it after onComplete and after onError. I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.

All the test where data is received passes w/o any problems.

I added a test (similar to TestStatusDataset/Failures/Timeout test to see why that was producing heap-free-after-use):

BOOST_AUTO_TEST_CASE(StatusR)
{
  DummyValidator acceptValidator;
  SegmentFetcher::Options options;
  options.maxTimeout = 3000_ms;

  {
    auto fetcher = SegmentFetcher::start(face, Interest("/localhost/nfd/faces/list"),
                                         acceptValidator, options);
  }

  advanceClocks(500_ms, 7);
}

which leads to heap-free-after-use. If I don't reset the fetcher after onError, then I get memory leaks.

@Junxiao, can you please elaborate your solution once again?

Actions

Copy link

#20

Updated by Davide Pesavento almost 7 years ago

Ashlesh Gawande wrote:

I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.

Well this is wrong. The mere act of checking the member shared_ptr is undefined behavior, because the instance of SegmentFetcher that contains the shared_ptr might no longer exist at the time of checking, hence the use-after-free.

Actions

Copy link

#21

Updated by Ashlesh Gawande almost 7 years ago

I removed those checks, but still got the heap-use-after-free. So I am not sure what's wrong - I have uploaded the code.

Actions

Copy link

#22

Updated by Davide Pesavento almost 7 years ago

Removing the checks doesn't solve the problem I pointed out, it makes it worse. This part of your change was correct a few patch sets ago, I don't know why you decided to change it... anyway, I'll upload a new revision to show my design suggestion.

Actions

Copy link

#23

Updated by Ashlesh Gawande almost 7 years ago

File asan asan added

So I think while running OutOfScopeTimeout test, we get heap-use-after-free in scheduler.cpp at while (!m_queue.empty()) {
because when we reset the shared_ptr, m_this, in onError it destructs the scheduler. But the io service is still running so it can trigger some call back in scheduler (I think).

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

Actions

Copy link

#24

Updated by Junxiao Shi almost 7 years ago

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

No, the API should remain unchanged.

Actions

Copy link

#25

Updated by Ashlesh Gawande almost 7 years ago

Junxiao Shi wrote:

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

No, the API should remain unchanged.

Okay, any suggestion on getting around the destruction of scheduler?
(I have to reset m_this in onError, otherwise ASan will report memory leaks).

Actions

Copy link

#26

Updated by Davide Pesavento almost 7 years ago

Ashlesh Gawande wrote:

Okay, any suggestion on getting around the destruction of scheduler?

Capture a shared_ptr in the scheduled callback, to ensure SegmentFetcher, and therefore Scheduler, stays alive? Scheduler is unable to guarantee its own survival, callers must ensure the instance remains valid while there are outstanding callbacks.

We've had this problem before, see e.g. #2653. And I'm starting to think that Scheduler is causing more problems than it solves...