Project

General

Profile

Feature #4692

SegmentFetcher: allow applications to stop/cancel fetch

Added by Ashlesh Gawande over 2 years ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Category:
Utils
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Tags:

Description

SegmentFetcher::start returns a shared_ptr, but this shared_ptr is kept alive internally by the SegmentFetcher instance itself, so even if the returned shared_ptr goes out of scope, the SegmentFetcher is not destructed.

In PSync #4662 (and maybe NLSR #4682) when a test completes successfully without error, ASan and valgrind reports leak due to segment fetcher start being called such that there is no expectation of data (start is called because the application is supposed to send the interest upon getting data).
The io is already stopped given that the test is successfully terminated. If advanceClocks is run for 60 seconds (maxTimeout) the segment fetcher destructs and no leaks are reported.

If a stop/cancel function is provided it can be called in the application's destructor.

Another use case is when a previous interest needs to be cancelled and a new interest needs to be sent in its place.


Files

heap-free-after-use.txt (13.7 KB) heap-free-after-use.txt Ashlesh Gawande, 09/01/2018 09:05 PM
asan (15.7 KB) asan Ashlesh Gawande, 10/02/2018 05:41 PM

Related issues

Related to ndn-cxx - Feature #4776: Re-design SegmentFetcherNew

Actions
Blocks ndn-tools - Task #4637: ndncatchunks: use SegmentFetcherNew

Actions
Blocks PSync - Feature #4662: Segment hello and sync data in partial syncClosed07/12/2018

Actions
Blocks PSync - Task #4716: Segment sync data in full syncClosed

Actions
#1

Updated by Davide Pesavento over 2 years ago

  • Description updated (diff)
  • Start date deleted (08/03/2018)
#2

Updated by Ashlesh Gawande over 2 years ago

One solution is to:

  • Stop passing shared_ptr around
  • Remove deprecated functions
#3

Updated by Davide Pesavento over 2 years ago

Ashlesh Gawande wrote:

One solution is to:

  • Stop passing shared_ptr around

Uhm no, there is a reason why we do that. SegmentFetcher needs to manage its own lifetime. You could pass weak_ptr around, instead of shared_ptr, and give ownership to the caller, but that's an incompatible API change, because current consumers expect the fetch to continue even if they don't retain the shared_ptr.
I would not disagree with changing this, but you need to find existing consumers of the new start()-based API and fix them. And you still need to cancel outstanding operations in an orderly fashion.

#4

Updated by Ashlesh Gawande over 2 years ago

1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr. NLSR does rely on segment fetcher managing itself. If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs? Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"

2) If it is a lot of work to change dependent applications, how about a stop() function that does the following:

  • Cancel any scheduled events
  • Manipulate (m_timeLastSegmentReceived + m_options.maxTimeout) such that it is less than time::steady_clock::now()
  • Call afterTimeoutCb

But again applications who wish to cancel any pending segment fetchers so as to release memory when they are successfully terminated need to keep track of each segment fetcher and call stop on each when destructor is called.

#5

Updated by Junxiao Shi over 2 years ago

I disagree with any breaking changes regarding requiring the calling app to manage SegmentFetcher instance lifetime.

SegmentFetcher should keep tracking of PendingInterestId* it has expressed. SegmentFetcher::cancel simply cancels all pending Interests, which stops the transfer, and naturally releases shared_ptr<SegmentFetcher> because no callback function still references it.

#6

Updated by Ashlesh Gawande over 2 years ago

What about the scheduled timeout callback, afterTimeoutCb?
https://github.com/named-data/ndn-cxx/blob/master/src/util/segment-fetcher.cpp#L148
We also need to cancel pending events from scheduler.

#7

Updated by Davide Pesavento over 2 years ago

Ashlesh Gawande wrote:

1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr.

Not really. I'm only talking about the users of the new start()-based API. We can ignore the users of the deprecated API since we won't be changing that. I'm not expecting many users of the new API, given that it was added not long ago and it's not in any released version of ndn-cxx.

If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs?

Correct.

Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"

Cancel pending interests and scheduled events.

#8

Updated by Davide Pesavento over 2 years ago

Junxiao Shi wrote:

I disagree with any breaking changes regarding requiring the calling app to manage SegmentFetcher instance lifetime.

Again, as I said, it wouldn't really be a breaking change since the new API hasn't appeared in any released version of ndn-cxx yet.

#9

Updated by Ashlesh Gawande over 2 years ago

So if I make the following changes to the segment fetcher:

  • Pass around weak_ptr instead of shared_ptr
  • Add a destructor where I erase pending interests and scheduled events

Then how would an application manage the memory?

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

  auto fetcher = SegmentFetcher::start(m_face, Interest(prefix), m_validator, fetcherOptions);
  m_fetchers.push_back(fetcher);
  if (processResponse) {
    fetcher->onComplete.connect([this, processResponse] (ConstBufferPtr bufferPtr) {
                                  processResponse(std::move(bufferPtr));
                                  for (auto it = m_fetchers.begin(); it != m_fetchers.end();) {
                                    if (it->use_count() == 1) {
                                      it = m_fetchers.erase(it);
                                    }
                                    else {
                                      ++it;
                                    }
                                  }
                                });
  }
  if (onFailure) {
    fetcher->onError.connect([=] (uint32_t code, const std::string& msg) {
      processDatasetFetchError(onFailure, code, msg);
    });
  }
#10

Updated by Junxiao Shi over 2 years ago

Then how would an application manage the memory?

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.

Revert that.

But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

My first idea is to capture shared_ptr into both callbacks, but this wouldn’t work: the callback is still hooked on a signal and thus would never release the shared_ptr.

My second idea can work:

struct PtrWrapper
{
  shared_ptr<SegmentFetcher> fetcher;
};

// when you start fetching
wrapper = new PtrWrapper();
wrapper.fetcher = SegmentFetcher::start(

// in a callback
delete wrapper;

Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.
In fact, the reason to name the API SegmentFetcher::start Is to convey that the caller doesn’t need to keep the return value. If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new start implementation that internally maintains the object lifetime.

My third idea: convert the API into a Promise. There are Promise libraries out there.

#11

Updated by Ashlesh Gawande over 2 years ago

Okay thanks.
I think for now, providing stop() function that does the following is quicker:

  • Cancel pending interests
  • Cancel scheduled events

Then in PSync I can have a shared_ptr member variable for segment fetcher, on which I call stop() before assigning it a new segment fetcher via start.
Also, I can provide a stop function in PSync consumer which calls stop on segment fetcher member variable, that will be called in unit test destructor so that ASan leak-sanitizer does not complain.

#12

Updated by Ashlesh Gawande over 2 years ago

  • Status changed from New to Code review
  • Assignee set to Ashlesh Gawande
  • % Done changed from 0 to 100
#13

Updated by Davide Pesavento over 2 years ago

Junxiao Shi wrote:

Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.

Junxiao, don't spread FUD please. "maintaining the lifetime" here simply means that when the last shared_ptr goes out of scope, the segment fetcher is automatically stopped... it's not that different from the usual RAII principle.

In fact, the reason to name the API SegmentFetcher::start Is to convey that the caller doesn’t need to keep the return value.

I don't know where you got this idea from... the word "start" does not imply or even suggest anything like that.

If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new start implementation that internally maintains the object lifetime.

That's not going to help with the original issue that triggered this whole discussion: the memory "leak".

#14

Updated by Davide Pesavento over 2 years ago

Ashlesh Gawande wrote:

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

You're doing something wrong... are the segment fetcher callbacks checking that their weak_ptr is still valid? I can't really tell what's going on if I don't see the code.

#15

Updated by Ashlesh Gawande over 2 years ago

Davide Pesavento wrote:

Ashlesh Gawande wrote:

In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.

You're doing something wrong... are the segment fetcher callbacks checking that their weak_ptr is still valid? I can't really tell what's going on if I don't see the code.

You are right, I was not checking if weak_ptr is still valid. Updated the same gerrit change.

#16

Updated by Davide Pesavento over 2 years ago

  • Target version set to v0.7
#17

Updated by Davide Pesavento over 2 years ago

I was wondering if we should instead have a "scoped handle" type, separate from SegmentFetcher itself, that automatically cancels the associated fetch operation when destructed. This design would be consistent with several other facilities already available in ndn-cxx, such as ScopedConnection, ScopedEventId, and soon ScopedPendingInterest (#4316).

#18

Updated by Junxiao Shi over 2 years ago

I was wondering if we should instead have a "scoped handle" type, separate from SegmentFetcher itself, that automatically cancels the associated fetch operation when destructed.

Yes. It’ll take a while before I get to that template, however.

#19

Updated by Ashlesh Gawande over 2 years ago

Problem with storing the shared_ptr as a class member in SegmentFetcher is when to destroy it? I destroy it after onComplete and after onError. I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.

All the test where data is received passes w/o any problems.

I added a test (similar to TestStatusDataset/Failures/Timeout test to see why that was producing heap-free-after-use):

BOOST_AUTO_TEST_CASE(StatusR)
{
  DummyValidator acceptValidator;
  SegmentFetcher::Options options;
  options.maxTimeout = 3000_ms;

  {
    auto fetcher = SegmentFetcher::start(face, Interest("/localhost/nfd/faces/list"),
                                         acceptValidator, options);
  }

  advanceClocks(500_ms, 7);
}

which leads to heap-free-after-use. If I don't reset the fetcher after onError, then I get memory leaks.

@Junxiao, can you please elaborate your solution once again?

#20

Updated by Davide Pesavento over 2 years ago

Ashlesh Gawande wrote:

I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.

Well this is wrong. The mere act of checking the member shared_ptr is undefined behavior, because the instance of SegmentFetcher that contains the shared_ptr might no longer exist at the time of checking, hence the use-after-free.

#21

Updated by Ashlesh Gawande over 2 years ago

I removed those checks, but still got the heap-use-after-free. So I am not sure what's wrong - I have uploaded the code.

#22

Updated by Davide Pesavento over 2 years ago

Removing the checks doesn't solve the problem I pointed out, it makes it worse. This part of your change was correct a few patch sets ago, I don't know why you decided to change it... anyway, I'll upload a new revision to show my design suggestion.

#23

Updated by Ashlesh Gawande over 2 years ago

So I think while running OutOfScopeTimeout test, we get heap-use-after-free in scheduler.cpp at while (!m_queue.empty()) {
because when we reset the shared_ptr, m_this, in onError it destructs the scheduler. But the io service is still running so it can trigger some call back in scheduler (I think).

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

#24

Updated by Junxiao Shi over 2 years ago

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

No, the API should remain unchanged.

#25

Updated by Ashlesh Gawande over 2 years ago

Junxiao Shi wrote:

Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?

No, the API should remain unchanged.

Okay, any suggestion on getting around the destruction of scheduler?
(I have to reset m_this in onError, otherwise ASan will report memory leaks).

#26

Updated by Davide Pesavento over 2 years ago

Ashlesh Gawande wrote:

Okay, any suggestion on getting around the destruction of scheduler?

Capture a shared_ptr in the scheduled callback, to ensure SegmentFetcher, and therefore Scheduler, stays alive? Scheduler is unable to guarantee its own survival, callers must ensure the instance remains valid while there are outstanding callbacks.

We've had this problem before, see e.g. #2653. And I'm starting to think that Scheduler is causing more problems than it solves...

(I have to reset m_this in onError, otherwise ASan will report memory leaks).

Do you expect the new stop() function to behave synchronously?

#27

Updated by Davide Pesavento over 2 years ago

  • Blocks Task #4637: ndncatchunks: use SegmentFetcher added
#28

Updated by Ashlesh Gawande over 2 years ago

  • Blocks Feature #4662: Segment hello and sync data in partial sync added
#29

Updated by Ashlesh Gawande over 2 years ago

  • Blocks Task #4716: Segment sync data in full sync added
#30

Updated by Ashlesh Gawande over 2 years ago

  • Status changed from Code review to Closed
#31

Updated by Ashlesh Gawande over 2 years ago

#32

Updated by Davide Pesavento 8 months ago

  • Tags set to API

Also available in: Atom PDF