Feature #4692
closedSegmentFetcher: allow applications to stop/cancel fetch
100%
Description
SegmentFetcher::start
returns a shared_ptr
, but this shared_ptr
is kept alive internally by the SegmentFetcher
instance itself, so even if the returned shared_ptr
goes out of scope, the SegmentFetcher
is not destructed.
In PSync #4662 (and maybe NLSR #4682) when a test completes successfully without error, ASan and valgrind reports leak due to segment fetcher start being called such that there is no expectation of data (start is called because the application is supposed to send the interest upon getting data).
The io is already stopped given that the test is successfully terminated. If advanceClocks
is run for 60 seconds (maxTimeout) the segment fetcher destructs and no leaks are reported.
If a stop/cancel function is provided it can be called in the application's destructor.
Another use case is when a previous interest needs to be cancelled and a new interest needs to be sent in its place.
Files
Updated by Davide Pesavento over 6 years ago
- Description updated (diff)
- Start date deleted (
08/03/2018)
Updated by Ashlesh Gawande over 6 years ago
One solution is to:
- Stop passing shared_ptr around
- Remove deprecated functions
Updated by Davide Pesavento over 6 years ago
Ashlesh Gawande wrote:
One solution is to:
- Stop passing shared_ptr around
Uhm no, there is a reason why we do that. SegmentFetcher needs to manage its own lifetime. You could pass weak_ptr
around, instead of shared_ptr
, and give ownership to the caller, but that's an incompatible API change, because current consumers expect the fetch to continue even if they don't retain the shared_ptr
.
I would not disagree with changing this, but you need to find existing consumers of the new start()
-based API and fix them. And you still need to cancel outstanding operations in an orderly fashion.
Updated by Ashlesh Gawande over 6 years ago
1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr. NLSR does rely on segment fetcher managing itself. If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs? Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"
2) If it is a lot of work to change dependent applications, how about a stop() function that does the following:
- Cancel any scheduled events
- Manipulate (m_timeLastSegmentReceived + m_options.maxTimeout) such that it is less than time::steady_clock::now()
- Call afterTimeoutCb
But again applications who wish to cancel any pending segment fetchers so as to release memory when they are successfully terminated need to keep track of each segment fetcher and call stop on each when destructor is called.
Updated by Junxiao Shi over 6 years ago
I disagree with any breaking changes regarding requiring the calling app to manage SegmentFetcher
instance lifetime.
SegmentFetcher should keep tracking of PendingInterestId*
it has expressed. SegmentFetcher::cancel
simply cancels all pending Interests, which stops the transfer, and naturally releases shared_ptr<SegmentFetcher>
because no callback function still references it.
Updated by Ashlesh Gawande over 6 years ago
What about the scheduled timeout callback, afterTimeoutCb?
https://github.com/named-data/ndn-cxx/blob/master/src/util/segment-fetcher.cpp#L148
We also need to cancel pending events from scheduler.
Updated by Davide Pesavento over 6 years ago
Ashlesh Gawande wrote:
1) Yes, this will affect a lot of users if they are (expectedly) relying on not managing the shared_ptr.
Not really. I'm only talking about the users of the new start()
-based API. We can ignore the users of the deprecated API since we won't be changing that. I'm not expecting many users of the new API, given that it was added not long ago and it's not in any released version of ndn-cxx.
If we were to change segment fetcher as you suggested, then in the application do we need to keep track of shared_ptr for each instance to make sure it does not go out of scope and destructs?
Correct.
Also, can you please elaborate on "cancel outstanding operations in an orderly fashion?"
Cancel pending interests and scheduled events.
Updated by Davide Pesavento over 6 years ago
Junxiao Shi wrote:
I disagree with any breaking changes regarding requiring the calling app to manage
SegmentFetcher
instance lifetime.
Again, as I said, it wouldn't really be a breaking change since the new API hasn't appeared in any released version of ndn-cxx yet.
Updated by Ashlesh Gawande over 6 years ago
So if I make the following changes to the segment fetcher:
- Pass around weak_ptr instead of shared_ptr
- Add a destructor where I erase pending interests and scheduled events
Then how would an application manage the memory?
In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.
auto fetcher = SegmentFetcher::start(m_face, Interest(prefix), m_validator, fetcherOptions);
m_fetchers.push_back(fetcher);
if (processResponse) {
fetcher->onComplete.connect([this, processResponse] (ConstBufferPtr bufferPtr) {
processResponse(std::move(bufferPtr));
for (auto it = m_fetchers.begin(); it != m_fetchers.end();) {
if (it->use_count() == 1) {
it = m_fetchers.erase(it);
}
else {
++it;
}
}
});
}
if (onFailure) {
fetcher->onError.connect([=] (uint32_t code, const std::string& msg) {
processDatasetFetchError(onFailure, code, msg);
});
}
Updated by Junxiao Shi over 6 years ago
Then how would an application manage the memory?
In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
Revert that.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.
My first idea is to capture shared_ptr into both callbacks, but this wouldn’t work: the callback is still hooked on a signal and thus would never release the shared_ptr.
My second idea can work:
struct PtrWrapper
{
shared_ptr<SegmentFetcher> fetcher;
};
// when you start fetching
wrapper = new PtrWrapper();
wrapper.fetcher = SegmentFetcher::start(
// in a callback
delete wrapper;
Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.
In fact, the reason to name the API SegmentFetcher::start
Is to convey that the caller doesn’t need to keep the return value. If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new start
implementation that internally maintains the object lifetime.
My third idea: convert the API into a Promise. There are Promise libraries out there.
Updated by Ashlesh Gawande over 6 years ago
Okay thanks.
I think for now, providing stop() function that does the following is quicker:
- Cancel pending interests
- Cancel scheduled events
Then in PSync I can have a shared_ptr member variable for segment fetcher, on which I call stop() before assigning it a new segment fetcher via start.
Also, I can provide a stop function in PSync consumer which calls stop on segment fetcher member variable, that will be called in unit test destructor so that ASan leak-sanitizer does not complain.
Updated by Ashlesh Gawande over 6 years ago
- Status changed from New to Code review
- Assignee set to Ashlesh Gawande
- % Done changed from 0 to 100
Updated by Davide Pesavento over 6 years ago
Junxiao Shi wrote:
Basically it turns a shared_ptr into a bare pointer. Incredibly ugly, isn’t it? This is exactly the reason that a caller shouldn’t be expected to maintain the lifetime of SegmentFetcher.
Junxiao, don't spread FUD please. "maintaining the lifetime" here simply means that when the last shared_ptr
goes out of scope, the segment fetcher is automatically stopped... it's not that different from the usual RAII principle.
In fact, the reason to name the API
SegmentFetcher::start
Is to convey that the caller doesn’t need to keep the return value.
I don't know where you got this idea from... the word "start" does not imply or even suggest anything like that.
If you change semantics, you should let the caller invoke SegmentFetcher constructor, and then provide a new
start
implementation that internally maintains the object lifetime.
That's not going to help with the original issue that triggered this whole discussion: the memory "leak".
Updated by Davide Pesavento over 6 years ago
Ashlesh Gawande wrote:
In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.
You're doing something wrong... are the segment fetcher callbacks checking that their weak_ptr
is still valid? I can't really tell what's going on if I don't see the code.
Updated by Ashlesh Gawande over 6 years ago
Davide Pesavento wrote:
Ashlesh Gawande wrote:
In mgmt/nfd/controller, the user of the new API, I added a vector to store the shared_ptr.
But I cannot erase the stored shared_ptr in the call back connected to the shared_ptr w/o heap-use-after-free.You're doing something wrong... are the segment fetcher callbacks checking that their
weak_ptr
is still valid? I can't really tell what's going on if I don't see the code.
You are right, I was not checking if weak_ptr
is still valid. Updated the same gerrit change.
Updated by Davide Pesavento about 6 years ago
I was wondering if we should instead have a "scoped handle" type, separate from SegmentFetcher
itself, that automatically cancels the associated fetch operation when destructed. This design would be consistent with several other facilities already available in ndn-cxx, such as ScopedConnection
, ScopedEventId
, and soon ScopedPendingInterest
(#4316).
Updated by Junxiao Shi about 6 years ago
I was wondering if we should instead have a "scoped handle" type, separate from
SegmentFetcher
itself, that automatically cancels the associated fetch operation when destructed.
Yes. It’ll take a while before I get to that template, however.
Updated by Ashlesh Gawande about 6 years ago
- File heap-free-after-use.txt heap-free-after-use.txt added
Problem with storing the shared_ptr as a class member in SegmentFetcher is when to destroy it? I destroy it after onComplete and after onError. I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.
All the test where data is received passes w/o any problems.
I added a test (similar to TestStatusDataset/Failures/Timeout test to see why that was producing heap-free-after-use):
BOOST_AUTO_TEST_CASE(StatusR)
{
DummyValidator acceptValidator;
SegmentFetcher::Options options;
options.maxTimeout = 3000_ms;
{
auto fetcher = SegmentFetcher::start(face, Interest("/localhost/nfd/faces/list"),
acceptValidator, options);
}
advanceClocks(500_ms, 7);
}
which leads to heap-free-after-use. If I don't reset the fetcher after onError, then I get memory leaks.
@Junxiao, can you please elaborate your solution once again?
Updated by Davide Pesavento about 6 years ago
Ashlesh Gawande wrote:
I also stopped passing weak_ptr around since we have the instance in the class and simply check if the member is alive before proceeding in the functions.
Well this is wrong. The mere act of checking the member shared_ptr
is undefined behavior, because the instance of SegmentFetcher that contains the shared_ptr
might no longer exist at the time of checking, hence the use-after-free.
Updated by Ashlesh Gawande about 6 years ago
I removed those checks, but still got the heap-use-after-free. So I am not sure what's wrong - I have uploaded the code.
Updated by Davide Pesavento about 6 years ago
Removing the checks doesn't solve the problem I pointed out, it makes it worse. This part of your change was correct a few patch sets ago, I don't know why you decided to change it... anyway, I'll upload a new revision to show my design suggestion.
Updated by Ashlesh Gawande about 6 years ago
So I think while running OutOfScopeTimeout test, we get heap-use-after-free in scheduler.cpp at while (!m_queue.empty()) {
because when we reset the shared_ptr, m_this, in onError it destructs the scheduler. But the io service is still running so it can trigger some call back in scheduler (I think).
Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?
Updated by Junxiao Shi about 6 years ago
Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?
No, the API should remain unchanged.
Updated by Ashlesh Gawande about 6 years ago
Junxiao Shi wrote:
Could a potential solution be to let users pass a scheduler to the segment fetcher (not tested yet)?
No, the API should remain unchanged.
Okay, any suggestion on getting around the destruction of scheduler?
(I have to reset m_this in onError, otherwise ASan will report memory leaks).
Updated by Davide Pesavento about 6 years ago
Ashlesh Gawande wrote:
Okay, any suggestion on getting around the destruction of scheduler?
Capture a shared_ptr
in the scheduled callback, to ensure SegmentFetcher
, and therefore Scheduler
, stays alive? Scheduler
is unable to guarantee its own survival, callers must ensure the instance remains valid while there are outstanding callbacks.
We've had this problem before, see e.g. #2653. And I'm starting to think that Scheduler
is causing more problems than it solves...
(I have to reset m_this in onError, otherwise ASan will report memory leaks).
Do you expect the new stop()
function to behave synchronously?
Updated by Davide Pesavento about 6 years ago
- Blocks Task #4637: ndncatchunks: use SegmentFetcher added
Updated by Ashlesh Gawande about 6 years ago
- Blocks Feature #4662: Segment hello and sync data in partial sync added
Updated by Ashlesh Gawande about 6 years ago
- Blocks Task #4716: Segment sync data in full sync added
Updated by Ashlesh Gawande about 6 years ago
- Status changed from Code review to Closed
Updated by Ashlesh Gawande about 6 years ago
- Related to Feature #4776: Re-design SegmentFetcher added