Bug #1856
closedNFD crash with assert failure from StreamFace
100%
Description
I just observed nfd crash with the following error:
Assertion failed: (!m_sendQueue.empty()), function handleSend, file ../daemon/face/stream-face.hpp, line 265.
Updated by Junxiao Shi over 10 years ago
This condition could happen if closeSocket is invoked before handleSend.
Updated by Alex Afanasyev over 10 years ago
Hmm... That should be "impossible". At least I have assumed that after m_socket->close(error);
handleSend
should be called with some error code (boost::system::errc::operation_canceled
) and not get asserted...
Should I simply change assert to a condition and that would fix the problem?
Updated by Junxiao Shi over 10 years ago
Make a test case to confirm or reject the hypothesis that handleSend would be called with error code if closeSocket is invoked.
Don't change anything without finding the reason.
Updated by Alex Afanasyev over 10 years ago
- Target version changed from v0.2 to v0.3
Updated by Alex Afanasyev over 10 years ago
- Assignee deleted (
Alex Afanasyev)
Given that it is not clear how this issue happens, more investigations needed before we can make a solution.
Updated by Alex Afanasyev over 10 years ago
- Status changed from New to In Progress
- Assignee set to Alex Afanasyev
- % Done changed from 0 to 80
I was able to reproduce the problem with test cases... A weird one, but solvable.
Updated by Alex Afanasyev over 10 years ago
- Status changed from In Progress to Code review
- % Done changed from 80 to 100
The issue relates to some bug or feature of Boost.Asio.
When closeSocket is called from within socket event (e.g., async_receive
callback), m_socket->shutdown/cancel/close()
does not trigger cancellation of send callback. Instead, the send callback will be called as nothing bad happened... In test case, I was able to trigger such event by sending an Interest that larger than MAX_NDN_PACKET_SIZE
.
To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.
Updated by Davide Pesavento over 10 years ago
Alex Afanasyev wrote:
To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.
I'm not sure I understand the last part.
handleSend
no longer asserts because the queue is not cleared immediately in closeSocket
, but only after all pending callbacks have been executed. Therefore handleSend
will try to schedule the next async_write
(assuming more blocks are in the queue) on a closed socket and fail. Is this correct? What is this "invalid handler" thing?
Updated by Alex Afanasyev over 10 years ago
Ok. Let me give a sequence what I was observing:
- normal handleSend that schedules async_write with handleSend callback
- async_receive callback calls closeSocket, socket close, shutdown
normal handleSend that schedules async_write with handleSend callback
This is abnormal and was not expected
At this point, socket is no longer valid, so the scheduled async_write will fail
queue is getting cleaned somewhere here
handleSend with error code 9 (= invalid handler or something like that)
Updated by Davide Pesavento over 10 years ago
This happens after applying your patch?
Updated by Alex Afanasyev over 10 years ago
Yes. These after the patch. Before the patch, queue cleanup happens at step 2 and step 3 causes assertion fault.
Updated by Davide Pesavento over 10 years ago
Ok makes sense. Error code 9 is actually EBADF "bad file descriptor", which again makes sense since the fd becomes invalid when the socket is closed.
Updated by Junxiao Shi over 10 years ago
Boost.Asio problem should be reported to Boost, and Boost ticket should be referenced here.
It's unsafe to send into a bad file descriptor.
Kernel can reuse a file descriptor after it's closed.
If something is sent into a closed and then reused file descriptor, it may go into another socket (and cause a decode error in the app).
Updated by Davide Pesavento over 10 years ago
Junxiao Shi wrote:
It's unsafe to send into a bad file descriptor. Kernel can reuse a file descriptor after it's closed.
This is true, although it's unlikely to happen. Anyway, I've submitted an alternative solution as patch set 5 at http://gerrit.named-data.net/1211 which does not suffer from this issue.
Updated by Davide Pesavento over 10 years ago
- Status changed from Code review to Closed