Bug #1856: NFD crash with assert failure from StreamFace - NFD - NDN project issue tracking system

Bug #1856

closed

NFD crash with assert failure from StreamFace

Added by Alex Afanasyev over 11 years ago. Updated over 11 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Alex Afanasyev

Category:

Faces

Target version:

v0.3

Start date:

08/13/2014

Due date:

% Done:

100%

Estimated time:

Description

I just observed nfd crash with the following error:

Assertion failed: (!m_sendQueue.empty()), function handleSend, file ../daemon/face/stream-face.hpp, line 265.

History
Notes
Property changes

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

This condition could happen if closeSocket is invoked before handleSend.

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Hmm... That should be "impossible". At least I have assumed that after m_socket->close(error); handleSend should be called with some error code (boost::system::errc::operation_canceled) and not get asserted...

Should I simply change assert to a condition and that would fix the problem?

Actions

Copy link

Updated by Junxiao Shi over 11 years ago

Make a test case to confirm or reject the hypothesis that handleSend would be called with error code if closeSocket is invoked.
Don't change anything without finding the reason.

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Target version changed from v0.2 to v0.3

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Assignee deleted (~~Alex Afanasyev~~)

Given that it is not clear how this issue happens, more investigations needed before we can make a solution.

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Status changed from New to In Progress
Assignee set to Alex Afanasyev
% Done changed from 0 to 80

I was able to reproduce the problem with test cases... A weird one, but solvable.

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Status changed from In Progress to Code review
% Done changed from 80 to 100

The issue relates to some bug or feature of Boost.Asio.

When closeSocket is called from within socket event (e.g., async_receive callback), m_socket->shutdown/cancel/close() does not trigger cancellation of send callback. Instead, the send callback will be called as nothing bad happened... In test case, I was able to trigger such event by sending an Interest that larger than MAX_NDN_PACKET_SIZE.

To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.

Actions

Copy link

Updated by Davide Pesavento over 11 years ago

Alex Afanasyev wrote:

To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.

I'm not sure I understand the last part.

handleSend no longer asserts because the queue is not cleared immediately in closeSocket, but only after all pending callbacks have been executed. Therefore handleSend will try to schedule the next async_write (assuming more blocks are in the queue) on a closed socket and fail. Is this correct? What is this "invalid handler" thing?

Actions

Copy link

Updated by Alex Afanasyev over 11 years ago

Ok. Let me give a sequence what I was observing:

normal handleSend that schedules async_write with handleSend callback
async_receive callback calls closeSocket, socket close, shutdown
normal handleSend that schedules async_write with handleSend callback

This is abnormal and was not expected

At this point, socket is no longer valid, so the scheduled async_write will fail
queue is getting cleaned somewhere here
handleSend with error code 9 (= invalid handler or something like that)

Actions

Copy link

#10

Updated by Davide Pesavento over 11 years ago

This happens after applying your patch?

Actions

Copy link

#11

Updated by Alex Afanasyev over 11 years ago

Yes. These after the patch. Before the patch, queue cleanup happens at step 2 and step 3 causes assertion fault.

Actions

Copy link

#12

Updated by Davide Pesavento over 11 years ago

Ok makes sense. Error code 9 is actually EBADF "bad file descriptor", which again makes sense since the fd becomes invalid when the socket is closed.

Actions

Copy link

#13

Updated by Junxiao Shi over 11 years ago

Boost.Asio problem should be reported to Boost, and Boost ticket should be referenced here.

It's unsafe to send into a bad file descriptor.
Kernel can reuse a file descriptor after it's closed.
If something is sent into a closed and then reused file descriptor, it may go into another socket (and cause a decode error in the app).

Actions

Copy link

#14

Updated by Davide Pesavento over 11 years ago

Junxiao Shi wrote:

It's unsafe to send into a bad file descriptor. Kernel can reuse a file descriptor after it's closed.

This is true, although it's unlikely to happen. Anyway, I've submitted an alternative solution as patch set 5 at http://gerrit.named-data.net/1211 which does not suffer from this issue.

Actions

Copy link

#15

Updated by Davide Pesavento over 11 years ago

Status changed from Code review to Closed

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

NFD

Tags

Bug #1856

NFD crash with assert failure from StreamFace

Updated by Junxiao Shi over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Alex Afanasyev over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Junxiao Shi over 11 years ago

Updated by Davide Pesavento over 11 years ago

Updated by Davide Pesavento over 11 years ago