Project

General

Profile

Actions

Bug #1856

closed

NFD crash with assert failure from StreamFace

Added by Alex Afanasyev over 10 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Category:
Faces
Target version:
Start date:
08/13/2014
Due date:
% Done:

100%

Estimated time:

Description

I just observed nfd crash with the following error:

Assertion failed: (!m_sendQueue.empty()), function handleSend, file ../daemon/face/stream-face.hpp, line 265.
Actions #1

Updated by Junxiao Shi over 10 years ago

This condition could happen if closeSocket is invoked before handleSend.

Actions #2

Updated by Alex Afanasyev over 10 years ago

Hmm... That should be "impossible". At least I have assumed that after m_socket->close(error); handleSend should be called with some error code (boost::system::errc::operation_canceled) and not get asserted...

Should I simply change assert to a condition and that would fix the problem?

Actions #3

Updated by Junxiao Shi over 10 years ago

Make a test case to confirm or reject the hypothesis that handleSend would be called with error code if closeSocket is invoked.
Don't change anything without finding the reason.

Actions #4

Updated by Alex Afanasyev over 10 years ago

  • Target version changed from v0.2 to v0.3
Actions #5

Updated by Alex Afanasyev over 10 years ago

  • Assignee deleted (Alex Afanasyev)

Given that it is not clear how this issue happens, more investigations needed before we can make a solution.

Actions #6

Updated by Alex Afanasyev over 10 years ago

  • Status changed from New to In Progress
  • Assignee set to Alex Afanasyev
  • % Done changed from 0 to 80

I was able to reproduce the problem with test cases... A weird one, but solvable.

Actions #7

Updated by Alex Afanasyev over 10 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 80 to 100

The issue relates to some bug or feature of Boost.Asio.

When closeSocket is called from within socket event (e.g., async_receive callback), m_socket->shutdown/cancel/close() does not trigger cancellation of send callback. Instead, the send callback will be called as nothing bad happened... In test case, I was able to trigger such event by sending an Interest that larger than MAX_NDN_PACKET_SIZE.

To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.

Actions #8

Updated by Davide Pesavento over 10 years ago

Alex Afanasyev wrote:

To avoid problems, queue is cleaned after all pending callbacks are fired. handleSend will not assert, as the scheduled callback, if any, will immediately return with error code = invalid handler.

I'm not sure I understand the last part.

handleSend no longer asserts because the queue is not cleared immediately in closeSocket, but only after all pending callbacks have been executed. Therefore handleSend will try to schedule the next async_write (assuming more blocks are in the queue) on a closed socket and fail. Is this correct? What is this "invalid handler" thing?

Actions #9

Updated by Alex Afanasyev over 10 years ago

Ok. Let me give a sequence what I was observing:

  • normal handleSend that schedules async_write with handleSend callback
  • async_receive callback calls closeSocket, socket close, shutdown
  • normal handleSend that schedules async_write with handleSend callback

    This is abnormal and was not expected

    At this point, socket is no longer valid, so the scheduled async_write will fail

  • queue is getting cleaned somewhere here

  • handleSend with error code 9 (= invalid handler or something like that)

Actions #10

Updated by Davide Pesavento over 10 years ago

This happens after applying your patch?

Actions #11

Updated by Alex Afanasyev over 10 years ago

Yes. These after the patch. Before the patch, queue cleanup happens at step 2 and step 3 causes assertion fault.

Actions #12

Updated by Davide Pesavento over 10 years ago

Ok makes sense. Error code 9 is actually EBADF "bad file descriptor", which again makes sense since the fd becomes invalid when the socket is closed.

Actions #13

Updated by Junxiao Shi over 10 years ago

Boost.Asio problem should be reported to Boost, and Boost ticket should be referenced here.

It's unsafe to send into a bad file descriptor.
Kernel can reuse a file descriptor after it's closed.
If something is sent into a closed and then reused file descriptor, it may go into another socket (and cause a decode error in the app).

Actions #14

Updated by Davide Pesavento over 10 years ago

Junxiao Shi wrote:

It's unsafe to send into a bad file descriptor. Kernel can reuse a file descriptor after it's closed.

This is true, although it's unlikely to happen. Anyway, I've submitted an alternative solution as patch set 5 at http://gerrit.named-data.net/1211 which does not suffer from this issue.

Actions #15

Updated by Davide Pesavento over 10 years ago

  • Status changed from Code review to Closed
Actions

Also available in: Atom PDF