Project

General

Profile

Bug #1497

FaceTable::remove called twice when UnixStreamFace is destroyed

Added by Alex Afanasyev over 5 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Forwarding
Target version:
Start date:
04/17/2014
Due date:
% Done:

100%

Estimated time:

Description

I'm seeing the following in the log

1397777209.441760 INFO: [UnixStreamFace] [id:10,endpoint:/private/tmp/nfd.sock] Connection closed
1397777209.445571 INFO: [FaceTable] Removed face id=10 remote=fd://23 local=unix:///private/tmp/nfd.sock
1397777209.448699 INFO: [FaceTable] Removed face id=-1 remote=fd://23 local=unix:///private/tmp/nfd.sock

This is not critical, but we should find this second remove and eliminate it (if we can).

HTTP Request.jmx (6.83 KB) HTTP Request.jmx Anonymous, 04/25/2014 01:39 PM
nfd_log.txt (11.6 KB) nfd_log.txt Anonymous, 04/25/2014 01:39 PM

History

#1 Updated by Junxiao Shi over 5 years ago

This is a serious logic error that is worth an assertion failure.

#2 Updated by Davide Pesavento over 5 years ago

I cannot reproduce on Linux.

1397842660.815849 INFO: [FaceTable] Added face id=4 remote=fd://17 local=unix:///run/nfd.sock
1397842660.826540 INFO: [UnixStreamFace] [id:4,endpoint:/run/nfd.sock] Connection closed
1397842660.829236 INFO: [FaceTable] Removed face id=4 remote=fd://17 local=unix:///run/nfd.sock

What are your steps?

#3 Updated by Alex Afanasyev over 5 years ago

Interestingly, I cannot reproduce the same bug again... The steps were simple: run nfd, nrd, and then run producer example, ctr-c it. Run, ctr-c it.

#4 Updated by Davide Pesavento over 5 years ago

Ok I managed to reproduce by starting and stopping nrd. Doesn't happen with any other client.

#5 Updated by Davide Pesavento over 5 years ago

This is what I believe is happening:

  • the peer (nrd) closes his side of the connection
  • the async_receive callback (StreamFace::handleReceive) is invoked with an error code (EOF), which in turn calls processErrorCode and closeSocket, which shuts down and closes the underlying socket
  • then the onFail notification chain is started
    • FaceTable's handler is run first, that in turns fires its own event onRemove, to which FaceManager is subscribed
    • > FaceManager::onRemoveFace executes and sends a FaceEventNotification via the internal face
    • >> the forwarder dispatches the Data packet coming from the internal face to the same unix stream face to which nrd was connected
    • >>> the async_send in StreamFace schedules the transmission and returns immediately
    • back in FaceTable::remove, the face is finally erased from the table, its id set to -1, and the INFO message printed; note that the onReceive(Interest|Data) callback chains are cleared, but onFail is not
  • when control goes back to boost::asio event loop, the previously scheduled Data sending is eventually attempted but fails because the socket is closed
  • processErrorCode is called again, which triggers the second onFail event for the same face
    • the whole onFail and onRemove chain of handlers is repeated again, but this time the FaceEventNotification is not propagated to the (closing) unix face because it's no longer present in the face table (otherwise we would have had an infinite loop)
    • the second FaceTable INFO message, with id=-1, is printed

#6 Updated by Junxiao Shi over 5 years ago

One solution is to clear onFail event listener in FaceTable::remove, but do so in io_service::post so that other listeners can be fired.

#7 Updated by Anonymous over 5 years ago

I can reproduce this using apache jmeter and the nfd-status-http-server.

  1. Start NFD (sudo ./build/bin/nfd)
  2. Start nfd-status-http-server (./build/bin/nfd-status-http-server -v)
  3. Load attached jmeter configuration and select "run"

NFD will crash with the following tail of output (default log level=WARN, FaceTable TRACE. I also tried doing default at DEBUG, but I didn't notice anything particularly interesting):

1398458023.296022 INFO: [FaceTable] Added face id=52 remote=fd://26 local=unix:///private/var/run/nfd.sock
1398458023.397617 INFO: [FaceTable] Removed face id=47 remote=fd://28 local=unix:///private/var/run/nfd.sock
1398458023.398223 INFO: [FaceTable] Added face id=53 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.497706 INFO: [FaceTable] Removed face id=48 remote=fd://22 local=unix:///private/var/run/nfd.sock
1398458023.498518 INFO: [FaceTable] Added face id=54 remote=fd://28 local=unix:///private/var/run/nfd.sock
1398458023.522723 WARNING: [UnixStreamFace] [id:53,endpoint:/private/var/run/nfd.sock] Send or receive operation failed, closing socket: Broken pipe
1398458023.544633 INFO: [FaceTable] Removed face id=53 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.569262 INFO: [FaceTable] Removed face id=-1 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.645803 INFO: [FaceTable] Removed face id=49 remote=fd://23 local=unix:///private/var/run/nfd.sock
1398458023.647216 FATAL: [NFD] call to empty boost::function

Credit to Chengyu for finding this while testing the http server :)

The number of clients jmeter uses impacts reproducing the bug. The attached configuration uses 100, but adjust as necessary.

#8 Updated by Davide Pesavento over 5 years ago

  • Category changed from Faces to Tables
  • Priority changed from Low to Normal

Raising priority because it can apparently trigger a crash in nfd.

1398458023.647216 FATAL: [NFD] call to empty boost::function

Can you find out what function is it and/or who is trying to call it?

#9 Updated by Junxiao Shi over 5 years ago

  • Category changed from Tables to Forwarding

FaceTable is part of Forwarding, not Tables.

EventEmitter does not support removing one listener, but FaceTable cannot clear onFail, causing this problem.

#10 Updated by Junxiao Shi over 5 years ago

  • Status changed from New to In Progress
  • Assignee set to Junxiao Shi

#11 Updated by Junxiao Shi over 5 years ago

  • Status changed from In Progress to Code review
  • % Done changed from 0 to 70

#12 Updated by Alex Afanasyev over 5 years ago

I guess this is 100% now and should be closed?

#13 Updated by Junxiao Shi over 5 years ago

  • Status changed from Code review to Closed
  • % Done changed from 70 to 100

Also available in: Atom PDF