Bug #1497
closedFaceTable::remove called twice when UnixStreamFace is destroyed
100%
Description
I'm seeing the following in the log
1397777209.441760 INFO: [UnixStreamFace] [id:10,endpoint:/private/tmp/nfd.sock] Connection closed
1397777209.445571 INFO: [FaceTable] Removed face id=10 remote=fd://23 local=unix:///private/tmp/nfd.sock
1397777209.448699 INFO: [FaceTable] Removed face id=-1 remote=fd://23 local=unix:///private/tmp/nfd.sock
This is not critical, but we should find this second remove and eliminate it (if we can).
Files
Updated by Junxiao Shi over 10 years ago
This is a serious logic error that is worth an assertion failure.
Updated by Davide Pesavento over 10 years ago
I cannot reproduce on Linux.
1397842660.815849 INFO: [FaceTable] Added face id=4 remote=fd://17 local=unix:///run/nfd.sock
1397842660.826540 INFO: [UnixStreamFace] [id:4,endpoint:/run/nfd.sock] Connection closed
1397842660.829236 INFO: [FaceTable] Removed face id=4 remote=fd://17 local=unix:///run/nfd.sock
What are your steps?
Updated by Alex Afanasyev over 10 years ago
Interestingly, I cannot reproduce the same bug again... The steps were simple: run nfd, nrd, and then run producer example, ctr-c it. Run, ctr-c it.
Updated by Davide Pesavento over 10 years ago
Ok I managed to reproduce by starting and stopping nrd. Doesn't happen with any other client.
Updated by Davide Pesavento over 10 years ago
This is what I believe is happening:
- the peer (nrd) closes his side of the connection
- the
async_receive
callback (StreamFace::handleReceive
) is invoked with an error code (EOF), which in turn callsprocessErrorCode
andcloseSocket
, which shuts down and closes the underlying socket - then the
onFail
notification chain is startedFaceTable
's handler is run first, that in turns fires its own eventonRemove
, to whichFaceManager
is subscribed- >
FaceManager::onRemoveFace
executes and sends aFaceEventNotification
via the internal face - >> the forwarder dispatches the Data packet coming from the internal face to the same unix stream face to which nrd was connected
- >>> the
async_send
inStreamFace
schedules the transmission and returns immediately - back in
FaceTable::remove
, the face is finally erased from the table, its id set to -1, and the INFO message printed; note that theonReceive(Interest|Data)
callback chains are cleared, butonFail
is not
- when control goes back to boost::asio event loop, the previously scheduled Data sending is eventually attempted but fails because the socket is closed
processErrorCode
is called again, which triggers the secondonFail
event for the same face- the whole
onFail
andonRemove
chain of handlers is repeated again, but this time theFaceEventNotification
is not propagated to the (closing) unix face because it's no longer present in the face table (otherwise we would have had an infinite loop) - the second
FaceTable
INFO message, withid=-1
, is printed
- the whole
Updated by Junxiao Shi over 10 years ago
One solution is to clear onFail event listener in FaceTable::remove, but do so in io_service::post so that other listeners can be fired.
Updated by Anonymous over 10 years ago
- File HTTP Request.jmx HTTP Request.jmx added
- File nfd_log.txt nfd_log.txt added
I can reproduce this using apache jmeter and the nfd-status-http-server.
- Start NFD (sudo ./build/bin/nfd)
- Start nfd-status-http-server (./build/bin/nfd-status-http-server -v)
- Load attached jmeter configuration and select "run"
NFD will crash with the following tail
of output (default log level=WARN, FaceTable TRACE. I also tried doing default at DEBUG, but I didn't notice anything particularly interesting):
1398458023.296022 INFO: [FaceTable] Added face id=52 remote=fd://26 local=unix:///private/var/run/nfd.sock
1398458023.397617 INFO: [FaceTable] Removed face id=47 remote=fd://28 local=unix:///private/var/run/nfd.sock
1398458023.398223 INFO: [FaceTable] Added face id=53 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.497706 INFO: [FaceTable] Removed face id=48 remote=fd://22 local=unix:///private/var/run/nfd.sock
1398458023.498518 INFO: [FaceTable] Added face id=54 remote=fd://28 local=unix:///private/var/run/nfd.sock
1398458023.522723 WARNING: [UnixStreamFace] [id:53,endpoint:/private/var/run/nfd.sock] Send or receive operation failed, closing socket: Broken pipe
1398458023.544633 INFO: [FaceTable] Removed face id=53 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.569262 INFO: [FaceTable] Removed face id=-1 remote=fd://27 local=unix:///private/var/run/nfd.sock
1398458023.645803 INFO: [FaceTable] Removed face id=49 remote=fd://23 local=unix:///private/var/run/nfd.sock
1398458023.647216 FATAL: [NFD] call to empty boost::function
Credit to Chengyu for finding this while testing the http server :)
The number of clients jmeter uses impacts reproducing the bug. The attached configuration uses 100, but adjust as necessary.
Updated by Davide Pesavento over 10 years ago
- Category changed from Faces to Tables
- Priority changed from Low to Normal
Raising priority because it can apparently trigger a crash in nfd.
1398458023.647216 FATAL: [NFD] call to empty boost::function
Can you find out what function is it and/or who is trying to call it?
Updated by Junxiao Shi over 10 years ago
- Category changed from Tables to Forwarding
FaceTable
is part of Forwarding, not Tables.
EventEmitter
does not support removing one listener, but FaceTable cannot clear onFail
, causing this problem.
Updated by Junxiao Shi over 10 years ago
- Status changed from New to In Progress
- Assignee set to Junxiao Shi
Updated by Junxiao Shi over 10 years ago
- Status changed from In Progress to Code review
- % Done changed from 0 to 70
Updated by Alex Afanasyev over 10 years ago
I guess this is 100% now and should be closed?
Updated by Junxiao Shi over 10 years ago
- Status changed from Code review to Closed
- % Done changed from 70 to 100