Task #2526
closedApplication crashes after trying to put large data into face
0%
Description
I have an application that puts a large data object (1GB or more) into a NFD face. I have set the MAX_NDN_PACKET_SIZE to 2GB. The application can put around 800MB of data into NFD. Any larger, and it crashes with the following error:
terminate called after throwing an instance of 'ndn::Transport::Error'
what(): error while connecting to the forwarder
Attached the gdb log of the application and the NFD log.
Application Trace
#0 0x00007ffff524e877 in raise () from /lib64/libc.so.6
#1 0x00007ffff524ff68 in abort () from /lib64/libc.so.6
#2 0x00007ffff5b54dd5 in gnu_cxx::verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007ffff5b52d46 in ?? () from /lib64/libstdc++.so.6
#4 0x00007ffff5b52d73 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007ffff5b52fe9 in __cxa_rethrow () from /lib64/libstdc++.so.6
#6 0x000000000045f64a in ndn::Face::processEvents (this=this@entry=0x7fffffffde20, timeout=..., keepThread=keepThread@entry=false) at ../src/face.cpp:445
#7 0x0000000000453247 in ndn::Producer::run (this=this@entry=0x7fffffffdd90) at ../tools/ndnputbigchunk.cpp:109
#8 0x000000000044a465 in ndn::main (argc=, argv=) at ../tools/ndnputbigchunk.cpp:137
#9 0x00007ffff523ad65 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000451ea1 in _start ()
Files
Updated by Alex Afanasyev almost 10 years ago
- Project changed from NFD to ndn-cxx
- Target version set to Unsupported
I can only think about memory exhaustion
Updated by Junxiao Shi almost 10 years ago
Please attach application code.
Remember that your application shouldn't send Data without first receiving an Interest.
As announced, NFD will not accept unsolicited Data.
Updated by susmit shannigrahi almost 10 years ago
I plotted the memory with 0.01 sec sampling, the machine is not running out, it has 128GB RAM.
Please attach application code.
http://redmine.named-data.net/attachments/download/191/ndnputbigchunk.cpp
Remember that your application shouldn't send Data without first receiving an Interest.
As announced, NFD will not accept unsolicited Data.
Ok. But why crash at 800MB? Should NFD not reject whenever it receives any unsolicited data?
Updated by Junxiao Shi almost 10 years ago
This app is sending unsolicited Data into NFD.
When doing so, the Face is not yet connected.
The correct behavior should be: NFD drops those Data; app shouldn't crash.
I guess some buffer used by Face::put
is full, most likely StreamTransportImpl<..>::m_transmissionQueue
.
Updated by susmit shannigrahi almost 10 years ago
I have put this in the stream-face.hpp:
284 while (m_inputBufferSize - offset > 0)
285 {
286 isOk = Block::fromBuffer(m_inputBuffer + offset, m_inputBufferSize - offset, element);
287 if (!isOk){
288 NFD_LOG_WARN("Can not create block from buffer");
289 break;
290 }
291
When the app crashes, NFD log also shows "Can not create block from buffer".
Updated by Alex Afanasyev almost 10 years ago
It's normal on NFD side. NFD should show this many times until the whole packet is received. If only part of packet is received, it cannot yet decode TLV
Updated by susmit shannigrahi almost 10 years ago
Also, I printed the error code in the error handler (stream.hpp, line 273).
It says: "Error code: asio.misc:2 (End of file)"
Updated by Davide Pesavento almost 10 years ago
Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?
Updated by Alex Afanasyev almost 10 years ago
It says: "Error code: asio.misc:2 (End of file)"
This only means that TCP/UnixSocket connection got closed.
Updated by susmit shannigrahi almost 10 years ago
Davide Pesavento wrote:
Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?
Yes, NFD receives the whole thing and puts it in the CS. If there is an Interest for it, NFD sends the data back from CS.
Updated by susmit shannigrahi almost 10 years ago
We found out the problem. In the application, m_face.put(*data) was called before the run() loop. This caused the run loop to block while data was being published. The connect timer of 4 seconds in src/transport/stream-transport.hpp caused a timeout and closure of the face.
The only remaining problem is that the face still accepts unsolicited data.
Please close this if that's not a bug/duplicate.
Updated by Alex Afanasyev almost 10 years ago
This is not right. There shouldn't be problem calling m_face.put()
before run()
loop. When put is called before connect, data packet suppose to be remembered and actual publishing should have happened just after connect.
Updated by Anonymous almost 10 years ago
Said a different way, the problem is that the application is creating a Face
(default unix transport), putting a huge Data
, and then calling run
.
I think the timer for the transport connect is ticking away while the large Data
is being put
. We observed connection setup times between the transport's async_connect
and timeout handler being triggered/cancelled varying based on the size of the Data
. The problem seems to be solved by doing a short run
after creating the Face
, then putting the Data
and the normal run()
call.
Updated by Alex Afanasyev almost 10 years ago
Oh... Yeah. This timeout issue seem to be a re-occurring problem and I will try to fix it.
Updated by Junxiao Shi about 9 years ago
- Related to Bug #2742: Face creation fails due to heavy application processing before calling processEvents added
Updated by Junxiao Shi about 9 years ago
Updated by Junxiao Shi about 9 years ago
- Status changed from New to Resolved
This Bug is believed to have been resolved with #2742. Can Susmit confirm?