Task #2526
closed
Application crashes after trying to put large data into face
Added by susmit shannigrahi almost 10 years ago.
Updated about 9 years ago.
Description
I have an application that puts a large data object (1GB or more) into a NFD face. I have set the MAX_NDN_PACKET_SIZE to 2GB. The application can put around 800MB of data into NFD. Any larger, and it crashes with the following error:
terminate called after throwing an instance of 'ndn::Transport::Error'
what(): error while connecting to the forwarder
Attached the gdb log of the application and the NFD log.
Application Trace
#0 0x00007ffff524e877 in raise () from /lib64/libc.so.6
#1 0x00007ffff524ff68 in abort () from /lib64/libc.so.6
#2 0x00007ffff5b54dd5 in gnu_cxx::verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007ffff5b52d46 in ?? () from /lib64/libstdc++.so.6
#4 0x00007ffff5b52d73 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007ffff5b52fe9 in __cxa_rethrow () from /lib64/libstdc++.so.6
#6 0x000000000045f64a in ndn::Face::processEvents (this=this@entry=0x7fffffffde20, timeout=..., keepThread=keepThread@entry=false) at ../src/face.cpp:445
#7 0x0000000000453247 in ndn::Producer::run (this=this@entry=0x7fffffffdd90) at ../tools/ndnputbigchunk.cpp:109
#8 0x000000000044a465 in ndn::main (argc=, argv=) at ../tools/ndnputbigchunk.cpp:137
#9 0x00007ffff523ad65 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000451ea1 in _start ()
Files
- Project changed from NFD to ndn-cxx
- Target version set to Unsupported
I can only think about memory exhaustion
Please attach application code.
Remember that your application shouldn't send Data without first receiving an Interest.
As announced, NFD will not accept unsolicited Data.
I plotted the memory with 0.01 sec sampling, the machine is not running out, it has 128GB RAM.
Please attach application code.
http://redmine.named-data.net/attachments/download/191/ndnputbigchunk.cpp
Remember that your application shouldn't send Data without first receiving an Interest.
As announced, NFD will not accept unsolicited Data.
Ok. But why crash at 800MB? Should NFD not reject whenever it receives any unsolicited data?
This app is sending unsolicited Data into NFD.
When doing so, the Face is not yet connected.
The correct behavior should be: NFD drops those Data; app shouldn't crash.
I guess some buffer used by Face::put
is full, most likely StreamTransportImpl<..>::m_transmissionQueue
.
I have put this in the stream-face.hpp:
284 while (m_inputBufferSize - offset > 0)
285 {
286 isOk = Block::fromBuffer(m_inputBuffer + offset, m_inputBufferSize - offset, element);
287 if (!isOk){
288 NFD_LOG_WARN("Can not create block from buffer");
289 break;
290 }
291
When the app crashes, NFD log also shows "Can not create block from buffer".
It's normal on NFD side. NFD should show this many times until the whole packet is received. If only part of packet is received, it cannot yet decode TLV
Also, I printed the error code in the error handler (stream.hpp, line 273).
It says: "Error code: asio.misc:2 (End of file)"
Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?
It says: "Error code: asio.misc:2 (End of file)"
This only means that TCP/UnixSocket connection got closed.
Davide Pesavento wrote:
Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?
Yes, NFD receives the whole thing and puts it in the CS. If there is an Interest for it, NFD sends the data back from CS.
We found out the problem. In the application, m_face.put(*data) was called before the run() loop. This caused the run loop to block while data was being published. The connect timer of 4 seconds in src/transport/stream-transport.hpp caused a timeout and closure of the face.
The only remaining problem is that the face still accepts unsolicited data.
Please close this if that's not a bug/duplicate.
This is not right. There shouldn't be problem calling m_face.put()
before run()
loop. When put is called before connect, data packet suppose to be remembered and actual publishing should have happened just after connect.
Said a different way, the problem is that the application is creating a Face
(default unix transport), putting a huge Data
, and then calling run
.
I think the timer for the transport connect is ticking away while the large Data
is being put
. We observed connection setup times between the transport's async_connect
and timeout handler being triggered/cancelled varying based on the size of the Data
. The problem seems to be solved by doing a short run
after creating the Face
, then putting the Data
and the normal run()
call.
Oh... Yeah. This timeout issue seem to be a re-occurring problem and I will try to fix it.
- Related to Bug #2742: Face creation fails due to heavy application processing before calling processEvents added
At 20150901 conference call, Alex believes this is probably a duplicate of #2742.
We can confirm this after #2742 is resolved.
- Status changed from New to Resolved
This Bug is believed to have been resolved with #2742. Can Susmit confirm?
- Status changed from Resolved to Closed
Also available in: Atom
PDF