Project

General

Profile

Task #2526

Application crashes after trying to put large data into face

Added by susmit shannigrahi over 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Base
Target version:
Start date:
02/16/2015
Due date:
% Done:

0%

Estimated time:

Description

I have an application that puts a large data object (1GB or more) into a NFD face. I have set the MAX_NDN_PACKET_SIZE to 2GB. The application can put around 800MB of data into NFD. Any larger, and it crashes with the following error:

terminate called after throwing an instance of 'ndn::Transport::Error'
what(): error while connecting to the forwarder

Attached the gdb log of the application and the NFD log.

Application Trace

#0 0x00007ffff524e877 in raise () from /lib64/libc.so.6
#1 0x00007ffff524ff68 in abort () from /lib64/libc.so.6
#2 0x00007ffff5b54dd5 in gnu_cxx::verbose_terminate_handler() () from /lib64/libstdc++.so.6
#3 0x00007ffff5b52d46 in ?? () from /lib64/libstdc++.so.6
#4 0x00007ffff5b52d73 in std::terminate() () from /lib64/libstdc++.so.6
#5 0x00007ffff5b52fe9 in __cxa_rethrow () from /lib64/libstdc++.so.6
#6 0x000000000045f64a in ndn::Face::processEvents (this=this@entry=0x7fffffffde20, timeout=..., keepThread=keepThread@entry=false) at ../src/face.cpp:445
#7 0x0000000000453247 in ndn::Producer::run (this=this@entry=0x7fffffffdd90) at ../tools/ndnputbigchunk.cpp:109
#8 0x000000000044a465 in ndn::main (argc=, argv=) at ../tools/ndnputbigchunk.cpp:137
#9 0x00007ffff523ad65 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000451ea1 in _start ()


Files

nfd-log (161 KB) nfd-log susmit shannigrahi, 02/16/2015 11:33 AM

Related issues

Related to ndn-cxx - Bug #2742: Face creation fails due to heavy application processing before calling processEventsClosed04/10/2015

Actions

History

#1

Updated by Alex Afanasyev over 5 years ago

  • Project changed from NFD to ndn-cxx
  • Target version set to Unsupported

I can only think about memory exhaustion

#2

Updated by Junxiao Shi over 5 years ago

Please attach application code.

Remember that your application shouldn't send Data without first receiving an Interest.

As announced, NFD will not accept unsolicited Data.

#3

Updated by susmit shannigrahi over 5 years ago

I plotted the memory with 0.01 sec sampling, the machine is not running out, it has 128GB RAM.

Please attach application code.

http://redmine.named-data.net/attachments/download/191/ndnputbigchunk.cpp

Remember that your application shouldn't send Data without first receiving an Interest.

As announced, NFD will not accept unsolicited Data.

Ok. But why crash at 800MB? Should NFD not reject whenever it receives any unsolicited data?

#4

Updated by Junxiao Shi over 5 years ago

This app is sending unsolicited Data into NFD.

When doing so, the Face is not yet connected.

The correct behavior should be: NFD drops those Data; app shouldn't crash.

I guess some buffer used by Face::put is full, most likely StreamTransportImpl<..>::m_transmissionQueue.

#5

Updated by susmit shannigrahi over 5 years ago

I have put this in the stream-face.hpp:

284 while (m_inputBufferSize - offset > 0)
285 {
286 isOk = Block::fromBuffer(m_inputBuffer + offset, m_inputBufferSize - offset, element);
287 if (!isOk){
288 NFD_LOG_WARN("Can not create block from buffer");
289 break;
290 }
291

When the app crashes, NFD log also shows "Can not create block from buffer".

#6

Updated by Alex Afanasyev over 5 years ago

It's normal on NFD side. NFD should show this many times until the whole packet is received. If only part of packet is received, it cannot yet decode TLV

#7

Updated by susmit shannigrahi over 5 years ago

Also, I printed the error code in the error handler (stream.hpp, line 273).

It says: "Error code: asio.misc:2 (End of file)"

#8

Updated by Davide Pesavento over 5 years ago

Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?

#9

Updated by Alex Afanasyev over 5 years ago

It says: "Error code: asio.misc:2 (End of file)"

This only means that TCP/UnixSocket connection got closed.

#10

Updated by susmit shannigrahi over 5 years ago

Davide Pesavento wrote:

Are you saying that if you try to send <800 MB it actually works? i.e. NFD receives the whole thing and forwards it?

Yes, NFD receives the whole thing and puts it in the CS. If there is an Interest for it, NFD sends the data back from CS.

#11

Updated by susmit shannigrahi over 5 years ago

We found out the problem. In the application, m_face.put(*data) was called before the run() loop. This caused the run loop to block while data was being published. The connect timer of 4 seconds in src/transport/stream-transport.hpp caused a timeout and closure of the face.

The only remaining problem is that the face still accepts unsolicited data.

Please close this if that's not a bug/duplicate.

#12

Updated by Alex Afanasyev over 5 years ago

This is not right. There shouldn't be problem calling m_face.put() before run() loop. When put is called before connect, data packet suppose to be remembered and actual publishing should have happened just after connect.

#13

Updated by Anonymous over 5 years ago

Said a different way, the problem is that the application is creating a Face (default unix transport), putting a huge Data, and then calling run.

I think the timer for the transport connect is ticking away while the large Data is being put. We observed connection setup times between the transport's async_connect and timeout handler being triggered/cancelled varying based on the size of the Data. The problem seems to be solved by doing a short run after creating the Face, then putting the Data and the normal run() call.

#14

Updated by Alex Afanasyev over 5 years ago

Oh... Yeah. This timeout issue seem to be a re-occurring problem and I will try to fix it.

#15

Updated by Junxiao Shi over 4 years ago

  • Category set to Base
#16

Updated by Junxiao Shi over 4 years ago

  • Related to Bug #2742: Face creation fails due to heavy application processing before calling processEvents added
#17

Updated by Junxiao Shi over 4 years ago

At 20150901 conference call, Alex believes this is probably a duplicate of #2742.
We can confirm this after #2742 is resolved.

#18

Updated by Junxiao Shi over 4 years ago

  • Status changed from New to Resolved

This Bug is believed to have been resolved with #2742. Can Susmit confirm?

#19

Updated by susmit shannigrahi over 4 years ago

  • Status changed from Resolved to Closed

Yes, thanks.

Also available in: Atom PDF