Bug #1769
closed"error while connecting to forwarder" when using Face.put in a loop on large amount of data
70%
Description
Using Face.put in a loop on a large amount of data (10MB) sometimes produces an "error while connecting to forwarder" exception.
We discovered the problem using a slightly modified version of ndnputchunks (see attached code). Reproduction is inconsistent, but should happen fairly regularly.
Files
Updated by Alex Afanasyev over 10 years ago
I believe this error is a result of the current async implementation of communicating with the forwarder. Every time we "put" (or expressInterest), the most things that can happen is that socket::async_send
will called. I had experience in the past that this async_send
can fail when it is called too many times without being properly processed by io_service
thread, as it is when Face::put is called in the loop.
This needs more careful evaluation and consideration. If it is the problem I'm thinking of, then I'm not even sure what should be the interface around. There should be a way (signal, callback) that indicates that face is now ready to accept new Interest/Data, but how exactly this can be achieved?
Updated by Junxiao Shi over 10 years ago
- Category set to Base
Confirmed. Simplified snippet:
// g++ bug1769.cpp `pkg-config --libs --cflags libndn-cxx`
#include <ndn-cxx/face.hpp>
#include <ndn-cxx/security/key-chain.hpp>
using namespace ndn;
int
main(int argc, char** argv)
{
Face face;
KeyChain keyChain;
static uint8_t buffer[8000];
for (int i = 0; i < 2000; ++i) {
shared_ptr<Data> data = make_shared<Data>(Name("/A").appendSegment(i));
data->setContent(buffer, sizeof(buffer));
keyChain.sign(*data);
BOOST_ASSERT(data->wireEncode().size() < 8800);
face.put(*data);
}
face.processEvents();
}
Observations:
- The bug does not appear if the same Data is passed to
face.put
2000 times. - The bug does not appear if
keyChain.sign
is replaced withkeyChain.signWithSha256
.
Updated by Alex Afanasyev over 10 years ago
- Status changed from New to In Progress
- Assignee set to Alex Afanasyev
- Target version set to v0.2
Few observations. This code would work if packet size are <= 8192 bytes. Also, the same code works perfectly if switched to use TCP transport (Face face("localhost")).
I tracked this to a much bigger problem with the way we use Boost.Asio. And this applies to NFD as well. In particular, socket::async_send we are using is not guaranteed to send all the supplied data as I was incorrectly assuming.
The suggested way is to use boost::async_write
free function. However, we cannot simply replace async_send
with it, since there is a requirement that until async_write
finishes, there are no other write calls happen. The current code does not guarantee this and we will need to add some form of queueing.
Updated by Davide Pesavento over 10 years ago
Alex Afanasyev wrote:
I tracked this to a much bigger problem with the way we use Boost.Asio. And this applies to NFD as well. In particular, socket::async_send we are using is not guaranteed to send all the supplied data as I was incorrectly assuming.
Yes, apparently async_send
behaves exactly like async_write_some
.
The suggested way is to use boost::async_write free function. However, we cannot simply replace async_send with it, since there is a requirement that until async_write finishes, there are no other write calls happen. The current code does not guarantee this and we will need to add some form of queueing.
More simply, can we keep calling async_write_some
(that does not have this requirement) with the same buffer (+ offset) until the bytes_transferred
argument passed to the completion handler equals the number of bytes remaining?
Updated by Alex Afanasyev over 10 years ago
We can call that, but the problem is that we need to prevent other async_send calls to be scheduled in between. This is the problem I'm trying to thing of a solution.
Updated by Alex Afanasyev over 10 years ago
- Related to Task #1777: Serialization of write operation in socket stream added
Updated by Alex Afanasyev over 10 years ago
- Target version changed from v0.2 to v0.3
Steve, can you verify that you don't have the problem anymore (with master branch)
Updated by Anonymous over 10 years ago
Still getting the problem running our when running our demo on Ubuntu 14.04. Basically, we have a script that runs nfs-start, sleeps 2 seconds, and then spins off 6 ndnputchunks4 (previously attached) publishers.
I'm now also noticing the following assertion failure from NRD. I think this is new, but not 100% sure:
nrd: /usr/local/include/ndn-cxx/management/nfd-control-parameters.hpp:358: const milliseconds& ndn::nfd::ControlParameters::getExpirationPeriod() const: Assertion `this->hasExpirationPeriod()' failed.
Updated by Anonymous over 10 years ago
Steve DiBenedetto wrote:
...nfs-start...
nfd-start
Updated by Alex Afanasyev over 10 years ago
Are you using release branch of both library and NFD ? this assert could be if nfd is release and the library is master...
Updated by Alex Afanasyev over 10 years ago
Actually, you need to use master branch on both, since error fixed in master only.
Updated by Anonymous over 10 years ago
My fault. I'm using master for both, but NFD was a little behind. I've updated NFD to the latest master and the assertion failure is gone. However, the "error while connecting to forward" problem remains.
Updated by Alex Afanasyev over 10 years ago
I suspect that creation of data packets for 10mb takes more than 4sec. What you can do for now is to change catchunks to create all data prior to the initial put call or space out (with scheduler) data creation.
the reason is that the first put will initiate connection, but until you give processEvents() to do the work, nothing will happen, but the internal scheduler will remember connection initiation time. If more than default 4 sec, then you will get an error.
Updated by Junxiao Shi over 10 years ago
I doubt the workaround in note-14 can help.
I changed the snippet in note-2 as follows:
- total 20 Data packets
- add
sleep(5)
beforeface.processEvents()
And there is no error.
There is also no error running note-2 snippet unchanged.
Updated by susmit shannigrahi over 10 years ago
Is there any updates in this?
Thanks.
Updated by Alex Afanasyev over 10 years ago
Have you tried the suggestion I made (preparing data packets first, and then putting them to face)?
Updated by susmit shannigrahi over 10 years ago
I not getting the error with the latest version of ndn-cxx/NFD. I tried with and without the fix Alex suggested.
Could not reproduce either way.
Updated by Junxiao Shi over 10 years ago
- Status changed from In Progress to Abandoned
This bug is gone after recent ndn-cxx and NFD update, as reported in note-18 and note-15.