Project

General

Profile

Actions

Bug #2174

open

Multiple register prefix gives NFD error "request timed out (code: 10060)"

Added by Anonymous over 9 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
RIB
Target version:
-
Start date:
11/13/2014
Due date:
% Done:

0%

Estimated time:

Description

We came across this issue when trying to register multiple prefixes in a group chat application.

To reproduce: Start NFD with the installed defaults in a separate window so you can see errors. Modify the ndn-cxx examples/producer.cpp run method to register 300 different prefixes. https://github.com/named-data/ndn-cxx/blob/b6783065ae24f4035a95d7ab6effd81b720a6f23/examples/producer.cpp#L80 . Replace:

m_face.setInterestFilter("/example/testApp",
                         bind(&Producer::onInterest, this, _1, _2),
                         RegisterPrefixSuccessCallback(),
                         bind(&Producer::onRegisterFailed, this, _2));

with:

for (int x = 1; x <= 300; ++x) {
  std::ostringstream prefix;
  prefix << "/example/testApp" << x;
  m_face.setInterestFilter(prefix.str(),
                           bind(&Producer::onInterest, this, _1, _2),
                           RegisterPrefixSuccessCallback(),
                           bind(&Producer::onRegisterFailed, this, _2));
}

In ndn-cxx, compile and run the producer:

./waf configure --with-examples
./waf
build/examples/producer

Expected result: All prefixes are registered.

Actual result: After a few seconds, NFD prints the following error several times:

[RibManager] NFD returned an error: request timed out (code: 10060)

It also prints the following error a few times:

[RibManager] NFD returned an error: Face not found (code: 410)

Finally, it prints:

Face Status Dataset request timed out

Many of the prefixes are not registered, presumably related to the above messages.


Related issues 4 (2 open2 closed)

Related to NFD - Bug #4511: Failure of RIB register commands within short period of time Rejected

Actions
Related to NFD - Bug #5149: Error 10060 when fetching RIB dataset: Timeout exceededNew02/26/2021

Actions
Blocked by ndn-cxx - Bug #1589: KeyChain::sign is slow with tpm=osx-keychainRejected05/10/2014

Actions
Blocked by ndn-cxx - Feature #2293: Controller: stop-and-waitNew

Actions
Actions #1

Updated by Junxiao Shi over 9 years ago

  • Category set to RIB
  • Target version set to v0.3

This problem appears similar to #1839, and may be caused by #1990.

If you inspect RIB and FIB, which prefixes are not registered?
Are these always at the high end of the range (near 300), or are they uniformly distributed (scattered between 0 and 300)?

Actions #2

Updated by Junxiao Shi over 9 years ago

What commands should I use to inspect the RIB and FIB?

nfd-status -b
nfd-status -r
Actions #3

Updated by Anonymous over 9 years ago

Inspecting the RIB and FIB doesn't show any of them registered.

Actions #4

Updated by Junxiao Shi over 9 years ago

I'm unable to reproduce this bug on Ubuntu 12.04.

All prefixes are registered successfully.
Both nfd-status -b | grep example | wc -l and nfd-status -r | grep example | wc -l give "300".

Actions #5

Updated by Anonymous over 9 years ago

I should have mentioned: I'm on a MacBook Pro OS X 10.9.5.

I have Ubuntu 14.04 in VirtualBox. I could try to reproduce in there, but I suppose the timing would be different. Do you want me to try it?

Actions #6

Updated by Junxiao Shi over 9 years ago

I'm unable to try this on OSX 10.9 due to Bug #2179.

Actions #7

Updated by Vince Lehman over 9 years ago

I am able to reproduce the bug on OS X 10.9.5, but not Ubuntu 14.04 or 14.10.

On OS X 10.9.5:
When I removed "m_face.shutdown()" from the producer program and let "m_face.processEvents()" run in a loop, I still received

[RibManager] NFD returned an error: request timed out (code: 10060)

but nfd-status showed all of the prefixes in the RIB and the FIB. It appears that the controller is responding with a timeout even though the
updates are being processed.

Actions #8

Updated by Junxiao Shi over 9 years ago

@JeffT and @Vince, in your setups, which TPM is used for the KeyChain of NFD?

If it's osx-keychain, this problem is caused by #1589.

Also, as pointed out in #1990, sending multiple Signed Interests (such as prefix registration commands) at once may cause some Signed Interests to be rejected.

However, note-7 doesn't observe this problem.

Actions #9

Updated by Anonymous over 9 years ago

Yes, the OS X Keychain is slow. But why should NFD choke?

Actions #10

Updated by Junxiao Shi over 9 years ago

Yes, the OS X Keychain is slow. But why should NFD choke?

Please answer:

  • Is NFD using tpm=osx-keychain or tpm=file?
  • Does ALL prefix registrations return "10060", or does some of the succeed without an error code?
Actions #11

Updated by Anonymous over 9 years ago

Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".

Running producer calls onRegisterFailed 300 times. I think they all fail.

Actions #12

Updated by Junxiao Shi over 9 years ago

Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".

TPM configuration is not in nfd.conf. For applications including NRD, it's in client.conf somewhere.

I'm not sure which file controls the TPM used by NFD itself. I have asked nfd-dev and is waiting for an answer.

Actions #13

Updated by Junxiao Shi over 9 years ago

@Jeff : is your NFD installed by compiling from source, or from MacPorts?

Please paste output from each of these commands:

grep 'tpm' ~/.ndn/client.conf
grep 'tpm' /usr/local/etc/ndn/client.conf
grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep 'tpm' /opt/local/etc/ndn/client.conf
Actions #14

Updated by Anonymous over 9 years ago

My NFD is compiled from source. Here is the output from the commands:

$ grep 'tpm' ~/.ndn/client.conf
grep: /Users/jefft0/.ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/etc/ndn/client.conf
grep: /usr/local/etc/ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep: /usr/local/var/lib/ndn/nfd/.ndn/client.conf: No such file or directory
$ grep 'tpm' /opt/local/etc/ndn/client.conf
grep: /opt/local/etc/ndn/client.conf: No such file or directory
Actions #15

Updated by Junxiao Shi over 9 years ago

@Jeff you are using osx-keychain TPM.

To determine whether the problem is caused by #1589, please:

  1. create ~/.ndn/client.conf with the following content:

    tpm=file
    
  2. repeat the steps to reproduce

  3. report how many prefix registration have failed

Actions #16

Updated by Anonymous over 9 years ago

Can you help me out? When I switch to tpm=file, I lose my OS X signing key which was created when requesting a certificate and NFD won't run. Should I request a new certificate for this test? If not, how to generate a key? (I tried ndnsec-keygen /`whoami` | ndnsec-install-cert - but it says "key too short for this signature scheme".)

Actions #17

Updated by Junxiao Shi over 9 years ago

After swapping TPM, PIB should be swaped as well:

mv ~/.ndn ~/.ndn-osxkeychain

You should be able to create new self-signed certificate after this.

After the experiment, delete ~/.ndn and move back .ndn-osxkeychain.

Actions #18

Updated by Anonymous over 9 years ago

With tpm=file, all prefix registrations succeed.

Actions #19

Updated by Junxiao Shi over 9 years ago

  • Blocked by Bug #1589: KeyChain::sign is slow with tpm=osx-keychain added
Actions #20

Updated by Junxiao Shi over 9 years ago

  • Target version deleted (v0.3)

Suggested workaround: run NFD with file-based keychain.

Actions #21

Updated by Anonymous over 9 years ago

Why is this issue blocked by issue #1589? Yes, the OS X key chain may be slow. But NFD is able to use it to do a few register prefix. Why would it fail to process several in a batch? Does the internal timing model need to be adjusted?

Actions #22

Updated by Davide Pesavento over 9 years ago

I agree with Jeff. We need a proper solution/workaround here. Blaming the keychain isn't going to help.

Actions #23

Updated by Junxiao Shi over 9 years ago

A workaround is already given in note-20.

There is already proposed solution for #1589.

@Yingdi Yu currently refuses to acknowledge #1990.

But if #1990 is acknowledged, we could implement stop-and-wait in ndn::nfd::Controller, and that would also fix the problem here.

Actions #24

Updated by Davide Pesavento over 9 years ago

Junxiao Shi wrote:

A workaround is already given in note-20.

Seriously, that is not a proper workaround.

There is already proposed solution for #1589.

Where?

But if #1990 is acknowledged, we could implement stop-and-wait in ndn::nfd::Controller, and that would also fix the problem here.

I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.

Actions #25

Updated by Junxiao Shi over 9 years ago

There is already proposed solution for #1589.

http://redmine.named-data.net/issues/1589#note-7

I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.

Interest reordering not happening locally is a consequence of current implementation, but it's not part of the contract so it's not guaranteed.

The current problem isn't caused by Interest reordering, but a stop-and-wait mechanism would solve both problems.

Actions #26

Updated by Junxiao Shi over 9 years ago

I don't think the current problem is worth a workaround (other than note-20) before the final solution.

The scenario is rare. A normal application shouldn't register a large number of prefixes at the same time.

Potential final solution is one of:

  • #1589
  • stop-and-wait if #1990 is acknowledged
  • #1705 ; note: NFD itself shouldn't explicitly call signWithSha256, as decided in #1814 note-2
Actions #27

Updated by Junxiao Shi almost 7 years ago

Actions #28

Updated by Alex Afanasyev about 6 years ago

  • Related to Bug #4511: Failure of RIB register commands within short period of time added
Actions #29

Updated by Junxiao Shi about 3 years ago

  • Related to Bug #5149: Error 10060 when fetching RIB dataset: Timeout exceeded added
Actions

Also available in: Atom PDF