Bug #2174
openMultiple register prefix gives NFD error "request timed out (code: 10060)"
0%
Description
We came across this issue when trying to register multiple prefixes in a group chat application.
To reproduce: Start NFD with the installed defaults in a separate window so you can see errors. Modify the ndn-cxx examples/producer.cpp run method to register 300 different prefixes. https://github.com/named-data/ndn-cxx/blob/b6783065ae24f4035a95d7ab6effd81b720a6f23/examples/producer.cpp#L80 . Replace:
m_face.setInterestFilter("/example/testApp",
bind(&Producer::onInterest, this, _1, _2),
RegisterPrefixSuccessCallback(),
bind(&Producer::onRegisterFailed, this, _2));
with:
for (int x = 1; x <= 300; ++x) {
std::ostringstream prefix;
prefix << "/example/testApp" << x;
m_face.setInterestFilter(prefix.str(),
bind(&Producer::onInterest, this, _1, _2),
RegisterPrefixSuccessCallback(),
bind(&Producer::onRegisterFailed, this, _2));
}
In ndn-cxx, compile and run the producer:
./waf configure --with-examples
./waf
build/examples/producer
Expected result: All prefixes are registered.
Actual result: After a few seconds, NFD prints the following error several times:
[RibManager] NFD returned an error: request timed out (code: 10060)
It also prints the following error a few times:
[RibManager] NFD returned an error: Face not found (code: 410)
Finally, it prints:
Face Status Dataset request timed out
Many of the prefixes are not registered, presumably related to the above messages.
Updated by Junxiao Shi over 9 years ago
- Category set to RIB
- Target version set to v0.3
Updated by Junxiao Shi over 9 years ago
What commands should I use to inspect the RIB and FIB?
nfd-status -b
nfd-status -r
Updated by Anonymous over 9 years ago
Inspecting the RIB and FIB doesn't show any of them registered.
Updated by Junxiao Shi over 9 years ago
I'm unable to reproduce this bug on Ubuntu 12.04.
All prefixes are registered successfully.
Both nfd-status -b | grep example | wc -l
and nfd-status -r | grep example | wc -l
give "300".
Updated by Anonymous over 9 years ago
I should have mentioned: I'm on a MacBook Pro OS X 10.9.5.
I have Ubuntu 14.04 in VirtualBox. I could try to reproduce in there, but I suppose the timing would be different. Do you want me to try it?
Updated by Junxiao Shi over 9 years ago
I'm unable to try this on OSX 10.9 due to Bug #2179.
Updated by Vince Lehman over 9 years ago
I am able to reproduce the bug on OS X 10.9.5, but not Ubuntu 14.04 or 14.10.
On OS X 10.9.5:
When I removed "m_face.shutdown()" from the producer program and let "m_face.processEvents()" run in a loop, I still received
[RibManager] NFD returned an error: request timed out (code: 10060)
but nfd-status showed all of the prefixes in the RIB and the FIB. It appears that the controller is responding with a timeout even though the
updates are being processed.
Updated by Junxiao Shi over 9 years ago
@JeffT and @Vince, in your setups, which TPM is used for the KeyChain of NFD?
If it's osx-keychain, this problem is caused by #1589.
Also, as pointed out in #1990, sending multiple Signed Interests (such as prefix registration commands) at once may cause some Signed Interests to be rejected.
However, note-7 doesn't observe this problem.
Updated by Anonymous over 9 years ago
Yes, the OS X Keychain is slow. But why should NFD choke?
Updated by Junxiao Shi over 9 years ago
Yes, the OS X Keychain is slow. But why should NFD choke?
Please answer:
- Is NFD using
tpm=osx-keychain
ortpm=file
? - Does ALL prefix registrations return "10060", or does some of the succeed without an error code?
Updated by Anonymous over 9 years ago
Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".
Running producer calls onRegisterFailed 300 times. I think they all fail.
Updated by Junxiao Shi over 9 years ago
Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".
TPM configuration is not in nfd.conf
. For applications including NRD, it's in client.conf
somewhere.
I'm not sure which file controls the TPM used by NFD itself. I have asked nfd-dev and is waiting for an answer.
Updated by Junxiao Shi over 9 years ago
@Jeff : is your NFD installed by compiling from source, or from MacPorts?
Please paste output from each of these commands:
grep 'tpm' ~/.ndn/client.conf
grep 'tpm' /usr/local/etc/ndn/client.conf
grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep 'tpm' /opt/local/etc/ndn/client.conf
Updated by Anonymous over 9 years ago
My NFD is compiled from source. Here is the output from the commands:
$ grep 'tpm' ~/.ndn/client.conf
grep: /Users/jefft0/.ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/etc/ndn/client.conf
grep: /usr/local/etc/ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep: /usr/local/var/lib/ndn/nfd/.ndn/client.conf: No such file or directory
$ grep 'tpm' /opt/local/etc/ndn/client.conf
grep: /opt/local/etc/ndn/client.conf: No such file or directory
Updated by Junxiao Shi over 9 years ago
@Jeff you are using osx-keychain TPM.
To determine whether the problem is caused by #1589, please:
create
~/.ndn/client.conf
with the following content:tpm=file
repeat the steps to reproduce
report how many prefix registration have failed
Updated by Anonymous over 9 years ago
Can you help me out? When I switch to tpm=file, I lose my OS X signing key which was created when requesting a certificate and NFD won't run. Should I request a new certificate for this test? If not, how to generate a key? (I tried ndnsec-keygen /`whoami` | ndnsec-install-cert -
but it says "key too short for this signature scheme".)
Updated by Junxiao Shi over 9 years ago
After swapping TPM, PIB should be swaped as well:
mv ~/.ndn ~/.ndn-osxkeychain
You should be able to create new self-signed certificate after this.
After the experiment, delete ~/.ndn and move back .ndn-osxkeychain.
Updated by Anonymous over 9 years ago
With tpm=file, all prefix registrations succeed.
Updated by Junxiao Shi over 9 years ago
- Blocked by Bug #1589: KeyChain::sign is slow with tpm=osx-keychain added
Updated by Junxiao Shi over 9 years ago
- Target version deleted (
v0.3)
Suggested workaround: run NFD with file-based keychain.
Updated by Anonymous over 9 years ago
Why is this issue blocked by issue #1589? Yes, the OS X key chain may be slow. But NFD is able to use it to do a few register prefix. Why would it fail to process several in a batch? Does the internal timing model need to be adjusted?
Updated by Davide Pesavento over 9 years ago
I agree with Jeff. We need a proper solution/workaround here. Blaming the keychain isn't going to help.
Updated by Junxiao Shi over 9 years ago
A workaround is already given in note-20.
There is already proposed solution for #1589.
@Yingdi Yu currently refuses to acknowledge #1990.
But if #1990 is acknowledged, we could implement stop-and-wait in ndn::nfd::Controller
, and that would also fix the problem here.
Updated by Davide Pesavento over 9 years ago
Junxiao Shi wrote:
A workaround is already given in note-20.
Seriously, that is not a proper workaround.
There is already proposed solution for #1589.
Where?
But if #1990 is acknowledged, we could implement stop-and-wait in
ndn::nfd::Controller
, and that would also fix the problem here.
I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.
Updated by Junxiao Shi over 9 years ago
There is already proposed solution for #1589.
http://redmine.named-data.net/issues/1589#note-7
I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.
Interest reordering not happening locally is a consequence of current implementation, but it's not part of the contract so it's not guaranteed.
The current problem isn't caused by Interest reordering, but a stop-and-wait mechanism would solve both problems.
Updated by Junxiao Shi over 9 years ago
I don't think the current problem is worth a workaround (other than note-20) before the final solution.
The scenario is rare. A normal application shouldn't register a large number of prefixes at the same time.
Potential final solution is one of:
Updated by Junxiao Shi almost 7 years ago
- Blocked by Feature #2293: Controller: stop-and-wait added
Updated by Alex Afanasyev about 6 years ago
- Related to Bug #4511: Failure of RIB register commands within short period of time added
Updated by Junxiao Shi about 3 years ago
- Related to Bug #5149: Error 10060 when fetching RIB dataset: Timeout exceeded added