Bug #2174
openMultiple register prefix gives NFD error "request timed out (code: 10060)"
0%
Description
We came across this issue when trying to register multiple prefixes in a group chat application.
To reproduce: Start NFD with the installed defaults in a separate window so you can see errors. Modify the ndn-cxx examples/producer.cpp run method to register 300 different prefixes. https://github.com/named-data/ndn-cxx/blob/b6783065ae24f4035a95d7ab6effd81b720a6f23/examples/producer.cpp#L80 . Replace:
m_face.setInterestFilter("/example/testApp",
bind(&Producer::onInterest, this, _1, _2),
RegisterPrefixSuccessCallback(),
bind(&Producer::onRegisterFailed, this, _2));
with:
for (int x = 1; x <= 300; ++x) {
std::ostringstream prefix;
prefix << "/example/testApp" << x;
m_face.setInterestFilter(prefix.str(),
bind(&Producer::onInterest, this, _1, _2),
RegisterPrefixSuccessCallback(),
bind(&Producer::onRegisterFailed, this, _2));
}
In ndn-cxx, compile and run the producer:
./waf configure --with-examples
./waf
build/examples/producer
Expected result: All prefixes are registered.
Actual result: After a few seconds, NFD prints the following error several times:
[RibManager] NFD returned an error: request timed out (code: 10060)
It also prints the following error a few times:
[RibManager] NFD returned an error: Face not found (code: 410)
Finally, it prints:
Face Status Dataset request timed out
Many of the prefixes are not registered, presumably related to the above messages.
Updated by Junxiao Shi about 10 years ago
- Category set to RIB
- Target version set to v0.3
Updated by Junxiao Shi about 10 years ago
What commands should I use to inspect the RIB and FIB?
nfd-status -b
nfd-status -r
Updated by Anonymous about 10 years ago
Inspecting the RIB and FIB doesn't show any of them registered.
Updated by Junxiao Shi about 10 years ago
I'm unable to reproduce this bug on Ubuntu 12.04.
All prefixes are registered successfully.
Both nfd-status -b | grep example | wc -l
and nfd-status -r | grep example | wc -l
give "300".
Updated by Anonymous about 10 years ago
I should have mentioned: I'm on a MacBook Pro OS X 10.9.5.
I have Ubuntu 14.04 in VirtualBox. I could try to reproduce in there, but I suppose the timing would be different. Do you want me to try it?
Updated by Junxiao Shi about 10 years ago
I'm unable to try this on OSX 10.9 due to Bug #2179.
Updated by Vince Lehman about 10 years ago
I am able to reproduce the bug on OS X 10.9.5, but not Ubuntu 14.04 or 14.10.
On OS X 10.9.5:
When I removed "m_face.shutdown()" from the producer program and let "m_face.processEvents()" run in a loop, I still received
[RibManager] NFD returned an error: request timed out (code: 10060)
but nfd-status showed all of the prefixes in the RIB and the FIB. It appears that the controller is responding with a timeout even though the
updates are being processed.
Updated by Junxiao Shi about 10 years ago
@JeffT and @Vince, in your setups, which TPM is used for the KeyChain of NFD?
If it's osx-keychain, this problem is caused by #1589.
Also, as pointed out in #1990, sending multiple Signed Interests (such as prefix registration commands) at once may cause some Signed Interests to be rejected.
However, note-7 doesn't observe this problem.
Updated by Anonymous about 10 years ago
Yes, the OS X Keychain is slow. But why should NFD choke?
Updated by Junxiao Shi about 10 years ago
Yes, the OS X Keychain is slow. But why should NFD choke?
Please answer:
- Is NFD using
tpm=osx-keychain
ortpm=file
? - Does ALL prefix registrations return "10060", or does some of the succeed without an error code?
Updated by Anonymous about 10 years ago
Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".
Running producer calls onRegisterFailed 300 times. I think they all fail.
Updated by Junxiao Shi about 10 years ago
Do you mean /usr/local/etc/ndn/nfd.conf ? There is no line for "tpm=".
TPM configuration is not in nfd.conf
. For applications including NRD, it's in client.conf
somewhere.
I'm not sure which file controls the TPM used by NFD itself. I have asked nfd-dev and is waiting for an answer.
Updated by Junxiao Shi about 10 years ago
@Jeff : is your NFD installed by compiling from source, or from MacPorts?
Please paste output from each of these commands:
grep 'tpm' ~/.ndn/client.conf
grep 'tpm' /usr/local/etc/ndn/client.conf
grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep 'tpm' /opt/local/etc/ndn/client.conf
Updated by Anonymous about 10 years ago
My NFD is compiled from source. Here is the output from the commands:
$ grep 'tpm' ~/.ndn/client.conf
grep: /Users/jefft0/.ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/etc/ndn/client.conf
grep: /usr/local/etc/ndn/client.conf: No such file or directory
$ grep 'tpm' /usr/local/var/lib/ndn/nfd/.ndn/client.conf
grep: /usr/local/var/lib/ndn/nfd/.ndn/client.conf: No such file or directory
$ grep 'tpm' /opt/local/etc/ndn/client.conf
grep: /opt/local/etc/ndn/client.conf: No such file or directory
Updated by Junxiao Shi about 10 years ago
@Jeff you are using osx-keychain TPM.
To determine whether the problem is caused by #1589, please:
create
~/.ndn/client.conf
with the following content:tpm=file
repeat the steps to reproduce
report how many prefix registration have failed
Updated by Anonymous about 10 years ago
Can you help me out? When I switch to tpm=file, I lose my OS X signing key which was created when requesting a certificate and NFD won't run. Should I request a new certificate for this test? If not, how to generate a key? (I tried ndnsec-keygen /`whoami` | ndnsec-install-cert -
but it says "key too short for this signature scheme".)
Updated by Junxiao Shi about 10 years ago
After swapping TPM, PIB should be swaped as well:
mv ~/.ndn ~/.ndn-osxkeychain
You should be able to create new self-signed certificate after this.
After the experiment, delete ~/.ndn and move back .ndn-osxkeychain.
Updated by Anonymous about 10 years ago
With tpm=file, all prefix registrations succeed.
Updated by Junxiao Shi about 10 years ago
- Blocked by Bug #1589: KeyChain::sign is slow with tpm=osx-keychain added
Updated by Junxiao Shi about 10 years ago
- Target version deleted (
v0.3)
Suggested workaround: run NFD with file-based keychain.
Updated by Anonymous about 10 years ago
Why is this issue blocked by issue #1589? Yes, the OS X key chain may be slow. But NFD is able to use it to do a few register prefix. Why would it fail to process several in a batch? Does the internal timing model need to be adjusted?
Updated by Davide Pesavento about 10 years ago
I agree with Jeff. We need a proper solution/workaround here. Blaming the keychain isn't going to help.
Updated by Junxiao Shi about 10 years ago
A workaround is already given in note-20.
There is already proposed solution for #1589.
@Yingdi Yu currently refuses to acknowledge #1990.
But if #1990 is acknowledged, we could implement stop-and-wait in ndn::nfd::Controller
, and that would also fix the problem here.
Updated by Davide Pesavento about 10 years ago
Junxiao Shi wrote:
A workaround is already given in note-20.
Seriously, that is not a proper workaround.
There is already proposed solution for #1589.
Where?
But if #1990 is acknowledged, we could implement stop-and-wait in
ndn::nfd::Controller
, and that would also fix the problem here.
I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.
Updated by Junxiao Shi about 10 years ago
There is already proposed solution for #1589.
http://redmine.named-data.net/issues/1589#note-7
I don't understand what interest reordering has to do with this particular issue. In Jeff's case NFD is running locally, and no packet reordering happens over a Unix socket.
Interest reordering not happening locally is a consequence of current implementation, but it's not part of the contract so it's not guaranteed.
The current problem isn't caused by Interest reordering, but a stop-and-wait mechanism would solve both problems.
Updated by Junxiao Shi about 10 years ago
I don't think the current problem is worth a workaround (other than note-20) before the final solution.
The scenario is rare. A normal application shouldn't register a large number of prefixes at the same time.
Potential final solution is one of:
Updated by Junxiao Shi over 7 years ago
- Blocked by Feature #2293: Controller: stop-and-wait added
Updated by Alex Afanasyev almost 7 years ago
- Related to Bug #4511: Failure of RIB register commands within short period of time added
Updated by Junxiao Shi over 3 years ago
- Related to Bug #5149: Error 10060 when fetching RIB dataset: Timeout exceeded added