Bug #1790
closed
NLSR throws exception and quits if executed simultaneously on all machines in emulab environment.
Added by Syed Amin over 10 years ago.
Updated about 8 years ago.
Description
I am running NLSR on 10 emulab nodes. There is a script that runs nfd first and then nlsr. If I run this script manually at each node with 2 to 3 seconds delay of going from node to another, the nlsr works fine. However, when I execute the script simultaneously at each node (using a scheduler or by broadcasting the command to all terminals) then nlsr quits giving following error:
terminate called after throwing an instance of 'ndn::SecPublicInfoSqlite3::Error'
what(): Key does not exist:/ndn/caida/%C1.Router/router1/NLSR/ksk-1406145072693
At one point I also got this error:
nlsr: /usr/include/boost/smart_ptr/shared_ptr.hpp:418: boost::shared_ptr<T>::reference boost::shared_ptr<T>::operator*() const [with T = ndn::IdentityCertificate, boost::shared_ptr<T>::reference = ndn::IdentityCertificate&]: Assertion `px != 0' failed.
The home folder in emulab is shared among all nodes, where I guess the keys are stored. Not sure but do you think that the concurrent access to that folder may be causing this error.
- Description updated (diff)
Shared home folder is definitely a problem. The certificate/keys are generated and stored under $HOME/.ndn
.
I would strongly recommend setting HOME variable to machine-specific folder (e.g., /tmp/nlsr) before running NLSR, NFD (if it is run as root, make sure you set HOME to /tmp/root-nlsr
or something like that), nrd.
- Description updated (diff)
Hint. For the task description and comments, you can use markdown syntax (e.g., 4 spaces for literal/code blocks). Compared with "pre" it properly escapes underscores and other special symbols.
- Description updated (diff)
changing HOME variable may affect other things as well. Isn't it possible to specify where to store ".ndn" folder.
Which other things? You can change HOME just before running the process (to affect only that process).
HOME=/test ./nlsr
Unfortunately, there is no other way currently to make this change. But even if the was, shared HOME would pose problems for applications.
Changing the HOME to tmp or to some local folder seems to fix the issue. I ran it for couple of times, for two to three hours and didn't face the issue that I reported. One word of caution, which might be helpful for others that there is a significant difference between:
HOME=/test ./nlsr
and
HOME=/test; ./nlsr
I once by mistake issued the second command, which changed the environment variable for all processes of that session and messed up the other scripts.
Should we not close this issue?
- Subject changed from NLSR throws exception and quits if executed simultaneously on all machines. to NLSR throws exception and quits if executed simultaneously on all machines in emulab environment.
Are the details about the files created at the "HOME" folder by nfd mentioned somewhere in the documents?
I didn't know about this folder until Alex mentioned to delete a file from there (I was having trouble in starting nfd and was getting authorization errors). Since then I've ran into similar problem many times and deleting ndnsec-public-info.db in ".ndn" folder helped in most of the cases.
- Is duplicate of Bug #2009: SecPublicInfoSqlite3 is unsafe on NFS share added
- Status changed from New to Closed
Also available in: Atom
PDF