Bug #4705
closed
NLSR crashes on multiple nodes in mcn failure experiment with a long failure time
Added by Ashlesh Gawande over 6 years ago.
Updated about 6 years ago.
Description
Crashes happen while waiting after failure has happened.
Crash was reported when running on 43 node current testbed topology.
Can be replicated with 10 node topology, using the Mini-NDN change, master NLSR, with command:
sudo minindn topologies/minindn.caida.conf --nlsr-security --experiment mcn-failure --wait-time 1200 --ctime 90 --nPings 0
Files
bt (21.9 KB)
bt |
|
Ashlesh Gawande, 08/13/2018 04:02 PM
|
|
bt-full (196 KB)
bt-full |
|
Ashlesh Gawande, 08/13/2018 04:02 PM
|
|
nlsr.log (441 KB)
nlsr.log |
|
Ashlesh Gawande, 08/13/2018 04:03 PM
|
|
- Private changed from No to Yes
Happens at ~452s after NLSR has started (lsa-refresh-interval is set to 240 seconds).
- Private changed from Yes to No
- Status changed from New to Code review
backtrace and log from arizona node is attached.
Crash happens while initializing set using the range constructor. I don't know why it happens.
In my testing crash does not happen if we use insert instead, waiting for bot to confirm that it works:
https://gerrit.named-data.net/#/c/NLSR/+/4908/
- Status changed from Code review to Closed
- Target version set to v0.5.0
- % Done changed from 0 to 100
Davide figured out what the bug was.
We did not have strict weak ordering as required by std::set in our < operator define here.
If the requirements are not met then it may lead to undefined behaviour.
return (m_name < adjacent.getName()) ||
(m_linkCost < adjacent.getLinkCost());
The solution was:
auto linkCost = adjacent.getLinkCost();
return std::tie(m_name, m_linkCost) <
std::tie(adjacent.getName(), linkCost);
Also available in: Atom
PDF