Bug #4705
closedNLSR crashes on multiple nodes in mcn failure experiment with a long failure time
100%
Description
Crashes happen while waiting after failure has happened.
Crash was reported when running on 43 node current testbed topology.
Can be replicated with 10 node topology, using the Mini-NDN change, master NLSR, with command:
sudo minindn topologies/minindn.caida.conf --nlsr-security --experiment mcn-failure --wait-time 1200 --ctime 90 --nPings 0
Files
Updated by Ashlesh Gawande over 6 years ago
- Private changed from No to Yes
Happens at ~452s after NLSR has started (lsa-refresh-interval is set to 240 seconds).
Updated by Ashlesh Gawande over 6 years ago
- Status changed from New to Code review
Updated by Ashlesh Gawande over 6 years ago
backtrace and log from arizona node is attached.
Crash happens while initializing set using the range constructor. I don't know why it happens.
In my testing crash does not happen if we use insert instead, waiting for bot to confirm that it works:
https://gerrit.named-data.net/#/c/NLSR/+/4908/
Updated by Ashlesh Gawande over 6 years ago
- Status changed from Code review to Closed
- Target version set to v0.5.0
- % Done changed from 0 to 100
Davide figured out what the bug was.
We did not have strict weak ordering as required by std::set in our < operator define here.
If the requirements are not met then it may lead to undefined behaviour.
return (m_name < adjacent.getName()) ||
(m_linkCost < adjacent.getLinkCost());
The solution was:
auto linkCost = adjacent.getLinkCost();
return std::tie(m_name, m_linkCost) <
std::tie(adjacent.getName(), linkCost);