Bug #4479
closedCrash in LpReliability::deleteUnackedFrag
100%
Description
I am trying to transfer a reasonably sized file (10MB) over Ethernet unicast using putchunks and catchunks.
The segment size for putchunks is 8800, MTU on the interface is 9000.
Configurations are same as #4012-10.
I find nfd receives a SIGSEGV but not because the host is out of memory.
GDB backtrace below and attached.
#0 0x00007ffff4f68342 in std::_Rb_tree_increment(std::_Rb_tree_node_base const*) () from /lib64/libstdc++.so.6
#1 0x0000000000474029 in std::_Rb_tree_iterator<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> >::operator++ (this=<synthetic pointer>)
at /usr/include/c++/6.3.1/bits/stl_tree.h:209
#2 std::_Rb_tree<unsigned long, std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag>, std::_Select1st<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> >) (__position=..., this=0x7030f8) at /usr/include/c++/6.3.1/bits/stl_tree.h:1046
#3 std::map<unsigned long, nfd::face::LpReliability::UnackedFrag, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> > >::erase[abi:cxx11](std::_Rb_tree_iterator<std::pair<unsigned long const, nfd::face::LpReliability::UnackedFrag> >) (__position=..., this=0x7030f8)
at /usr/include/c++/6.3.1/bits/stl_map.h:951
#4 nfd::face::LpReliability::deleteUnackedFrag (this=this@entry=0x7030a0, fragIt=...) at ../daemon/face/lp-reliability.cpp:327
#5 0x000000000047671a in nfd::face::LpReliability::onLpPacketLost (this=this@entry=0x7030a0, txSeqIt=...) at ../daemon/face/lp-reliability.cpp:248
#6 0x00000000004795fb in nfd::face::LpReliability::processIncomingPacket (this=this@entry=0x7030a0, pkt=...) at ../daemon/face/lp-reliability.cpp:132
#7 0x0000000000459932 in nfd::face::GenericLinkService::doReceivePacket(nfd::face::Transport::Packet&&) (this=0x702f00,
packet=<unknown type in /usr/local/bin/nfd, CU 0x20a911, DIE 0x266d97>) at ../daemon/face/generic-link-service.cpp:279
#8 0x000000000057b6e4 in nfd::face::EthernetTransport::receivePayload (this=this@entry=0x6f17d0,
payload=payload@entry=0x1a19dee "dM\375\003D\001\302\375\003D\001\303\375\003D\001\304\375\003D\001\305\375\003D\001\306\375\003D\001\307\375\003H\002\023+P'\005%\a\030\b\004test\b\001\062\b", <incomplete sequence \375>, length=length@entry=79, sender=...) at ../daemon/face/ethernet-transport.cpp:199
#9 0x000000000057e87a in nfd::face::EthernetTransport::handleRead (this=0x6f17d0, error=...) at ../daemon/face/ethernet-transport.cpp:163
#10 0x000000000058107a in boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>::operator() (a1=..., p=<optimized out>, this=<optimized out>)
at /usr/include/boost/bind/mem_fn_template.hpp:165
#11 boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()>::operator()<boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::rrlist2<boost::system::error_code const&, unsigned long const&> > (a=<synthetic pointer>..., f=..., this=0x7fffffffca60)
at /usr/include/boost/bind/bind.hpp:319
#12 boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> >::operator()<boost::system::error_code const&, unsigned long const&> (a2=@0x7fffffffca78: 0, a1=..., this=0x7fffffffca50)
at /usr/include/boost/bind/bind.hpp:1246
#13 boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> >, boost::system::error_code, unsigned long>::operator() (this=0x7fffffffca50)
at /usr/include/boost/asio/detail/bind_handler.hpp:127
#14 boost::asio::asio_handler_invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> >, boost::system::error_code, unsigned long> > (function=...)
at /usr/include/boost/asio/handler_invoke_hook.hpp:69
#15 boost_asio_handler_invoke_helpers::invoke<boost::asio::detail::binder2<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> >, boost::system::error_code, unsigned long>, boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> > > (context=..., function=...) at /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
#16 boost::asio::detail::reactive_null_buffers_op<boost::_bi::bind_t<void, boost::_mfi::mf1<void, nfd::face::EthernetTransport, boost::system::error_code const&>, boost::_bi::list2<boost::_bi::value<nfd::face::EthernetTransport*>, boost::arg<1> (*)()> > >::do_complete (owner=0x6ae630, base=<optimized out>)
at /usr/include/boost/asio/detail/reactive_null_buffers_op.hpp:75
#17 0x00000000004430e6 in boost::asio::detail::task_io_service_operation::complete (bytes_transferred=0, ec=..., owner=..., this=<optimized out>)
at /usr/include/boost/asio/detail/task_io_service_operation.hpp:38
#18 boost::asio::detail::epoll_reactor::descriptor_state::do_complete (owner=0x6ae630, base=0x706a70, ec=..., bytes_transferred=<optimized out>)
at /usr/include/boost/asio/detail/impl/epoll_reactor.ipp:651
#19 0x00000000004442a7 in boost::asio::detail::task_io_service_operation::complete (bytes_transferred=1, ec=..., owner=..., this=<optimized out>)
at /usr/include/boost/asio/detail/task_io_service_operation.hpp:38
#20 boost::asio::detail::task_io_service::do_run_one (ec=..., this_thread=..., lock=..., this=0x6ae630) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:372
#21 boost::asio::detail::task_io_service::run (this=0x6ae630, ec=...) at /usr/include/boost/asio/detail/impl/task_io_service.ipp:149
#22 0x000000000044adfe in boost::asio::io_service::run (this=0x6ae5c0) at /usr/include/boost/asio/impl/io_service.ipp:59
#23 nfd::NfdRunner::run (this=this@entry=0x7fffffffd250) at ../daemon/main.cpp:143
#24 0x000000000042e8cc in main (argc=<optimized out>, argv=<optimized out>) at ../daemon/main.cpp:291
Files
Updated by Junxiao Shi almost 7 years ago
- Description updated (diff)
Please provide:
- tcpdump (not ndndump) trace
bt full
(notbt
) output
See http://www.lists.cs.ucla.edu/pipermail/nfd-dev/2016-May/001748.html .
Updated by Davide Pesavento almost 7 years ago
- Subject changed from Data transfer over ethernet unicast face crashes NFD to Crash in LpReliability::deleteUnackedFrag
- Description updated (diff)
- Category set to Faces
Updated by Junxiao Shi almost 7 years ago
- Assignee set to Eric Newberry
- Target version set to v0.7
Assigning to Eric who authored this LpReliability
class. Susmit still needs to upload requested information in note-1.
Updated by Eric Newberry almost 7 years ago
@susmit shannigrahi what release/git revision are you using?
Updated by susmit shannigrahi almost 7 years ago
Eric Newberry wrote:
@susmit shannigrahi what release/git revision are you using?
Hi Eric, I am using master. NFD revision is 97a010178c0bd340babb3dfabcc6f8f6abbaef42.
I am trying to reproduce the bug.
Updated by susmit shannigrahi almost 7 years ago
I am currently unable to reproduce the bug. I will update if I find it again.
Do you think the existing trace would be enough to debug this?
Updated by Davide Pesavento almost 7 years ago
susmit shannigrahi wrote:
Do you think the existing trace would be enough to debug this?
Possibly, but I'm really unfamiliar with the LpReliability code. @Eric, did you have time to look into this?
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
susmit shannigrahi wrote:
Do you think the existing trace would be enough to debug this?
Possibly, but I'm really unfamiliar with the LpReliability code. @Eric, did you have time to look into this?
I briefly did and I think it's likely caused by incorrectly tracking one of the iterators (or one of the pointers to an iterator) in the LpReliability class.
Updated by Davide Pesavento almost 7 years ago
susmit shannigrahi wrote:
The segment size for putchunks is 8800, MTU on the interface is 9000.
Are you sure about this? I found a fairly serious bug in LpReliability when used in conjunction with fragmentation, but the above settings imply that packets never get fragmented...
Updated by Eric Newberry almost 7 years ago
Davide Pesavento wrote:
susmit shannigrahi wrote:
The segment size for putchunks is 8800, MTU on the interface is 9000.
Are you sure about this? I found a fairly serious bug in LpReliability when used in conjunction with fragmentation, but the above settings imply that packets never get fragmented...
Where did you find this other bug? Has it been fixed?
Updated by Davide Pesavento almost 7 years ago
- Priority changed from Normal to High
- % Done changed from 0 to 20
The following assertion in LpReliability is wrong:
void
LpReliability::onLpPacketLost(UnackedFrags::iterator txSeqIt)
{
BOOST_ASSERT(m_unackedFrags.count(txSeqIt->first) > 0);
...
}
This test case https://gerrit.named-data.net/c/4590/1/tests/daemon/face/lp-reliability.t.cpp#658 will trigger the aforementioned assert if built in debug mode, and will probably crash in release mode (where asserts are disabled).
Updated by Davide Pesavento almost 7 years ago
Davide Pesavento wrote:
The following assertion in LpReliability is wrong:
This was rather cryptic, let me clarify. What I meant is that the code makes an assumption that is not correct, and that assumption is "checked" by the aforementioned assertion.
More specifically, onLpPacketLost
assumes that txSeqIt
is a valid iterator into the m_unackedFrags
map. But this is not true if another fragment belonging to the same net-layer packet was deemed lost during the same rx event (i.e. within a single processIncomingPacket
invocation) and that fragment also exceeded its max retx count. In that case, one or more iterators returned by findLostLpPackets
may become dangling, because all fragments of that net-layer packet are removed from the map, thus invalidating the iterators.
Updated by susmit shannigrahi almost 7 years ago
Eric Newberry wrote:
Davide Pesavento wrote:
susmit shannigrahi wrote:
The segment size for putchunks is 8800, MTU on the interface is 9000.
Are you sure about this? I found a fairly serious bug in LpReliability when used in conjunction with fragmentation, but the above settings imply that packets never get fragmented...
This has been a few weeks, so no, not very sure.
I will try to trigger the behavior you described.
Updated by Eric Newberry almost 7 years ago
- Status changed from New to In Progress
Updated by Eric Newberry almost 7 years ago
- Status changed from In Progress to Code review
- % Done changed from 20 to 100
Updated by Eric Newberry almost 7 years ago
- Status changed from Code review to Closed