Project

General

Profile

Actions

Bug #2148

closed

UtilFaceUri/Canonize{Tcp,Udp} test cases randomly time out on Ubuntu 14.10

Added by Davide Pesavento over 9 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Category:
Utils
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:

Description

On a 64-bit Ubuntu 14.10 virtual machine

davide@ndnbox:~/ndn-cxx$ git rev-parse HEAD
4e9b069bb844545d7e352b98821c5a11520f1b58

davide@ndnbox:~/ndn-cxx$ ./build/unit-tests -p
[...]
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeUdp": udp4://all-routers.mcast.net:56363 should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeUdp": udp://[2001:db8::1]:6363 should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeUdp": udp6://[2001:db8::01]:6363 should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeUdp": udp6://google-public-dns-a.google.com should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeUdp": udp6://[ff02::2] should succeed
../tests/unit-tests/util/face-uri.cpp(79): error in "CanonizeUdp": check !tc->m_isCompleted failed
../tests/unit-tests/util/face-uri.cpp(52): error in "CanonizeUdp": check m_nPending == 0 failed [-1 != 0]

Files

CanonizeUdp.pcapng (12.2 KB) CanonizeUdp.pcapng Davide Pesavento, 11/07/2014 05:15 PM
success-10-seconds.pcapng (14.5 KB) success-10-seconds.pcapng 10 second canonization time; test succeeds Vince Lehman, 12/10/2014 02:17 PM
failure-4-seconds.pcapng (10.2 KB) failure-4-seconds.pcapng 4 second canonization time; test fails Vince Lehman, 12/10/2014 02:17 PM
capi-test.pcapng (54.5 KB) capi-test.pcapng C API Test Vince Lehman, 12/16/2014 08:57 AM
dns-test.pcap (3.04 KB) dns-test.pcap Vince Lehman, 01/22/2015 01:07 PM

Related issues 1 (0 open1 closed)

Related to ndn-cxx - Bug #3270: Occasional non-reproducible test failure in UtilDns/AsynchronousV6Closed

Actions
Actions #1

Updated by Junxiao Shi over 9 years ago

  • Category set to Utils
  • Target version set to v0.3

Does the box have access to a working DNS server that can solve both A records and AAAA records? Please attach tcpdump trace during the unit testing.

Actions #2

Updated by Davide Pesavento over 9 years ago

Probably not, it's IPv4 only I believe. Regardless, all tests requiring IPv6 should be skipped if IPv6 isn't available (the same applies to IPv4). I will attach the tcpdump log tomorrow if you really need it.

Actions #3

Updated by Junxiao Shi over 9 years ago

The test doesn't require IPv6 connectivity, but it requires the DNS server (whatever connection) to respond to both A and AAAA queries.

Actions #4

Updated by Davide Pesavento over 9 years ago

Whatever... my point is still valid. Unit tests should not rely on such assumptions. In fact, they should not rely on any kind of connectivity at all.

Actions #6

Updated by Junxiao Shi over 9 years ago

  • Assignee set to Vince Lehman

The pcap trace in note-5 reveals that AAAA records are correctly resolved by the DNS server.
It's unknown why resolver fails to return the record.

I'm assigning this Bug is @Vince, the author of util/dns.hpp.

Actions #7

Updated by Junxiao Shi over 9 years ago

  • Subject changed from FaceUri's CanonizeUdp tests fail to UtilTestFaceUri/CanonizeUdp test case fails on Ubuntu 14.10

@Davide, is CanonizeTcp test case passing or failing on this platform?

Actions #8

Updated by Davide Pesavento over 9 years ago

CanonizeTcp tests sometimes fail too...

../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeTcp": tcp4://google-public-dns-a.google.com should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeTcp": tcp://[2001:db8::1]:6363 should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeTcp": tcp6://[2001:db8::01]:6363 should succeed
../tests/unit-tests/util/face-uri.cpp(95): error in "CanonizeTcp": tcp6://google-public-dns-a.google.com should succeed
../tests/unit-tests/util/face-uri.cpp(91): error in "CanonizeTcp": check !tc->m_isCompleted failed
../tests/unit-tests/util/face-uri.cpp(52): error in "CanonizeTcp": check m_nPending == 0 failed [-1 != 0]

Note that the failures (both for UDP and TCP) are not 100% reproducible but appear to be random. Could be a timing related issue.

Actions #9

Updated by Davide Pesavento over 9 years ago

I also get this from time to time:

==23852==ERROR: AddressSanitizer: heap-use-after-free on address 0x61200000ee50 at pc 0xa928e3 bp 0x7fff9a977b80 sp 0x7fff9a977b70
WRITE of size 1 at 0x61200000ee50 thread T0
    #0 0xa928e2 in ndn::Scheduler::onEvent(boost::system::error_code const&) ../src/util/scheduler.cpp:168
    #1 0xa93dc6 in operator()<const boost::system::error_code&, void> /usr/include/c++/4.9/functional:569
    #2 0xa93dc6 in __call<void, const boost::system::error_code&, 0ul, 1ul> /usr/include/c++/4.9/functional:1264
    #3 0xa93dc6 in operator()<const boost::system::error_code&, void> /usr/include/c++/4.9/functional:1323
    #4 0xa93dc6 in boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::Scheduler::*)(boost::system::error_code const&)> (ndn::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code>::operator()() /usr/include/boost/asio/detail/bind_handler.hpp:47
    #5 0xa93dc6 in asio_handler_invoke<boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::Scheduler::*)(const boost::system::error_code&)>(ndn::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code> > /usr/include/boost/asio/handler_invoke_hook.hpp:69
    #6 0xa93dc6 in invoke<boost::asio::detail::binder1<std::_Bind<std::_Mem_fn<void (ndn::Scheduler::*)(const boost::system::error_code&)>(ndn::Scheduler*, std::_Placeholder<1>)>, boost::system::error_code>, std::_Bind<std::_Mem_fn<void (ndn::Scheduler::*)(const boost::system::error_code&)>(ndn::Scheduler*, std::_Placeholder<1>)> > /usr/include/boost/asio/detail/handler_invoke_helpers.hpp:37
    #7 0xa93dc6 in boost::asio::detail::wait_handler<std::_Bind<std::_Mem_fn<void (ndn::Scheduler::*)(boost::system::error_code const&)> (ndn::Scheduler*, std::_Placeholder<1>)> >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) /usr/include/boost/asio/detail/wait_handler.hpp:70
    #8 0x54b087 in boost::asio::detail::task_io_service_operation::complete(boost::asio::detail::task_io_service&, boost::system::error_code const&, unsigned long) /usr/include/boost/asio/detail/task_io_service_operation.hpp:38
    #9 0x54b087 in boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) /usr/include/boost/asio/detail/impl/task_io_service.ipp:384
    #10 0x54b087 in boost::asio::detail::task_io_service::run(boost::system::error_code&) /usr/include/boost/asio/detail/impl/task_io_service.ipp:153
    #11 0x6ccf34 in boost::asio::io_service::run() /usr/include/boost/asio/impl/io_service.ipp:59
    #12 0x6ccf34 in ndn::util::UtilTestFaceUri::CanonizeEmptyCallback::test_method() ../tests/unit-tests/util/face-uri.cpp:404
    #13 0x6cd2cf in CanonizeEmptyCallback_invoker ../tests/unit-tests/util/face-uri.cpp:380
    #14 0x44d2de in invoke<void (*)()> /usr/include/boost/test/utils/callback.hpp:56
    #15 0x44d2de in boost::unit_test::ut_detail::callback0_impl_t<boost::unit_test::ut_detail::unused, void (*)()>::invoke() /usr/include/boost/test/utils/callback.hpp:89
    #16 0x7f7220b855a0 (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x685a0)
    #17 0x7f7220b60865 in boost::execution_monitor::catch_signals(boost::unit_test::callback0<int> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x43865)
    #18 0x7f7220b610a2 in boost::execution_monitor::execute(boost::unit_test::callback0<int> const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x440a2)
    #19 0x7f7220b856a1 in boost::unit_test::unit_test_monitor_t::execute_and_translate(boost::unit_test::test_case const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x686a1)
    #20 0x7f7220b6f2f3 in boost::unit_test::framework_impl::visit(boost::unit_test::test_case const&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x522f3)
    #21 0x7f7220b9e2d2 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x812d2)
    #22 0x7f7220b9e2d2 in boost::unit_test::traverse_test_tree(boost::unit_test::test_suite const&, boost::unit_test::test_tree_visitor&) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x812d2)
    #23 0x7f7220b6a819 in boost::unit_test::framework::run(unsigned long, bool) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x4d819)
    #24 0x7f7220b83283 in boost::unit_test::unit_test_main(bool (*)(), int, char**) (/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.55.0+0x66283)
    #25 0x81f1cc in main /usr/include/boost/test/unit_test.hpp:59
    #26 0x7f721f674ec4 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21ec4)
    #27 0x43a6b8 (/home/davide/ndn-cxx/build/unit-tests+0x43a6b8)

0x61200000ee50 is located 272 bytes inside of 280-byte region [0x61200000ed40,0x61200000ee58)
freed by thread T0 here:
    #0 0x7f722194663f in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5863f)
    #1 0xa570b2 in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> >::deallocate(std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /usr/include/c++/4.9/ext/new_allocator.h:110
    #2 0xa570b2 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> > >::deallocate(std::allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> >&, std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /usr/include/c++/4.9/bits/alloc_traits.h:383
    #3 0xa570b2 in std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2>::_M_destroy() /usr/include/c++/4.9/bits/shared_ptr_base.h:535

previously allocated by thread T0 here:
    #0 0x7f722194613f in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5813f)
    #1 0xa54880 in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> >::allocate(unsigned long, void const*) /usr/include/c++/4.9/ext/new_allocator.h:104
    #2 0xa54880 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> > >::allocate(std::allocator<std::_Sp_counted_ptr_inplace<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, (__gnu_cxx::_Lock_policy)2> >&, unsigned long) /usr/include/c++/4.9/bits/alloc_traits.h:357
    #3 0xa54880 in __shared_count<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, const std::function<void(const boost::asio::ip::address&)>&, const std::function<void(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>&, const std::function<bool(const boost::asio::ip::address&)>&, std::reference_wrapper<boost::asio::io_service> > /usr/include/c++/4.9/bits/shared_ptr_base.h:616
    #4 0xa54880 in __shared_ptr<std::allocator<ndn::dns::Resolver>, const std::function<void(const boost::asio::ip::address&)>&, const std::function<void(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>&, const std::function<bool(const boost::asio::ip::address&)>&, std::reference_wrapper<boost::asio::io_service> > /usr/include/c++/4.9/bits/shared_ptr_base.h:1090
    #5 0xa54880 in shared_ptr<std::allocator<ndn::dns::Resolver>, const std::function<void(const boost::asio::ip::address&)>&, const std::function<void(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>&, const std::function<bool(const boost::asio::ip::address&)>&, std::reference_wrapper<boost::asio::io_service> > /usr/include/c++/4.9/bits/shared_ptr.h:316
    #6 0xa54880 in allocate_shared<ndn::dns::Resolver, std::allocator<ndn::dns::Resolver>, const std::function<void(const boost::asio::ip::address&)>&, const std::function<void(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>&, const std::function<bool(const boost::asio::ip::address&)>&, std::reference_wrapper<boost::asio::io_service> > /usr/include/c++/4.9/bits/shared_ptr.h:588
    #7 0xa54880 in make_shared<ndn::dns::Resolver, const std::function<void(const boost::asio::ip::address&)>&, const std::function<void(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)>&, const std::function<bool(const boost::asio::ip::address&)>&, std::reference_wrapper<boost::asio::io_service> > /usr/include/c++/4.9/bits/shared_ptr.h:604
    #8 0xa54880 in ndn::dns::asyncResolve(std::string const&, std::function<void (boost::asio::ip::address const&)> const&, std::function<void (std::string const&)> const&, boost::asio::io_service&, std::function<bool (boost::asio::ip::address const&)> const&, boost::chrono::duration<long, boost::ratio<1l, 1000000000l> > const&) ../src/util/dns.cpp:136

SUMMARY: AddressSanitizer: heap-use-after-free ../src/util/scheduler.cpp:168 ndn::Scheduler::onEvent(boost::system::error_code const&)
Shadow bytes around the buggy address:
  0x0c247fff9d70: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c247fff9d80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c247fff9d90: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c247fff9da0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c247fff9db0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c247fff9dc0: fd fd fd fd fd fd fd fd fd fd[fd]fa fa fa fa fa
  0x0c247fff9dd0: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c247fff9de0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c247fff9df0: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c247fff9e00: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c247fff9e10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==23852==ABORTING
Actions #10

Updated by Davide Pesavento over 9 years ago

The use-after-free happens only (or is detected only) when the machine is completely disconnected from the Internet.

Actions #11

Updated by Vince Lehman over 9 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Vince Lehman over 9 years ago

Junxiao Shi wrote:
The pcap trace in note-5 reveals that AAAA records are correctly resolved by the DNS server.
It's unknown why resolver fails to return the record.

I am able to reproduce the bug on a 64-bit Ubuntu 14.10 VM.

After some testing, the reason the records are not returned during face canonization is due to timeout.

There is a delay which occasionally occurs while the queries are being sent; Wireshark shows that DNS queries completely stop and then resume after 5 seconds. Canonization is allotted 4 seconds to complete before a timeout occurs and so the test fails. When I increase the time allowed for canonization above 5 seconds I cannot reproduce the error.

I am attaching two pcap traces which show the delay:

1.) Canonization time of 4 seconds where the test fails; delay of 4.998087 seconds

2.) Canonization time of 10 seconds where the test succeeds; delay of 4.998563 seconds

Davide's trace shows a delay of 4.968998 seconds.

I will continue to look into this to try and find why the resolver stops sending queries.

Actions #14

Updated by Vince Lehman over 9 years ago

I am able to reproduce the above delay on Ubuntu 14.10 using this cpp program (g++ ... -lresolv):

#include <resolv.h>

int
main(int argc, char* argv[])
{
  u_char answer[128];

  for (int i = 0; i < 1000; ++i) {
    res_query("google-public-dns-a.google.com", C_IN, T_A, answer, sizeof(answer));
  }

  return 0;
}
Actions #15

Updated by Junxiao Shi over 9 years ago

As I can see, 124 DNS queries are sent in the first 5 seconds. This is much more than the number of DNS queries needed by CanonizeUdp test case.

Please try another small program where DNS queries are sent for different host names.

Actions #16

Updated by Davide Pesavento over 9 years ago

any news here?

Actions #17

Updated by Vince Lehman over 9 years ago

Sorry for the lack of updates.

Today I was able to test the same program with these 6 hosts:

google-public-dns-a.google.com
all-routers.mcast.net
invalid.invalid
example.net
192.0.2.333
named-data.net

I sent two requests for each and got the same results; I've attached a pcap file with the captured trace.

Actions #18

Updated by Davide Pesavento about 9 years ago

Looks like one of the requests gets lost (packet no.24 in the trace), and after 5 seconds there's a retransmission. I guess we should simply increase the timeouts in the test case.

This still doesn't explain the use-after-free though.

Actions #19

Updated by Davide Pesavento about 9 years ago

  • Blocks Task #2589: CI: enable AddressSanitizer for unit tests added
Actions #20

Updated by Davide Pesavento about 9 years ago

  • Subject changed from UtilTestFaceUri/CanonizeUdp test case fails on Ubuntu 14.10 to UtilFaceUri/Canonize{Tcp,Udp} test cases randomly fail on Ubuntu 14.10
  • Target version changed from v0.3 to v0.4
  • Start date deleted (11/06/2014)
Actions #21

Updated by Davide Pesavento about 9 years ago

  • Subject changed from UtilFaceUri/Canonize{Tcp,Udp} test cases randomly fail on Ubuntu 14.10 to UtilFaceUri/Canonize{Tcp,Udp,EmptyCallback} test cases randomly fail on Ubuntu 14.10

Found the bug that causes the use-after-free.

CanonizeEmptyCallback uses very short (1ms) timeouts. This means that sometimes, when the Resolver::onResolveSuccess callback is invoked, the timeout has already expired and its callback scheduled for execution. At this point, canceling the event/deadline_timer has no effect. At the end of the execution of onResolveSuccess, the Resolver refcount drops to zero and it's deallocated together with the scheduler instance. Eventually the timer callback in the scheduler is also invoked, which causes a use-after-free as soon as any scheduler instance variable is accessed.

Boost documentation for basic_deadline_timer confirms the above theory:

If the timer has already expired when cancel() is called, then the handlers for asynchronous wait operations will:

  • have already been invoked; or
  • have been queued for invocation in the near future. These handlers can no longer be cancelled, and therefore are passed an error code that indicates the successful completion of the wait operation.

(http://www.boost.org/doc/libs/1_48_0/doc/html/boost_asio/reference/basic_deadline_timer/cancel/overload1.html)

Actions #22

Updated by Junxiao Shi about 9 years ago

The heap-use-after-free is unrelated to this Bug (as defined in Description field), due to the observation in note-14.

Please open a separate issue for heap-use-after-free.

Actions #23

Updated by Davide Pesavento about 9 years ago

  • Subject changed from UtilFaceUri/Canonize{Tcp,Udp,EmptyCallback} test cases randomly fail on Ubuntu 14.10 to UtilFaceUri/Canonize{Tcp,Udp} test cases randomly time out on Ubuntu 14.10

Junxiao Shi wrote:

Please open a separate issue for heap-use-after-free.

#2653

Actions #24

Updated by Davide Pesavento about 9 years ago

  • Blocks deleted (Task #2589: CI: enable AddressSanitizer for unit tests)
Actions #25

Updated by Davide Pesavento about 9 years ago

  • Status changed from In Progress to Code review
  • Assignee changed from Vince Lehman to Davide Pesavento
  • % Done changed from 0 to 100
Actions #26

Updated by Davide Pesavento about 9 years ago

  • Status changed from Code review to Closed
Actions #27

Updated by Junxiao Shi over 8 years ago

  • Related to Bug #3270: Occasional non-reproducible test failure in UtilDns/AsynchronousV6 added
Actions

Also available in: Atom PDF