Project

General

Profile

Actions

Task #4095

closed

DNS Lookups on Ubuntu 17.04 Fail Sometimes

Added by Eric Newberry almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Utils
Target version:
-
Start date:
05/26/2017
Due date:
% Done:

100%

Estimated time:

Description

IPv6 DNS lookups appear to fail sometimes when running unit tests, depending upon the network environment. In environments where they fail, the test cases in question succeed on second and further runs. The test cases in question are Util/TestDns/AsynchronousV6 and Util/TestDns/AsychronousV4AndV6.

Removing systemd-resolved from the hosts line in /etc/nsswitch.conf appears to fix this issue.


Files

5415-enp0s3_nossh.pcap (20.2 KB) 5415-enp0s3_nossh.pcap Junxiao Shi, 05/28/2017 07:07 AM
5415-lo.pcap (79.5 KB) 5415-lo.pcap Junxiao Shi, 05/28/2017 07:07 AM

Related issues 1 (0 open1 closed)

Blocks NFD - Task #4002: Jenkins: Ubuntu 17.04 slaveClosedEric Newberry

Actions
Actions #1

Updated by Eric Newberry almost 7 years ago

  • Blocks Task #4002: Jenkins: Ubuntu 17.04 slave added
Actions #2

Updated by Junxiao Shi almost 7 years ago

Can you upload a Wireshark capture of the machine when the problematic unit test is running?
The capture should show port 53 traffic on both IPv4 and IPv6.

Actions #3

Updated by Eric Newberry almost 7 years ago

Strangely, I'm not getting this issue anymore. I was only getting it on the University network and not on my home network. However, now when I run the tests from within the University network (on Vagrant), the tests pass.

Actions #4

Updated by Junxiao Shi almost 7 years ago

http://jenkins.named-data.net/job/NFD/5409/ failed due to DNS issue. Can you keep a tcpdump continuously running on 17.04 slaves, capturing DNS traffic only, so that we have a log whenever the problem appears again?

Actions #5

Updated by Eric Newberry almost 7 years ago

  • Subject changed from IPv6 DNS Lookups on Ubuntu 17.04 Fail Sometimes to DNS Lookups on Ubuntu 17.04 Fail Sometimes

It appears that the DNS lookup issues happen when doing reverse lookups (IP address to hostname) during FaceUri canonization and affect IPv4 (and potentially IPv6). The queries appear to be timing out.

Actions #6

Updated by Davide Pesavento almost 7 years ago

Eric Newberry wrote:

It appears that the DNS lookup issues happen when doing reverse lookups (IP address to hostname) during FaceUri canonization

AFAIK we don't do any reverse lookups during FaceUri canonization.

Actions #7

Updated by Junxiao Shi almost 7 years ago

Is it a problem with the host node, or a problem with Ubuntu 17.04 OS?

I also notice that all slaves on that host node is down at the moment.
I'd suggest to ensure each OS version has slaves on at least two host nodes. This can expose any host-node specific issue, and also prevents single point of failure.

UPDATE: Arizona UITS is upgrading firewall, causing remote nodes to appear offline. But it's still a good idea to shift slaves onto two or more sites.

Updated by Junxiao Shi almost 7 years ago

I did a tcpdump capture on Ubuntu-17.04-64bit-csu-10022.
Unlike previous attempts, all DNS related tests are passing.
Surprisingly, there is no DNS packet seen in tcpdump.

Actions #9

Updated by Eric Newberry almost 7 years ago

Junxiao Shi wrote:

Surprisingly, there is no DNS packet seen in tcpdump.

Yes, I noticed this when I did tcpdumps on these nodes yesterday. I believe the DNS queries are being resolved using alternative means, perhaps systemd-resolved. However, systemd-resolved should also use port 53 for its traffic.

Actions #10

Updated by Eric Newberry almost 7 years ago

Davide Pesavento wrote:

Eric Newberry wrote:

It appears that the DNS lookup issues happen when doing reverse lookups (IP address to hostname) during FaceUri canonization

AFAIK we don't do any reverse lookups during FaceUri canonization.

Here's some output from the tools unit tests run on Ubuntu 17.04:

../tests/tools/nfdc/face-module.t.cpp(395): error: in "Nfdc/TestFaceModule/CreateCommand/ErrorConflict": check exitCode == 1 has failed [4 != 1]
../tests/tools/nfdc/face-module.t.cpp(397): error: in "Nfdc/TestFaceModule/CreateCommand/ErrorConflict": check err.is_equal("Error 409 when creating face: conflict-409\n") has failed. Output content: "Error when canonizing 'udp://20.53.73.45': Hostname resolution timed out
"
Actions #11

Updated by Davide Pesavento almost 7 years ago

I don't have an explanation for that output, but there's no code in ndn-cxx, NFD, or nfdc to perform reverse resolutions, so something else must be going on.

As a side note, IpHostCanonizeProvider could avoid calling dns::asyncResolve() if the host portion of the URI already contains a valid IP address.

Actions #12

Updated by Davide Pesavento almost 7 years ago

Junxiao Shi wrote:

Is it a problem with the host node, or a problem with Ubuntu 17.04 OS?

Seems to be the former.

Job 5172 on a UA slave passed successfully, while Job 5173 failed on a CSU slave. Therefore this issue seems related to the host or network environment at CSU.

Actions #13

Updated by Eric Newberry almost 7 years ago

I've launched a second 17.04 node at UA and disconnected both 17.04 nodes at CSU. Hopefully, this will resolve the issues as this issue seems specific to the CSU environment.

Actions #14

Updated by Davide Pesavento almost 7 years ago

  • Status changed from New to Feedback

Seems to be working for now. Should we close (or reject) this issue?

Actions #15

Updated by Eric Newberry almost 7 years ago

Davide Pesavento wrote:

Seems to be working for now. Should we close (or reject) this issue?

Probably not reject since there were changes merged for this issue.

Actions #16

Updated by Eric Newberry almost 7 years ago

  • Status changed from Feedback to Closed

It appears that this issue was potentially fixed by the merged changes and any remaining issues are possibly specific to the CSU environment. Closing for now.

Actions #17

Updated by Eric Newberry almost 7 years ago

  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF