Bug #4410: `run_tests.py test_ndnping` fails to terminate - NFD - NDN project issue tracking system

Actions

Copy link

Bug #4410

closed

`run_tests.py test_ndnping` fails to terminate

Added by Junxiao Shi over 7 years ago. Updated about 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Eric Newberry

Category:

Integration Tests

Target version:

v0.7

Start date:

Due date:

% Done:

100%

Estimated time:

Description

Environment: Ubuntu 16.04 single node

Steps to reproduce:

./install_apps.py install_all
./run_tests.py test_ndnping

Expected: script terminates after testing completes
Actual: script prints "Ran 1 test in 6.022s OK" but fails to terminate

Diagnostics:

$ pstree -p $(pgrep run_tests.py | head -1)
run_tests.py(9466)───run_tests.py(9474)───sudo(9477)───nfd(9479)─┬─{nfd}(9481)
                                                                 └─{nfd}(9482)

Same issue occurs with `./run_tests.py test_cs_freshness

Files

Download all files

4833-3.txz (23.9 KB) 4833-3.txz		Junxiao Shi, 07/02/2018 06:15 AM
4833-4.txz (23.5 KB) 4833-4.txz		Junxiao Shi, 07/05/2018 09:30 AM
4833-5.txz (23.4 KB) 4833-5.txz		Junxiao Shi, 07/09/2018 09:00 AM

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Davide Pesavento over 7 years ago

Related to Bug #4379: integration tests: fix broken tests added

Actions

Copy link

Updated by Eric Newberry over 7 years ago

In the Vagrant environment, this script terminates when running all tests together (test_all). I have not tried when running the single test, but I plan to test this shortly.

Actions

Copy link

Updated by Eric Newberry over 7 years ago

I am unable to replicate this issue in the Vagrant environment.

Actions

Copy link

Updated by Eric Newberry over 7 years ago

Junxiao, what particular environment are you encountering this issue on? Emulab?

Actions

Copy link

Updated by Junxiao Shi over 7 years ago

what particular environment are you encountering this issue on?

It’s a node on a private Emulab system.

Actions

Copy link

Updated by Eric Newberry over 7 years ago

Start date deleted (~~12/21/2017~~)

I'm actually seeing this issue when I run the integration tests on Ubuntu 16.04 64-bit in Virtualbox (Vagrant uses 14.04 64-bit). It appears with test_ndnping, but not test_cs_freshness in my environment.

Actions

Copy link

Updated by Junxiao Shi over 7 years ago

Cause of this issue is ProcessManager.killProcess is using the wrong PID to kill process.

def killProcess(self, processKey):
    if processKey not in self.results and processKey in self.subprocesses:
        subprocess.call(['sudo', 'kill', str(self.subprocesses[processKey].pid)])

When a process is started, Python's subprocess module assigns self.subprocesses[processKey].pid to be the PID of started process.
Since NFD daemonizes itself, the PID does not match nfd process.
I confirmed this by inserting print to Python code and comparing with pgrep.

To fix this issue, ProcessManager.startNfd and ProcessManager.killNfd should use nfd-start and nfd-stop scripts.

Actions

Copy link

Updated by Eric Newberry over 7 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Eric Newberry over 7 years ago

Status changed from In Progress to Code review
% Done changed from 0 to 100

Actions

Copy link

#10

Updated by Davide Pesavento over 7 years ago

Junxiao Shi wrote:

Since NFD daemonizes itself, the PID does not match nfd process.

What are you talking about? NFD does not daemonize itself.

Actions

Copy link

#11

Updated by Eric Newberry over 7 years ago

Junxiao, what I believe you're seeing in the issue description are two threads of nfd (in the curly braces). I believe (and may be wrong) that terminating the parent process would terminate both threads and that the existing code should work to terminate NFD.

Actions

Copy link

#12

Updated by Eric Newberry over 7 years ago

Another thought: Perhaps kill is terminating sudo, leaving the nfd process as an orphan.

Actions

Copy link

#13

Updated by Davide Pesavento over 7 years ago

Eric Newberry wrote:

Another thought: Perhaps kill is terminating sudo, leaving the nfd process as an orphan.

Yes, kill is signaling the sudo process, but sudo should propagate the signal to its child process.

Actions

Copy link

#14

Updated by Eric Newberry over 7 years ago

We could also just rewrite this test case as a Bash script, like most other test cases. These other tests cases do similar things, but do not seem to encounter this issue.

Actions

Copy link

#15

Updated by Junxiao Shi over 7 years ago

We could also just rewrite this test case as a Bash script, like most other test cases.

After all tests moved to bash, there’s no need for a Python wrapper for each test. Use a bash script to invoke each test instead.

These other tests cases do similar things, but do not seem to encounter this issue.

They use nfd-stop or killall nfd mostly, as they should have.

Actions

Copy link

#16

Updated by Eric Newberry about 7 years ago

Blocks Task #4380: Run integration tests for every Jenkins build added

Actions

Copy link

#17

Updated by Eric Newberry about 7 years ago

I ran each test individually on the Vagrant environment and found that the following ones failed to terminate:

test_interest_aggregation
test_ndnpeekpoke
test_ndnping
test_ndntraffic

Actions

Copy link

#18

Updated by Eric Newberry about 7 years ago

I pushed a change to rewrite test_ndnpeekpoke, test_ndnping, and test_ndntraffic as Bash-based tests, which resolves this issue. I decided to leave test_interest_aggregation to #4379, since it's broken anyway.

Actions

Copy link

#19

Updated by Junxiao Shi about 7 years ago

File 4833-3.txz 4833-3.txz added

Change 4833,3 fails to terminate in ./run-vagrant-tests.sh.

Node A has the following process when stuck. Test proceeds after executing nfd-stop in node A.

vagrant@vagrant:~/integration-tests$ pstree -p 2694
run_tests.py(2694)---run_tests.py(5858)---sudo(5861)---nfd(5863)-+-{nfd}(5864)
                                                                 `-{nfd}(5865)

Actions

Copy link

#20

Updated by Eric Newberry about 7 years ago

Junxiao Shi wrote:

Change 4833,3 fails to terminate in ./run-vagrant-tests.sh.

Node A has the following process when stuck. Test proceeds after executing nfd-stop in node A.
vagrant@vagrant:~/integration-tests$ pstree -p 2694
run_tests.py(2694)---run_tests.py(5858)---sudo(5861)---nfd(5863)-+-{nfd}(5864)
                                                                 `-{nfd}(5865)

As I said in note 18, I'm not planning to fix test_interest_aggregation, so this is probably the cause.

Actions

Copy link

#21

Updated by Junxiao Shi about 7 years ago

File 4833-4.txz 4833-4.txz added

Change 4833,3 fails to terminate in ./run-vagrant-tests.sh.

I also see error messages during execution:

../permanent-face-test.sh: line 34: [[: 0
0: syntax error in expression (error token is "0")
./permanent-face-test.sh: line 52: [[: 100
0: syntax error in expression (error token is "0")
./permanent-face-test.sh: line 74: [[: 0
10: syntax error in expression (error token is "10")

Actions

Copy link

#22

Updated by Eric Newberry about 7 years ago

Junxiao Shi wrote:

I also see error messages during execution:

../permanent-face-test.sh: line 34: [[: 0
0: syntax error in expression (error token is "0")
./permanent-face-test.sh: line 52: [[: 100
0: syntax error in expression (error token is "0")
./permanent-face-test.sh: line 74: [[: 0
10: syntax error in expression (error token is "10")

Fixing the test these are occurring in is not part of this issue, but rather #4379.

Actions

Copy link

#23

Updated by Junxiao Shi about 7 years ago

Fixing the test these are occurring in is not part of this issue, but rather #4379.

I'm not judging which issue the error messages belong. I'm stating a fact of the appearance of these error messages, just like Jenkins fails the build whenever a test case fails regardless of whether it relates to the current commit.

Actions

Copy link

#24