Project

General

Profile

Bug #2121

nfd-status-http-server subprocesses hang with data size over 65536

Added by John DeHart almost 5 years ago. Updated almost 5 years ago.

Status:
Closed
Priority:
Urgent
Category:
Tools
Target version:
Start date:
11/03/2014
Due date:
% Done:

100%

Estimated time:
1.00 h

Description

nfd-status-http-server.py spawns subprocesses of 'nfd-status -x' to gather
nfd status information. If the size of the data returned from such an
nfd-status is over 65536 the subprocess hangs. I believe this is
a limitation in the subprocess.PIPE mechanism.

There is some discussion of this issue here:
http://thraxil.org/users/anders/posts/2008/03/13/Subprocess-Hanging-PIPE-is-your-enemy/

This is a rather serious problem as of our Testbed nodes are now limited
in the number of links they can have. For example when I tried to add the WASEDA
node I wanted to add a link from WASEDA to ARIZONA but was unable to because
the ARIZONA node would eventually hang with all the stuck subprocesses caused
by the testbed status page trying to monitor its status.

History

#1 Updated by Davide Pesavento almost 5 years ago

Pipes are not buffers. The reading side of a pipe is expected to consume available data as soon as possible. The 64K limitation actually comes from the kernel, but it's intended. From pipe(7):

A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 65536 bytes. Since Linux 2.6.35, the default pipe capacity is 65536 bytes, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations. See fcntl(2) for more information.

Python's documentation has several warnings about incorrect usage of subprocess.PIPE:

Note: Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.

and on .wait():

Warning: This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.

So the solution is to use Popen.communicate() to periodically read from the pipe.

#2 Updated by Junxiao Shi almost 5 years ago

  • Category set to Tools
  • Assignee set to Alex Afanasyev
  • Target version set to v0.3
  • Estimated time set to 3.00 h

#3 Updated by Alex Afanasyev almost 5 years ago

  • Status changed from New to Code review
  • % Done changed from 0 to 100
  • Estimated time changed from 3.00 h to 1.00 h

#4 Updated by Junxiao Shi almost 5 years ago

To trigger this Bug:

for I in $(seq 0 200); do echo $I | nc -u localhost 6363 & done
curl http://localhost:8080

#5 Updated by Junxiao Shi almost 5 years ago

  • Status changed from Code review to Closed

Also available in: Atom PDF