nfd-status-http-server subprocesses hang with data size over 65536
nfd-status-http-server.py spawns subprocesses of 'nfd-status -x' to gather
nfd status information. If the size of the data returned from such an
nfd-status is over 65536 the subprocess hangs. I believe this is
a limitation in the subprocess.PIPE mechanism.
There is some discussion of this issue here:
This is a rather serious problem as of our Testbed nodes are now limited
in the number of links they can have. For example when I tried to add the WASEDA
node I wanted to add a link from WASEDA to ARIZONA but was unable to because
the ARIZONA node would eventually hang with all the stuck subprocesses caused
by the testbed status page trying to monitor its status.
Updated by Davide Pesavento almost 7 years ago
Pipes are not buffers. The reading side of a pipe is expected to consume available data as soon as possible. The 64K limitation actually comes from the kernel, but it's intended. From pipe(7):
A pipe has a limited capacity. If the pipe is full, then a write(2) will block or fail, depending on whether the O_NONBLOCK flag is set (see below). Different implementations have different limits for the pipe capacity. Applications should not rely on a particular capacity: an application should be designed so that a reading process consumes data as soon as it is available, so that a writing process does not remain blocked.
In Linux versions before 2.6.11, the capacity of a pipe was the same as the system page size (e.g., 4096 bytes on i386). Since Linux 2.6.11, the pipe capacity is 65536 bytes. Since Linux 2.6.35, the default pipe capacity is 65536 bytes, but the capacity can be queried and set using the fcntl(2) F_GETPIPE_SZ and F_SETPIPE_SZ operations. See fcntl(2) for more information.
Python's documentation has several warnings about incorrect usage of
Note: Do not use stdout=PIPE or stderr=PIPE with this function as that can deadlock based on the child process output volume. Use Popen with the communicate() method when you need pipes.
Warning: This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
So the solution is to use
Popen.communicate() to periodically read from the pipe.