Why does subprocess keep running after communicate() is finished? - python

I have an older python 2.7.5 script which suddenly makes problems on Red Hat Enterprise Linux Server release 7.6 (Maipo). After all I see, it runs fine on Red Hat Enterprise Linux Server release 7.4 (Maipo).
The script basically implements something like
cat /proc/cpuinfo | grep -m 1 -i 'cpu MHz'
by creating two subrocesses and piping the output of the first into the second (see code example below). On the newer OS version, the cat processes stay open until the script terminates.
It seems, that the pipe to grep somehow holds the cat-process open and I can't find any documentation on how to explicitely close it.
The issue can be reproduced by pasting this code into the python CLI and then checking the ps process list for a static process 'cat /proc/cpuinfo'.
The code is breaking down what's originally happening inside a loop, so please don't argue about its style. ;-)
import shlex
from subprocess import *
cmd1 = "cat /proc/cpuinfo"
cmd2 = "grep -m 1 -i 'cpu MHz'"
args1 = shlex.split(cmd1) # split into args
args2 = shlex.split(cmd2) # split into args
# first process uses default stdin
ps1 = Popen(args1, stdout=PIPE)
# then use the output of the previous process as stdin
ps2 = Popen(args2, stdin=ps1.stdout, stdout=PIPE)
out, err = ps2.communicate()
print(out)
Afterwards check the process list in a second session(!) with:
ps -eF |grep -v grep|grep /proc/cpuinfo
On RHEL7.4 I find no open process in the process list, whereas on RHEL 7.6 after some attempts it looks like this:
[reinski#myhost ~]$ ps -eF |grep -v grep|grep /proc/cpuinfo
reinski 2422 89459 0 26993 356 142 18:46 pts/3 00:00:00 cat /proc/cpuinfo
reinski 2597 139605 0 26993 352 31 18:39 pts/3 00:00:00 cat /proc/cpuinfo
reinski 7809 139605 0 26993 352 86 18:03 pts/3 00:00:00 cat /proc/cpuinfo
These processes will only dissappear when I close the python CLI, in which case I get errors like this (I left the formatting messed up as it was):
cat: write error: Broken pipe
cat: write errorcat: write error: Broken pipe
: Broken pipe
Why is cat obviously still wanting to write to the pipe, even though it should have already output the whole /proc/cpuinfo and should have terminated itself?
Or more important: How can I prevent this from happening?
Thanks for any help!
Example 2:
Given the suggestion from VPfB it turned out, that my example was a little unlucky, since the expected result can be achieved by a single grep command.
So here is a modified example to show the problem with piping in another way:
import shlex
from subprocess import *
cmd1 = "grep -m 1 -i 'cpu MHz' /proc/cpuinfo"
cmd2 = "awk '{print $4}'"
args1 = shlex.split(cmd1) # split into args
args2 = shlex.split(cmd2) # split into args
# first process uses default stdin
ps1 = Popen(args1, stdout=PIPE)
# then use the output of the previous process as stdin
ps2 = Popen(args2, stdin=ps1.stdout, stdout=PIPE)
out, err = ps2.communicate()
print(out)
This time, the result is a single zombie process for the grep process (169731 is the pid of the python session):
[reinski#myhost ~]$ ps -eF|grep 169731
reinski 169731 189499 0 37847 6024 198 17:51 pts/2 00:00:00 python
reinski 193999 169731 0 0 0 142 17:53 pts/2 00:00:00 [grep] <defunct>
So, is this just another symptom of the same problem or am I doing something completely wrong here?

Ok, it seems I just found a solution for the zombie processes staying open from the examples:
Simply need to do a
ps1.communicate()
It seems, this is required to close the pipe properly.
I'd expect this to happen when the second process's communicate() is called and it reads the pipe from the first process.
Can someone maybe point out to me, what I am missing here?
I am always willing to learn... ;-)

Related

Output garbled when launching multiple ssh-sessions with pseudo-tty (need remote process to exit when ssh disconnects/is killed)

I have a python script that opens multiple concurrent pseudo-tty ssh sessions to a server. My problem is that the output is garbled:
for i in range(0, 3):
subprocess.Popen(
"ssh -tt -q myserver 'echo 11; echo 22; echo 33; echo 44;'",
shell=True
)
Output:
11
22
33
44
11
22
33
44
11
22
33
44
The output varies. Sometimes it works, but most of the time I get those weird indentations. In reality I want to launch remote python processes (a locust load gen slave), but I've simplified it to just use echo.
Things I've tried:
universal_newlines=True, bufsize=1 (doesnt help)
remove -tt (fixes the output but has the undesired side effect of remote processes not dying right away if python/ssh is terminated)
piping to cat -e to get hidden characters (for debugging):
11^M$
22^M$
33^M$
44^M$
11$
22$
33$
44$
11$
22$
33$
44$
I'm not sure if is even a python issue or just an SSH issue. My guess is that I need to use some sort of line buffering, but I dont know how :-/
I'm on MacOS Mojave, and I've tried both in iTerm2 and Term if that matters.
Edit: I'm not sure it is related, but the problem appears to occur more frequently if I ensure python keeps running until the ssh session has terminated (by adding time.sleep(10) at the end of the script)
edit 2: I tried #FLemaitre 's solution (not using -tt and killing explicitly), and it works in the simple case, but not when spawning locust:
proc = subprocess.Popen(
"ssh servername 'locust --slave --master-port 7777 --no-web -f locustfile.py & read; kill $!'",
shell=True,
stdin=subprocess.PIPE,
)
time.sleep(10)
proc.kill()
proc.wait()
On the remote a bash -c locust --slave ... process is started. It dies when ssh is killed, but locust itself (a child of the above process) does not :-/
I reproduce systematically the issue with the following script:
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 10):
proc = subprocess.Popen(
"ssh -tt -q localhost 'echo 11; echo 22; echo 33; '",
shell=True
)
time.sleep(4)
And I think the issue is not related to Python. These multiple ssh with pseudo-TTY seem to conflict with each other's. Eventually, the terminal used to run this script ends up broken as well (whereas it wasn't sourced):
>cat test2.py
import subprocess
import time
import atexit
... etc ...
I checked the documentation and this -t option seems to do much more than what you are actually trying to achieve. When I remove the second t and the -q options, I sometimes (not often), get a cryptic error message stating that something went wrong (but I no longer manage to reproduce it). I checked with google but without much success. Still, I'm convinced that this option is overkill and I would rather focus on the undying processes. This one issue is well known:
Starting a process over ssh using bash and then killing it on sigint
The second answer is your -tt option, but the best answer suits your example very well and is superior (with -tt you solve the ssh propagation of the termination but do not tackle the same issue between Python and its subprocess). For example:
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 10):
proc = subprocess.Popen(
"ssh localhost 'sleep 90 & read ; kill $!'",
shell=True,
stdin=subprocess.PIPE
)
time.sleep(40)
With this solution, stdin is shared by all actors (python, the python subprocess, the ssh process, the sleep process), and its closure at any point in the chain is detected by the final business process, trigering a graceful shutdown.
Edit with locust:
I gave it a quick try and the issue was that a simple 'kill' is ignored by the slave (looks like an issue on lucust side). It seems to work with a 'kill -9':
import subprocess
import time
if __name__ == "__main__":
for i in range(0, 2):
proc = subprocess.Popen(
"ssh localhost 'python -m locust --slave --no-web -f ~devsup/users/flemaitre/tmp/locust_config.py & read ; kill -9 $!'",
shell=True,
stdin=subprocess.PIPE
)
time.sleep(40)

cat blocking when called via subprocess.Popen()

Im having a unexpected behaviour of the linux cat when its called via subprocess.Popen().
The Python script is structured like such:
import os, subprocess
def _degrade_child_rights(user_uid, user_gid):
def result():
os.setgid(user_gid)
os.setegid(user_gid)
os.setuid(user_uid)
os.seteuid(user_uid)
return result
child = subprocess.Popen("cat /home/myuser/myfolder/screenlog.0",
preexec_fn=_degrade_child_rights(0, 0), shell=True,
stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
When i check the executed shell-command with ps aux | grep cat it shows me that python successfully run the shell-command.
> ps aux | grep cat
root 21236 0.0 0.0 6564 780 pts/1 S 20:49 0:00 /bin/sh -c cat /home/myuser/myfolder/screenlog.0
root 21237 0.0 0.0 11056 732 pts/1 S 20:49 0:00 cat /home/myuser/myfolder/screenlog.0
root 21476 0.0 0.0 15800 936 pts/1 S+ 20:52 0:00 grep --color=auto cat
However, the cat command never finishes.
I also outsourced the cat $file command to a bash-script. Then bash executes my cat-call, but also blocks.
When i manually execute cat $file it runs like expected, so a not existing EOF at the end of the file is also impossible.
I think, the '/bin/sh -c' added by Popen messes somehow with the correct execution of cat $file.
Can i somehow prevent this?
You might want to try the communicate method for the Popen object:
Popen.communicate(input=None)
Interact with process: Send data to
stdin. Read data from stdout and stderr, until end-of-file is reached.
Wait for process to terminate. The optional input argument should be a
string to be sent to the child process, or None, if no data should be
sent to the child.
communicate() returns a tuple (stdoutdata, stderrdata).
Note that if you want to send data to the process’s stdin, you need to
create the Popen object with stdin=PIPE. Similarly, to get anything
other than None in the result tuple, you need to give stdout=PIPE
and/or stderr=PIPE too.
Note The data read is buffered in memory, so do not use this method if
the data size is large or unlimited.
There is more info in the subprocess python doc

pid of a process created by python subprocess.Popen(shell=True) is not the pid of the spawned shell

i have a python script for test:
test.py:
#coding=utf-8
import os
import time
print os.getpid()
call it by subprocess.Popen:
p = sp.Popen("python test.py", shell=True)
print p.pid
different outputs of these two print statement are expected as p.pid should be the pid of the shell process spawned, but the real output is:
In [18]: p = sp.Popen("python test.py", shell=True)
In [19]: 19108
In [19]: p.pid
Out[19]: 19108
I believe you are on UNIX/Linux. If I may restate your question, I think you're asking, given
p = subprocess.Popen("python test.py", shell=True)
why is p.pid the same as that of the test.py process rather than that of the intervening shell, which shell you explicitly requested? That is, you expect the process genealogy to look like this:
python (calling subprocess.Popen) # pid 123
\_ /bin/sh -c 'python test.py' # pid 124
\_ python test.py # pid 125 # note: pids need not be sequential, that's just for demonstration
The answer is, your shell is making an optimization. The shell recognizes that it has been given a simple command and simply execves that command, replacing itself — but not its PID, of course — with the new process. So, the genealogy looks like this:
python (calling Popen) # pid 201
\_ /bin/sh -c ... --execve--> python test.py # pid 202
On Linux you can strace -fe trace=process ... to confirm this. You'll see the top-level python process fork (er, clone) and then the child will exec /bin/sh and then again python.

Output from subprocess.Popen

I have been writing some python code and in my code I was using "command"
The code was working as I intended but then I noticed in the Python docs that command has been deprecated and will be removed in Python 3 and that I should use "subprocess" instead.
"OK" I think, "I don't want my code to go straight to legacy status, so I should change that right now.
The thing is that subprocess.Popen seems to prepend a nasty string to the start of any output e.g.
<subprocess.Popen object at 0xb7394c8c>
All the examples I see have it there, it seems to be accepted as given that it is always there.
This code;
#!/usr/bin/python
import subprocess
output = subprocess.Popen("ls -al", shell=True)
print output
produces this;
<subprocess.Popen object at 0xb734b26c>
brettg#underworld:~/dev$ total 52
drwxr-xr-x 3 brettg brettg 4096 2011-05-27 12:38 .
drwxr-xr-x 21 brettg brettg 4096 2011-05-24 17:40 ..
<trunc>
Is this normal? If I use it as part of a larger program that outputs various formatted details to the console it messes everything up.
I'm using the command to obtain the IP address for an interface by using ifconfig along with various greps and awks to scrape the address.
Consider this code;
#!/usr/bin/python
import commands,subprocess
def new_get_ip (netif):
address = subprocess.Popen("/sbin/ifconfig " + netif + " | grep inet | grep -v inet6 | awk '{print $2}' | sed 's/addr://'i", shell=True)
return address
def old_get_ip (netif):
address = commands.getoutput("/sbin/ifconfig " + netif + " | grep inet | grep -v inet6 | awk '{print $2}' | sed 's/addr://'i")
return address
print "OLD IP is :",old_get_ip("eth0")
print ""
print "NEW IP is :",new_get_ip("eth0")
This returns;
brettg#underworld:~/dev$ ./IPAddress.py
OLD IP is : 10.48.16.60
NEW IP is : <subprocess.Popen object at 0xb744270c>
brettg#underworld:~/dev$ 10.48.16.60
Which is fugly to say the least.
Obviously I am missing something here. I am new to Python of course so I'm sure it is me doing the wrong thing but various google searches have been fruitless to this point.
What if I want cleaner output? Do I have to manually trim the offending output or am I invoking subprocess.Popen incorrectly?
The "ugly string" is what it should be printing. Python is correctly printing out the repr(subprocess.Popen(...)), just like what it would print if you said print(open('myfile.txt')).
Furthermore, python has no knowledge of what is being output to stdout. The output you are seeing is not from python, but from the process's stdout and stderr being redirected to your terminal as spam, that is not even going through the python process. It's like you ran a program someprogram & without redirecting its stdout and stderr to /dev/null, and then tried to run another command, but you'd occasionally see spam from the program. To repeat and clarify:
<subprocess.Popen object at 0xb734b26c> <-- output of python program
brettg#underworld:~/dev$ total 52 <-- spam from your shell, not from python
drwxr-xr-x 3 brettg brettg 4096 2011-05-27 12:38 . <-- spam from your shell, not from python
drwxr-xr-x 21 brettg brettg 4096 2011-05-24 17:40 .. <-- spam from your shell, not from python
...
In order to capture stdout, you must use the .communicate() function, like so:
#!/usr/bin/python
import subprocess
output = subprocess.Popen(["ls", "-a", "-l"], stdout=subprocess.PIPE).communicate()[0]
print output
Furthermore, you never want to use shell=True, as it is a security hole (a major security hole with unsanitized inputs, a minor one with no input because it allows local attacks by modifying the shell environment). For security reasons and also to avoid bugs, you generally want to pass in a list rather than a string. If you're lazy you can do "ls -al".split(), which is frowned upon, but it would be a security hole to do something like ("ls -l %s"%unsanitizedInput).split().
See the subprocess module documentation for more information.
Here is how to get stdout and stderr from a program using the subprocess module:
from subprocess import Popen, PIPE, STDOUT
cmd = 'echo Hello World'
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
output = p.stdout.read()
print output
results:
b'Hello\r\n'
you can run commands with PowerShell and see results:
from subprocess import Popen, PIPE, STDOUT
cmd = 'powershell.exe ls'
p = Popen(cmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
output = p.stdout.read()
useful link
The variable output does not contain a string, it is a container for the subprocess.Popen() function. You don't need to print it. The code,
import subprocess
output = subprocess.Popen("ls -al", shell=True)
works perfectly, but without the ugly : <subprocess.Popen object at 0xb734b26c> being printed.

Python bash pipe

I want to pipe a python script's output to a bash script. What i did so far was i tried to use os.popen(), sys.subprocess(), and tried to give a pipe for an example
os.popen('echo "P 1 1 591336 4927369 1 321 " | v.in.ascii -zn out=abcx format=standard --overwrite')
but this didn't work, the values "591336" and "4927369" are the variables which comes as the output of the python script. but when I do this or change the values manually by repeating the echo command and the pipe, it works (in bash).
v.in.ascii -zn out=abcx format=standard --overwrite
the above part of the bash command is a part of Grass GIS
Can anyone help me!
You can just use print to output to stdout and pipe the Python process to the next process, e.g.
python myprogram.py | ...
Where myprogram.py might look like:
for x in something:
print dosomething(x)
This works for me:
>>> stdin, stdout = os.popen2("echo %s | grep 'test'" % 'some test param')
>>> print stdout.read()
some test param
>>>
As of Python 2.6, the subprocess module is recommended instead of the deprecated os.popen. Here's an example:
from subprocess import Popen, PIPE
p = Popen(["v.in.ascii", "-zn", "out=abcx", "format=standard", "--overwrite"], stdin=PIPE)
p.stdin.write("P 1 1 591336 4927369 1 321\n")
p.stdin.close()
p.wait() # unless background execution preferred
I really like John Paulett's answer.
I think your echo example would work if you used os.system instead of os.popen.
One way to use popen here is like this:
f = os.popen("v.in.ascii -zn out=abcx format=standard --overwrite", 'w')
f.write("P 1 1 591336 4927369 1 321\n")
f.close()
(You have to specify the pipe is for writing.)

Categories

Resources