Subprocess starting two processes instead of one - python

I'm using subprocess to start a process and let it run in the background, it's a server application. The process itself is a java program with a thin wrapper (which among other things, means that I can just launch it as an executable without having to call java explicitly).
I'm using Popen to run the process and when I set shell=False, it runs but it spawns two processes instead of one. The first process has init as its parent and when I inspect it via ps, it just displays the raw command. However, the second process displays with the expanded java arguments (-D and -X flags) - this is what I expect to see and how the process looks when I run the command manually.
Interestingly, when I set shell=True, the command fails. The command does have a help message but it doesn't seem to indicate that there's a problem with my argument list (there shouldn't be). Everything is the same except the shell named argument to Popen.
I'm using Python 2.7 on Ubuntu. Not really sure what's going on here, any help is appreciated. I suppose it's possible that the java command is doing an exec/fork and for some reason, the parent process isn't dying when I start it through Python.
I saw this SO question which looked promising but doesn't change the behavior that I'm experiencing.

This is actually more of a question about the wrapper than about Python -- you would get the same behavior running it from any other language.
To get the behavior you want, the wrapper would want to have the line where it invokes the JVM look as follows:
exec java -D... -cp ... main.class.here "$#"
...as opposed to lacking the exec on front:
java -D... -cp ... main.class.here "$#"
In the former case, the process image of the wrapper is replaced with that of the JVM it invokes; in the latter, the wrapper waits for the JVM to exit, and then continues to run.
If the wrapper does any cleanup after JVM exit, using exec will prevent this from happening and would thus be the Wrong Thing -- in this case, you would want the wrapper to still exist while the JVM runs, as otherwise it would be unable to perform cleanup afterwards.
Be aware that if the wrapper is responsible for detaching the subprocess, it needs to be able to close open file handles for this to happen correctly. Consider passing close_fds=True to your Popen call if your parent process has more file descriptors than only stdin, stdout and stderr open.

Related

Cannot call ubuntu 'ulimit' from python subprocess without using shell option

When I try to call ulimit -n from subprocess, i.e.
subprocess.check_output(['ulimit', '-n'])
I get the following error:
OSError: [Errno 2] No such file or directory
This is strange, because the command is valid on the command line. Previous answers to similar questions focus on the need to input the command in the format of a list, which I have done. Other answers have mentioned that alias commands can cause problems for subprocess, but ulimit is not a alias. If I use the shell=True option the code works. But I would like to understand why.
ulimit is a wrapper around a system call to limit the resources of the current process. Because it acts on the current process, it needs to be called on the current process or it has no effect.
For this reason, the shell implements it as a built-in, so there is no such binary.
If you were to create a shell to just call ulimit, and then kill the shell, you have accomplished nothing, because the process which has the limits is then killed. This is why things like cd that affect the current process need to be implemented like that in the shell.
This means that you cannot call it as a subprocess in python. Fortunately, python has a module to wrap it: https://docs.python.org/3/library/resource.html

Executing Git server commands through Python shell

I'm trying to write my own shell script in Python for SSH to call to (using the SSH command= parameters in authorized_keys files). Currently I'm simply calling the original SSH command (it is set as an environment variable prior to the script being called my SSH). However, I always end up with a git error regarding the repository hanging up unexpectedly.
My Python code is literally:
#!/usr/bin/python
import os
import subprocess
if os.environ('SSH_ORIGINAL_COMMAND') is not None:
subprocess.Popen(os.environ('SSH_ORIGINAL_COMMAND'), shell=True)
else:
print 'who the *heck* do you think you are?'
Please let me know what is preventing the git command from successfully allowing the system to work. For reference, the command that is being called on the server when a client calls git push is git-receive-pack /path/to/repo.git.
Regarding the Python code shown above, I have tried using shell=True and shell=False (correctly passing the command as a list when False) and neither work correctly.
Thank you!
Found the solution!
You'll need to call the communicate() method of the subprocess object created by Popen call.
proc = subprocess.Popen(args, shell=False)
proc.communicate()
I'm not entirely sure why, however I think it has to do with the communicate() method allowing data to also be given via stdin. I thought the process would automatically accept input since I didn't override the input stream at all anywhere, but perhaps a manual call to communicate is needed to kick things off...hopefully someone can weigh in here!
You also can't stdout=subprocess.PIPE as it will cause the command to hang. Again, not sure if this is because of how git works or something to do about the whole process. Hopefully this at least helps someone in the future!

How to achieve Perl's exec function in Python?

Assume using Linux:
In Perl, the exec function executes an external program and immediately exits itself, leaving the external program in same shell session.
A very close answer using Python is https://stackoverflow.com/a/13256908
However, the Python solution using start_new_session=True starts an external program using setsid method, that means that solution is suitable for making a daemon, not an interactive program.
Here is an simple example of using perl:
perl -e '$para=qq(-X --cmd ":vsp");exec "vim $para"'
After vim is started, the original Perl program has exited and the vim is still in the same shell session(vim is not sent to new session group).
How to get the same solution with Python.
Perl is just wrapping the exec* system call functions here. Python has the same wrappers, in the os module, see the os.exec* documentation:
These functions all execute a new program, replacing the current process; they do not return. On Unix, the new executable is loaded into the current process, and will have the same process id as the caller.
To do the same in Python:
python -c 'import os; para="-X --cmd \":vsp\"".split(); os.execlp("vim", *para)'
os.execlp accepts an argument list and looks up the binary in $PATH from the first argument.
The subprocess module is only ever suitable for running processes next to the Python process, not to replace the Python process. On POSIX systems, the subprocess module uses the low-level exec* functions to implement it's functionality, where a fork of the Python process is then replaced with the command you wanted to run with subprocess.

Alternative in python to subprocess

I am trying to write a script which has to make a lot of calls to some bash commands, parse and process the outputs and finally give some output.
I was using subprocess.Popen and subprocess.call
If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.
Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process? This would be something like bash running as a server and python calls going to it.
I feel this would optimize the calls a bit as there is no bash process setup and teardown. Or will it give no performance advantage?
I feel this would optimize the calls a bit as there is no bash process setup and teardown.
subprocess never runs the shell unless you ask it explicitly e.g.,
#!/usr/bin/env python
import subprocess
subprocess.check_call(['ls', '-l'])
This call runs ls program without invoking /bin/sh.
Or will it give no performance advantage?
If your subprocess calls actually use the shell e.g., to specify a pipeline consicely or you use bash process substitution that could be verbose and error-prone to define using subprocess module directly then it is unlikely that invoking bash is a performance bottleneck -- measure it first.
There are Python packages that too allow to specify such commands consicely e.g., plumbum could be used to emulate a shell pipeline.
If you want to use bash as a server process then pexpect is useful for dialog-based interactions with an external process -- though it is unlikely that it affects time performance. fabric allows to run both local and remote commands (ssh).
There are other subprocess wrappers such as sarge which can parse a pipeline specified in a string without invoking the shell e.g., it enables cross-platform support for bash-like syntax (&&, ||, & in command lines) or sh -- a complete subprocess replacement on Unix that provides TTY by default (it seems full-featured but the shell-like piping is less straightforward). You can even use Python-ish BASHwards-looking syntax to run commands with xonsh shell.
Again, it is unlikely that it affects performance in a meaningful way in most cases.
The problem of starting and communicating with external processes in a portable manner is complex -- the interaction between processes, pipes, ttys, signals, threading, async. IO, buffering in various places has rough edges. Introducing a new package may complicate things if you don't know how a specific package solve numerous issues related to running shell commands.
If I understand correct these methods spawn a bah process, run the command, get the output and then kill the process.
subprocess.Popen is a bit more involved. It actually creates an I/O thread to avoid deadlocks. See https://www.python.org/dev/peps/pep-0324/:
A communicate() method, which makes it easy to send stdin data and read stdout and stderr data, without risking deadlocks. Most people are aware of the flow control issues involved with child process communication, but not all have the patience or skills to write a fully correct and deadlock-free select loop. This means that many Python applications contain race conditions. A communicate() method in the standard library solves this problem.
Is there a way to have a bash process running in the background continuously and then the python calls could just go directly to that process?
Sure, you can still use subprocess.Popen and send messages to you subprocess and receive messages back without terminating the subprocess. In the simplest case your messages can be lines.
This allows for request-response style protocols as well as publish-subscribe when the subprocess can keep sending you messages back when an event of interest happens.

Invoking C compiler using Python subprocess command

I am trying to compile a C program using Python and want to give input using "<" operator but it's not working as expected.
If I compile the C program and run it by giving input though a file it works; for example
./a.out <inp.txt works
But similarly if I try to do this using a Python script, it did not quite work out as expected.
For example:
import subprocess
subprocess.call(["gcc","a.c","-o","x"])
subprocess.call(["./x"])
and
import subprocess
subprocess.call(["gcc","a.c","-o","x"])
subprocess.call(["./x","<inp.txt"])
Both script ask for input though terminal. But I think in the second script it should read from file. why both the programs are working the same?
To complement #Jonathan Leffler's and #alastair's helpful answers:
Assuming you control the string you're passing to the shell for execution, I see nothing wrong with using the shell for convenience. [1]
subprocess.call() has an optional Boolean shell parameter, which causes the command to be passed to the shell, enabling I/O redirection, referencing environment variables, ...:
subprocess.call("./x <inp.txt", shell = True)
Note how the entire command line is passed as a single string rather than an array of arguments.
[1]
Avoid use of the shell in the following cases:
If your Python code must run on platforms other than Unix-like ones, such as Windows.
If performance is paramount.
If you find yourself "outsourcing" tasks better handled on the Python side.
If you're concerned about lack of predictability of the shell environment (as #alastair is):
subprocess.call with shell = True always creates non-interactive non-login instances of /bin/sh - note that it is NOT the user's default shell that is used.
sh does NOT read initialization files for non-interactive non-login shells (neither system-wide nor user-specific ones).
Note that even on platforms where sh is bash in disguise, bash will act this way when invoked as sh.
Every shell instance created with subprocess.call with shell = True is its own world, and its environment is neither influenced by previous shell instances nor does it influence later ones.
However, the shell instances created do inherit the environment of the python process itself:
If you started your Python program from an interactive shell, then that shell's environment is inherited. Note that this only pertains to the current working directory and environment variables, and NOT to aliases, shell functions, and shell variables.
Generally, that's a feature, given that Python (CPython) itself is designed to be controllable via environment variables (for 2.x, see https://docs.python.org/2/using/cmdline.html#environment-variables; for 3.x, see https://docs.python.org/3/using/cmdline.html#environment-variables).
If needed, you can supply your own environment to the shell via the env parameter; note, however, that you'll have to supply the entire environment in that event, potentially including variables such as USER and HOME, if needed; simple example, defining $PATH explicitly:
subprocess.call('echo $PATH', shell = True, \
env = { 'PATH': '/sbin:/bin:/usr/bin' })
The shell does I/O redirection for a process. Based on what you're saying, the subprocess module does not do I/O redirection like that. To demonstrate, run:
subprocess.call(["sh","-c", "./x <inp.txt"])
That runs the shell and should redirect the I/O. With your code, your program ./x is being given an argument <inp.txt which it is ignoring.
NB: the alternative call to subprocess.call is purely for diagnostic purposes, not a recommended solution. The recommended solution involves reading the (Python 2) subprocess module documentation (or the Python 3 documentation for it) to find out how to do the redirection using the module.
import subprocess
i_file = open("inp.txt")
subprocess.call("./x", stdin=i_file)
i_file.close()
If your script is about to exit so you don't have to worry about wasted file descriptors, you can compress that to:
import subprocess
subprocess.call("./x", stdin=open("inp.txt"))
By default, the subprocess module does not pass the arguments to the shell. Why? Because running commands via the shell is dangerous; unless they're correctly quoted and escaped (which is complicated), it is often possible to convince programs that do this kind of thing to run unwanted and unexpected shell commands.
Using the shell for this would be wrong anyway. If you want to take input from a particular file, you can use subprocess.Popen, setting the stdin argument to a file descriptor for the file inp.txt (you can get the file descriptor by calling fileno() a Python file object).

Categories

Resources