Python program still running but PID can't be found

Python program still running but PID can't be found - python

I am running a detached child program in background from my parent program. After i exited the parent program, i would expect the child program to continuing running and logging into OUTPUT_PATH. And indeed, i can see the log file updating. However as i was trying to find the PID from ps aux i can't find it. can anyone explain this behavior? what am i doing wrong?
shellCommand = "nohup python PYTHON_PROGRAM ARGS >OUTPUT_PATH 2>&1 &"
subprocess.Popen(shellCommand, shell=True, preexec_fn=os.setpgrp)

OK, this is getting too big for comments. By running ps -fwp $(pgrep -f PYTHON_PROGRAM), we've found the process now. :) But its PID does not match the one reported by Popen.pid. This would be down to shell instance that was called since you've used shell=True. First fork was to call shell, the second was for your script. Actually, this is documented in the link mention above:
Note that if you set the shell argument to True, this is the process ID of the spawned shell.
But see the NOTE bellow.
Which brings us to the "more orthodox way". Where we're entering possibly contested territory, different people, different ideas. Not as much the first one perhaps as it would be in line with the documentation to suggest not to use shell=True unless you really need to.
args is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names). If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.
There is also another section on (security) implications of not heeding the recommendation.
So, compiling a list of arguments to run nohup with your script and handling the output redirection already through keyword arguments (stdout, stderr) of Popen would seem like a good course of action and would also get you a consistent PID.
This last step might attract most controversy: but you can actually daemonize process by means of python interfaces to corresponding syscalls. Well documented example would seem to grow in github (reached over one hop from a link in the PEP mentioned bellow).
or there is a library referred to from the PEP-3143 on the topic.
NOTE: That bit does not appear to be always true (calling of sh yes, but two PIDs no). At least on my system I've observed sh to exec the program called via -c (in itself) without forking. From few quick runs and traces this was the case at least if I did not mess with the stdin/-out/-err (i.e. no pipes or redirections), did not force subshell (...), or did not chain commands over ;. (The latter two are kind of obvious, former is as well once you realize how redirections are implemented). So at least for my shell I would dare to extrapolate and say that: It seems it does not fork, unless it must. Or even more simplified (and hence not entirely correct) statement would be: simple stuff won't fork.

Related

Starting process in Google Colab with Prefix "!" vs. "subprocess.Popen(..)"

I've been using Google Colab for a few weeks now and I've been wondering what the difference is between the two following commands (for example):
!ffmpeg ...
subprocess.Popen(['ffmpeg', ...
I was wondering because I ran into some issues when I started either of the commands above and then tried to stop execution midway. Both of them cancel on KeyboardInterrupt but I noticed that after that the runtime needs a factory reset because it somehow got stuck. Checking ps aux in the Linux console listed a process [ffmpeg] <defunct> which somehow still was running or at least blocking some ressources as it seemed.
I then did some research and came across some similar posts asking questions on how to terminate a subprocess correctly (1, 2, 3). Based on those posts I generally came to the conclusion that using the subprocess.Popen(..) variant obviously provides more flexibility when it comes to handling the subprocess: Defining different stdout procedures or reacting to different returncode etc. But I'm still unsure on what the first command above using the ! as prefix exactly does under the hood.
Using the first command is much easier and requires way less code to start this process. And assuming I don't need a lot of logic handling the process flow it would be a nice way to execute something like ffmpeg - if I were able to terminate it as expected. Even following the answers from the other posts using the 2nd command never got me to a point where I could terminate the process fully once started (even when using shell=False, process.kill() or process.wait() etc.). This got me frustrated, because restarting and re-initializing the Colab instance itself can take several minutes every time.
So, finally, I'd like to understand in more general terms what the difference is and was hoping that someone could enlighten me. Thanks!

! commands are executed by the notebook (or more specifically by the ipython interpreter), and are not valid Python commands. If the code you are writing needs to work outside of the notebook environment, you cannot use ! commands.
As you correctly note, you are unable to interact with the subprocess you launch via !; so it's also less flexible than an explicit subprocess call, though similar in this regard to subprocess.call
Like the documentation mentions, you should generally avoid the bare subprocess.Popen unless you specifically need the detailed flexibility it offers, at the price of having to duplicate the higher-level functionality which subprocess.run et al. already implement. The code to run a command and wait for it to finish is simply
subprocess.check_call(['ffmpeg', ... ])
with variations for capturing its output (check_output) and the more modern run which can easily replace all three of the legacy high-level calls, albeit with some added verbosity.

Python URL breaks System command

I am trying to open chromium to an authorization URL with a specific launch parameter. I was not able to find the solution to this launch parameter using the webbrowser library, so I moved onto os.system
browser_cmd = "chromium-browser --password-store=basic " + auth_url
os.system(browser_cmd)
This works up until the "&" in the URL. So chromium opens without bothering me with keyring nonsense, but only opens the URL until the first &. Is there a way of handling the URL and maintaining its integrity?

This is because & is special to the shell. The canonical way to run a subprocess from within Python 3 is:
import subprocess
subprocess.run(['chromium-browser', '--password-store=basic', auth_url],
check=True)
print('chromium-browser exited successfully')

This is exactly why you shouldn't use os.system for anything non-trivial. As the docs for that function say:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.
If you follow that link, you'll see that you can write this as:
subprocess.call(['chromium-browser', '--password-store=basic', auth_url])
Because you're passing a list of arguments, rather than trying to put them together into a string that you can smuggle through the shell, you don't have to worry about quoting, or anything else.
By the way, you probably want to use run rather than call here, but for some reason the recipes still haven't been updated in the docs as of 3.7, and I didn't want to add confusion by showing something that doesn't match… Anyway, you should read the at least the quick-start "Using" section at the top of the docs.
If you really want to use os.system anyway for some reason, you will need to quote and/or escape the auth_url argument. Assuming you don't care about Windows, the best way to do this is with the shlex module:
browser_cmd = "chromium-browser --password-store=basic " + shlex.quote(auth_url)
os.system(browser_cmd)
If you do care about Windows, you can add posix=False to the quote call. The effects of the posix flag are documented under the shlex constructor. The default True value means it follows POSIX rules as closely as possible, which means it should be able to handle anything that could possibly be handled, as long as your shell is strictly compatible with sh (as, e.g., bash is, but Windows cmd is definitely not, and even tcsh or fish may not be). With False, it uses "compatibility mode". For simple examples like yours, it should work for most shells without fiddling, but if you need to get more complicated, you need to read Improved Compatibility with Shells. (And, for Windows cmd or PowerShell, there's a limit to how far you can push things.)

Subprocess, stderr to DEVNULL but errors are printed

I am working on a French chatbot using python. For a first text-to-speech attempt, I am using espeak with mbrola. I call it with subprocess :
from subprocess import run, DEVNULL
def speak(text):
command = ["espeak", "-vmb-fr1", text]
run(command, stderr=DEVNULL, stdout=DEVNULL)
speak("Bonjour.")
As you see, I'm sending stderr and stdout to /dev/null
When I run the program, It seems to work, espeak is speaking, but I get this :
*** Error in `mbrola': free(): invalid pointer: 0x08e3af18 ***
*** Error in `mbrola': free(): invalid pointer: 0x0988af88 ***
I think it is a C error in mbrola. I think I can't fix it. But it works, so I just want to mute the error. How can I do ? Is there a way ?
Edit, in response to abarnert :
When I redirect stdout and stderr by the shell (python myscript.py 2>&1 >/dev/null), the message still show up.
distro : Debian 9.3
glibc version : 2.24

Run it with setsid (just add that string in front of the command and arguments). That will stop it from opening /dev/tty to report the malloc errors. It will also prevent terminal signals, including SIGHUP when the terminal is closed, from affecting the process, which may be a good or a bad thing.
Alternatively, set the environment variable LIBC_FATAL_STDERR_ to some nonempty string, with whose name I was able to find several similar questions.

The root problem is that mbrola/espeak has a serious bug with memory allocation. If you haven't checked for a new version, and reported the bug to them, that's the first thing you should do.
These warnings are emitted by glibc's malloc checker, which is described in the mallopt docs. If heap checking is enabled, every detected error with malloc (and free and related functions) will be printed out to stderr, but if it's disabled, nothing will be done. (Other possibilities are available as well, but that's not relevant here.)
According to the documentation, unless the program explicitly calls mallopt, either setting the environment variable MALLOC_CHECK_ to 0 or not setting it at all should mean no malloc debug output. However, most of the major distros (starting with Debian) have long shipped a glibc that's configured to default to 1 (meaning print the error message) instead of 0. You can still override this by explicitly setting MALLOC_CHECK_=0.
Also, the documentation implies that malloc errors go to stderr unless malloc_printerr is replaced. But again, many distros do replace it with an intentionally-harder-to-ignore function that logs to the current process's tty if pretend and stderr if not. This is why it shows up even if you pipe espeak's stderr to /dev/null, and your own program's as well.
So, to hide these errors, you can:
Set the environment variable MALLOC_CHECK_ to 0 in espeak, which will disable the checks.
Prevent espeak from opening a tty, which means the checks will still happen, but the output will have nowhere to go.
Using setsid, a tool that calls setsid at the start of the new process, is one way to do the latter. Whether that's a good idea or not depends on whether you want the process to lead its own process group. You really should read up on what that means and decide what you want, not choose between the options because typing setsid is shorter than typing MALLOC_CHECK_=0.
And again, you really should check for a new version first, and report this bug upstream if they haven't fixed it yet.

Advantages of subprocess over os.system

I have recently came across a few posts on stack overflow saying that subprocess is much better than os.system, however I am having difficulty finding the exact advantages.
Some examples of things I have run into:
https://docs.python.org/3/library/os.html#os.system
"The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function."
No idea in what ways it is more powerful though, I know it is easier in many ways to use subprocess but is it actually more powerful in some way?
Another example is:
https://stackoverflow.com/a/89243/3339122
The advantage of subprocess vs system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).
This post which has 2600+ votes. Again could not find any elaboration on what was meant by better error handling or real status code.
Top comment on that post is:
Can't see why you'd use os.system even for quick/dirty/one-time. subprocess seems so much better.
Again, I understand it makes some things slightly easier, but I hardly can understand why for example:
subprocess.call("netsh interface set interface \"Wi-Fi\" enable", shell=True)
is any better than
os.system("netsh interface set interface \"Wi-Fi\" enabled")
Can anyone explain some reasons it is so much better?

First of all, you are cutting out the middleman; subprocess.call by default avoids spawning a shell that examines your command, and directly spawns the requested process. This is important because, besides the efficiency side of the matter, you don't have much control over the default shell behavior, and it actually typically works against you regarding escaping.
In particular, do not do this:
subprocess.call('netsh interface set interface "Wi-Fi" enable')
since
If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.
Instead, you'll do:
subprocess.call(["netsh", "interface", "set", "interface", "Wi-Fi", "enable"])
Notice that here all the escaping nightmares are gone. subprocess handles escaping (if the OS wants arguments as a single string - such as Windows) or passes the separated arguments straight to the relevant syscall (execvp on UNIX).
Compare this with having to handle the escaping yourself, especially in a cross-platform way (cmd doesn't escape in the same way as POSIX sh), especially with the shell in the middle messing with your stuff (trust me, you don't want to know what unholy mess is to provide a 100% safe escaping for your command when calling cmd /k).
Also, when using subprocess without the shell in the middle you are sure you are getting correct return codes. If there's a failure launching the process you get a Python exception, if you get a return code it's actually the return code of the launched program. With os.system you have no way to know if the return code you get comes from the launched command (which is generally the default behavior if the shell manages to launch it) or it is some error from the shell (if it didn't manage to launch it).
Besides arguments splitting/escaping and return code, you have way better control over the launched process. Even with subprocess.call (which is the most basic utility function over subprocess functionalities) you can redirect stdin, stdout and stderr, possibly communicating with the launched process. check_call is similar and it avoids the risk of ignoring a failure exit code. check_output covers the common use case of check_call + capturing all the program output into a string variable.
Once you get past call & friends (which is blocking just as os.system), there are way more powerful functionalities - in particular, the Popen object allows you to work with the launched process asynchronously. You can start it, possibly talk with it through the redirected streams, check if it is running from time to time while doing other stuff, waiting for it to complete, sending signals to it and killing it - all stuff that is way besides the mere synchronous "start process with default stdin/stdout/stderr through the shell and wait it to finish" that os.system provides.
So, to sum it up, with subprocess:
even at the most basic level (call & friends), you:
avoid escaping problems by passing a Python list of arguments;
avoid the shell messing with your command line;
either you have an exception or the true exit code of the process you launched; no confusion about program/shell exit code;
have the possibility to capture stdout and in general redirect the standard streams;
when you use Popen:
you aren't restricted to a synchronous interface, but you can actually do other stuff while the subprocess run;
you can control the subprocess (check if it is running, communicate with it, kill it).
Given that subprocess does way more than os.system can do - and in a safer, more flexible (if you need it) way - there's just no reason to use system instead.

There are many reasons, but the main reason is mentioned directly in the docstring:
>>> os.system.__doc__
'Execute the command in a subshell.'
For almost all cases where you need a subprocess, it is undesirable to spawn a subshell. This is unnecessary and wasteful, it adds an extra layer of complexity, and introduces several new vulnerabilities and failure modes. Using subprocess module cuts out the middleman.

Running jobs on a cluster submitted via qsub from Python. Does it make sense?

I have the situation where I am doing some computation in Python, and based on the outcomes I have a list of target files that are candidates to be passed to 2nd program.
For example, I have 50,000 files which contain ~2000 items each. I want to filter for certain items and call a command line program to do some calculation on some of those.
This Program #2 can be used via shell command line, but requires also a lengthy set of arguments. Because of performance reasons I would have to run Program #2 on a cluster.
Right now, I am running Program #2 via
'subprocess.call("...", shell=True)
But I'd like to run it via qsub in future.
I have not much experience of how exactly this could be done in a reasonably efficient manner.
Would it make sense to write temporary 'qsub' files and run them via subprocess() directly from the Python script? Is there a better, maybe more pythonic solution?
Any ideas and suggestions are very welcome!

It makes perfect sense, although I would go for another solution.
As far as I understand, you have programme #1 that determines which of your 50,000 files needs to be computed by programme #2.
Both programme #1 and #2 are written in Python. Excellent choice.
Incidentally, I have a Python module that might come in handy: https://gist.github.com/stefanedwards/8841307
If you are running the same qsub-system as I have (no idea what ours is called), you cannot use command arguments on the submitted scripts. Instead, any options are submitted via the -v option, that puts them into environment variables, e.g.:
[me#local ~] $ python isprime.py 1
1: True
[me#local ~] $ head -n 5 isprime.py
#!/usr/bin/python
### This is a python script ...
import os
os.chdir(os.environ.get('PBS_O_WORKDIR','.'))
[me#local ~] $ qsub -v isprime='1 2 3' isprime.py
123456.cluster.control.com
[me#local ~]
Here, isprime.py could handle command line arguments using argparse. Then you just need to check whether the script is running as a submitted job, and then retrieve said arguments from the environment variables (os.environ).
When programme #2 is modified to be run on the cluster, programme #1 can submit jobs by using subprocess.call(['qsub','-v options=...','programme2.py'], shell=FALSE)
Another approach would be to queue all the files in a database (say, an SQLite database). Then you could have programme #1 check all non-processed entries in the database, determine the outcome (run, not run, run with special options).
You now have the opportunity to run programme #2 in parallel on the cluster, which simply checks for the database for files to analyse.
Edit: When Programme #2 is an executable
Instead of a python script, we use a bash script that takes environment variables and puts them on a command line for the programme:
#!/bin/bash
cd .
# put options into context/flags etc.
if [ -n $option1 ]; then _opt1="--opt1 $option1"; fi
# we can even define our own defaults
_opt2='--no-verbose'
if [ -n $opt2 ]; then _opt2="-o $opt2"; fi
/path/to/exe $_opt1 $opt2
If you are going for the database solution, then have a python script that checks the database for unprocessed files, mark file as being processed (do these to in a single transaction), get options, call executable with subprocess, when done, mark file as done, check for a new file, etc.

You obviously have built yourself a string cmd containing a command that you could enter in a shell for running the 2nd program. You are currently using subprocess.call(cmd, shell=True) for executing the 2nd program from a Python script (it then becomes executed within a process on the same machine as the calling script).
I understand that you are asking how to submit a job to a cluster so that this 2nd program is run on the cluster instead of the calling machine. Well, this is pretty easy and the method is independent of Python, so there is no 'pythonic' solution, just an obvious one :-) : replace your current cmd with a command that defers the heavy work to the cluster.
First of all, dig into the documentation of your cluster's qsub command (the underlying batch system might be SGE or LSF, or whatever, you need to get the corresponding docs) and try to find the shell command line that properly submits an example job of yours to the cluster. It might look as simple as qsub ...args... cmd, whereas cmd here is the content of the original cmd string. I assume that you now have the entire qsub command needed, let's call it qsubcmd (you have to come up with that on your own, we can't help there). Now all you need to do in your original Python script is calling
subprocess.call(qsubcmd, shell=True)
instead of
subprocess.call(cmd, shell=True)
Note that qsub likely only works on very few machines, typically known as your cluster 'head node(s)'. This means that your Python script that wants to submit these jobs should run on this machine (if that is not possible, you need to add an ssh login procedure to the submission process that we don't want to discuss here).
Please also note that, if you have the time, you should look into the shell=True implications of your subprocess usage. If you can circumvent shell=True, this will be the more secure solution. This might however not be an issue in your environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.