Dealing with external processes

Dealing with external processes - python

I've been working on a gui app that needs to manage external processes. Working with external processes leads to a lot of issues that can make a programmer's life difficult. I feel like maintenence on this app is taking an unacceptably long time. I've been trying to list the things that make working with external processes difficult so that I can come up with ways of mitigating the pain. This kind of turned into a rant which I thought I'd post here in order to get some feedback and to provide some guidance to anybody thinking about sailing into these very murky waters. Here's what I've got so far:
Output from the child can get mixed up with output from the parent. This can make both outputs misleading and hard to read. It can be hard to tell what came from where. It becomes harder to figure out what's going on when things are asynchronous. Here's a contrived example:
import textwrap, os, time
from subprocess import Popen
test_path = 'test_file.py'
with open(test_path, 'w') as file:
file.write(textwrap.dedent('''
import time
for i in range(3):
print 'Hello %i' % i
time.sleep(1)'''))
proc = Popen('python -B "%s"' % test_path)
for i in range(3):
print 'Hello %i' % i
time.sleep(1)
os.remove(test_path)
Output:
Hello 0
Hello 0
Hello 1
Hello 1
Hello 2
Hello 2
I guess I could have the child process write its output to a file. But it can be annoying to have to open up a file every time I want to see the result of a print statement.
If I have code for the child process I could add a label, something like print 'child: Hello %i', but it can be annoying to do that for every print. And it adds some noise to the output. And of course I can't do it if I don't have access to the code.
I could manually manage the process output. But then you open up a huge can of worms with threads and polling and stuff like that.
A simple solution is to treat processes like synchronous functions, that is, no further code executes until the process completes. In other words, make the process block. But that doesn't work if you're building a gui app. Which brings me to the next problem...
Blocking processes cause the gui to become unresponsive.
import textwrap, sys, os
from subprocess import Popen
from PyQt4.QtGui import *
from PyQt4.QtCore import *
test_path = 'test_file.py'
with open(test_path, 'w') as file:
file.write(textwrap.dedent('''
import time
for i in range(3):
print 'Hello %i' % i
time.sleep(1)'''))
app = QApplication(sys.argv)
button = QPushButton('Launch process')
def launch_proc():
# Can't move the window until process completes
proc = Popen('python -B "%s"' % test_path)
proc.communicate()
button.connect(button, SIGNAL('clicked()'), launch_proc)
button.show()
app.exec_()
os.remove(test_path)
Qt provides a process wrapper of its own called QProcess which can help with this. You can connect functions to signals to capture output relatively easily. This is what I'm currently using. But I'm finding that all these signals behave suspiciously like goto statements and can lead to spaghetti code. I think I want to get sort-of blocking behavior by having the 'finished' signal from QProcess call a function containing all the code that comes after the process call. I think that should work but I'm still a bit fuzzy on the details...
Stack traces get interrupted when you go from the child process back to the parent process. If a normal function screws up, you get a nice complete stack trace with filenames and line numbers. If a subprocess screws up, you'll be lucky if you get any output at all. You end up having to do a lot more detective work everytime something goes wrong.
Speaking of which, output has a way of disappearing when dealing external processes. Like if you run something via the windows 'cmd' command, the console will pop up, execute the code, and then disappear before you have a chance to see the output. You have to pass the /k flag to make it stick around. Similar issues seem to crop up all the time.
I suppose both problems 3 and 4 have the same root cause: no exception handling. Exception handling is meant to be used with functions, it doesn't work with processes. Maybe there's some way to get something like exception handling for processes? I guess that's what stderr is for? But dealing with two different streams can be annoying in itself. Maybe I should look into this more...
Processes can hang and stick around in the background without you realizing it. So you end up yelling at your computer cuz it's going so slow until you finally bring up your task manager and see 30 instances of the same process hanging out in the background.
Also, hanging background processes can interefere with other instances of the process in various fun ways, such as causing permissions errors by holding a handle to a file or someting like that.
It seems like an easy solution to this would be to have the parent process kill the child process on exit if the child process didn't close itself. But if the parent process crashes, cleanup code might not get called and the child can be left hanging.
Also, if the parent waits for the child to complete, and the child is in an infinite loop or something, you can end up with two hanging processes.
This problem can tie in to problem 2 for extra fun, causing your gui to stop responding entirely and force you to kill everything with the task manager.
F***ing quotes
Parameters often need to be passed to processes. This is a headache in itself. Especially if you're dealing with file paths. Say... 'C:/My Documents/whatever/'. If you don't have quotes, the string will often be split at the space and interpreted as two arguments. If you need nested quotes you can use ' and ". But if you need to use more than two layers of quotes, you have to do some nasty escaping, for example: "cmd /k 'python \'path 1\' \'path 2\''".
A good solution to this problem is passing parameters as a list rather than as a single string. Subprocess allows you to do this.
Can't easily return data from a subprocess.
You can use stdout of course. But what if you want to throw a print in there for debugging purposes? That's gonna screw up the parent if it's expecting output formatted a certain way. In functions you can print one string and return another and everything works just fine.
Obscure command-line flags and a crappy terminal based help system.
These are problems I often run into when using os level apps. Like the /k flag I mentioned, for holding a cmd window open, who's idea was that? Unix apps don't tend to be much friendlier in this regard. Hopefully you can use google or StackOverflow to find the answer you need. But if not, you've got a lot of boring reading and frusterating trial and error to do.
External factors.
This one's kind of fuzzy. But when you leave the relatively sheltered harbor of your own scripts to deal with external processes you find yourself having to deal with the "outside world" to a much greater extent. And that's a scary place. All sorts of things can go wrong. Just to give a random example: the cwd in which a process is run can modify it's behavior.
There are probably other issues, but those are the ones I've written down so far. Any other snags you'd like to add? Any suggestions for dealing with these problems?

Check out the subprocess module. It should help with output separation. I don't see any way around either separate output streams or some kind of output tagging in a single stream.
The hanging process problem is difficult as well. The only solution I have been able to make is to put a timer on the external process, and kill it if it does not return in the allotted time. Crude, nasty, and if anyone else has a good solution, I would love to hear it so I can use it too.
One thing you could do to help deal with the problem of completely un-managed shutdown is to keep a directory of pid files. Whenever you kick off an external process, write a file into your pid file directory with a name that is the pid for the process. Erase the pid file when you know the process has exited cleanly. You can use the stuff in the pid directory to help cleanup on crashes or re-starts.
This may not provide any satisfying or useful answers, but maybe it's a start.

Related

python prompt available whilst outputting data

I'm fairly new to python and need help with the following.
Say I have some code which is continually outputting data to the Python console, e.g.:
for x in range(10000): print x
I then want to be able to enter various commands which may affect the output immediately. For example, I enter a number which causes the loop to start again from this number, or I enter step=2, which causes the step level to change, etc.
Basically, I want the code to run and print in the background, while the prompt is still available.
Is such a thing possible? I'm guessing the output would have to be sent to a new window, but I am unsure how this would work out in practice. I would prefer no GUI at the moment, as I just want to keep things as simple as possible.

As #BlueRhine S says start a background thread. Here is a link to the standard library threading module docs. I would probably prefer to start a sub-process though and use a pipe to communicate between your foreground process and the worker process. Here is a link to those docs

hijacking terminal stdin from python

Is there a way in python to hijack the terminal stdin? Unix only solutions will do fine.
I'm currently writing a small wrapper around top as I want to be able to monitor named processes, e.g. all running python instances. Basically I'm calling pgrep to get process id's and then runs top using the -p option.
Overall this script have worked satisfactorily for a few years now (well with the caveat that top -p only accepts 20 pid's...). However, I now would like adjust the script to update the call to top if new processes matching the name pattern are born. This also works relatively nicely, but... any options set interactively in top gets lost every time I update the pid list but natural causes as I stop and restart top. Therefore I would like to hijack the terminal stdin somehow to be able to backtrack what ever the settings are in affect so I can set them accordingly after updating the pid-list, or even halt updating if neccesary (e.g. if top is awaiting more instructions from the user).
Now perhaps what I'm trying to achieve is just silly and there are better ways to do it, if so I'd highly appreciate enlightenment
(oh. the tag ps were used as the tag top does not exists and I'm to new here to define new tags, after all the two utilities are related)
thanks \p

What you are doing sounds like a bit of a hack. I would just write a Python script using psutil that does exactly what you want. Whatever information you are interested in, psutil should give it to you - and more.
Quick and dirty:
import psutil
import time
while True:
processes = [ p for p in psutil.process_iter() if 'python' in p.name() ]
for p in processes:
# print out whatever information interests you
print(
p.pid,
p.name(),
p.cpu_percent(),
p.io_counters().read_bytes,
p.io_counters().write_bytes
)
time.sleep(10)
Link to Documentation: http://pythonhosted.org/psutil/

Python: Spawning another program

I have a Python program from which I spawn a sub-program to process some files without holding up the main program. I'm currently using bash for the sub-program, started with a command and two parameters like this:
result = os.system('sub-program.sh file.txt file.txt &')
That works fine, but I (eventually!) realised that I could use Python for the sub-program, which would be far preferable, so I have converted it. The simplest way of spawning it might be:
result = os.system('python3 sub-program.py file.txt file.txt &')
Some research has shown several more sophisticated alternatives, but I have the impression that the latest and most approved method is this one:
subprocess.Popen(["python3", "-u", "sub-program.py"])
Am I correct in thinking that that is the most appropriate way of doing it? Would anyone recommend a different method and why? Simple would be good as I'm a bit of a Python novice.
If this is the recommended method, I can probably work out what the "-u" does and how to add the parameters for myself.
Optional extras:
Send a message back from the sub-program to the main program.
Make the sub-program quit when the main program does.

Yes, using subprocess is the recommended way to go according to the documentation:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
However, subprocess.Popen may not be what you're looking for. As opposed to os.system you will create a Popen object that corresponds to the subprocess and you'll have to wait for it in order to wait for it's completion, fx:
proc = subprocess.Popen(["python3", "-u", "sub-program.py"])
do_something()
res = proc.wait()
If you want to just run a program and wait for completion you should probably use subprocess.run (or maybe subprocess.call, subprocess.check_call or subprocess.check_output) instead.

Thanks skyking!
With
import subprocess
at the beginning of the main program, this does what I want:
with open('output.txt', 'w') as f:
subprocess.Popen([spawned.py, parameter1, parameter2], stdout = f)
The first line opens a file for the output from the sub-program started in the second line. In the second line, the square brackets contain the stuff for the sub-program - name followed by two parameters. The parameters are available in the sub-program in sys.argv[1] and sys.argv[2]. After that come the subprocess parameters - the f says to output to the text file mentioned above.

Is there any particular reason it has to be another program entirely? Why not just spawn another process which runs one of the functions defined within your script?
I suggest that you read up on multiprocessing. Python has module just for that: https://docs.python.org/dev/library/multiprocessing.html
Here you can find info on spawning new processes, communicating between them and syncronizing them.
Be warned though that if you want to really speed up your file processing you'll want to use processes instead of threads (due to some limitations in python, threads will only slow you down which is confusing).
Also check out this page: https://pymotw.com/2/multiprocessing/basics.html
It has some code samples that will help you out a lot.
Don't forget this guard in your script:
if __name__ == '__main__':
It is very important ;)

Python script to script enter and exit

I am trying to create a python script that on a click of a button opens another python script and closes itself and some return function in the second script to return to the original script hope you can help.
Thanks.

Since your question is very vague, here's a somewhat vague answer:
First, think about whether you really need to do this at all. Why can't the first script just import the second script as a module and call some function on it?
But let's assume you've got a good answer for that, and you really do need to "close" and run the other script, where by "close" you mean "make your GUI invisible".
def handle_button_click(button):
button.parent_window().hide()
subprocess.call([sys.executable, '/path/to/other/script.py'])
button.parent_window().show()
This will hide the window, run the other script, then show the window again when the other script is finished. It's generally a very bad idea to do something slow and blocking in the middle of an event handler, but in this case, because we're hiding our whole UI anyway, you can get away with it.
A smarter solution would involve some kind of signal that either the second script sends, or that a watcher thread sends. For example:
def run_other_script_with_gui_hidden(window):
gui_library.do_on_main_thread(window.hide)
subprocess.call([sys.executable, '/path/to/other/script.py'])
gui_library.do_on_main_thread(window.show)
def handle_button_click(button):
t = threading.Thread(target=run_other_script_with_gui_hidden)
t.daemon = True
t.start()
Obviously you have to replace things like button.window(), window.hide(), gui_library.do_on_main_thread, etc. with the appropriate code for your chosen window library.
If you'd prefer to have the first script actually exit, and the second script re-launch it, you can do that, but it's tricky. You don't want to launch the second script as a child process, but as a sibling. Ideally, you want it to just take over your own process. Except that you need to shut down your GUI before doing that, unless your OS will do that automatically (basically, Windows will, Unix will not). Look at the os.exec family, but you'll really need to understand how these things work in Unix to do it right. Unless you want the two scripts to be tightly coupled together, you probably want to pass the second script, on the command line, the exact right arguments to re-launch the first one (basically, pass it your whole sys.argv after any other parameters).
As an alternative, you can use execfile to run the second script within your existing interpreter instance, and then have the second script execfile you back. This has similar, but not identical, issues to the exec solution.

change process state with python

I want to search for a process and show it,emacs for example,I use
`p = subprocess.Popen('ps -A | grep emacs',shell=True,stdout=subprocess.PIPE)`
to get the process, then how can I wake it up and show it?
in other words,the question shoud be : how python change the state of process?

In short, python has a pty module and look for the solution there.
This question is not that simple as it may look like.
It is simple to change the running state of a process by delivering corresponding signals but it is not simple to manipulate foreground/background properties.
When we talk about manipulating the foreground/background processes, we really talk about 'job control.' In UNIX environment, job control is achieved by coordination of several parts, including kernel, controlling terminal, current shell and almost every process invoked in that shell session. Telling a process to get back to foreground, you have to tell others to shut up and go to background simultaneously. See?
Let's come back to your question. There could be 2 answers to this, one is no way and the other is it could be done but how.
Why 2 answers?
Generally you cannot have job control unless you program for it. You also cannot use a simple pipe to achieve the coordination model which leads to job control mechanism; the reason is essential since you cannot deliver signals through a pipe. That's why the answer is no way, at least no way in a simple pipe implementation.
However, if you have enough patience to program terminal I/O, it still can be done with a lot of labor work. Concept is simple: you cheat your slave program, which is emacs in this example, that it has been attached to a real terminal having a true keyboard and a solid monitor standby, and you prepare your master program, which is the python script, to handle and relay necessary events from its controlling terminal to the slave's pseudo-terminal.
This schema is actually adopted by many terminal emulators. You just need to write another terminal emulator in your case... Wait! Does it have to be done with so much effort, always?
Luckily no.
Your shell manages all the stuff for you in an interactive scenario. You just tell shell to 'fg/bg' the task, quite easy in real life. The designated command combination can be found in shell's manpage. It could look like 'jobs -l | grep emacs' along with 'fg %1'. Nonetheless those combined commands cannot be invoked by a program. It's a different story since a program will start a new shell to interpret its commands and such a new shell cannot control the old running emacs because it doesn't have the privilege. Type it in with your keyboard and read it out on your monitor; that's an interactive scenario.
In an automation scenario, think twice before you employ a job control design because most automation scenarios do not require a job control schema. You need an editor here and a player there, that's all right, but just don't make them to stay "background" and pop to "foreground." They'd better exit when they complete their task.
But if you are unlucky to have to program job control in automation procedures, try to program pseudo-terminal master and slave I/O as well. They look like a sophisticated IPC mechanism and their details are OS-dependent. However this is the standard answer to your question; though annoying, I know.

you can get the output generated by this process, reading the stdout descriptor:
out = p.stdout.read()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.