confused about python subprocess inside for loop - python

I am trying to automate some big data file processing using python.
A lop of the processing is chained , i.e script1 writes a file , that is then processed by script2 , then script2's output by script3 etc.
I am using the subprocess module in a threaded context.
I have one class that creates tuples of chained scripts
("scr1.sh","scr2.sh","scr3.sh").
Then another class that uses a call like
for script in scriplist:
subprocess.call(script)
My question is that in this for loop , is each script only called after subprocess.call(script1) returns a successful retcode?.
Or is it that all three get called right after one another since I am using subprocess.call, Without using "sleep" or "wait", I want to make sure that the second script only starts after the first one is over.
edit: The pydoc says
"subprocess.call(*popenargs, **kwargs)
Run command with arguments. Wait for command to complete, then return the returncode attribute."
So in the for loop (above) , does it wait for each retcode before iterating to the next script.
I am new to threading . I am attaching the stripped-down code for the class that runs the analysis here. The subprocess.call loop is part of this class.
class ThreadedDataProcessor(Thread):
def __init__(self, in_queue, out_queue):
# Uses Queue
Thread.__init__(self)
self.in_queue = in_queue
self.out_queue = out_queue
def run(self):
while True:
path = self.in_queue.get()
if path is None:
break
myprocessor = ProcessorScriptCreator(path)
scrfiles = myprocessor.create_and_return_shell_scripts()
for index,file in enumerate(scrfiles):
subprocess.call([file])
print "CALLED%s%s" % (index,file) *5
#report(myfile.describe())
#report("Done %s" % path)
self.out_queue.put(path)
in_queue = Queue()

The loop will serially call each script, wait until it completes, and then call the next one regardless of success or failure of the previous call. You probably want to say:
try:
map(subprocess.check_call, script_list)
except Exception, e:
# failed script
A new thread will start with each call to run, and also end when run is done. You iterate over the script with subprocess within one thread.
You should make sure that each set of calls in each thread are not going to impact other calls from other threads. For example trying to read and write to the same file from a script call in multiple threads at the same time.

Related

Cannot kill a loading animation when using multiprocessing

I'm trying to use multiprocessing to run multiple scripts. At the start, I launch a loading animation, however I am unable to ever kill it. Below is an example...
Animation: foo.py
import sys
import time
import itertools
# Simple loading animation that runs infinitely.
for c in itertools.cycle(['|', '/', '-', '\\']):
sys.stdout.write('\r' + c)
sys.stdout.flush()
time.sleep(0.1)
Useful script: bar.py
from time import sleep
# Stand-in for a script that does something useful.
sleep(5)
Attempt to run them both:
import multiprocessing
from multiprocessing import Process
import subprocess
pjt_dir = "/home/solebay/path/to/project" # Setup paths..
foo_path = pjt_dir + "/foo.py" # ..
bar_path = pjt_dir + "/bar.py" # ..
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
foo_p.start() # start loading animation
bar_p.start() # start 'useful' script
bar_p.join() # Wait for useful script to finish executing
foo_p.kill() # Kill loading animation
I get no error messages, and (my_venv) solebay#computer:~$ comes up in my terminal, but the loading animation persists (clipping over my name and environement). How can I kill it?
I've run into a similar situation before where I couldn't terminate the program using ctrl + c. The issue is (more or less) solved by using daemonic processes/threads (see multiprocessing doc). To do this, you simply change
foo_p = Process(target=run_script, args=(foo_path,))
to
foo_p = Process(target=run_script, args=(foo_path,), daemon=True)
and similarly for other children processes that you would like to create.
With that being said, I myself am not exactly sure if this is the correct way to remedy the issue with not being able to terminate the multiprocessing program, or is it just some artifact that happens to help with this. I would suggest this thread that went into the discussion about daemon threads more. But essentially, from my understanding, daemon threads would be terminated automatically whenever their parent process is terminated, regardless of whether they are finished or not. Meanwhile, if a thread is not daemonic, then somehow you need to wait until the children processes to finish before you're able to fully terminate the program.
You are creating too many processes. These two lines:
foo_p = Process(target=run_script, args=(foo_path,)) # Define the processes..
bar_p = Process(target=run_script, args=(bar_path,)) # ..
create two new processes. Let's all them "A" and "B". Each process consists of this function:
def run_script(path): # Simple function that..
"""Launches python scripts.""" # ..allows me to set a..
subprocess.run(["python", path]) # ..script as a process.
which then creates another subprocess. Let's call those two processes "C" and "D". In all you have created 4 extra processes, instead of just the 2 that you need. It is actually process "C" that's producing the output on the terminal. This line:
bar_p.join()
waits for "B" to terminate, which implies that "D" has terminated. But this line:
foo_p.kill()
kills process "A" but orphans process "C". So the output to the terminal continues forever.
This is well documented - see the description of multiprocessing.terminate, which says:
"Note that descendant processes of the process will not be terminated – they will simply become orphaned."
The following program works as you intended, exiting gracefully from the second process after the first one has finished. (I renamed "foo.py" to useless.py and "bar.py" to useful.py, and made small changes so I could run it on my computer.)
import subprocess
import os
def run_script(name):
s = os.path.join(r"c:\pyproj310\so", name)
return subprocess.Popen(["py", s])
if __name__ == "__main__":
useless_p = run_script("useless.py")
useful_p = run_script("useful.py")
useful_p.wait() # Wait for useful script to finish executing
useless_p.kill() # Kill loading animation
You can't use subprocess.run() to launch the new processes since that function will block the main script until the process completes. So I used Popen instead. Also I placed the running code under an if __name__ == "__main__" which is good practice (and maybe necessary on Windows).

Threading not working properly when function is in another module

I'm currently using a daemon thread in Python to constantly detect parts of the screen in the background while other more important functions are running. testImToStr is in the same file as the rest of the code I am using.
def testImToStr():
while True:
pospro.imagetoString();
doImageToString = threading.Thread(target=testImToStr, args=(), daemon=True)
doImageToString.start()
while True:
#other stuff i was too lazy to copy over
This version is working as it does both the image processing and the other stuff in the while loop.
However, then the target thread is in another module:
doImageToString = threading.Thread(target=pospro.loopedImToStr, args=(), daemon=True)
doImageToString.start()
while True:
#other stuff i was too lazy to copy over
Other module:
def loopedImToStr():
while True:
imagetoString()
def imagetoString():
#stuff here
It only loops the target thread and does not run the while loop in the file that originally made the thread. How is it that both loops are run when the thread is in the same file as the loop but only the thread is run when it they are in different files?
I think all your problem makes the most common mistake - target has to be function's name without () - so called callback
Thread(target=pospro.loopedImToStr, daemon=True)
and later Thread.start() will use () to run it.
In your code you run testImToStr() at once like
result = testImToStr()
doImageToString = threading.Thread(target=result, ...)
so testImToStr() blocks all code and it can't run other loop.

Stop a long-running action in web2py with multiprocessing

I have a web2py application that basically serves as a browser interface for a Python script. This script usually returns pretty quickly, but can occasionally take a long time. I want to provide a way for the user to stop the script's execution if it takes too long.
I am currently calling the function like this:
def myView(): # this function is called from ajax
session.model = myFunc() # myFunc is from a module which i have complete control over
return dict(model=session.model)
myFunc, when called with certain options, uses multiprocessing but still ends up taking a long time. I need some way to terminate the function, or at the very least the thread's children.
The first thing i tried was to run myFunc in a new process, and roll my own simple event system to kill it:
# in the controller
def myView():
p_conn, c_conn = multiprocessing.Pipe()
events = multiprocessing.Manager().dict()
proc = multiprocessing.Process(target=_fit, args=(options, events c_conn))
proc.start()
sleep(0.01)
session.events = events
proc.join()
session.model = p_conn.recv()
return dict(model=session.model)
def _fit(options, events pipe):
pipe.send(fitting.logistic_fit(options=options, events=events))
pipe.close()
def stop():
try:
session.events['kill']()
except SystemExit:
pass # because it raises that error intentionally
return dict()
# in the module
def kill():
print multiprocessing.active_children()
for p in multiprocessing.active_children():
p.terminate()
raise SystemExit
def myFunc(options, events):
events['kill'] = kill
I ran into a few major problems with this.
The session in stop() wasn't always the same as the session in myView(), so session.events was None.
Even when the session was the same, kill() wasn't properly killing the children.
The long-running function would hang the web2py thread, so stop() wasn't even processed until the function finished.
I considered not calling join() and using AJAX to pick up the result of the function at a later time, but I wasn't able to save the process object in session for later use. The pipe seemed to be able to be pickled, but then I had the problem with not being able to access the same session from another view.
How can I implement this feature?
For long running tasks, you are better off queuing them via the built-in scheduler. If you want to allow the user to manually stop a task that is taking too long, you can use the scheduler.stop_task(ref) method (where ref is the task id or uuid). Alternatively, when you queue a task, you can specify a timeout, so it will automatically stop if not completed within the timeout period.
You can do simple Ajax polling to notify the client when the task has completed (or implement something more sophisticated with websockets or SSE).

Python time.sleep lock process

I want to create multi process app. Here is sample:
import threading
import time
from logs import LOG
def start_first():
LOG.log("First thread has started")
time.sleep(1000)
def start_second():
LOG.log("second thread has started")
if __name__ == '__main__':
### call birhtday daemon
first_thread = threading.Thread(target=start_first())
### call billing daemon
second_thread = threading.Thread(target=start_second())
### starting all daemons
first_thread.start()
second_thread.start()
In this code second thread does not work. I guess, after calling sleep function inside first_thread main process is slept. I found this post. But here sleep was used with class. I got that(Process finished with exit code 0
) as a result when I run answer. Could anybody explain me where I made a mistake ?
I am using python 3.* on windows
When creating your thread you are actually invoking the functions when trying to set the target for the Thread instead of passing a function to it. This means when you try to create the first_thread you are actually calling start_first which includes the very long sleep. I imagine you then get frustrated that you don't see the output from the second thread and kill it, right?
Remove the parens from your target= statements and you will get what you want
first_thread = threading.Thread(target=start_first)
second_thread = threading.Thread(target=start_second)
first_thread.start()
second_thread.start()
will do what you are trying

Python subprocess: callback when cmd exits

I'm currently launching a programme using subprocess.Popen(cmd, shell=TRUE)
I'm fairly new to Python, but it 'feels' like there ought to be some api that lets me do something similar to:
subprocess.Popen(cmd, shell=TRUE, postexec_fn=function_to_call_on_exit)
I am doing this so that function_to_call_on_exit can do something based on knowing that the cmd has exited (for example keeping count of the number of external processes currently running)
I assume that I could fairly trivially wrap subprocess in a class that combined threading with the Popen.wait() method, but as I've not done threading in Python yet and it seems like this might be common enough for an API to exist, I thought I'd try and find one first.
Thanks in advance :)
You're right - there is no nice API for this. You're also right on your second point - it's trivially easy to design a function that does this for you using threading.
import threading
import subprocess
def popen_and_call(on_exit, popen_args):
"""
Runs the given args in a subprocess.Popen, and then calls the function
on_exit when the subprocess completes.
on_exit is a callable object, and popen_args is a list/tuple of args that
would give to subprocess.Popen.
"""
def run_in_thread(on_exit, popen_args):
proc = subprocess.Popen(*popen_args)
proc.wait()
on_exit()
return
thread = threading.Thread(target=run_in_thread, args=(on_exit, popen_args))
thread.start()
# returns immediately after the thread starts
return thread
Even threading is pretty easy in Python, but note that if on_exit() is computationally expensive, you'll want to put this in a separate process instead using multiprocessing (so that the GIL doesn't slow your program down). It's actually very simple - you can basically just replace all calls to threading.Thread with multiprocessing.Process since they follow (almost) the same API.
There is concurrent.futures module in Python 3.2 (available via pip install futures for older Python < 3.2):
pool = Pool(max_workers=1)
f = pool.submit(subprocess.call, "sleep 2; echo done", shell=True)
f.add_done_callback(callback)
The callback will be called in the same process that called f.add_done_callback().
Full program
import logging
import subprocess
# to install run `pip install futures` on Python <3.2
from concurrent.futures import ThreadPoolExecutor as Pool
info = logging.getLogger(__name__).info
def callback(future):
if future.exception() is not None:
info("got exception: %s" % future.exception())
else:
info("process returned %d" % future.result())
def main():
logging.basicConfig(
level=logging.INFO,
format=("%(relativeCreated)04d %(process)05d %(threadName)-10s "
"%(levelname)-5s %(msg)s"))
# wait for the process completion asynchronously
info("begin waiting")
pool = Pool(max_workers=1)
f = pool.submit(subprocess.call, "sleep 2; echo done", shell=True)
f.add_done_callback(callback)
pool.shutdown(wait=False) # no .submit() calls after that point
info("continue waiting asynchronously")
if __name__=="__main__":
main()
Output
$ python . && python3 .
0013 05382 MainThread INFO begin waiting
0021 05382 MainThread INFO continue waiting asynchronously
done
2025 05382 Thread-1 INFO process returned 0
0007 05402 MainThread INFO begin waiting
0014 05402 MainThread INFO continue waiting asynchronously
done
2018 05402 Thread-1 INFO process returned 0
I modified Daniel G's answer to simply pass the subprocess.Popen args and kwargs as themselves instead of as a separate tuple/list, since I wanted to use keyword arguments with subprocess.Popen.
In my case I had a method postExec() that I wanted to run after subprocess.Popen('exe', cwd=WORKING_DIR)
With the code below, it simply becomes popenAndCall(postExec, 'exe', cwd=WORKING_DIR)
import threading
import subprocess
def popenAndCall(onExit, *popenArgs, **popenKWArgs):
"""
Runs a subprocess.Popen, and then calls the function onExit when the
subprocess completes.
Use it exactly the way you'd normally use subprocess.Popen, except include a
callable to execute as the first argument. onExit is a callable object, and
*popenArgs and **popenKWArgs are simply passed up to subprocess.Popen.
"""
def runInThread(onExit, popenArgs, popenKWArgs):
proc = subprocess.Popen(*popenArgs, **popenKWArgs)
proc.wait()
onExit()
return
thread = threading.Thread(target=runInThread,
args=(onExit, popenArgs, popenKWArgs))
thread.start()
return thread # returns immediately after the thread starts
I had same problem, and solved it using multiprocessing.Pool. There are two hacky tricks involved:
make size of pool 1
pass iterable arguments within an iterable of length 1
result is one function executed with callback on completion
def sub(arg):
print arg #prints [1,2,3,4,5]
return "hello"
def cb(arg):
print arg # prints "hello"
pool = multiprocessing.Pool(1)
rval = pool.map_async(sub,([[1,2,3,4,5]]),callback =cb)
(do stuff)
pool.close()
In my case, I wanted invocation to be non-blocking as well. Works beautifully
I was inspired by Daniel G. answer and implemented a very simple use case - in my work I often need to make repeated calls to the same (external) process with different arguments. I had hacked a way to determine when each specific call was done, but now I have a much cleaner way to issue callbacks.
I like this implementation because it is very simple, yet it allows me to issue asynchronous calls to multiple processors (notice I use multiprocessing instead of threading) and receive notification upon completion.
I tested the sample program and works great. Please edit at will and provide feedback.
import multiprocessing
import subprocess
class Process(object):
"""This class spawns a subprocess asynchronously and calls a
`callback` upon completion; it is not meant to be instantiated
directly (derived classes are called instead)"""
def __call__(self, *args):
# store the arguments for later retrieval
self.args = args
# define the target function to be called by
# `multiprocessing.Process`
def target():
cmd = [self.command] + [str(arg) for arg in self.args]
process = subprocess.Popen(cmd)
# the `multiprocessing.Process` process will wait until
# the call to the `subprocess.Popen` object is completed
process.wait()
# upon completion, call `callback`
return self.callback()
mp_process = multiprocessing.Process(target=target)
# this call issues the call to `target`, but returns immediately
mp_process.start()
return mp_process
if __name__ == "__main__":
def squeal(who):
"""this serves as the callback function; its argument is the
instance of a subclass of Process making the call"""
print "finished %s calling %s with arguments %s" % (
who.__class__.__name__, who.command, who.args)
class Sleeper(Process):
"""Sample implementation of an asynchronous process - define
the command name (available in the system path) and a callback
function (previously defined)"""
command = "./sleeper"
callback = squeal
# create an instance to Sleeper - this is the Process object that
# can be called repeatedly in an asynchronous manner
sleeper_run = Sleeper()
# spawn three sleeper runs with different arguments
sleeper_run(5)
sleeper_run(2)
sleeper_run(1)
# the user should see the following message immediately (even
# though the Sleeper calls are not done yet)
print "program continued"
Sample output:
program continued
finished Sleeper calling ./sleeper with arguments (1,)
finished Sleeper calling ./sleeper with arguments (2,)
finished Sleeper calling ./sleeper with arguments (5,)
Below is the source code of sleeper.c - my sample "time consuming" external process
#include<stdlib.h>
#include<unistd.h>
int main(int argc, char *argv[]){
unsigned int t = atoi(argv[1]);
sleep(t);
return EXIT_SUCCESS;
}
compile as:
gcc -o sleeper sleeper.c
There is also ProcesPoolExecutor since 3.2 in concurrent.futures (https://docs.python.org/3/library/concurrent.futures.html). The usage is as of the ThreadPoolExecutor mentioned above. With on exit callback being attached via executor.add_done_callback().
Thanks guys, for pointing me into the right direction. I made a class from what I found here and added a stop-function to kill the process:
class popenplus():
def __init__(self, onExit, *popenArgs, **popenKWArgs):
thread = Thread(target=self.runInThread, args=(onExit, popenArgs, popenKWArgs))
thread.start()
def runInThread(self, onExit, popenArgs, popenKWArgs):
self.proc = Popen(*popenArgs, **popenKWArgs)
self.proc.wait()
self.proc = None
onExit()
def stop(self):
if self.proc:
self.proc.kill()
On POSIX systems, the parent process receives a SIGCHLD signal when a child process exits. To run a callback when a subprocess command exits, handle the SIGCHLD signal in the parent. Something like this:
import signal
import subprocess
def sigchld_handler(signum, frame):
# This is run when the child exits.
# Do something here ...
pass
signal.signal(signal.SIGCHLD, sigchld_handler)
process = subprocess.Popen('mycmd', shell=TRUE)
Note that this will not work on Windows.
AFAIK there is no such API, at least not in subprocess module. You need to roll something on your own, possibly using threads.

Categories

Resources