Running a Python Subprocess - python

All,
I have read several threads on how to run subprocesses in python and none of them seem to help me. It's probably because I don't know how to use them properly. I have several methods that I would like to run at the same time rather than in sequence and I thought that the subprocess module would do this for me.
def services():
services = [method1(),
method2(),
method3(),
mrthod4(),
method5()]
return services
def runAll():
import subprocess
for i in services():
proc = subprocess.call(i,shell=True)
The problem with this approach is that method1() starts and method2() doesn't start until 1 finishes. I have tried several approaches including using subprocess.Popen[] in my services method with no luck. Can anyone lend me a hand on how to get methods 1-5 running at the same time?
Thanks,
Adam

According to the Python documentation subprocess.call() waits for the command to complete. You should directly use the subprocess.Popen objects which will give you the flexibility you need.

In python 3.2.x the concurrent futures module makes this sort of things very easy.

Python threads are more appropriate to what you are looking for: http://docs.python.org/library/threading.html or even the multiprocessing module: http://docs.python.org/library/multiprocessing.html#module-multiprocessing.

By saying method1(), you're calling the function and waiting for it to return. (It's a function, not a method.)
If you just want to run a bunch of heavy-duty function in parallel and collect their result, you can use joblib:
from joblib import Parallel, delayed
functions = [fn1, fn2, fn3, fn4]
results = Parallel(n_jobs=4)(delayed(f)() for f in functions)

subprocess.call() blocks until the process completes.
multiprocessing sounds more appropriate for what you are doing.
for example:
from multiprocessing import Process
def f1():
while True:
print 'foo'
def f2():
while True:
print 'bar'
def f3():
while True:
print 'baz'
if __name__ == '__main__':
for func in (f1, f2, f3):
Process(target=func).start()

You need to use & to execute them asynchronously. Here is an example:
subprocess.call("./foo1&", shell=True)
subprocess.call("./foo2&", shell=True)
This is just like the ordinary unix shell.
EDIT: Though there are multiple, much better ways to do this. See the other answers for some examples.

subprocess does not make the processes asynchronous. What you are trying to achieve can be done using multithreading or multiprocessing module.

I had a similar problem recently, and solved it like this:
from multiprocessing import Pool
def parallelfuncs(funcs, args, results, bad_value = None):
p = Pool()
waiters = []
for f in funcs:
waiters.append(p.apply_async(f, args, callback = results.append))
p.close()
for w in waiters:
if w.get()[0] == bad_value:
p.terminate()
return p
The nice thing is that the functions funcs are executed in parallel on args (kind of the reverse of map), and the result returned. The Pool of multiprocessing uses all processors and handles job execution.
w.get blocks, if that wasn't clear.
For your use case, you would call
results = []
parallelfuncs(services, args, results).join()
print results

Related

How do I run two looping functions parallel to each other? [duplicate]

Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

Using os.popen() or subprocess to execute functions

I'm currently studying threading, multiprocess, and os documentations to improve the structure of my program. However to be honest to me, some of it is sophisticated, I can't get it to implement on my program, either it crashes due to stackoverflow, or gets the wrong ourput or no output at all. So here's my problem.
Let's say I have a list of names that gets passed into a function and that function is what I want to run in another console with - ofcourse a python interpretter. and have it run there in a full cycle.
Let's say I have this:
def execute_function(name, arg1, arg2):
While True:
#do something
for name in names:
execute_function(name, arg1, arg2)
what should I use in order to run this function to open another console programatically on python and run it there While True: is it subproccess/multiprocess/threading or perhaps os.popen()?
And how should I execute, in this example? The multiprocessing Pool and Process always crashes with me. So I think its not the right solution. So far from what I've searched I haven't seen examples with threading and subprocess being used with functions. Is there a workaround on this? or perhaps a simple solution I might have missed? Thanks.
Edit:
A similar code:
if symbols is not None and symbols1 is not None:
symbols = [x for x in symbols if x is not None]
symbols1 = [x for x in symbols1 if x is not None]
if symbol != None and symbol in symbols and symbol in symbols1:
with Pool(len(exchanges)) as p:
p.map(bot_algorithm, (a, b, symbol, expent,amount))
http://prntscr.com/j4viat - what the error looks like
subprocess is always usually preferred over os.system().
The docs contain a number of examples - in your case, your execute_function() function might want to use subprocess.check_output() if you want to see the results of the command.
eg.:
def execute_function(name, arg1, arg2):
output = subprocess.check_output(["echo", name])
print(output)
All this does though is launch a new process, and waits for it to return. While that's technically two processes, it's not exactly what you'd call multi-threading.
To use run multiple subprocesses at synchronously, you might do something like this with the multiprocessing library:
from multiprocessing.dummy import Pool
def execute_function(name, arg1, arg2):
return subprocess.check_output(["echo", name])
names = ["alex", "bob", "chrissy"]
pool = Pool()
map_results = pool.map(execute_function, names)
this maps an iterator (the list of names) to a function (execute_function) and runs them all at once. Well, as many cores as your machine has at once. map_results is a list of return values from the execute_function func.

Run methods in parallel

I have a program that, among other things, parses some big files, and I would like to have this done in parallel to save time.
The code flow looks something like this:
if __name__ == '__main__':
obj = program_object()
obj.do_so_some_stuff(argv)
obj.field1 = parse_file_one(f1)
obj.field2 = parse_file_two(f2)
obj.do_some_more_stuff()
I tried running the file parsing methods in separate processes like this:
p_1 = multiprocessing.Process(target=parse_file_one, args=(f1))
p_2 = multiprocessing.Process(target=parse_file_two, args=(f2))
p_1.start()
p_2.start()
p_1.join()
p_2.join()
There are 2 problems here. One is how to have the separate process modify the filed, but more importantly, forking the process duplicates my whole main! I get exception regarding argv when executing the
do_so_some_stuff(argv)
second time. That really is not what I wanted. It even happened when I run only 1 of the Processes.
How could I get just the file parsing methods to run in parallel to each other, and then continue back with main process like before?
Try putting the parsing methods in a separate module.
First, i guess instead of:
obj = program_object()
program_object.do_so_some_stuff(argv)
you mean:
obj = program_object()
obj.do_so_some_stuff(argv)
Second, try using threading like this:
#!/usr/bin/python
import thread
if __name__ == '__main__':
try:
thread.start_new_thread( parse_file_one, (f1) )
thread.start_new_thread( parse_file_two, (f2) )
except:
print "Error: unable to start thread"
But, as pointed out by Wooble, depending on the implementation of your parsing functions, this might not be a solution that executes truly in parallel, because of the GIL.
In that case, you should check the Python multiprocessing module that will do true concurrent execution:
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine.

basic multiprocessing with python

I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)
You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))
Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?
How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg
Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is
What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.
I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Categories

Resources