Using os.popen() or subprocess to execute functions - python

I'm currently studying threading, multiprocess, and os documentations to improve the structure of my program. However to be honest to me, some of it is sophisticated, I can't get it to implement on my program, either it crashes due to stackoverflow, or gets the wrong ourput or no output at all. So here's my problem.
Let's say I have a list of names that gets passed into a function and that function is what I want to run in another console with - ofcourse a python interpretter. and have it run there in a full cycle.
Let's say I have this:
def execute_function(name, arg1, arg2):
While True:
#do something
for name in names:
execute_function(name, arg1, arg2)
what should I use in order to run this function to open another console programatically on python and run it there While True: is it subproccess/multiprocess/threading or perhaps os.popen()?
And how should I execute, in this example? The multiprocessing Pool and Process always crashes with me. So I think its not the right solution. So far from what I've searched I haven't seen examples with threading and subprocess being used with functions. Is there a workaround on this? or perhaps a simple solution I might have missed? Thanks.
Edit:
A similar code:
if symbols is not None and symbols1 is not None:
symbols = [x for x in symbols if x is not None]
symbols1 = [x for x in symbols1 if x is not None]
if symbol != None and symbol in symbols and symbol in symbols1:
with Pool(len(exchanges)) as p:
p.map(bot_algorithm, (a, b, symbol, expent,amount))
http://prntscr.com/j4viat - what the error looks like

subprocess is always usually preferred over os.system().
The docs contain a number of examples - in your case, your execute_function() function might want to use subprocess.check_output() if you want to see the results of the command.
eg.:
def execute_function(name, arg1, arg2):
output = subprocess.check_output(["echo", name])
print(output)
All this does though is launch a new process, and waits for it to return. While that's technically two processes, it's not exactly what you'd call multi-threading.
To use run multiple subprocesses at synchronously, you might do something like this with the multiprocessing library:
from multiprocessing.dummy import Pool
def execute_function(name, arg1, arg2):
return subprocess.check_output(["echo", name])
names = ["alex", "bob", "chrissy"]
pool = Pool()
map_results = pool.map(execute_function, names)
this maps an iterator (the list of names) to a function (execute_function) and runs them all at once. Well, as many cores as your machine has at once. map_results is a list of return values from the execute_function func.

Related

multiprocessing in python using pool.map_async

Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()
It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.

Running python in python block by block capturing the output

As a pet project, I want to build something similar to the Jupyter notebook. Given an array of strings, each of which is a peace of python code, I would like to run each piece one by one in a single python process and then associate blocks of output with each piece of code. I would also like to manage it all in another (parent) python process.
To make the problem tangible, let's say I have a list of strings, each is a piece of python code. One string use variables from the preceding piece of code, i.e. they should all be run in a single process. Now I want to run one piece of code, wait until it finishes, capture the output, then run the next piece, and so on.
Unfortunately, googling around only gave me an example, where I can run peace of code, using subprocess.Popen('python', stdout=PIPE, ...), but with this approach, it will start executing my command after I close stdin, effectively closing the whole python process.
You can use contextlib.redirect_stdout from the standard library to capture the output of exec() calls. With that, your idea of code blocks (as I understand them) is straightforward to implement:
import io
from contextlib import redirect_stdout
class Block:
def __init__(self, code=''):
self.code = code
self.stdout = io.StringIO()
def run(self):
with redirect_stdout(self.stdout):
exec(self.code, globals()) # Pass global variable dict to allow modification
#property
def output(self):
return self.stdout.getvalue()
>>> b1 = Block('a = 42; print(a)')
>>> b2 = Block('print(1/a)')
>>> b1.run()
>>> b2.run()
>>> b1.output
'42\n'
>>> b2.output
'0.023809523809523808\n'

basic multiprocessing with python

I have found information on multiprocessing and multithreading in python but I don't understand the basic concepts and all the examples that I found are more difficult than what I'm trying to do.
I have X independent programs that I need to run. I want to launch the first Y programs (where Y is the number of cores of my computer and X>>Y). As soon as one of the independent programs is done, I want the next program to run in the next available core. I thought that this would be straightforward, but I keep getting stuck on it. Any help in solving this problem would be much appreciated.
Edit: Thanks a lot for your answers. I also found another solution using the joblib module that I wanted to share. Suppose that you have a script called 'program.py' that you want to run with different combination of the input parameters (a0,b0,c0) and you want to use all your cores. This is a solution.
import os
from joblib import Parallel, delayed
a0 = arange(0.1,1.1,0.1)
b0 = arange(-1.5,-0.4,0.1)
c0 = arange(1.,5.,0.1)
params = []
for i in range(len(a0)):
for j in range(len(b0)):
for k in range(len(c0)):
params.append((a0[i],b0[j],c0[k]))
def func(parameters):
s = 'python program.py %g %g %g' % parameters[0],parameters[1],parameters[2])
command = os.system(s)
return command
output = Parallel(n_jobs=-1,verbose=1000)(delayed(func)(i) for i in params)
You want to use multiprocessing.Pool, which represents a "pool" of workers (default one per core, though you can specify another number) that do your jobs. You then submit jobs to the pool, and the workers handle them as they become available. The easiest function to use is Pool.map, which runs a given function for each of the arguments in the passed sequence, and returns the result for each argument. If you don't need return values, you could also use apply_async in a loop.
def do_work(arg):
pass # do whatever you actually want to do
def run_battery(args):
# args should be like [arg1, arg2, ...]
pool = multiprocessing.Pool()
ret_vals = pool.map(do_work, arg_tuples)
pool.close()
pool.join()
return ret_vals
If you're trying to call external programs and not just Python functions, use subprocess. For example, this will call cmd_name with the list of arguments passed, raise an exception if the return code isn't 0, and return the output:
def do_work(subproc_args):
return subprocess.check_output(['cmd_name'] + list(subproc_args))
Hi i'm using the object QThread from pyqt
From what i understood, your thread when he is running can only use his own variable and proc, he cannot change your main object variables
So before you run it be sur to define all the qthread variables you will need
like this for example:
class worker(QThread)
def define(self, phase):
print 'define'
self.phase=phase
self.start()#will run your thread
def continueJob(self):
self.start()
def run(self):
self.launchProgramme(self.phase)
self.phase+=1
def launchProgramme(self):
print self.phase
i'm not well aware of how work the basic python thread but in pyqt your thread launch a signal
to your main object like this:
class mainObject(QtGui.QMainWindow)
def __init__(self):
super(mcMayaClient).__init__()
self.numberProgramme=4
self.thread = Worker()
#create
self.connect(self.thread , QtCore.SIGNAL("finished()"), self.threadStoped)
self.connect(self.thread , QtCore.SIGNAL("terminated()"), self.threadStopped)
connected like this, when the thread.run stop, it will launch your threadStopped proc in your main object where u can get the value of your thread Variables
def threadStopped(self):
value=self.worker.phase
if value<self.numberProgramme:
self.worker.continueJob()
after that you just have to lauch another thread or not depending of the value you get
This is for pyqt threading of course, in python basic thread, the way to execute the def threadStopped could be different.

Running a Python Subprocess

All,
I have read several threads on how to run subprocesses in python and none of them seem to help me. It's probably because I don't know how to use them properly. I have several methods that I would like to run at the same time rather than in sequence and I thought that the subprocess module would do this for me.
def services():
services = [method1(),
method2(),
method3(),
mrthod4(),
method5()]
return services
def runAll():
import subprocess
for i in services():
proc = subprocess.call(i,shell=True)
The problem with this approach is that method1() starts and method2() doesn't start until 1 finishes. I have tried several approaches including using subprocess.Popen[] in my services method with no luck. Can anyone lend me a hand on how to get methods 1-5 running at the same time?
Thanks,
Adam
According to the Python documentation subprocess.call() waits for the command to complete. You should directly use the subprocess.Popen objects which will give you the flexibility you need.
In python 3.2.x the concurrent futures module makes this sort of things very easy.
Python threads are more appropriate to what you are looking for: http://docs.python.org/library/threading.html or even the multiprocessing module: http://docs.python.org/library/multiprocessing.html#module-multiprocessing.
By saying method1(), you're calling the function and waiting for it to return. (It's a function, not a method.)
If you just want to run a bunch of heavy-duty function in parallel and collect their result, you can use joblib:
from joblib import Parallel, delayed
functions = [fn1, fn2, fn3, fn4]
results = Parallel(n_jobs=4)(delayed(f)() for f in functions)
subprocess.call() blocks until the process completes.
multiprocessing sounds more appropriate for what you are doing.
for example:
from multiprocessing import Process
def f1():
while True:
print 'foo'
def f2():
while True:
print 'bar'
def f3():
while True:
print 'baz'
if __name__ == '__main__':
for func in (f1, f2, f3):
Process(target=func).start()
You need to use & to execute them asynchronously. Here is an example:
subprocess.call("./foo1&", shell=True)
subprocess.call("./foo2&", shell=True)
This is just like the ordinary unix shell.
EDIT: Though there are multiple, much better ways to do this. See the other answers for some examples.
subprocess does not make the processes asynchronous. What you are trying to achieve can be done using multithreading or multiprocessing module.
I had a similar problem recently, and solved it like this:
from multiprocessing import Pool
def parallelfuncs(funcs, args, results, bad_value = None):
p = Pool()
waiters = []
for f in funcs:
waiters.append(p.apply_async(f, args, callback = results.append))
p.close()
for w in waiters:
if w.get()[0] == bad_value:
p.terminate()
return p
The nice thing is that the functions funcs are executed in parallel on args (kind of the reverse of map), and the result returned. The Pool of multiprocessing uses all processors and handles job execution.
w.get blocks, if that wasn't clear.
For your use case, you would call
results = []
parallelfuncs(services, args, results).join()
print results

simplifying threading in python

I am looking for a way to ease my threaded code.
There are a lot of places in my code where I do something like:
for arg in array:
t=Thread(lambda:myFunction(arg))
t.start()
i.e running the same function, each time for different parameters, in threads.
This is of course a simplified version of the real code, and usually the code inside the for loop is ~10-20 lines long, that cannot be made simple by using one auxiliary function like myFunction in the example above (had that been the case, I could've just used a thread pool).
Also, this scenario is very, very common in my code, so there are tons of lines which I consider redundant. It would help me a lot if I didn't need to handle all this boilerplate code, but instead be able to do something like:
for arg in array:
with threaded():
myFunction(arg)
i.e somehow threaded() takes every line of code inside it and runs it in a separate thread.
I know that context managers aren't supposed to be used in such situations, that it's probably a bad idea and will require an ugly hack, but nonetheless - can it be done, and how?
How about this:
for arg in array:
def _thread():
# code here
print arg
t = Thread(_thread)
t.start()
additionally, with decorators, you can sugar it up a little:
def spawn_thread(func):
t = Thread(func)
t.start()
return t
for arg in array:
#spawn_thread
def _thread():
# code here
print arg
Would a thread pool help you here? Many implementations for Python exist, for example this one.
P.S: still interested to know what your exact use-case is
What you want is a kind of "contextual thread pool".
Take a look at the ThreadPool class in this module, designed to be used similar to the manner you've given. Use would be something like this:
with ThreadPool() as pool:
for arg in array:
pool.add_thread(target=myFunction, args=[arg])
Failures in any task given to a ThreadPool will flag an error, and perform the standard error backtrace handling.
I think you're over-complicating it. This is the "pattern" I use:
# util.py
def start_thread(func, *args):
thread = threading.Thread(target=func, args=args)
thread.setDaemon(True)
thread.start()
return thread
# in another module
import util
...
for arg in array:
util.start_thread(myFunction, arg)
I don't see the big deal about having to create myFunction. You could even define the function inline with the function that starts it.
def do_stuff():
def thread_main(arg):
print "I'm a new thread with arg=%s" % arg
for arg in array:
util.start_thread(thread_main, arg)
If you're creating a large number of threads, a thread pool definitely makes more sense. You can easily make your own with the Queue and threading modules. Basically create a jobs queue, create N worker threads, give each thread a "pointer" to the queue and have them pull jobs from the queue and process them.

Categories

Resources