I have problems with python multiprocessing
python version 3.6.6
using Spyder IDE on windows 7
1.
queue is not being populated -> everytime I try to read it, its empty. Somewhere I read, that I have to get() it before process join() but it did not solve it.
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
if __name__ == '__main__':
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
>>>created
>>>started
>>>True
>>>finished
I would expect output to not be empty.
if I change it from .join() to
for p in processes:
print(output.get())
#p.join()
it freezes
2.
Next problem I have is with pool.map() - it freezes and has no chance to exceed memory limit. I dont even know how to debug such simple pieace of code.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4)
print('Pool created')
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10))) # it freezes here
Hope its not a big deal to have two questions in one topic
Apperently the problem is Spyder's IPython console. When I run both from cmd, its executed properly.
Solution
for debugging in Spyder add .dummy to multiprocessing import
from multiprocessing.dummy import Process,Queue
It will not be executed by more processors, but you will get results and can actualy see the output. When debugging is done simply delete .dummy, place it in another file, import it and call it for example as function
multiprocessing_my.py
from multiprocessing import Process,Queue
# define a example function
def fnc(i, output):
output.put(i)
print(i)
def test():
# Define an output queue
output = Queue()
# Setup a list of processes that we want to run
processes = [Process(target=fnc, args=(i, output)) for i in range(4)]
print('created')
# Run processes
for p in processes:
p.start()
print('started')
# Exit the completed processes
for p in processes:
p.join()
print(output.empty())
print('finished')
# Get process results from the output queue
results = [output.get() for p in processes]
print('get results')
print(results)
test_mp.py
executed by selecting code and pressing ctrl+Enter
import multiprocessing_my
multiprocessing_my.test2()
...
In[9]: test()
created
0
1
2
3
started
False
finished
get results
[0, 1, 2, 3]
Related
I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.
Here is the code:
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def fun2():
print('starting fun2')
time.sleep(5)
print('finishing fun2')
def fun3():
print('starting fun3')
print('finishing fun3')
if __name__ == '__main__':
processes = []
print('starting main')
for i in [fun, fun2, fun3]:
p = Process(target=i)
p.start()
processes.append(p)
for p in processes:
p.join()
print('finishing main')
g=0
print("g",g)
I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:
starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):
starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.
On waiting the Process to finish:
You can just Process.join your list, something like
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
processes = [
multiprocessing.Process(target=func1),
multiprocessing.Process(target=func2),
multiprocessing.Process(target=func3),
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
But if you're thinking about giving your process more complexity, try using a Pool:
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
result = []
with multiprocessing.Pool() as pool:
result.append(pool.apply_async(func1))
result.append(pool.apply_async(func2))
result.append(pool.apply_async(func3))
for r in result:
r.wait()
if __name__ == '__main__':
main()
More info on Pool
On why g0 prints multiple times:
This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.
From the docs:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
(...)
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
Similar restrictions apply if a pool or manager is created in the main module.
It's basically interpreting again because it's importing your .py file as a module.
I've written the following code which runs a function that simulates a stochastic simulation of a series of chemical reactions. I've written the following code:
v = range(1, 51)
def parallelfunc(*v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
if __name__ == '__main__':
info('main line')
start = datetime.utcnow()
p = Process(target=parallelfunc, args=(v))
p.start()
p.join()
end = datetime.utcnow()
sim_time = end - start
print(f"Simualtion utc time:\n{sim_time}")
I'm using the Process method from the multiprocessing library and am trying to run gillespie_tau_leaping 50 times.
Only I'm not sure if its working. gillespie_tau_leaping prints out a number of values to the terminal, but these values are only printed out once, I'd expect them to be printed out 50 times.
I tried using the getpid etc command and this returns the following to the terminal:
main line
module name: __main__
parent process: 6188
process id: 27920
How can I tell if my code as worked and how can I get it to print the values from gillepsie_tau_leaping 50 times to the terminal?
Cheers
Your code is running just one process, the call to Process, spawns a new thread but you are doing it only once (not in a loop).
I would suggest you to use multiprocessing pools
Your code can be something like this:
from multiprocess import Pool
def parallelfunc(*args):
do_something()
def main():
# create a list of list of args for the function invocation
func_args = [['arg1call1', 'arg2call1', 'arg3call1'], ['arg1call2', 'arg2call2', 'arg3call2']]
with Pool() as p:
results = p.map(parallelfunc, func_args)
# do something with results which is a list of results
multiprocessing pool by default create the same number of processes as your CPU cores and manage the process Pool till the end of the processing taking care of all the Inter Process Communication.
This is really handy because synchronizing processes can be hard.
Hope this helps
Below is my demo:
Thread xxx:
import xxx
import xxxx
def test_process1(hostname,proxy_param):
# just never run
try: # breakpoint 0
with open("/xxx","w+") as f: # breakpoint 1
f.write("something")
except Exception as e:
pass # just never run breakpoint 3
def test():
try:
a = Process(target=test_process1, args=(hostname,proxy_param))
a.start()
a.join() # you are blocking here. test_process1 not working and never quit
except Exception as e:
pass # breakpoint 4
function test_process1 just never run. No error, No breakpoint.
The test function code is in a big project, here is a demo.
Hope! this piece of code helps.
Workers list will get divided based on the number of processes in use.
Sample Code with ManagerList.
from subprocess import PIPE, Popen
from multiprocessing import Pool,Pipe
from multiprocessing import Process, Queue, Manager
def child_process(child_conn,output_list,messenger):
input_recvd = messenger["input"]
output_list.append(input_recvd)
print(input_recvd)
child_conn.close()
def parent_process(number_of_process=2):
workers_inputs = [{"input":"hello"}, {"input":"world"}]
with Manager() as manager:
processes = []
output_list = manager.list() # <-- can be shared between processes.
parent_conn, child_conn = Pipe()
for single_id_dict in workers_inputs:
pro_obj = Process(target=child_process, args=(child_conn,output_list,single_id_dict)) # Passing the list
pro_obj.start()
processes.append(pro_obj)
for p in processes:
p.join()
output_list = [single_feature for single_feature in output_list]
return output_list
parent_process()
OUTPUT:
hello
world
['hello', 'world']
ManagerList is useful to get the output from various parallel process it's like an inbuilt Queue Mechanism with easy to use and safe from deadlocks.
I am running the following (example) code:
from multiprocessing import Pool
def f(x):
return x*x
pool = Pool(processes=4)
print pool.map(f, range(10))
However, the code never finishes. What am I doing wrong?
The line
pool = Pool(processes=4)
completes successfully, it appears to stop in the last line. Not even pressing ctrl+c interrupts the execution. I am running the code inside an ipython console in Spyder.
from multiprocessing import Pool
def f(x):
return x * x
def main():
pool = Pool(processes=3) # set the processes max number 3
result = pool.map(f, range(10))
pool.close()
pool.join()
print(result)
print('end')
if __name__ == "__main__":
main()
The key step is to call pool.close() and pool.join() after the processes finished. Otherwise the pool is not released.
Besides, you should create the pool in the main process by putting the codes within if __name__ == "__main__":
Your constructor is throwing the interpreter off into a thread producing factory for some reason.
You first need to stop all the threads are now running and there will be tons. If you bring up the task manager you will see tons of rogue python.exe tasks. To kill them in bulk try:
taskkill /F /IM python.exe
You would need to do the above a couple of times and make sure the task manager does not show anymore python.exe tasks. This will also kill you spyder instance. So make sure you save.
Now change your code to the following:
from multiprocessing import Pool
def f(x):
return x*x
if (__name__ == '__main__'):
pool = Pool(4)
print pool.map(f, range(10))
Note that I have removed the processes named argument.
My question is hopefully particular enough to not relate to any of the other ones that I've read. I'm wanting to use subprocess and multiprocessing to spawn a bunch of jobs serially and return the return code to me. The problem is that I don't want to wait() so I can spawn the jobs all at once, but I do want to know when it finishes so I can get the return code. I'm having this weird problem where if I poll() the process it won't run. It just hangs out in the activity monitor without running (I'm on a Mac). I thought I could use a watcher thread, but I'm hanging on the q_out.get() which is leading me to believe that maybe I'm filling up the buffer and deadlocking. I'm not sure how to get around this. This is basically what my code looks like. If anyone has any better ideas on how to do this I would be happy to completely change my approach.
def watchJob(p1,out_q):
while p1.poll() == None:
pass
print "Job is done"
out_q.put(p1.returncode)
def runJob(out_q):
LOGFILE = open('job_to_run.log','w')
p1 = Popen(['../../bin/jobexe','job_to_run'], stdout = LOGFILE)
t = threading.Thread(target=watchJob, args=(p1,out_q))
t.start()
out_q= Queue()
outlst=[]
for i in range(len(nprocs)):
proc = Process(target=runJob, args=(out_q,))
proc.start()
outlst.append(out_q.get()) # This hangs indefinitely
proc.join()
You don't need neither multiprocessing nor threading here. You could run multiple child processes in parallel and collect their statutes all in a single thread:
#!/usr/bin/env python3
from subprocess import Popen
def run(cmd, log_filename):
with open(log_filename, 'wb', 0) as logfile:
return Popen(cmd, stdout=logfile)
# start several subprocesses
processes = {run(['echo', c], 'subprocess.%s.log' % c) for c in 'abc'}
# now they all run in parallel
# report as soon as a child process exits
while processes:
for p in processes:
if p.poll() is not None:
processes.remove(p)
print('{} done, status {}'.format(p.args, p.returncode))
break
p.args stores cmd in Python 3.3+, keep track of cmd yourself on earlier Python versions.
See also:
Python threading multiple bash subprocesses?
Python subprocess in parallel
Python: execute cat subprocess in parallel
Using Python's Multiprocessing module to execute simultaneous and separate SEAWAT/MODFLOW model runs
To limit number of parallel jobs a ThreadPool could be used (as shown in the first link):
#!/usr/bin/env python3
from multiprocessing.dummy import Pool # use threads
from subprocess import Popen
def run_until_done(args):
cmd, log_filename = args
try:
with open(log_filename, 'wb', 0) as logfile:
p = Popen(cmd, stdout=logfile)
return cmd, p.wait(), None
except Exception as e:
return cmd, None, str(e)
commands = ((('echo', str(d)), 'subprocess.%03d.log' % d) for d in range(500))
pool = Pool(128) # 128 concurrent commands at a time
for cmd, status, error in pool.imap_unordered(run_until_done, commands):
if error is None:
fmt = '{cmd} done, status {status}'
else:
fmt = 'failed to run {cmd}, reason: {error}'
print(fmt.format_map(vars())) # or fmt.format(**vars()) on older versions
The thread pool in the example has 128 threads (no more, no less). It can't execute more than 128 jobs concurrently. As soon as any of the threads frees (done with a job), it takes another, etc. Total number of jobs that is executed concurrently is limited by the number of threads. New job doesn't wait for all 128 previous jobs to finish. It is started when any of the old jobs is done.
If you're going to run watchJob in a thread, there's no reason to busy-loop with p1.poll; just call p1.wait() to block until the process finishes. Using the busy loop requires the GIL to constantly be released/re-acquired, which slows down the main thread, and also pegs the CPU, which hurts performance even more.
Also, if you're not using the stdout of the child process, you shouldn't send it to PIPE, because that could cause a deadlock if the process writes enough data to the stdout buffer to fill it up (which may actually be what's happening in your case). There's also no need to use multiprocessing here; just call Popen in the main thread, and then have the watchJob thread wait on the process to finish.
import threading
from subprocess import Popen
from Queue import Queue
def watchJob(p1, out_q):
p1.wait()
out_q.put(p1.returncode)
out_q = Queue()
outlst=[]
p1 = Popen(['../../bin/jobexe','job_to_run'])
t = threading.Thread(target=watchJob, args=(p1,out_q))
t.start()
outlst.append(out_q.get())
t.join()
Edit:
Here's how to run multiple jobs concurrently this way:
out_q = Queue()
outlst = []
threads = []
num_jobs = 3
for _ in range(num_jobs):
p = Popen(['../../bin/jobexe','job_to_run'])
t = threading.Thread(target=watchJob, args=(p1, out_q))
t.start()
# Don't consume from the queue yet.
# All jobs are running, so now we can start
# consuming results from the queue.
for _ in range(num_jobs):
outlst.append(out_q.get())
t.join()