I have one function which will take different inputs and I want that function to run parallely. Below is what I tried , but it's not working due to time.sleep I think.
from multiprocessing import Process
from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
l1 = Queue()
a = Process(target=f('Tom'))
a.start()
l2 = Queue()
b = Process(target=f("Stock"))
b.start()
print (l1.get())
print (l2.get())
I want the function to run parallely. Currently the function waits for 10 seconds before it goes to the second execution.
See the comment posted by #juanpa.arrivillaga. Just specify as the target argument of the Process constructor the name of the function; do not call the function. The arguments to your target are specified separately as the args argument, either as a tuple or a list.
Also, since f is not returning a useful value, there is no reason to have Queue instances for the results. In any case, you are not passing the Queue instances to f and nothing is being put to the queue so I can't understand why you would be attempting to issue calls to get against these Queue instances; these calls will hang forever. The argument you do need to be passing to f is the name, as follows:
from multiprocessing import Process
#from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
start_time = time.time()
a = Process(target=f, args=('Tom',))
a.start()
b = Process(target=f, args=("Stock",))
b.start()
# Wait for processes to complete
a.join()
b.join()
elapsed_time = time.time() - start_time
print(elapsed_time)
Prints:
hello Tom
hello Stock
10.212252855300903
If you were running this on a platform such as Windows, then you would need to place the process-creating code in a special block as follows;
from multiprocessing import Process
#from time import sleep
import time
def f(name):
print('hello', name)
time.sleep(10)
# Required on platforms that create processes with the "spawn" method:
if __name__ == '__main__':
start_time = time.time()
a = Process(target=f, args=('Tom',))
a.start()
b = Process(target=f, args=("Stock",))
b.start()
# Wait for processes to complete
a.join()
b.join()
elapsed_time = time.time() - start_time
print(elapsed_time)
Use Pool and starmap:
import multiprocessing
import time
def f(name, place):
print('hello', name, place)
time.sleep(5)
if __name__ == '__main__':
data = [('Louis', 'val1'), ('Paul', 'val2'), ('Alexandre', 'val3'),
('John', 'val4'), ('Tom', 'val5'), ('Bob', 'val6')]
with multiprocessing.Pool(2) as pool:
pool.starmap(f, data)
Read carefully the multiprocessing guidlines
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Related
I have a python script which calls a series of sub-processes. They need to run "for ever" - but they occasionally die, or get killed. When this happens I need to restart the process using the same arguments as the one which died.
This is a very simplified version:
[edit: this is the less simplified version, which includes "restart" code]
import multiprocessing
import time
import random
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
if __name__ == '__main__':
children = [] # list
args = {} # dictionary
for processNumber in range(10,15):
p = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
children.append(p)
p.start()
args[p.pid] = processNumber
while True:
time.sleep(1)
for n, p in enumerate(children):
if not p.is_alive():
#get parameters dead child was started with
pidArgs = args[p.pid]
del(args[p.pid])
print("n,args,p: ",n,pidArgs,p)
children.pop(n)
# start new process with same args
p = multiprocessing.Process(
target=printNumber,
args=(pidArgs,)
)
children.append(p)
p.start()
args[p.pid] = pidArgs
I have updated the example to illustrate how I want the processes to be restarted if one crashes/killed/etc - keeping track of which pid was started with which args.
Is this the "best" way to do this, or is there a more "python" way of doing this?
I think I would create a separate thread for each Process and use a ProcessPoolExecutor. Executors have a useful function, submit, which returns a Future. You can wait on each Future and re-launch the Executor when the Future is done. Arguments to the function are tracked as class variables, so restarting is just a simple loop.
import threading
from concurrent.futures import ProcessPoolExecutor
import time
import random
import traceback
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
class KeepRunning(threading.Thread):
def __init__(self, func, *args, **kwds):
self.func = func
self.args = args
self.kwds = kwds
super().__init__()
def run(self):
while True:
with ProcessPoolExecutor(max_workers=1) as pool:
future = pool.submit(self.func, *self.args, **self.kwds)
try:
future.result()
except Exception:
traceback.print_exc()
if __name__ == '__main__':
for process_number in range(10, 15):
keep = KeepRunning(printNumber, process_number)
keep.start()
while True:
time.sleep(1)
At the end of the program is a loop to keep the main thread running. Without that, the program will attempt to exit while your Processes are still running.
For the example you provided I would just remove the exit condition from the while loop and change it to True.
As you said though the actual code is more complicated (why didn't you post that?). So if the process gets terminated by lets say an exception just put the code inside a try catch block. You can then put said block in an infinite loop.
I hope this is what you are looking for but that seems to be the right way to do it provided the goal and information you provided.
Instead of just starting the process immediately, you can save the list of processes and their arguments, and create another process that checks they are alive.
For example:
if __name__ == '__main__':
process_list = []
for processNumber in range(5):
process = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
process_list.append((process,args))
process.start()
while True:
for running_process, process_args in process_list:
if not running_process.is_alive():
new_process = multiprocessing.Process(target=printNumber, args=(process_args))
process_list.remove(running_process, process_args) # Remove terminated process
process_list.append((new_process, process_args))
I must say that I'm not sure the best way to do it is in python, you may want to look at scheduler services like jenkins or something like that.
I am learning about Python multiprocessing and trying to understand how I can make my code wait for all processes to finish and then continue with the rest of the code. I thought join() method should do the job, but the output of my code is not what I expected from the using it.
Here is the code:
from multiprocessing import Process
import time
def fun():
print('starting fun')
time.sleep(2)
print('finishing fun')
def fun2():
print('starting fun2')
time.sleep(5)
print('finishing fun2')
def fun3():
print('starting fun3')
print('finishing fun3')
if __name__ == '__main__':
processes = []
print('starting main')
for i in [fun, fun2, fun3]:
p = Process(target=i)
p.start()
processes.append(p)
for p in processes:
p.join()
print('finishing main')
g=0
print("g",g)
I expected all processes under if __name__ == '__main__': to finish before the lines g=0 and print(g) are called, so something like this was expected:
starting main
starting fun2
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
But the actual output indicates that there's something I don't understand about join() (or multiprocessing in general):
starting main
g 0
g 0
starting fun2
g 0
starting fun
starting fun3
finishing fun3
finishing fun
finishing fun2
finishing main
g 0
The question is: How do I write the code that finishes all processes first and then continues with the code without multiprocessing, so that I get the former output? I run the code from command prompt on Windows, in case it matters.
On waiting the Process to finish:
You can just Process.join your list, something like
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
processes = [
multiprocessing.Process(target=func1),
multiprocessing.Process(target=func2),
multiprocessing.Process(target=func3),
]
for p in processes:
p.start()
for p in processes:
p.join()
if __name__ == '__main__':
main()
But if you're thinking about giving your process more complexity, try using a Pool:
import multiprocessing
import time
def func1():
time.sleep(1)
print('func1')
def func2():
time.sleep(2)
print('func2')
def func3():
time.sleep(3)
print('func3')
def main():
result = []
with multiprocessing.Pool() as pool:
result.append(pool.apply_async(func1))
result.append(pool.apply_async(func2))
result.append(pool.apply_async(func3))
for r in result:
r.wait()
if __name__ == '__main__':
main()
More info on Pool
On why g0 prints multiple times:
This is happening because you're using spawn or forkserver to set your Process and the g0 and print declarations are outside a function or the __main__ if block.
From the docs:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
(...)
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
Similar restrictions apply if a pool or manager is created in the main module.
It's basically interpreting again because it's importing your .py file as a module.
I've written the following code which runs a function that simulates a stochastic simulation of a series of chemical reactions. I've written the following code:
v = range(1, 51)
def parallelfunc(*v):
gillespie_tau_leaping(start_state, LHS, stoch_rate, state_change_array)
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
if __name__ == '__main__':
info('main line')
start = datetime.utcnow()
p = Process(target=parallelfunc, args=(v))
p.start()
p.join()
end = datetime.utcnow()
sim_time = end - start
print(f"Simualtion utc time:\n{sim_time}")
I'm using the Process method from the multiprocessing library and am trying to run gillespie_tau_leaping 50 times.
Only I'm not sure if its working. gillespie_tau_leaping prints out a number of values to the terminal, but these values are only printed out once, I'd expect them to be printed out 50 times.
I tried using the getpid etc command and this returns the following to the terminal:
main line
module name: __main__
parent process: 6188
process id: 27920
How can I tell if my code as worked and how can I get it to print the values from gillepsie_tau_leaping 50 times to the terminal?
Cheers
Your code is running just one process, the call to Process, spawns a new thread but you are doing it only once (not in a loop).
I would suggest you to use multiprocessing pools
Your code can be something like this:
from multiprocess import Pool
def parallelfunc(*args):
do_something()
def main():
# create a list of list of args for the function invocation
func_args = [['arg1call1', 'arg2call1', 'arg3call1'], ['arg1call2', 'arg2call2', 'arg3call2']]
with Pool() as p:
results = p.map(parallelfunc, func_args)
# do something with results which is a list of results
multiprocessing pool by default create the same number of processes as your CPU cores and manage the process Pool till the end of the processing taking care of all the Inter Process Communication.
This is really handy because synchronizing processes can be hard.
Hope this helps
I need to terminate some processes after a while, so I've used sleeping another process for the waiting. But the new process doesn't have access to global variables from the main process I guess. How could I solve it please?
Code:
import os
from subprocess import Popen, PIPE
import time
import multiprocessing
log_file = open('stdout.log', 'a')
log_file.flush()
err_file = open('stderr.log', 'a')
err_file.flush()
processes = []
def processing():
print "processing"
global processes
global log_file
global err_file
for i in range(0, 5):
p = Popen(['java', '-jar', 'C:\\Users\\two\\Documents\\test.jar'], stdout=log_file, stderr=err_file) # something long running
processes.append(p)
print len(processes) # returns 5
def waiting_service():
name = multiprocessing.current_process().name
print name, 'Starting'
global processes
print len(processes) # returns 0
time.sleep(2)
for i in range(0, 5):
processes[i].terminate()
print name, 'Exiting'
if __name__ == '__main__':
processing()
service = multiprocessing.Process(name='waiting_service', target=waiting_service)
service.start()
You should be using synchronization primitives.
Possibly you want to set an Event that's triggered after a while by the main (parent) process.
You may also want to wait for the processes to actually complete and join them (like you would a thread).
If you have many similar tasks, you can use a processing pool like multiprocessing.Pool.
Here is a small example of how it's done:
import multiprocessing
import time
kill_event = multiprocessing.Event()
def work(_id):
while not kill_event.is_set():
print "%d is doing stuff" % _id
time.sleep(1)
print "%d quit" % _id
def spawn_processes():
processes = []
# spawn 10 processes
for i in xrange(10):
# spawn process
process = multiprocessing.Process(target=work, args=(i,))
processes.append(process)
process.start()
time.sleep(1)
# kill all processes by setting the kill event
kill_event.set()
# wait for all processes to complete
for process in processes:
process.join()
print "done!"
spawn_processes()
The whole problem was in Windows' Python. Python for Windows is blocking global variables to be seen in functions. I've switched to linux and my script works OK.
Special thanks to #rchang for his comment:
When I tested it, in both cases the print statement came up with 5. Perhaps we have a version mismatch in some way? I tested it with Python 2.7.6 on Linux kernel 3.13.0 (Mint distribution).
I've gone through (this SO thread)[ Synchronization issue using Python's multiprocessing module but it doesnt provide the answer.
The following code :-
rom multiprocessing import Process, Lock
def f(l, i):
l.acquire()
print 'hello world', i
l.release()
# do something else
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
How do I get the processes to execute in order.? I want to hold up a lock for a few seconds and then release it and thereby moving forward with the P1 and P2 into the lock, and then P2 moving forward and P3 exceuting that lock. How would I get the processes to execute in order.?
It sounds like you just want to delay the start of each successive process. If that's the case, you can use a multiprocessing.Event to delay starting the next child in the main process. Just pass the event to the child, and have the child set the Event when its done doing whatever should run prior to starting the next child. The main process can wait on that Event, and once it's signalled, clear it and start the next child.
from multiprocessing import Process, Event
def f(e, i):
print 'hello world', i
e.set()
# do something else
if __name__ == '__main__':
event = Event()
for num in range(10):
p = Process(target=f, args=(event, num))
p.start()
event.wait()
event.clear()
this is not the purpose of locks. Your code architecture is bad for your use case. I think you should refactor your code to this:
from multiprocessing import Process
def f(i):
# do something here
if __name__ == '__main__':
for num in range(10):
print 'hello world', num
Process(target=f, args=(num,)).start()
in this case it will print in order and then will do the remaining part asynchronously