Currently I have 3 Process A,B,C created under main process. However, I would like to start B and C in Process A. Is that possible?
process.py
from multiprocessing import Process
procs = {}
import time
def test():
print(procs)
procs['B'].start()
procs['C'].start()
time.sleep(8)
procs['B'].terminate()
procs['C'].termiante()
procs['B'].join()
procs['C'].join()
def B():
while True:
print('+'*10)
time.sleep(1)
def C():
while True:
print('-'*10)
time.sleep(1)
procs['A'] = Process(target = test)
procs['B'] = Process(target = B)
procs['C'] = Process(target = C)
main.py
from process import *
print(procs)
procs['A'].start()
procs['A'].join()
And I got error
AssertionError: can only start a process object created by current process
Are there any alternative way to start process B and C in A? Or let A send signal to ask master process start B and C
I would recommend using Event objects to do the synchronization. They permit to trigger some actions across the processes. For instance
from multiprocessing import Process, Event
import time
procs = {}
def test():
print(procs)
# Will let the main process know that it needs
# to start the subprocesses
procs['B'][1].set()
procs['C'][1].set()
time.sleep(3)
# This will trigger the shutdown of the subprocess
# This is cleaner than using terminate as it allows
# you to clean up the processes if needed.
procs['B'][1].set()
procs['C'][1].set()
def B():
# Event will be set once again when this process
# needs to finish
event = procs["B"][1]
event.clear()
while not event.is_set():
print('+' * 10)
time.sleep(1)
def C():
# Event will be set once again when this process
# needs to finish
event = procs["C"][1]
event.clear()
while not event.is_set():
print('-' * 10)
time.sleep(1)
if __name__ == '__main__':
procs['A'] = (Process(target=test), None)
procs['B'] = (Process(target=B), Event())
procs['C'] = (Process(target=C), Event())
procs['A'][0].start()
# Wait for events to be set before starting the subprocess
procs['B'][1].wait()
procs['B'][0].start()
procs['C'][1].wait()
procs['C'][0].start()
# Join all the subprocess in the process that created them.
procs['A'][0].join()
procs['B'][0].join()
procs['C'][0].join()
note that this code is not really clean. Only one event is needed in this case. But you should get the main idea.
Also, the process A is not needed anymore, you could consider using callbacks instead. See for instance the concurrent.futures module if you want to chain some async actions.
Related
I'm trying to run multiple functions with multiprocessing and running into a bit of wall. I want to run an initial function to completion on all processes/inputs and then run 2 or 3 other functions in parallel on the output of the first function. I've already got my search function. the code is there for the sake of explanation.
I'm not sure how to continue the code from here. I've put my initial attempt below. I want all instances of process1 to finish and then process2 and process3 to start in parallel.
Code is something like:
from multiprocessing import Pool
def init(*args):
global working_dir
[working_dir] = args
def process1(InFile):
python.DoStuffWith.InFile
Output.save.in(working_dir)
def process2(queue):
inputfiles2 = []
python.searchfunction.appendOutputof.process1.to.inputfiles2
python.DoStuffWith.process1.Output
python.Output
def process3(queue):
inputfiles2 = []
python.searchfunction.appendOutputof.process1.to.inputfiles2
python.DoStuffWith.process1.Output
python.Output
def MCprocess():
working_dir = input("enter input: ")
inputfiles1 = []
python.searchfunction.appendfilesin.working_dir.to.inputfiles1
with Pool(initializer=init, initargs=[working_dir], processes=16) as pool:
pool.map(process1, inputfiles1)
pool.close()
#Editted Code
queue = multiprocessing.Queue
queue.put(working_dir)
queue.put(working_dir)
ProcessTwo = multiprocessing.Process(target=process2, args=(queue,))
ProcessThree = multiprocessing.Process(target=process3, args=(queue,))
ProcessTwo.start()
ProcessThree.start()
#OLD CODE
#with Pool(initializer=init, initargs=[working_dir], processes=16) as pool:
#pool.map_async(process2)
#pool.map_async(process3)
if __name__ == '__main__':
MCprocess()
Your best bet is to use an Event. The first process calls event.set() when it is done to indicate that the event has happened. The waiting processes use event.wait() or one of its variants to wait to be awoken that the event has been set.
I want to do a infinite loop function.
Here is my code
def do_request():
# my code here
print(result)
while True:
do_request()
When use while True to do this, it's a little slow, so I want to use a thread pool to concurrently execute the function do_request(). How to do this ?
Just like use ab (Apache Bench) to test HTTP server.
Finally, I've solved this problem. I use a variable to limit the thread number.
Here is my final code, solved my problem.
import threading
import time
thread_num = 0
lock = threading.Lock()
def do_request():
global thread_num
# -------------
# my code here
# -------------
with lock:
thread_num -= 1
while True:
if thread_num <= 50:
with lock:
thread_num += 1
t = threading.Thread(target=do_request)
t.start()
else:
time.sleep(0.01)
Thanks for all replies.
You can use threading in Python to implement this.
Can be something similar to this (when using two extra threads only):
import threading
# define threads
task1 = threading.Thread(target = do_request)
task2 = threading.Thread(target = do_request)
# start both threads
task1.start()
task2.start()
# wait for threads to complete
task1.join()
task2.join()
Basically, you start as many threads as you need (make sure you don't get too many, so your system can handle it), then you .join() them to wait for tasks to complete.
Or you can get fancier with multiprocessing Python module.
Try the following code:
import multiprocessing as mp
import time
def do_request():
while(True):
print('I\'m making requests')
time.sleep(0.5)
p = mp.Process(target=do_request)
p.start()
for ii in range(10):
print 'I\'m also doing other things though'
time.sleep(0.7)
print 'Now it is time to kill the service thread'
p.terminate()
The main thread stars a service thread that does the request and goes on until it has to, and then it finishes up the service thread.
Maybe you can use the concurrent.futures.ThreadPoolExecutor
from concurrent.futures import ThreadPoolExecutor
import time
def wait_on_b(hello):
time.sleep(1)
print(hello) # b will never complete because it is waiting on a.
return 5
def wait_on_a():
time.sleep(1)
print(a.result()) # a will never complete because it is waiting on b.
return 6
executor = ThreadPoolExecutor(max_workers=2)
a = executor.submit(wait_on_b, 3)
b = executor.submit(wait_on_a)
How about this?
from threading import Thread, Event
class WorkerThread(Thread):
def __init__(self, logger, func):
Thread.__init__(self)
self.stop_event = Event()
self.logger = logger
self.func = func
def run(self):
self.logger("Going to start the infinite loop...")
#Your code
self.func()
concur_task = WorkerThread(logger, func = do_request)
concur_task.start()
To end this thread...
concur_task.stop_event.set()
concur_task.join(10) #or any value you like
I don't know why I'm having such a problem with this, basically, I want to have a Queue that is constantly running during the program called "Worker" this then works, however, every 10 seconds or so.. Another method called "Process" comes in and processes the data. Let's assume the following, data is captured every 10 seconds.. (0, 1, 2, 3, ..... n) and then the "Proces" function receives this, processes the data, ends, and then the "Worker" goes back to work and does their job until the program has ended.
I have the following code:
import multiprocessing as mp
import time
DELAY_SIZE = 10
def Worker(q):
print "I'm working..."
def Process(q):
print "I'm processing.."
queue = mp.Queue(maxsize=DELAY_SIZE)
p = mp.Process(target=Worker, args=(queue,))
p.start()
while True:
d = queue.get()
time.sleep(10)
Process()
In this example, it would look like the following:
I'm working...
I'm working...
I'm working...
...
...
...
I'm working...
I'm processing...
I'm processing...
I'm processing...
...
...
I'm working..
I'm working..
Any ideas?
Here is an alternative way using threads:
import threading
import Queue
import time
class Worker(threading.Thread):
def __init__(self, q):
threading.Thread.__init__(self)
self._q = q
def run(self):
# here, worker does its job
# results are pushed to the shared queue
while True:
print 'I am working'
time.sleep(1)
result = time.time() # just an example
self._q.put(result)
def process(q):
while True:
if q.empty():
time.sleep(10)
print 'I am processing'
worker_result = q.get()
# do whatever you want with the result...
print " ", worker_result
if __name__ == '__main__':
shared_queue = Queue.Queue()
worker = Worker(shared_queue)
worker.start()
process(shared_queue)
I have a simple example script constructed that defines three separate processes using multiprocessing in python. My objective is to have one parent thread that spawns two smaller threads that will collect and process data.
Currently, my implementation looks like this:
from Queue import Queue,Empty
from multiprocessing import Process
import time
import hashlib
class FillQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
i = 0
while i is not 5:
print 'putting'
self.q.put('foo')
i+=1
self.q.put('|STOP|')
class ConsumeQueue(Process):
def __init__(self,q):
Process.__init__(self)
self.q = q
def run(self):
print 'Consume'
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
class Ripper(Process):
q = Queue()
def __init__(self):
self.fq = FillQueue(self.q)
self.cq = ConsumeQueue(self.q)
self.fq.daemon = True
self.cq.daemon = True
def run(self):
try:
self.fq.start()
self.cq.start()
except KeyboardInterrupt:
print 'exit'
if __name__ == '__main__':
r = Ripper()
r.start()
As it runs presently, the output from the script on CLI looks like this:
putting
putting
putting
putting
putting
Consume
foo
foo
foo
foo
foo
|STOP|
done
Obviously, the way I am starting my two threads is blocking, since the consumer doesn't even begin to process the items in the queue until the filler finishes adding items.
How should I rewrite this to make both threads begin immediately and not block, so the consumer will simply pass to the Empty except block while there is no work to process, but will exit completely when it receives the stop message?
EDIT: typo, had the start and run methods mixed up
You seem to be starting multiple processes using multiprocessing.Process.
However, you are using Queue.Queue which is only threadsafe, and not designed to be used by multiple processes.
shevek's answer is valid as well, but as a start, you should replace Queue.Queue with multiprocessing.Queue.
try this:
from Queue import Empty
from multiprocessing import Process, Queue
import time
import hashlib
class FillQueue(object):
def __init__(self, q):
self.q = q
def run(self):
i = 0
while i < 5:
print 'putting'
self.q.put('foo %d' % i )
i+=1
time.sleep(.5)
self.q.put('|STOP|')
class ConsumeQueue(object):
def __init__(self, q):
self.q = q
def run(self):
while True:
try:
value = self.q.get(False)
print value
if value == '|STOP|':
print 'done'
break;
except Empty:
print 'Nothing to process atm'
time.sleep(.2)
if __name__ == '__main__':
q = Queue()
f = FillQueue(q)
c = ConsumeQueue(q)
p1 = Process(target=f.run)
p1.start()
p2 = Process(target=c.run)
p2.start()
p1.join()
p2.join()
I think your program works fine. The CPU processes only one thing at a time, for a short time. However, the time required to put all your stuff in the queue is very short. So there is no reason that the filler cannot do this in one time slice.
If you add some delays in the filler, I think you should see that it actually works as you expect.
This may have been asked in a similar context but I was unable to find an answer after about 20 minutes of searching, so I will ask.
I have written a Python script (lets say: scriptA.py) and a script (lets say scriptB.py)
In scriptB I want to call scriptA multiple times with different arguments, each time takes about an hour to run, (its a huge script, does lots of stuff.. don't worry about it) and I want to be able to run the scriptA with all the different arguments simultaneously, but I need to wait till ALL of them are done before continuing; my code:
import subprocess
#setup
do_setup()
#run scriptA
subprocess.call(scriptA + argumentsA)
subprocess.call(scriptA + argumentsB)
subprocess.call(scriptA + argumentsC)
#finish
do_finish()
I want to do run all the subprocess.call() at the same time, and then wait till they are all done, how should I do this?
I tried to use threading like the example here:
from threading import Thread
import subprocess
def call_script(args)
subprocess.call(args)
#run scriptA
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
But I do not think this is right.
How do I know they have all finished running before going to my do_finish()?
Put the threads in a list and then use the Join method
threads = []
t = Thread(...)
threads.append(t)
...repeat as often as necessary...
# Start all threads
for x in threads:
x.start()
# Wait for all of them to finish
for x in threads:
x.join()
You need to use join method of Thread object in the end of the script.
t1 = Thread(target=call_script, args=(scriptA + argumentsA))
t2 = Thread(target=call_script, args=(scriptA + argumentsB))
t3 = Thread(target=call_script, args=(scriptA + argumentsC))
t1.start()
t2.start()
t3.start()
t1.join()
t2.join()
t3.join()
Thus the main thread will wait till t1, t2 and t3 finish execution.
In Python3, since Python 3.2 there is a new approach to reach the same result, that I personally prefer to the traditional thread creation/start/join, package concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html
Using a ThreadPoolExecutor the code would be:
from concurrent.futures.thread import ThreadPoolExecutor
import time
def call_script(ordinal, arg):
print('Thread', ordinal, 'argument:', arg)
time.sleep(2)
print('Thread', ordinal, 'Finished')
args = ['argumentsA', 'argumentsB', 'argumentsC']
with ThreadPoolExecutor(max_workers=2) as executor:
ordinal = 1
for arg in args:
executor.submit(call_script, ordinal, arg)
ordinal += 1
print('All tasks has been finished')
The output of the previous code is something like:
Thread 1 argument: argumentsA
Thread 2 argument: argumentsB
Thread 1 Finished
Thread 2 Finished
Thread 3 argument: argumentsC
Thread 3 Finished
All tasks has been finished
One of the advantages is that you can control the throughput setting the max concurrent workers.
To use multiprocessing instead, you can use ProcessPoolExecutor.
I prefer using list comprehension based on an input list:
inputs = [scriptA + argumentsA, scriptA + argumentsB, ...]
threads = [Thread(target=call_script, args=(i)) for i in inputs]
[t.start() for t in threads]
[t.join() for t in threads]
You can have class something like below from which you can add 'n' number of functions or console_scripts you want to execute in parallel passion and start the execution and wait for all jobs to complete..
from multiprocessing import Process
class ProcessParallel(object):
"""
To Process the functions parallely
"""
def __init__(self, *jobs):
"""
"""
self.jobs = jobs
self.processes = []
def fork_processes(self):
"""
Creates the process objects for given function deligates
"""
for job in self.jobs:
proc = Process(target=job)
self.processes.append(proc)
def start_all(self):
"""
Starts the functions process all together.
"""
for proc in self.processes:
proc.start()
def join_all(self):
"""
Waits untill all the functions executed.
"""
for proc in self.processes:
proc.join()
def two_sum(a=2, b=2):
return a + b
def multiply(a=2, b=2):
return a * b
#How to run:
if __name__ == '__main__':
#note: two_sum, multiply can be replace with any python console scripts which
#you wanted to run parallel..
procs = ProcessParallel(two_sum, multiply)
#Add all the process in list
procs.fork_processes()
#starts process execution
procs.start_all()
#wait until all the process got executed
procs.join_all()
I just came across the same problem where I needed to wait for all the threads which were created using the for loop.I just tried out the following piece of code.It may not be the perfect solution but I thought it would be a simple solution to test:
for t in threading.enumerate():
try:
t.join()
except RuntimeError as err:
if 'cannot join current thread' in err:
continue
else:
raise
From the threading module documentation
There is a “main thread” object; this corresponds to the initial
thread of control in the Python program. It is not a daemon thread.
There is the possibility that “dummy thread objects” are created.
These are thread objects corresponding to “alien threads”, which are
threads of control started outside the threading module, such as
directly from C code. Dummy thread objects have limited functionality;
they are always considered alive and daemonic, and cannot be join()ed.
They are never deleted, since it is impossible to detect the
termination of alien threads.
So, to catch those two cases when you are not interested in keeping a list of the threads you create:
import threading as thrd
def alter_data(data, index):
data[index] *= 2
data = [0, 2, 6, 20]
for i, value in enumerate(data):
thrd.Thread(target=alter_data, args=[data, i]).start()
for thread in thrd.enumerate():
if thread.daemon:
continue
try:
thread.join()
except RuntimeError as err:
if 'cannot join current thread' in err.args[0]:
# catchs main thread
continue
else:
raise
Whereupon:
>>> print(data)
[0, 4, 12, 40]
Maybe, something like
for t in threading.enumerate():
if t.daemon:
t.join()
using only join can result in false-possitive interaction with thread. Like said in docs :
When the timeout argument is present and not None, it should be a
floating point number specifying a timeout for the operation in
seconds (or fractions thereof). As join() always returns None, you
must call isAlive() after join() to decide whether a timeout happened
– if the thread is still alive, the join() call timed out.
and illustrative piece of code:
threads = []
for name in some_data:
new = threading.Thread(
target=self.some_func,
args=(name,)
)
threads.append(new)
new.start()
over_threads = iter(threads)
curr_th = next(over_threads)
while True:
curr_th.join()
if curr_th.is_alive():
continue
try:
curr_th = next(over_threads)
except StopIteration:
break