I can't seem to figure out why my results are not appending while using the multiprocessing package.
I've looked at many similar questions but can't seem to figure out what I'm doing wrong. This my first attempt at multiprocessing (as you might be able to tell) so I don't quite understand all the jargon in the documentation which might be part of the problem
Running this in PyCharm prints an empty list instead of the desired list of row sums.
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, row, callback=results.append)
toc = timeit.default_timer() # start timer
print(toc - tic)
print(results)
EDIT: Closing and joining pool, then printing results within the if name==main block results in the following error being raised repeatedly until I manually stop execution:
RuntimeError:
An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
Code to reproduce error:
import numpy as np
from multiprocessing import Pool, freeze_support
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, (row,), callback=results.append)
pool.close()
pool.join()
print(results)
toc = timeit.default_timer() # end timer
print(toc - tic)
I think this would be a more correct way:
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
if __name__ == '__main__':
with Pool(processes=3) as pool:
for row in data:
results = pool.apply_async(add_these, (row,))
try:
print(results.get(timeout=1))
except TimeoutError:
print("Multiprocessing Timeout")
Related
Is it possible to have progress bar with map_async from multiprocessing:
toy example:
from multiprocessing import Pool
import tqdm
def f(x):
print(x)
return x*x
n_job = 4
with Pool(processes=n_job) as pool:
results = pool.map_async(f, range(10)).get()
print(results)
something like this:
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.imap(f, range(10)),
total=10):
data.append(d)
There are a couple of ways of achieving what you want that I can think of:
Use apply_async with a callback argument to update the progress bar as each result becomes available.
Use imap and as you iterate the results you can update the progress bar.
There is a slight problem with imap in that the results must be returned in task-submission order, which is of course what you want. But that order does not necessarily reflect the order in which the submitted tasks complete so the progress bar is not necessarily getting updated as frequently as it otherwise might. But I will show that solution first since it is the simplest and probably adequate:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
pool_size = 4
results = []
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
for result in pool.imap(f, range(10)):
results.append(result)
pbar.update()
print(results)
The solution that uses apply_async:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
def my_callback(_):
# We don't care about the actual result.
# Just update the progress bar:
pbar.update()
pool_size = 4
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
async_results = [pool.apply_async(f, args=(x,), callback=my_callback) for x in range(10)]
results = [async_result.get() for async_result in async_results]
print(results)
I think this is it:
from multiprocessing import Pool
import tqdm
def f(x):
return x*x
n_job = 4
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.map_async(f, range(10)).get(),
total=10):
data.append(d)
print(data)
I'm trying to learn how to use threading and specifically concurrent.futures.ThreadPoolExecutor this is because I need to return a numpy.array from a function I want to run concurrently.
The end goal is to have one process running a video loop of an application, while another process does object detection and GUI interactions. The result() keyword from the concurrent.futures library allows me to do this.
The issue is my code runs once, and then seems to lock up. I'm actually unsure what happens as when I step through it in the debugger it runs once, then the debugger goes blank and I literally cannot step through and no error is thrown.
The code appears to lock up on the line: notepadWindow = pygetwindow.getWindowsWithTitle('Notepad')[0]
I get exactly one loop, the print statement prints once the loop restarts and then it halts at pygetwindow
I don't know much about the GIL but I have tried using the max_workers=1 argument on ThreadPoolExecutor() which doesn't make a difference either way and I was under the impression concurrent.futures allows me to bypass the lock.
How do I run videoLoop as a single thread making sure to return DetectionWindow every iteration?
import cv2 as cv
import numpy as np
import concurrent.futures
from PIL import ImageGrab
import pygetwindow
def videoLoop():
notepadWindow = pygetwindow.getWindowsWithTitle('Notepad')[0]
x1 = notepadWindow.left
y1 = notepadWindow.top
height = notepadWindow.height
width = notepadWindow.width
x2 = x1 + width
y2 = y1 + height
haystack_img = ImageGrab.grab(bbox=(x1, y1, x2, y2))
haystack_img_np = np.array(haystack_img)
DetectionWindow= cv.cvtColor(haystack_img_np, cv.COLOR_BGR2GRAY)
return DetectionWindow
def f1():
with concurrent.futures.ThreadPoolExecutor() as executor:
f1 = executor.submit(videoLoop)
notepadWindow = f1.result()
cv.imshow("Video Loop", notepadWindow)
cv.waitKey(1)
print(f1.result())
while True:
f1()
A ThreadPoolExecutor won't help you an awful lot here, if you want a continuous stream of frames.
Here's a reworking of your code that uses a regular old threading.Thread and puts frames (and their capture timestamps, since this is asynchronous) in a queue.Queue you can then read in another (or the main) thread.
The thread has an otherwise infinite loop that can be stopped by setting the thread's exit_signal.
(I didn't test this, since I'm presently on a Mac, so there may be typos or other problems.)
import queue
import time
import cv2 as cv
import numpy as np
import threading
from PIL import ImageGrab
import pygetwindow
def do_capture():
notepadWindow = pygetwindow.getWindowsWithTitle("Notepad")[0]
x1 = notepadWindow.left
y1 = notepadWindow.top
height = notepadWindow.height
width = notepadWindow.width
x2 = x1 + width
y2 = y1 + height
haystack_img = ImageGrab.grab(bbox=(x1, y1, x2, y2))
return cv.cvtColor(np.array(haystack_img), cv.COLOR_BGR2GRAY)
class VideoCaptureThread(threading.Thread):
def __init__(self, result_queue: queue.Queue) -> None:
super().__init__()
self.exit_signal = threading.Event()
self.result_queue = result_queue
def run(self) -> None:
while not self.exit_signal.wait(0.05):
try:
result = do_capture()
self.result_queue.put((time.time(), result))
except Exception as exc:
print(f"Failed capture: {exc}")
def process_frames(result_queue: queue.Queue):
start_time = time.time()
while time.time() - start_time < 5: # Run for five seconds
frame = result_queue.get()
print(frame)
def main():
result_queue = queue.Queue()
thread = VideoCaptureThread(result_queue=result_queue)
thread.start()
process_frames(result_queue)
thread.exit_signal.set()
thread.join()
if __name__ == "__main__":
main()
I'm writing some bogus practice code so I can implement the ideas once I have a better idea of what I'm doing. The code is designed for multiprocessing in order to reduce the runtime by splitting the stack of three arrays into several pieces horizontally that are then executed in parallel via map_async. However, the code seems to hang at the first .recv() method of a pipe, even though the Pipe() object is fully defined, and I'm not sure why it's doing it. If I manually define each Pipe() object then the code works just fine, but as soon as I iterate the process the code hangs at sec1 = pipes[0][0].recv(). How can I fix this?
from multiprocessing import Process, Pipe
import multiprocessing as mp
import numpy as np
import math
num_sections = 4
pipes_send = [None]*num_sections
pipes_recv = [None]*num_sections
pipes = zip(pipes_recv,pipes_send)
for i in range(num_sections):
pipes[i] = list(pipes[i])
for i in range(num_sections):
pipes[i][0],pipes[i][1] = Pipe()
def f(sec_num):
for plane in range(3):
hist_sec[sec_num][plane] += rand_sec[sec_num][plane]
if sec_num == 0:
pipes[0][1].send(hist_sec[sec_num])
pipes[0][1].close()
if sec_num == 1:
pipes[1][1].send(hist_sec[sec_num])
pipes[1][1].close()
if sec_num == 2:
pipes[2][1].send(hist_sec[sec_num])
pipes[2][1].close()
if sec_num == 3:
pipes[3][1].send(hist_sec[sec_num])
pipes[3][1].close()
hist = np.zeros((3,512,512))
hist_sec = []
randmat = np.random.rand(3,512,512)
rand_sec = []
for plane in range(3):
hist_div = np.array_split(hist[plane], num_sections)
hist_sec.append(hist_div)
randmatsplit = np.array_split(randmat[plane], num_sections)
rand_sec.append(randmatsplit)
hist_sec = np.rollaxis(np.asarray(hist_sec),1,0)
rand_sec = np.rollaxis(np.asarray(rand_sec),1,0)
if __name__ == '__main__':
pool = mp.Pool(num_sections)
args = np.arange(num_sections)
pool.map_async(f, args, chunksize=1)
sec1 = pipes[0][0].recv()
sec2 = pipes[1][0].recv()
sec3 = pipes[2][0].recv()
sec4 = pipes[3][0].recv()
for plane in range(3):
hist_plane = np.concatenate((sec1[plane],sec2[plane],sec3[plane],sec4[plane]),axis=0)
hist_full.append(hist_plane)
pool.close()
pool.join()
Is it possible to create a Mayavi visualization that is updated on a timed bases rather than through Trait events?
I have a visualization that I need to update continually, but the data I am updating is coming from an external source (I.E. not an event from a user input from the graphical interface).
In the mean time, I need to be running a separate thread - so the Mayavi visualization can't control the main loop.
Can this be done? And if so How??
Any help would be very greatly appreciated.
Some dummy code for how I'm trying to tackle this is below:
import numpy
from mayavi.sources.array_source import ArraySource
from pyface.api import GUI
from mayavi.modules.api import Surface
from mayavi.api import Engine
import threading
import time
# Class runs a given function on a given thread at a given scan time
class TimedThread(threading.Thread):
def __init__(self, thread, scan_time, funct, *funct_args):
threading.Thread.__init__(self)
# Thread for the function to operate in
self.thread = thread
# Defines the scan time the function is to be run at
self.scan_time = scan_time
# Function to be run
self.run_function = funct
# Function arguments
self.funct_args = funct_args
def run(self):
while True:
# Locks the relevant thread
self.thread.acquire()
# Begins timer for elapsed time calculation
start_time = time.time()
# Runs the function that was passed to the thread
self.run_function(*self.funct_args)
# Wakes up relevant threads to listen for the thread release
self.thread.notify_all()
# Releases thread
self.thread.release()
# Calculates the elapsed process time & sleeps for the remainder of the scan time
end_time = time.time()
elapsed_time = end_time - start_time
sleep_time = self.scan_time - elapsed_time
if sleep_time > 0:
time.sleep(sleep_time)
else:
print 'Process time exceeds scan time'
# Function to update the visualisation
def update_visualisation(source):
print("Updating Visualization...")
# Pretend the data is being updated externally
x = numpy.array([0, numpy.random.rand()])
y = z = x
data = [x, y, z]
source.scalar_data = data
GUI.invoke_later(source.update)
# Function to run the visualisation
def run_main():
print 'Running Main Controller'
if __name__ == '__main__':
c = threading.Condition()
# Create a new Engine for Mayavi and start it
engine = Engine()
engine.start()
# Create a new Scene
engine.new_scene()
# Create the data
x = numpy.linspace(0, 10, 2)
y = z = x
data = [x, y, z]
# Create a new Source, map the data to the source and add it to the Engine
src = ArraySource()
src.scalar_data = data
engine.add_source(src)
# Create a Module
surf = Surface()
# Add the Module to the Engine
engine.add_module(surf)
# Create timed thread classes
visualisation_thread = TimedThread(c, 2.0, update_visualisation, src)
main_thread = TimedThread(c, 1.0, run_main)
# Start & join the threads
main_thread.start()
visualisation_thread.start()
main_thread.join()
visualisation_thread.join()
Found solution in the following link:
Animating a mayavi points3d plot
Solved by using the #mlab.animator to call the update function and using the yield command to release the animation to allow for user manipulation.
Solution below:
import numpy as np
import threading
import time
from mayavi import mlab
from mayavi.api import Engine
# Class runs a given function on a given thread at a given scan time
class SafeTimedThread(threading.Thread):
def __init__(self, thread_condition, scan_time, funct, *funct_args):
threading.Thread.__init__(self)
# Thread condition for the function to operate with
self.tc = thread_condition
# Defines the scan time the function is to be run at
self.scan_time = scan_time
# Function to be run
self.run_function = funct
# Function arguments
self.funct_args = funct_args
def run(self):
while True:
# Locks the relevant thread
self.tc.acquire()
# Begins timer for elapsed time calculation
start_time = time.time()
# Runs the function that was passed to the thread
self.run_function(*self.funct_args)
# Wakes up relevant threads to listen for the thread release
self.tc.notify_all()
# Releases thread
self.tc.release()
# Calculates the elapsed process time & sleep for the remainder of the scan time
end_time = time.time()
elapsed_time = end_time - start_time
sleep_time = self.scan_time - elapsed_time
if sleep_time > 0:
time.sleep(sleep_time)
else:
print 'Process time exceeds scan time'
# Function to run the main controller
def run_main():
print 'Running Main Controller'
def init_vis():
# Creates a new Engine, starts it and creates a new scene
engine = Engine()
engine.start()
engine.new_scene()
# Initialise Plot
data = np.random.random((3, 2))
x = data[0]
y = data[1]
z = data[2]
drawing = mlab.plot3d(x, y, z, np.ones_like(x))
return drawing
#mlab.animate(delay=500, ui=False)
def update_visualisation(drawing):
while True:
print ('Updating Visualisation')
# Pretend to receive data from external source
data = np.random.random((3, 2))
x = data[0]
y = data[1]
z = data[2]
drawing.mlab_source.set(x=x, y=y, z=z)
yield
if __name__ == '__main__':
# Create Condition for Safe Threading
c = threading.Condition()
# Create display window
dwg = init_vis()
# Create safe timed thread for main thread and start
main_thread = SafeTimedThread(c, 1.0, run_main).start()
# Update using mlab animator
vis_thread = update_visualisation(dwg)
mlab.show()
I have a code like this
x = 3;
y = 3;
z = 10;
ar = np.zeros((x,y,z))
from multiprocessing import Process, Pool
para = []
process = []
def local_func(section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
pool = Pool(1)
run_list = range(0,10)
list_of_results = pool.map(local_func, run_list)
print ar
The value in ar was not changed with multithreading, what might be wrong?
thanks
You're using multiple processes here, not multiple threads. Because of that, each instance of local_func gets its own separate copy of ar. You can use a custom Manager to create a shared numpy array, which you can pass to each child process and get the results you expect:
import numpy as np
from functools import partial
from multiprocessing import Process, Pool
import multiprocessing.managers
x = 3;
y = 3;
z = 10;
class MyManager(multiprocessing.managers.BaseManager):
pass
MyManager.register('np_zeros', np.zeros, multiprocessing.managers.ArrayProxy)
para = []
process = []
def local_func(ar, section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
if __name__ == "__main__":
m = MyManager()
m.start()
ar = m.np_zeros((x,y,z))
pool = Pool(1)
run_list = range(0,10)
func = partial(local_func, ar)
list_of_results = pool.map(func, run_list)
print ar
Well, multi-threading and multi-processing are different things.
With multi-threading threads share access to the same array.
With multi-processing each process has its own copy of the array.
multiprocessing.Pool is a process pool, not a thread pool.
If you want thread pool, use multiprocess.pool.ThreadPool:
Replace:
from multiprocessing import Pool
with:
from multiprocessing.pool import ThreadPool as Pool