Multiprocessing - pass shared Queue and unique number for each worker - python

I can't quite find solution to a code where I pass to each worker Shared Queue but also a number for each worker.
My code:
The idea is to create several channels for putting audio songs. Each channels must be unique. So If a song arrives I put it to channel which is available
from multiprocessing import Pool,Queue
from functools import partial
import pygame
queue = Queue()
def play_song(shared_queue, chnl):
channel = pygame.mixer.Channel(chnl)
while True:
sound_name = shared_queue.get()
channel.play(pygame.mixer.Sound(sound_name))
if __name__ == "__main__":
channels= [0,1, 2, 3, 4]
func = partial(play_song,queue)
p = Pool(5,func, (channels,))
This code of course doesn't return any error, because its multiprocessing, but the problem is that channels is passed to play_song as whole list instead of being mapped to all workers.
So basically instead of each worker initialize channel like this:
channel = pygame.mixer.Channel(0) # each worker would have number from list so 1,2,3,4
I am getting this
channel = pygame.mixer.Channel([0,1,2,3,4]) # for each worker
I tried playing with partial function, but unsuccessfully.
I was successful with pool.map function, but while I could pass individual numbers from channels list, I couldn't share Queue among workers

Eventually I found solution to my Pygame problem that does not require threads or multiprocessing.
Background to the problem:
I was working with Pyaudio and since it is quite lowlevel api to audio, I had problems with mixing several sounds at the same time and in general. The reasons are:
1) It is not easy(maybe imposible) to start several streams at the same time or feed those streams at the same time (looks like hardware problem)
2) Based on 1) I tried different attitude - have one stream where audio waves from different sounds are added up before entering stream - that works but its unreliable as adding up audiowaves is not really compatible - adding to much waves results in 'sound cracking' as the amplitudes are too high.
Based on 1) and 2) I wanted to try run streams in different processes, therefore this question.
Pygame solution (single processed):
for sound_file in sound_files:
availible_channel = pygame.mixer.find_channel() #if there are 8 channels, it can play 8 sounds at the same time
availible_channel.play(sound_file )
if sound_files are already loaded, this gives near simultaneous results.
Multiprocessing solution
Thanks to Darkonaut who pointed out multiprocessing method, I manage to answer my initial question on multiprocessing, which I think is already answered on stackoverflow, but I will include it.
Example is not finished, because I didnt use it at the end, but it answers my initial requirement on the processes with shared queue but with different parameters
import multiprocessing as mp
shared_queue = mp.Queue()
def channel(que,channel_num):
que.put(channel_num)
if __name__ == '__main__':
processes = [mp.Process(target=channel, args=(shared_queue, channel_num)) for channel_num in range(8)]
for p in processes:
p.start()
for i in range(8):
print(shared_queue.get())
for p in processes:
p.join()

Related

How can I remove duplicates while multiprocesscing?

I am very new to multiprocessing and I am only using it to find an image on the screen, the problem is the code produces duplicates which slow it down I have tried using a "not in" statement to only place proc into processes if it is not already in it, but this did not work. Any help or optimization would be welcome I have no idea what I am doing as this is just a personal project to learn multiprocessing.
from multiprocessing.context import Process
import pyautogui as auto
screenWidth, screenHeight = auto.size()
currentMouseX, currentMouseY = auto.position()
def bot(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
def bot2(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
def bot3(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
if __name__ == "__main__":
processes = []
for t in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot, args=(t,))
processes.append(proc)
proc.start()
for z in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot2, args=(z,))
processes.append(proc)
proc.start()
for x in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot3, args=(x,))
processes.append(proc)
proc.start()
for p in processes:
p.join()
Unless my eyes deceive me, you have three functions bot, bot2 and bot3 that appear to be identical. You have to ask yourself why you need three identical functions that differ only in a name. I certainly don't have an answer.
Presumanly auto.locateAllOnScreen returns the locations of all occurrences of "dot.png" on your screen and you would like to print out information on each occurrence in parallel. Your main process is iterating all of these occurrences 3 times and for each occurrence staring a new process. Then each process is totally ignoring the occurrence argument, aim, that is being passed to it and instead iterating all the occurrences itself. So if there were 5 occurrences on the screen you would be creating 3 * 5 = 15 processes and each process would be printing 5 lines of output (one for each occurrence) for a total of 15 * 5 = 75 lines of output when in reality you should only be getting 5 lines of output if you were doing this correctly (I am ignoring that there is a while True: loop where all the output is then repeated). You are also potentially creating more processes than the number of CPU cores you have on your computer and so they would not truly be running in parallel on the assumption that the bot function(s) are CPU-intensive, which may not be the case.
I am not sure whether this problem is a candidate for multiprocessing since there is a fair amount of overhead just to create processes and to pass arguments and results to and from one process to another. So you might not gain any improvement in performance. But if the idea is to see how you would solve this using multiprocessing, then I would suggest that if you do not know in advance how many elements the call to auto.locateAllOnScreen might return and recognizing that there is no point in creating more processes than the number of processors you actually have, then it is probably best to use a multiprocessing pool of fixed size.
What you want to do is have your worker function bot (and you only need one of these) be passed a single occurrence that it will process. You then create a pool of processes equal to the smaller value of the size of the number of CPUs you have and the number of tasks you actually have to submit. You then submit to the pool a number of tasks where each task specifies the worker function to perform that task and the argument(s) it requires.
In the code below I have removed from function bot the while True: loop that never terminates. You can put it back in if you want.
from multiprocessing import Pool, cpu_count
import pyautogui as auto
def bot(aim):
# do the work for the single occurrence of aim
auto.click(aim)
print(aim)
if __name__ == "__main__":
aims = list(auto.locateAllOnScreen(r"dot.png", confidence=0.9795))
# choose appropriate pool size:
pool = Pool(min(len(aims), cpu_count()))
# bot will be called for each element returned by call to auto.locateAllOnScreen
pool.map(bot, aims)

Creating a Queue delay in a Python pool without blocking

I have a large program (specifically, a function) that I'm attempting to parallelize using a JoinableQueue and the multiprocessing map_async method. The function that I'm working with does several operations on multidimensional arrays, so I break up each array into sections, and each section evaluates independently; however I need to stitch together one of the arrays early on, but the "stitch" happens before the "evaluate" and I need to introduce some kind of delay in the JoinableQueue. I've searched all over for a workable solution but I'm very new to multiprocessing and most of it goes over my head.
This phrasing may be confusing- apologies. Here's an outline of my code (I can't put all of it because it's very long, but I can provide additional detail if needed)
import numpy as np
import multiprocessing as mp
from multiprocessing import Pool, Pipe, JoinableQueue
def main_function(section_number):
#define section sizes
array_this_section = array[:,start:end+1,:]
histogram_this_section = np.zeros((3, dataset_size, dataset_size))
#start and end are defined according to the size of the array
#dataset_size is to show that the histogram is a different size than the array
for m in range(1,num_iterations+1):
#do several operations- each section of the array
#corresponds to a section on the histogram
hist_queue.put(histogram_this_section)
#each process sends their own part of the histogram outside of the pool
#to be combined with every other part- later operations
#in this function must use the full histogram
hist_queue.join()
full_histogram = full_hist_queue.get()
full_hist_queue.task_done()
#do many more operations
hist_queue = JoinableQueue()
full_hist_queue = JoinableQueue()
if __name__ == '__main__':
pool = mp.Pool(num_sections)
args = np.arange(num_sections)
pool.map_async(main_function, args, chunksize=1)
#I need the map_async because the program is designed to display an output at the
#end of each iteration, and each output must be a compilation of all processes
#a few variable definitions go here
for m in range(1,num_iterations+1):
for i in range(num_sections):
temp_hist = hist_queue.get() #the code hangs here because the queue
#is attempting to get before anything
#has been put
hist_full += temp_hist
for i in range(num_sections):
hist_queue.task_done()
for i in range(num_sections):
full_hist_queue.put(hist_full) #the full histogram is sent back into
#the pool
full_hist_queue.join()
#etc etc
pool.close()
pool.join()
I'm pretty sure that your issue is how you're creating the Queues and trying to share them with the child processes. If you just have them as global variables, they may be recreated in the child processes instead of inherited (the exact details depend on what OS and/or context you're using for multiprocessing).
A better way to go about solving this issue is to avoid using multiprocessing.Pool to spawn your processes and instead explicitly create Process instances for your workers yourself. This way you can pass the Queue instances to the processes that need them without any difficulty (it's technically possible to pass the queues to the Pool workers, but it's awkward).
I'd try something like this:
def worker_function(section_number, hist_queue, full_hist_queue): # take queues as arguments
# ... the rest of the function can work as before
# note, I renamed this from "main_function" since it's not running in the main process
if __name__ == '__main__':
hist_queue = JoinableQueue() # create the queues only in the main process
full_hist_queue = JoinableQueue() # the workers don't need to access them as globals
processes = [Process(target=worker_function, args=(i, hist_queue, full_hist_queue)
for i in range(num_sections)]
for p in processes:
p.start()
# ...
If the different stages of your worker function are more or less independent of one another (that is, the "do many more operations" step doesn't depend directly on the "do several operations" step above it, just on full_histogram), you might be able to keep the Pool and instead split up the different steps into separate functions, which the main process could call via several calls to map on the pool. You don't need to use your own Queues in this approach, just the ones built in to the Pool. This might be best especially if the number of "sections" you're splitting the work up into doesn't correspond closely with the number of processor cores on your computer. You can let the Pool match the number of cores, and have each one work on several sections of the data in turn.
A rough sketch of this would be something like:
def worker_make_hist(section_number):
# do several operations, get a partial histogram
return histogram_this_section
def worker_do_more_ops(section_number, full_histogram):
# whatever...
return some_result
if __name__ == "__main__":
pool = multiprocessing.Pool() # by default the size will be equal to the number of cores
for temp_hist in pool.imap_unordered(worker_make_hist, range(number_of_sections)):
hist_full += temp_hist
some_results = pool.starmap(worker_do_more_ops, zip(range(number_of_sections),
itertools.repeat(hist_full)))

Python Multiprocessing pool.map unresponsive with too many worker processes

first question on stack overflow so please bear with. I am looking to calculate the variance for group ratings (long numpy arrays). Running the program without parallel processing works fine, but given each process can run independently and there are 32 groups I am looking to make use of multiprocessing to speed things up. This works OK for small numbers of groups < 10, but after this the program will often just seemingly stop running with no error messages at an unspecified number of groups ( usually between 20 and 30 ) although less frequently will run all the way through. The arrays are quite large ( 21451 x 11462 user item ratings) and so I am wondering if the problem is caused by not enough memory, although no error messages are printed.
import numpy as np
from functools import partial
import multiprocessing
def variance_parallel(extra_matrices, group_num):
# do some variation calculation
# print confirmation that we have entered function, and group number
return single_group_var
def variance(extra_matrices, num_groups):
variance_partial = partial(variance_parallel, extra_matrices)
for g in list(range(num_groups)):
group_var = pool.map(variance_partial,range(g))
return(group_var)
num_cores = multiprocessing.cpu_count() - 1
pool = multiprocessing.Pool(processes=num_cores)
variance(extra_matrices, num_groups)
Running the above code shows the program progressively building the number of groups it is checking variance on ([0],[0,1],[0,1,2],...) before eventually printing nothing.
Thanks in advance for any help and apologies if my formatting / question is a bit off!
Multiple processes do not share data
Data sent to processes needs to be copied
Since the arrays are large, the issue is very likely to do with said copying of large arrays to the processes. Further more in Python's multiprocessing, sending data to processes is done by serialisation which is (a) CPU intensive and (b) takes extra memory in and by it self.
In short multi processing is not a good fit for your use case. Since numpy is a native code extension (where GIL does not apply) and is thread safe, best to use threading instead of multiprocessing. With threading, the worker threads can share data via their parent process's address space which makes away with having to copy.
That should stop the program from running out of memory.
However, for threads to share address space the data they share needs to be bound to an object, like in a python class.
Something like the below - untested as the code sample is incomplete.
import numpy as np
from functools import partial
from threading import Thread
from multiprocessing import cpu_count
class Variance(Thread):
def __init__(self, extra_matrices, group_num):
Thread.__init__(self)
self.extra_matrices = extra_matrices
self.group_num = group_num
self.output = None
def run(self):
# do some variation calculation
# print confirmation that we have entered function, and group number
self.output = single_group_var
num_cores = cpu_count() - 1
results = []
for g in list(range(num_groups)):
workers = [Variance(extra_matrices, range(g))
for _ in range(num_cores)]
# Start threads
for worker in workers:
worker.start()
# Wait for completion
for worker in workers:
worker.join()
results.extend([w.output for w in workers])
print results

Better way to share memory for multiprocessing in Python?

I have been tackling this problem for a week now and it's been getting pretty frustrating because every time I implement a simpler but similar scale example of what I need to do, it turns out multiprocessing will fudge it up. The way it handles shared memory baffles me because it is so limited, it can become useless quite rapidly.
So the basic description of my problem is that I need to create a process that gets passed in some parameters to open an image and create about 20K patches of size 60x40. These patches are saved into a list 2 at a time and need to be returned to the main thread to then be processed again by 2 other concurrent processes that run on the GPU.
The process and the workflow and all that are mostly taken care of, what I need now is the part that was supposed to be the easiest is turning out to be the most difficult. I have not been able to save and get the list with 20K patches back to the main thread.
First problem was because I was saving these patches as PIL images. I then found out all data added to a Queue object has to be pickled.
Second problem was I then converted the patches to an array of 60x40 each and saved them to a list. And now that still doesn't work? Apparently Queues have a limited amount of data they can save otherwise when you call queue_obj.get() the program hangs.
I have tried many other things, and every new thing I try does not work, so I would like to know if anyone has other recommendations of a library I can use to share objects without all the fuzz?
Here is a sample implementation of kind of what I'm looking at. Keep in mind this works perfectly fine, but the full implementation doesn't. And I do have the code print informational messages to see that the data being saved has the exact same shape and everything, but for some reason it doesn't work. In the full implementation the independent process completes successfully but freezes at q.get().
from PIL import Image
from multiprocessing import Queue, Process
import StringIO
import numpy
img = Image.open("/path/to/image.jpg")
q = Queue()
q2 = Queue()
#
#
# MAX Individual Queue limit for 60x40 images in BW is 31,466.
# Multiple individual Queues can be filled to the max limit of 31,466.
# A single Queue can only take up to 31,466, even if split up in different puts.
def rz(patch, qn1, qn2):
totalPatchCount = 20000
channels = 1
patch = patch.resize((60,40), Image.ANTIALIAS)
patch = patch.convert('L')
# ImgArray = numpy.asarray(im, dtype=numpy.float32)
list_im_arr = []
# ----Create a 4D Array
# returnImageArray = numpy.zeros(shape=(totalPatchCount, channels, 40, 60))
imgArray = numpy.asarray(patch, dtype=numpy.float32)
imgArray = imgArray[numpy.newaxis, ...]
# ----End 4D array
# list_im_arr2 = []
for i in xrange(totalPatchCount):
# returnImageArray[i] = imgArray
list_im_arr.append(imgArray)
qn1.put(list_im_arr)
qn1.cancel_join_thread()
# qn2.cancel_join_thread()
print "PROGRAM Done"
# rz(img,q,q2)
# l = q.get()
#
p = Process(target=rz,args=(img, q, q2,))
p.start()
p.join()
#
# # l = []
# # for i in xrange(1000): l.append(q.get())
#
imdata = q.get()
Queue is for communication between processes. In your case, you don't really have this kind of communication. You can simply let the process return result, and use the .get() method to collect them. (Remember to add if __name__ == "main":, see programming guideline)
from PIL import Image
from multiprocessing import Pool, Lock
import numpy
img = Image.open("/path/to/image.jpg")
def rz():
totalPatchCount = 20000
imgArray = numpy.asarray(patch, dtype=numpy.float32)
list_im_arr = [imgArray] * totalPatchCount # A more elegant way than a for loop
return list_im_arr
if __name__ == '__main__':
# patch = img.... Your code to get generate patch here
patch = patch.resize((60,40), Image.ANTIALIAS)
patch = patch.convert('L')
pool = Pool(2)
imdata = [pool.apply_async(rz).get() for x in range(2)]
pool.close()
pool.join()
Now, according to first answer of this post, multiprocessing only pass objects that's picklable. Pickling is probably unavoidable in multiprocessing because processes don't share memory. They simply don't live in the same universe. (They do inherit memory when they're first spawned, but they can not reach out of their own universe). PIL image object itself is not picklable. You can make it picklable by extracting only the image data stored in it, like this post suggested.
Since your problem is mostly I/O bound, you can also try multi-threading. It might be even faster for your purpose. Threads share everything so no pickling is required. If you're using python 3, ThreadPoolExecutor is a wonderful tool. For Python 2, you can use ThreadPool. To achieve higher efficiency, you'll have to rearrange how you do things, you want to break-up the process and let different threads do the job.
from PIL import Image
from multiprocessing.pool import ThreadPool
from multiprocessing import Lock
import numpy
img = Image.open("/path/to/image.jpg")
lock = Lock():
totalPatchCount = 20000
def rz(x):
patch = ...
return patch
pool = ThreadPool(8)
imdata = [pool.map(rz, range(totalPatchCount)) for i in range(2)]
pool.close()
pool.join()
You say "Apparently Queues have a limited amount of data they can save otherwise when you call queue_obj.get() the program hangs."
You're right and wrong there. There is a limited amount of information the Queue will hold without being drained. The problem is that when you do:
qn1.put(list_im_arr)
qn1.cancel_join_thread()
it schedules the communication to the underlying pipe (handled by a thread). The qn1.cancel_join_thread() then says "but it's cool if we exit without the scheduled put completing", and of course, a few microseconds later, the worker function exits and the Process exits (without waiting for the thread that is populating the pipe to actually do so; at best it might have sent the initial bytes of the object, but anything that doesn't fit in PIPE_BUF almost certainly gets dropped; you'd need some amazing race conditions to occur to get anything at all, let alone the whole of a large object). So later, when you do:
imdata = q.get()
nothing has actually been sent by the (now exited) Process. When you call q.get() it's waiting for data that never actually got transmitted.
The other answer is correct that in the case of computing and conveying a single value, Queues are overkill. But if you're going to use them, you need to use them properly. The fix would be to:
Remove the call to qn1.cancel_join_thread() so the Process doesn't exit until the data has been transmitted across the pipe.
Rearrange your calls to avoid deadlock
Rearranging is just this:
p = Process(target=rz,args=(img, q, q2,))
p.start()
imdata = q.get()
p.join()
moving p.join() after q.get(); if you try to join first, your main process will be waiting for the child to exit, and the child will be waiting for the queue to be consumed before it will exit (this might actually work if the Queue's pipe is drained by a thread in the main process, but it's best not to count on implementation details like that; this form is correct regardless of implementation details, as long as puts and gets are matched).

Python threading return values

I am new to threading an I have existing application that I would like to make a little quicker using threading.
I have several functions that return to a main Dict and would like to send these to separate threads so that run at the same time rather than one at a time.
I have done a little googling but I cant seem to find something that fits my existing code and could use a little help.
I have around six functions that return to the main Dict like this:
parsed['cryptomaps'] = pipes.ConfigParse.crypto(parsed['split-config'], parsed['asax'], parsed['names'])
The issue here is with the return value. I understand that I would need to use a queue for this but would I need a queue for each of these six functions or one queue for all of these. If it is the later how would I separate the returns from the threads and assign the to the correct Dict entries.
Any help on this would be great.
John
You can push tuples of (worker, data) to queue to identify the source.
Also please note that due to Global Interpreter Lock Python threading is not very useful. I suggest to take a look at multiprocessing module which offers interface very similiar to multithreading but will actually scale with number of workers.
Edit:
Code sample.
import multiprocessing as mp
# py 3 compatibility
try:
from future_builtins import range, map
except ImportError:
pass
data = [
# input data
# {split_config: ... }
]
def crypto(split_config, asax, names):
# your code here
pass
if __name__ == "__main__":
terminate = mp.Event()
input = mp.Queue()
output = mp.Queue()
def worker(id, terminate, input, output):
# use event here to graciously exit
# using Process.terminate would leave queues
# in undefined state
while not terminate.is_set():
try:
x = input.get(True, timeout=1000)
output.put((id, crypto(**x)))
except Queue.Empty:
pass
workers = [mp.Process(target=worker, args=(i, )) for i in range(0, mp.cpu_count())]
for worker in workers:
worker.start()
for x in data:
input.put(x)
# terminate workers
terminate.set()
# process results
# make sure that queues are emptied otherwise Process.join can deadlock
for worker in workers:
worker.join()

Categories

Resources