It will be hard without having the whole code but I will try my best to explain it in detail. If you need more information please let me know.
So I have a python program with 3 processes (multiprocessing) running in parallel. The first one is a video-preprocessing task. The second is an audio-preprocessing task, and the last is a DNN model call. All processes are written kinda like this:
from multiprocessing import Process
class NameOfTheProcess(Process):
def __init__(self, queue, videoBuffer, audioBuffer):
super().__init__()
# ....
def run(self):
while True: # so that the processes run till I stop the program
# ....
The video-preprocessing is a simple face tracking and filling in a queue (which is used so I can share the data between the processes).
The audio-preprocessing is a process where I get an audio frame using the jack library. There I downsample the audio and put it in a buffer. After a specific delay of 20 callbacks of jack I start the DNN model process.
In the DNN model process, I have currently only 4 simple steps. First I check if the audio queue is empty if not then I get the element of the queue and then I go through a "dummy" for loop in a range of 1000. After that, I take the last x elements of the audio queue and put them in another queue to use it later.
The video-preprocessing and audio-preprocessing work fine I have no issues there but when I also start the DNN-process than I get many audio lost and in jack-client I get a lot of 16:00:12.435 XRUN callback (7 skipped). And when I just start the audio-preprocessing and the DNN-process I have the same issue. So in my mind, there is no problem with the video-preprocessing.
After a while, I figured out that when I remove this line audioBufferIn = self.audioBuffer.get() in the code below I don't have the audio lost anymore but I need to get the audio queue there somehow so I can work with it.
from multiprocessing import Process
class DnnModelCall(Process):
def __init__(self, queue, audioBuffer):
super().__init__()
print("DnnModelCall: init")
self.queue = queue
self.audioBuffer = audioBuffer
def run(self):
print("DnnModelCall: run")
while True:
if(not self.audioBuffer.empty()):
k = 0
audioBufferIn = self.audioBuffer.get()
# audioBufferIn = self.audioBuffer.get(block=False)
for i in range(0, 1000):
k += 1
outputDnnBackPart = audioBufferIn[-2560:]
outputQueue = []
outputQueue.extend(outputDnnBackPart)
self.queue.put(outputQueue)
I have also tried it with block=False but I get the same result.
Have anyone an idea?
And if you need more information let me know.
Thanks in advance.
Related
We have ~15,000 nodes to log into and pull data from via Pexpect. To speed this up, I am doing multiprocessing - splitting the load equally between 12 cores. That works great, but this is still over 1000 nodes per core - processed one at a time.
The CPU utilization of each core as it does this processing is roughly 2%. And that sort of makes sense, as most of the time is just waiting for to see the Pexpect expect value as the node streams output. To try and take advantage of this and speed things up further, I want to implement multi-threading within the multi-processing on each core.
To attempt avoid any issues with shared variables, I put all data needed to log into a node in a dictionary (one key in dictionary per node), and then slice the dictionary, with each thread receiving a unique slice. Then after the threads are done, I combine the dictionary slices back together.
However, I am still seeing one thread completely finish before moving to the next.
I am wondering what constitutes an idle state such that a core can be moved to work on another thread? Does the fact that it is always looking for the Pexpect expect value mean it is never idle?
Also, as I use the same target function for each thread. I am not sure if perhaps that target function being the same for each thread (same vars local to that function) is influencing this?
My multi-threading code is below, for reference.
Thanks for any insight!
import threading
import <lots of other stuff>
class ThreadClass(threading.Thread):
def __init__(self, outputs_dict_split):
super(ThreadClass, self).__init__()
self.outputs_dict_split = outputs_dict_split
def run(self):
outputs_dict_split = get_output(self.outputs_dict_split)
return outputs_dict_split
def get_output(outputs_dict):
### PEXPECT STUFF TO LOGIN AND RUN COMMANDS ####
### WRITE DEVICE'S OUTPUTS TO DEVICE'S OUTPUTS_DICT RESULTS SUB-KEY ###
def backbone(outputs_dict):
filterbykey = lambda keys: {x: outputs_dict[x] for x in keys}
num_threads = 2
device_split = np.array_split(list(outputs_dict.keys()), num_threads)
outputs_dict_split_list = []
split_list1 = list(device_split[0])
split_list2 = list(device_split[1])
outputs_dict_split1 = filterbykey(split_list1)
outputs_dict_split2 = filterbykey(split_list2)
t1 = ThreadClass(outputs_dict_split1)
t2 = ThreadClass(outputs_dict_split2)
t1.start()
t2.start()
t1.join()
t2.join()
outputs_dict_split1 = t1.outputs_dict_split
outputs_dict_split2 = t2.outputs_dict_split
outputs_dict_split_list.append(outputs_dict_split1)
outputs_dict_split_list.append(outputs_dict_split2)
outputs_dict = ChainMap(*outputs_dict_split_list)
### Downstream Processing ###
This actually worked. However, I had to scale the number of devices being processed in order to see substantial improvements in overall processing time.
I am sharping up my Python skills and have started learning about websockets as an educational tool.
Therefore, I'm working with real-time data received every millisecond via a websocket. I would like to seperate its acquisition/processing/plotting in a clean and comprehensive way. Acquisition and processing are critical, whereas plotting can be updated every ~100ms.
A) I am assuming that the raw data arrives at a constant rate, every ms.
B) If processing isn't quick enough (>1ms), skip the data that arrived while busy and stay synced with A)
C) Every ~100ms or so, get the last processed data and plot it.
I guess that a Minimal Working Example would start like this:
import threading
class ReceiveData(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def receive(self):
pass
class ProcessData(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def process(self):
pass
class PlotData(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def plot(self):
pass
Starting with this (is it even the right way to go ?), how can I pass the raw data from ReceiveData to ProcessData, and periodically to PlotData ? How can I keep the executions synced, and repeat calls every ms or 100ms ?
Thank you.
I think your general approach with threads for receiving and processing the data is fine. For communication between the threads, I would suggest a producer-consumer approach. Here is a complete example using a Queue as a data structure.
In your case you want to skip unprocessed data and use only the most recent element. For achieving this, collections.deque (see documentation) might be a better choice for you - see also this discussion.
d = collections.deque(maxlen=1)
The producer side would then append data to the deque like this:
d.append(item)
And the main loop on the consumer side might look like this:
while True:
try:
item = d.pop()
print('Getting item' + str(item))
except IndexError:
print('Deque is empty')
# time.sleep(s) if you want to poll the latest data every s seconds
Possibly, you can merge the ReceiveData and ProcessData functionalities into just one class / thread and use only one deque between this class and PlotData.
I can't quite find solution to a code where I pass to each worker Shared Queue but also a number for each worker.
My code:
The idea is to create several channels for putting audio songs. Each channels must be unique. So If a song arrives I put it to channel which is available
from multiprocessing import Pool,Queue
from functools import partial
import pygame
queue = Queue()
def play_song(shared_queue, chnl):
channel = pygame.mixer.Channel(chnl)
while True:
sound_name = shared_queue.get()
channel.play(pygame.mixer.Sound(sound_name))
if __name__ == "__main__":
channels= [0,1, 2, 3, 4]
func = partial(play_song,queue)
p = Pool(5,func, (channels,))
This code of course doesn't return any error, because its multiprocessing, but the problem is that channels is passed to play_song as whole list instead of being mapped to all workers.
So basically instead of each worker initialize channel like this:
channel = pygame.mixer.Channel(0) # each worker would have number from list so 1,2,3,4
I am getting this
channel = pygame.mixer.Channel([0,1,2,3,4]) # for each worker
I tried playing with partial function, but unsuccessfully.
I was successful with pool.map function, but while I could pass individual numbers from channels list, I couldn't share Queue among workers
Eventually I found solution to my Pygame problem that does not require threads or multiprocessing.
Background to the problem:
I was working with Pyaudio and since it is quite lowlevel api to audio, I had problems with mixing several sounds at the same time and in general. The reasons are:
1) It is not easy(maybe imposible) to start several streams at the same time or feed those streams at the same time (looks like hardware problem)
2) Based on 1) I tried different attitude - have one stream where audio waves from different sounds are added up before entering stream - that works but its unreliable as adding up audiowaves is not really compatible - adding to much waves results in 'sound cracking' as the amplitudes are too high.
Based on 1) and 2) I wanted to try run streams in different processes, therefore this question.
Pygame solution (single processed):
for sound_file in sound_files:
availible_channel = pygame.mixer.find_channel() #if there are 8 channels, it can play 8 sounds at the same time
availible_channel.play(sound_file )
if sound_files are already loaded, this gives near simultaneous results.
Multiprocessing solution
Thanks to Darkonaut who pointed out multiprocessing method, I manage to answer my initial question on multiprocessing, which I think is already answered on stackoverflow, but I will include it.
Example is not finished, because I didnt use it at the end, but it answers my initial requirement on the processes with shared queue but with different parameters
import multiprocessing as mp
shared_queue = mp.Queue()
def channel(que,channel_num):
que.put(channel_num)
if __name__ == '__main__':
processes = [mp.Process(target=channel, args=(shared_queue, channel_num)) for channel_num in range(8)]
for p in processes:
p.start()
for i in range(8):
print(shared_queue.get())
for p in processes:
p.join()
I have a script that I wrote that I am able to pass arguments to, and I want launch multiple simultaneous iterations (maybe 100+) with unique arguments. My plan was to write another python script which then launch these subscripts/processes, however to be effective, I need the that script to be able to monitor the subscripts for any errors.
Is there any straightforward way to do this, or a library that offers this functionality? I've been searching for a while and am not having good luck finding anything. Creating subprocesses and multiple threads seems straight forward enough but I can't really find any guides or tutorials on how to then communicate with those threads/subprocesses.
A better way to do this would be to make use of threads. If you made the script you want to call into a function in this larger script, you could have your main function call this script as many times as you want and have the threads report back with information as needed. You can read a little bit about how threads work here.
I suggest to use threading.Thread or multiprocessing.Process despite of requirements.
Simple way to communicate between Threads / Processes is to use Queue. Multiprocessing module provides some other ways to communicate between processes (Queue, Event, Manager, ...)
You can see some elementary communication in the example:
import threading
from Queue import Queue
import random
import time
class Worker(threading.Thread):
def __init__(self, name, queue_error):
threading.Thread.__init__(self)
self.name = name
self.queue_error = queue_error
def run(self):
time.sleep(random.randrange(1, 10))
# Do some processing ...
# Report errors
self.queue_error.put((self.name, 'Error state'))
class Launcher(object):
def __init__(self):
self.queue_error = Queue()
def main_loop(self):
# Start threads
for i in range(10):
w = Worker(i, self.queue_error)
w.start()
# Check for errors
while True:
while not self.queue_error.empty():
error_data = self.queue_error.get()
print 'Worker #%s reported error: %s' % (error_data[0], error_data[1])
time.sleep(0.1)
if __name__ == '__main__':
l = Launcher()
l.main_loop()
Like someone else said, you have to use multiple processes for true parallelism instead of threads because the GIL limitation prevents threads from running concurrently.
If you want to use the standard multiprocessing library (which is based on launching multiple processes), I suggest using a pool of workers. If I understood correctly, you want to launch 100+ parallel instances. Launching 100+ processes on one host will generate too much overhead. Instead, create a pool of P workers where P is for example the number of cores in your machine and submit the 100+ jobs to the pool. This is simple to do and there are many examples on the web. Also, when you submit jobs to the pool, you can provide a callback function to receive errors. This may be sufficient for your needs (there are examples here).
The Pool in multiprocessing however can't distribute work across multiple hosts (e.g. cluster of machines) last time I looked. So, if you need to do this, or if you need a more flexible communication scheme, like being able to send updates to the controlling process while the workers are running, my suggestion is to use charm4py (note that I am a charm4py developer so this is where I have experience).
With charm4py you could create N workers which are distributed among P processes by the runtime (works across multiple hosts), and the workers can communicate with the controller simply by doing remote method invocation. Here is a small example:
from charm4py import charm, Chare, Group, Array, ArrayMap, Reducer, threaded
import time
WORKER_ITERATIONS = 100
class Worker(Chare):
def __init__(self, controller):
self.controller = controller
#threaded
def work(self, x, done_future):
result = -1
try:
for i in range(WORKER_ITERATIONS):
if i % 20 == 0:
# send status update to controller
self.controller.progressUpdate(self.thisIndex, i, ret=True).get()
if i == 5 and self.thisIndex[0] % 2 == 0:
# trigger NameError on even-numbered workers
test[3] = 3
time.sleep(0.01)
result = x**2
except Exception as e:
# send error to controller
self.controller.collectError(self.thisIndex, e)
# send result to controller
self.contribute(result, Reducer.gather, done_future)
# This custom map is used to prevent workers from being created on process 0
# (where the controller is). Not strictly needed, but allows more timely
# controller output
class WorkerMap(ArrayMap):
def procNum(self, index):
return (index[0] % (charm.numPes() - 1)) + 1
class Controller(Chare):
def __init__(self, args):
self.startTime = time.time()
done_future = charm.createFuture()
# create 12 workers, which are distributed by charm4py among processes
workers = Array(Worker, 12, args=[self.thisProxy], map=Group(WorkerMap))
# start work
for i in range(12):
workers[i].work(i, done_future)
print('Results are', done_future.get()) # wait for result
exit()
def progressUpdate(self, worker_id, current_step):
print(round(time.time() - self.startTime, 3), ': Worker', worker_id,
'progress', current_step * 100 / WORKER_ITERATIONS, '%')
# the controller can return a value here and the worker would receive it
def collectError(self, worker_id, error):
print(round(time.time() - self.startTime, 3), ': Got error', error,
'from worker', worker_id)
charm.start(Controller)
In this example, the Controller will print status updates and errors as they happen. It
will print final results from all workers when they are all done. The result for workers
that have failed will be -1.
The number of processes P is given at launch. The runtime will distribute the N workers among the available processes. This happens when the workers are created and there is no dynamic load balancing in this particular example.
Also, note that in the charm4py model remote method invocation is asynchronous and returns a future which the caller can block on, but only the calling thread blocks (not the whole process).
Hope this helps.
My structure (massively simplified) is depicted below:
import multiprocessing
def creator():
# creates files
return
def relocator():
# moves created files
return
create = multiprocessing.Process(target=creator)
relocate = multiprocessing.Process(target=relocator)
create.start()
relocate.start()
What I am trying to do is have a bunch of files created by creator and as soon as they get created have them moved to another directory by relocator.
The reason I want to use multiprocessing here is:
I do not want creator to wait for the moving to be finished first because moving takes time I dont want to waste.
Creating all the files first before starting to copy is not an option either because there is not enough space in the drive for all of them.
I want both the creator and relocator processes to be serial (one file at a time each) but run in parallel. A "log" of the actions should lool like this:
# creating file 1
# creating file 2 and relocating file 1
# creating file 3 and relocating file 2
# ...
# relocating last file
Based on what I have read, Queue is the way to go here.
Strategy: (maybe not the best one?!)
After an file gets created it will be entering the queue and after it has finished being relocated, it will be removed from the queue.
I am however having issues coding it; multiple files being created at the same time (multiple instances of creator running in parallel) and others...
I would be very grateful for any ideas, hints, explanations, etc
Lets take your idea and split in this features:
Creator should create files (100 for example)
Relocator should move 1 file at a time till there are no more files to move
Creator may end before Relocator so it can also
transform himself into a Relocator Both have to know when to
finish
So, we have 2 main functionalities:
def create(i):
# creates files and return outpath
return os.path.join("some/path/based/on/stuff", "{}.ext".format(i))
def relocate(from, to):
# moves created files
shuttil.move(from, to)
Now lets create our processes:
from multiprocessing import Process, Queue
comm_queue = Queue()
#process that create the files and push the data into the queue
def creator(comm_q):
for i in range(100):
comm_q.put(create(i))
comm_q.put("STOP_FLAG") # we tell the workers when to stop, we just push one since we only have one more worker
#the relocator works till it gets an stop flag
def relocator(comm_q):
data = comm_q.get()
while data != "STOP_FLAG":
if data:
relocate(data, to_path_you_may_want)
data = comm_q.get()
creator_process= multiprocessing.Process(target=creator, args=(comm_queue))
relocators = multiprocessing.Process(target=relocator, args=(comm_queue))
creator_process.start()
relocators .start()
This way we would have now a creator and a relocator, but, lets say now we want the Creator to start relocating when the creation job is done by it, we can just use relocator, but we would need to push one more "STOP_FLAG" since we would have 2 processes relocating
def creator(comm_q):
for i in range(100):
comm_q.put(create(i))
for _ in range(2):
comm_q.put("STOP_FLAG")
relocator(comm_q)
Lets say we want now an arbitrary number of relocator processes, we should adapt our code a bit to handle this, we would need the creator method to be aware of how many flags to notify the other processes when to stop, our resulting code would look like this:
from multiprocessing import Process, Queue, cpu_count
comm_queue = Queue()
#process that create the files and push the data into the queue
def creator(comm_q, number_of_subprocesses):
for i in range(100):
comm_q.put(create(i))
for _ in range(number_of_subprocesses + 1): # we need to count ourselves
comm_q.put("STOP_FLAG")
relocator(comm_q)
#the relocator works till it gets an stop flag
def relocator(comm_q):
data = comm_q.get()
while data != "STOP_FLAG":
if data:
relocate(data, to_path_you_may_want)
data = comm_q.get()
num_of_cpus = cpu_count() #we will spam as many processes as cpu core we have
creator_process= Process(target=creator, args=(comm_queue, num_of_cpus))
relocators = [Process(target=relocator, args=(comm_queue)) for _ in num_of_cpus]
creator_process.start()
for rp in relocators:
rp.start()
Then you will have to WAIT for them to finish:
creator_process.join()
for rp in relocators:
rp.join()
You may want to check at the multiprocessing.Queue documentation
Specially to the get method (is a blocking call by default)
Remove and return an item from the queue. If optional args block is
True (the default) and timeout is None (the default), block if
necessary until an item is available.