How to use multiprocessing in a for loop - python

I am new to Python multiprocessing. I have a function that returns values and is supposed to act in parallel. In the following, you can find a sample code.
import multiprocessing as mp
from tqdm import tqdm
def foo(self):
arg_triplets = [(self.loc_x[ii], self.loc_y[jj], self.arg)
for ii in np.arange(0, self.nx) for jj in np.arange(0, self.ny)]
ctx = mp.get_context('fork')
max_proc = mp.cpu_count()-1
pool = ctx.Pool(processes=max_proc)
return_values = list(tqdm(pool.imap(target_foo, arg_triplets), total=nx*ny))
pool.close()
pool.join()
So, when I run this routine once, everything works fine. The function target_foo takes triplets of arguments and returns all the output values as a list. I can monitor the status of my 8 core processor, and see 7 of them are working simultaneously. But the problem starts when I use function foo in a for loop. For example, I need to gather data for multiple foo which I do not need in parallel. So I create a for loop that calls foo sequentially. In each call of foo, the function target_foo is supposed to work in parallel. The problem is that for the first time it works on parallel, but from the second time, it does not. What am I doing wrong?

Related

Multiprocessing only using a single thread instead of multiple

This question has been asked and solved a few times recently but I have quite a specific example...
I have a multiprocessing function that was working absolutely fine in complete isolation yesterday (in an interactive notebook), however, I decided to parameterise so I can call it as part of a larger pipeline & for abstraction/cleaner notebook and now it's only using a single thread instead of 6.
import pandas as pd
import multiprocessing as mp
from multiprocessing import get_context
mp.set_start_method('forkserver')
def multiprocess_function(func, iterator, input_data):
result_list = []
def append_result(result):
result_list.append(result)
with get_context('fork').Pool(processes=6) as pool:
for i in iterator:
pool.apply_async(func, args = (i, input_data), callback = append_result)
pool.close()
pool.join()
return result_list
multiprocess_function(count_live, run_weeks, base_df)
My previous version of the code executed differently, instead of a return / call I was using the following at the bottom of the function (which doesn't work at all now I've parameterised - even with the args assigned)
if __name__ == '__main__':
multiprocess_function()
The function executes fine, just only operates across one thread as per the output in top.
Apologies if this is something incredibly simple - I'm not a programmer, I'm an analyst :)
edit: everything works absolutely fine if I include the if__name__ =='main': etc at the bottom of the function and execute the cell, however, when I do this I have to remove the parameters - maybe just something to do with scoping. If I execute by calling the function, whether it is parameterised or not, it only operates on a single thread.
You've got two problems:
You're not using an import guard.
You're not setting the default start method inside the import guard.
Between the two of them, you end up telling Python to spawn the forkserver inside the forkserver, which can only cause you grief. Change the structure of your code to:
import pandas as pd
import multiprocessing as mp
from multiprocessing import get_context
def multiprocess_function(func, iterator, input_data):
result_list = []
with get_context('fork').Pool(processes=6) as pool:
for i in iterator:
pool.apply_async(func, args=(i, input_data), callback=result_list.append)
pool.close()
pool.join()
return result_list
if __name__ == '__main__':
mp.set_start_method('forkserver')
multiprocess_function(count_live, run_weeks, base_df)
Since you didn't show where you got count_live, run_weeks and base_df from, I'll just say that for the code as written, they should be defined in the guarded section (since nothing relies on them as a global).
There are other improvements to be made (apply_async is being used in a way that makes me thing you really just wanted to listify the result of pool.imap_unordered, without the explicit loop), but that's fixing the big issues that will wreck use of spawn or forkserver start methods.
using "get_context('spawn') " instead of "get_context('fork')" maybe will solve your problem

Most efficient way to multiprocess separate functions over same object

In a python script, I have a large dataset that I would like to apply multiple functions to. The functions are responsible for creating certain outputs that get saved to the hard drive.
A few things of note:
the functions are independent
none of the functions return anything
the functions will take variable amounts of time
some of the functions may fail, and that is fine
Can I multiprocess this in any way that each function and the dataset are sent separately to a core and run there? This way I do not need the first function to finish before the second one can kick off? There is no need for them to be sequentially dependent.
Thanks!
Since your functions are independent and only read data, as long as it is not an issue if your data is modified during the execution of a function, then they are also thread safe.
Use a thread pool (click) . You would have to create a task per function you want to run.
Note: In order for it to run on more than one core you must use Python Multiprocessing. Else all the threads will run on a single core. This happens because Python has a Global Interpreter Lock (GIL). For more information Python threads all executing on a single core
Alternatively, you could use DASK , which augments the data in order to run some multi threading. While adding some overhead, it might be quicker for your needs.
I was in a similar situation as yours, and used Processes with the following function:
import multiprocessing as mp
def launch_proc(nproc, lst_functions, lst_args, lst_kwargs):
n = len(lst_functions)
r = 1 if n % nproc > 0 else 0
for b in range(n//nproc + r):
bucket = []
for p in range(nproc):
i = b*nproc + p
if i == n:
break
proc = mp.Process(target=lst_functions[i], args=lst_args[i], kwargs=lst_kwargs[i])
bucket.append(proc)
for proc in bucket:
proc.start()
for proc in bucket:
proc.join()
This has a major drawback: all Processes in a bucket have to finish before a new bucket can start. I tried to use a JoinableQueue to avoid this, but could not make it work.
Example:
def f(i):
print(i)
nproc = 2
n = 11
lst_f = [f] * n
lst_args = [[i] for i in range(n)]
lst_kwargs = [{}] * n
launch_proc(nproc, lst_f, lst_args, lst_kwargs)
Hope it can help.

Python Multiprocessing using Pool goes recursively haywire

I'm trying to make an expensive part of my pandas calculations parallel to speed up things.
I've already managed to make Multiprocessing.Pool work with a simple example:
import multiprocessing as mpr
import numpy as np
def Test(l):
for i in range(len(l)):
l[i] = i**2
return l
t = list(np.arange(100))
L = [t,t,t,t]
if __name__ == "__main__":
pool = mpr.Pool(processes=4)
E = pool.map(Test,L)
pool.close()
pool.join()
No problems here. Now my own algorithm is a bit more complicated, I can't post it here in its full glory and terribleness, so I'll use some pseudo-code to outline the things I'm doing there:
import pandas as pd
import time
import datetime as dt
import multiprocessing as mpr
import MPFunctions as mpf --> self-written worker functions that get called for the multiprocessing
import ClassGetDataFrames as gd --> self-written class that reads in all the data and puts it into dataframes
=== Settings
=== Use ClassGetDataFrames to get data
=== Lots of single-thread calculations and manipulations on the dataframe
=== Cut dataframe into 4 evenly big chunks, make list of them called DDC
if __name__ == "__main__":
pool = mpr.Pool(processes=4)
LLT = pool.map(mpf.processChunks,DDC)
pool.close()
pool.join()
=== Join processed Chunks LLT back into one dataframe
=== More calculations and manipulations
=== Data Output
When I'm running this script the following happens:
It reads in the data.
It does all calculations and manipulations until the Pool statement.
Suddenly it reads in the data again, fourfold.
Then it goes into the main script fourfold at the same time.
The whole thing cascades recursively and goes haywire.
I have read before that this can happen if you're not careful, but I do not know why it does happen here. My multiprocessing code is protected by the needed name-main-statement (I'm on Win7 64), it is only 4 lines long, it has close and join statements, it calls one defined worker function which then calls a second worker function in a loop, that's it. By all I know it should just create the pool with four processes, call the four processes from the imported script, close the pool and wait until everything is done, then just continue with the script. On a sidenote, I first had the worker functions in the same script, the behaviour was the same. Instead of just doing what's in the pool it seems to restart the whole script fourfold.
Can anyone enlighten me what might cause this behaviour? I seem to be missing some crucial understanding about Python's multiprocessing behaviour.
Also I don't know if it's important, I'm on a virtual machine that sits on my company's mainframe.
Do I have to use individual processes instead of a pool?
I managed to make it work by enceasing the entire script into the if __name__ == "__main__":-statement, not just the multiprocessing part.

Writing a parallel loop

I am trying to run a parallel loop on a simple example.
What am I doing wrong?
from joblib import Parallel, delayed
import multiprocessing
def processInput(i):
return i * i
if __name__ == '__main__':
# what are your inputs, and what operation do you want to
# perform on each input. For example...
inputs = range(1000000)
num_cores = multiprocessing.cpu_count()
results = Parallel(n_jobs=4)(delayed(processInput)(i) for i in inputs)
print(results)
The problem with the code is that when executed under Windows environments in Python 3, it opens num_cores instances of python to execute the parallel jobs but only one is active. This should not be the case since the activity of the processor should be 100% instead of 14% (under i7 - 8 logic cores).
Why are the extra instances not doing anything?
Continuing on your request to provide a working multiprocessing code, I suggest that you use pool_map (if the delayed functionality is not important), I'll give you an example, if your'e working on python3 its worth to mention you can use starmap.
Also worth mentioning that you can use map_sync/starmap_async if the order of the returned results does not have to correspond to the order of inputs.
import multiprocessing as mp
def processInput(i):
return i * i
if __name__ == '__main__':
# what are your inputs, and what operation do you want to
# perform on each input. For example...
inputs = range(1000000)
# removing processes argument makes the code run on all available cores
pool = mp.Pool(processes=4)
results = pool.map(processInput, inputs)
print(results)
On Windows, the multiprocessing module uses the 'spawn' method to start up multiple python interpreter processes. This is relatively slow. Parallel tries to be smart about running the code. In particular, it tries to adjust batch sizes so a batch takes about half a second to execute. (See the batch_size argument at https://pythonhosted.org/joblib/parallel.html)
Your processInput() function runs so fast that Parallel determines that it is faster to run the jobs serially on one processor than to spin up multiple python interpreters and run the code in parallel.
If you want to force your example to run on multiple cores, try setting batch_size to 1000 or making processInput() more complicated so it takes longer to execute.
Edit: Working example on windows that shows multiple processes in use (I'm using windows 7):
from joblib import Parallel, delayed
from os import getpid
def modfib(n):
# print the process id to see that multiple processes are used, and
# re-used during the job.
if n%400 == 0:
print(getpid(), n)
# fibonacci sequence mod 1000000
a,b = 0,1
for i in range(n):
a,b = b,(a+b)%1000000
return b
if __name__ == "__main__":
Parallel(n_jobs=-1, verbose=5)(delayed(modfib)(j) for j in range(1000, 4000))

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

Categories

Resources