I'm trying to get the all possible combination with replacement and make with each of them some calculation. I'm using the code below:
from itertools import combination_with_replacement
for seq in combination_with_replacement('ABCDE', 500):
# some calculation
How can I parallelize this calculation using multiprocessing?
You can use the standard library concurrent.futures.
from concurrent.futures import ProcessPoolExecutor
from itertools import combinations_with_replacement
def processing(combination):
print(combination)
# Compute interesting stuff
if __name__ == '__main__':
executor = ProcessPoolExecutor(max_workers=8)
result = executor.map(processing, combinations_with_replacement('ABCDE', 25))
for r in result:
# do stuff ...
A bit more explanations:
This code creates an executor using processes. Another possibility would be to use threads but full python threads only run on one core so it might not be the solution of interest in your case as you need to run heavy computation.
The map object return a asynchronous object. Thus, the line executor.map.. is non blocking and you can do other computation before collecting the result in the for loop.
It is important to declare the processing function out of the if __name__ == '__main__': block and to declare and use the executor in this block. This prevent for infinite executor spawning and permit to pickle the worker function to pass it to the child process. Without this block, the code is likely to fail.
I recommend this over multiprocessing.Pool as it has a more clever way to dispatch the work as you are using an iterator.
Note that your computation for combination of 500 with 5 elements ABCDE might not be possible. It needs to compute 5**500 > 1e350 elements. By parallelizing, you will only reduce your computation linearly by a factor max_workers, so in this case 8 and each process will need to run with ~ 1e349 elements, which should take about ~ 1e335 years if each computation is done in 1 micro second.
Related
I am trying to figure out how to perfom a multiprocessing task with an unusual formulation.
Basically, given two lists containing 10 matrices for each list, I have to check if applying an operation (that I'll call fn) gives the same results if the input is (A, B) or vice versa (B, A).
With a sequential approach, the solution is streightforward:
#Given
A = [matrix_a1, ... , matrix_a10]
B = [matrix_b1, ... , matrix_b10]
AB_BA= [fn(A[i], B[i])==fn(B[i], A[i]) for i in range(0, len(A)) ]
The next task is a bit strange because it requires setting strictly more than ten threads and applying multiprocessing. The restriction is that you can not assign all the single comparisons to ten different processes because the remaining processes will be unused. I do not know why the request seems to be using "process" and "thread" interchangeably.
This task seems a bit confusing because in multiprocessing, generally, you set the maximum number of workers, not the minimum.
I tried to use a solution that uses a ProcessPoolExecutor, as follows:
def equality(A, B,i):
res= fn(A[i], B[i]) == fn(B[i],A[i] )
return(res)
with concurrent.futures.ProcessPoolExecutor(max_workers=20) as executor:
idx=range(0, len(A))
results= executor.map(equality, A, B, idx)
for result in results:
print(result)
My problem is that I am not sure how to check resource usage. I have naively tried to monitor the CPU usage using the ubuntu system monitor as well as "top" from the command line.
In addition, this solution is the most efficient among those I tried, but there is not a direct specification to use at least 11 workers, so this solution seems not to stick with what was requested.
I also tried other solutions, such as using pool directly. This causes to evoke 10 python instances using top, but again, not more than 10. Here's what I tried:
def equality(A, B):
res=fn(A, B) == fn(B,A )
return(res)
with mp.Pool(20) as p:
print(p.starmap(equality, ((A[i], B[i]) for i in range(0, len(A)))))
Do you have any suggestions to address this request as well as monitor the resource usage to be sure it is working as expected?
Thank you very much for your help in advance.
I wish you had published the actual problem word for word, since your description is a bit unclear. But this is what I know (or think I know):
Unless the amount of CPU processing done by your worker function equality is great enough so that what is gained by running the function in parallel more than offsets the additional multiprocessing overhead you would not otherwise have if not using multiprocessing (i.e. starting processes, moving data from one address space to another, etc.), your multiprocessing code will run more slowly. Therefore, you should design your worker function to do the most work possible and to pass as little data as possible.
When you specify ...
results = executor.map(equality, A, B, idx)
... your equality function will be invoked once for each element of A, B and idx. So what is being passed is not the entire lists A and B but rather individual elements (e.g. matrix_a1 and matrix_b1). Therefore, there is no point in even passing an idx argument:
def equality(matrix_a, matrix_b):
"""
matrix_a and matrix_a are each single elements of
lists A and B respecticely.
"""
return fn(matrix_a) == fn(matrix_b)
def main():
from os import cpu_count
from concurrent.futures import ProcessPoolExecutor
A = [matrix_a1, ... , matrix_a10]
B = [matrix_b1, ... , matrix_b10]
# Do not create more processes then we have either
# CPU cores or the number of tasks that need to submit:
pool_size = min(cpu_count(), len(A))
with ProcessPoolExecutor(max_workers=pool_size) as executor:
AB_BA = list(executor.map(equality, A, B))
# This will be a list of 10 elements, each either `True` or `False`:
print(AB_BA)
# Required for Windows:
if __name__ == '__main__':
main()
So we will be submitting 10 tasks to a pool size of 10. Internally there is a "task queue" on which all the arguments being passed to equality exist:
matrix_a1, matrix_b1 # task 1
matrix_a2, matrix_b2 # task 2
...
matrix_a10, matrix_b10 # task 10
Any process in the pool that is idle will grap the next task in the queue to work on and the results will be returned in task submission order. But since equality is such a short-running function unless function fn is sufficiently complicated, there is the possibility that the pool process that grabs the first task can complete it and then grab the second task before some other pool process is dispatched by the operating system and can grab it. So there is no guarantee that all 10 tasks will be worked on in parallel by 10 pool processes even if function fn is sufficiently CPU-intensive. If you were to insert a call to time.sleep(.1) at the beginning of equality, that would give the other pool processes a chance to "wake up" and grab its own task from the task queue. But that would slow your program down even more since sleeping for this purposes is totally non-productive. But the point I am trying to make is that you cannot ensure that all pool processes will always be active concurrently.
Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?
If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.
There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.
Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.
from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()
You could use threading or multiprocessing.
How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.
I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!
I'm doing a small task in Python:
1- write a simple add function that will add two given numbers and print their sum.
2- Declare an array of 10 numbers.
3- call sum function in parallel threads and pass the numbers form above array to function as per following rules:
--> sum(array[thread_id], array[9-thread_id])
I have done step 1 and step 2 but I don't know what to do in step 3. Here's the code below:
import array
import multiprocessing
import threading
from multiprocessing import Process
# 1- write a simple add function that will add two given numbers and print their sum
x = 5
y = 10
def add(a,b):
sum = int(a) + int(b)
print ("The sum is: ", sum)
add(x,y)
# 2 Declare an array of 10 numbers
a_list = list(range(1, 11))
print ('an array of 10 numbers: ' + str(a_list))
# 3 call sum function in parallel threads and pass the numbers form above array to function as per following rules
# --> sum(array[thread_id], array[9-thread_id])
Can anyone please help me with what to do in step 3?
I have little knowledge of parallel threads (or multiprocessing)
In Python, you can start a new thread like so
import threading
def function_to_run_in_thread(arg1, arg2): ...
threading.Thread(target=function_to_run_in_thread, args=("some", "arguments")).start()
and you can start a new process similarly like so
import multiprocessing
def function_to_run_in_subprocess(arg1, arg2): ...
multiprocessing.Process(target=function_to_run_in_subprocess, args=("some", "arguments")).start()
however, it is not particularly easy to get at the function return value by threading/multiprocessing in this way. It would be better to use a ThreadPoolExecutor or ProcessPoolExecutor from the concurrent.futures module.
so in your case we can do
# Both the ThreadPoolExecutor and ProcessPoolExecutor are basically interchangeable from an API perspective
from concurrent.futures import ThreadPoolExecutor as Pool
# from concurrent.futures import ProcessPoolExecutor as Pool
pool = Pool(10) # the argument is the number of threads/processes available
sum_futures = [pool.submit(add, *pair) for pair in zip(a_list, reversed(a_list))]
# We must explicitly wait for the calculation to finish for each add call
sum_results = [future.result() for future in sum_futures]
Or, we can also do
sum_results = list(pool.map(lambda args: add(*args), zip(a_list, reversed(a_list))))
to be more concise
In CPython (the reference implementation of Python, and the one that you're probably using), only one thread can be holding the Global Interpreter Lock at a given time. This means that multithreading in code running in the CPython interpreter only gives you a benefit for doing operations that do not require holding the GIL, such as
blocking network/IO operations
running certain code in a C extension that explicitly releases the GIL
Basically what this means is that the operation of adding two numbers together in python cannot truly execute in parallel when using multithreading.
However, mulitprocessing comes with its own drawbacks, one of which being that serialization (using pickle) is required to pass values between processes. Some values cannot be pickled (e.g lambda expressions), which can be a potential roadblock. Threads are not subject to this restriction because they can communicate over using shared regions of memory, which is also a lot faster.
I have searched the site but I am not sure precisely what terms would yield relevant answers, my apologies if this question is redundant.
I need to process a very very large matrix (14,000,000 * 250,000) and would like to exploit Python's multiprocessing module to speed things up. For each pair of columns in the matrix I need to apply a function which will then store the results in a proprietary class.
I will be implementing a double four loop which provides the necessary combinations of columns.
I do not want to load up a pool with 250,000 tasks as I fear the memory usage will be significant.Ideally, I would like to have one column then be tasked out amongst the pool I.e
Process 1 takes Column A and Column B and a function F takes A,B and G and then stores the result in Class G[A,B]
Process 2 takes Column A and Column C and proceeds similarly
The processes will never access the same element of G.
So I would like to pause the for loop every N tasks. The set/get methods of G will be overriden to perform some back end tasks.
What I do not understand is whether or not pausing the loop is necessary? I.e is Python smart enough to only take what it can work on? Or will it be populating a massive amount of tasks?
Lastly, I am unclear of how the results work. I just want them to be set in G and not return anything. I do not want to have to worry about about .get() etc. but from my understanding the pool method returns a result object. Can I just ignore this?
Is there a better way? Am I completly lost?
First off - you will want to create a multiprocessing pool class. You setup how many workers you want and then use map to start up tasks. I am sure you already know but here is the python multiprocessing docs.
You say that you don't want to return data because you don't need to but how are you planning on viewing results? Will each task write the data to disk? To pass data between your processes you will want to use something like the multiprocessing queue.
Here is example code from the link on how to use process and queue:
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
And this is an example of using the Pool:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
Edit: #goncalopp makes a very important point that you may not want to do heavy numerical calculations in python due to how slow it is. Numpy is a great package for doing number crunching.
If you are heavily IO bound due to writing to disk on each process you should consider running something like 4*num_processors so that you always have something to do. You also should make sure you have a very fast disk :)
I have generated permutations with the itertools.permutations function in python. The problem is that the result is very big and I would like to go through it with multiple threads but don't really know how to accomplish that here is what I have so far:
perms = itertools.permutations('1234', r=4)
#I would like to iterate through 'perms' with multiple threads
for perm in perms:
print perm
If the work you want to do with the items from the permutation generator is CPU intensive, you probably want to use processes rather than threads. CPython's Global Interpreter Lock (GIL) makes multithreading of limited value when doing CPU bound work.
Instead, use the multiprocessing module's Pool class, like so:
import multiprocessing
import itertools
def do_stuff(perm):
# whatever
return list(reversed(perm))
if __name__ == "__main__":
with multiprocessing.Pool() as pool: # default is optimal number of processes
results = pool.map(do_stuff, itertools.permutations('1234', r=4))
# do stuff with results
Note that if you will be iterating over results (rather than doing something with it as a list), you can use imap instead of map to get an iterator that you can use to work on the results as they are produced from the worker processes. If it doesn't matter what order the items are returned, you can use imap_unordered to (I think) save a bit of memory.
The if __name__ is "__main__" boilerplate is required on Windows, where the multiprocessing module has to work around the OS's limitations (no fork).
Split the index of the number of perms between threads then use this function to generate the perm from its index in each thread rather than generating all the perms and splitting them between threads.
Assuming your processing function is f(x) you want to do
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
perms = itertools.permutations('1234', r=4)
for r in pool.map(f, perms):
print (r)
In fact, using threads would not execute the processes in parallel, unless it is IO bound. If it is CPU bound and you have a quad core, then it's the way to go. If you don't have multicore and it is CPU bound, then I'm afraid that making it parallel will not improve your current situation.
Python's futures module makes it easy to split work between threads. In this example, 4 threads will be used, but you can modify that to suit your needs.
from concurrent import futures
def thread_process(perm):
#do something
with futures.ThreadPoolExecutor(max_workers=4) as executor:
for perm in perms:
executor.submit(thread_process, perm)