Python permutations threads - python

I have generated permutations with the itertools.permutations function in python. The problem is that the result is very big and I would like to go through it with multiple threads but don't really know how to accomplish that here is what I have so far:
perms = itertools.permutations('1234', r=4)
#I would like to iterate through 'perms' with multiple threads
for perm in perms:
print perm

If the work you want to do with the items from the permutation generator is CPU intensive, you probably want to use processes rather than threads. CPython's Global Interpreter Lock (GIL) makes multithreading of limited value when doing CPU bound work.
Instead, use the multiprocessing module's Pool class, like so:
import multiprocessing
import itertools
def do_stuff(perm):
# whatever
return list(reversed(perm))
if __name__ == "__main__":
with multiprocessing.Pool() as pool: # default is optimal number of processes
results = pool.map(do_stuff, itertools.permutations('1234', r=4))
# do stuff with results
Note that if you will be iterating over results (rather than doing something with it as a list), you can use imap instead of map to get an iterator that you can use to work on the results as they are produced from the worker processes. If it doesn't matter what order the items are returned, you can use imap_unordered to (I think) save a bit of memory.
The if __name__ is "__main__" boilerplate is required on Windows, where the multiprocessing module has to work around the OS's limitations (no fork).

Split the index of the number of perms between threads then use this function to generate the perm from its index in each thread rather than generating all the perms and splitting them between threads.

Assuming your processing function is f(x) you want to do
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
perms = itertools.permutations('1234', r=4)
for r in pool.map(f, perms):
print (r)
In fact, using threads would not execute the processes in parallel, unless it is IO bound. If it is CPU bound and you have a quad core, then it's the way to go. If you don't have multicore and it is CPU bound, then I'm afraid that making it parallel will not improve your current situation.

Python's futures module makes it easy to split work between threads. In this example, 4 threads will be used, but you can modify that to suit your needs.
from concurrent import futures
def thread_process(perm):
#do something
with futures.ThreadPoolExecutor(max_workers=4) as executor:
for perm in perms:
executor.submit(thread_process, perm)

Related

How to implement multiprocessing on a specific function?

I am new to this multiprocessing concept. I am trying to implement multiprocessing to a spelling function to make it run faster. I tried as below but did not get results in previous order, token here is the huge list of tokenized sentences.
from spellchecker import SpellChecker
from wordsegment import load, segment
from timeit import default_timer as timer
from multiprocessing import Process, Pool, Queue, Manager
def text_similarity_spellings(self, token):
"""Uses spell checker to separate incorrect spellings and correct them"""
spell = SpellChecker()
unknown_words = [list(spell.unknown(word)) for word in token]
known_words = [list(spell.known(word)) for word in token]
load()
segmented = [[segment(word) for word in sub] for sub in unknown_words]
flat_list = list(self.unpacker(segmented))
new_list = [[known_words[x], flat_list[x]] for x in range(len(known_words))]
new_list = list(self.unpacker(new_list))
newlist = [sorted(set(mylist), key=lambda x: mylist.index(x)) for mylist in new_list]
return newlist
def run_all(self):
tread_vta = Manager().list()
processes = []
arg_split = np.array_split(np.array(token),10)
arg_tr_cl = []
finds = []
trdclean1 = []
for count, k in enumerate(arg_split):
arg_tr_cl.append((k, [], tread_vta, token[t]))
for j in range(len(arg_tr_cl)):
p = Process(target= self.text_similarity_spellings, args=arg_tr_cl[j])
p.start()
processes.append(p)
for p in processes:
p.join()
Can anyone suggest me a better way to apply multiprocessing to a specific function and get results in correct order?
First, there is a certain amount of overhead in creating processes and then again more overhead in passing arguments from the main process to a subprocess, which "lives" in another address space, and getting return values back (by the way, you have made no provisions for actually getting return values back from worker function text_similarity_spellings). So for you to profit from using multiprocessing, the gains from performing your tasks (invocations of your worker function) in parallel must be enough to offset the additional aforementioned costs. All of this is just a way of saying that your worker function has to be sufficiently CPU-intensive to justify multiprocessing.
Second, given the cost of creating processes, you do not want to be creating more processes than you can possibly use. If you have N tasks to complete (the length of arg_tr_cl) and M CPU processors to run them on and your worker function is pure CPU (no I/O involved), then you can never gain anything by trying to run these tasks using more than M processes. If, however, they do combine some I/O, then perhaps using more processes could be profitable. If there is a lot of I/O involved and only some CPU-intensive processing involved, then using a combination of multithreading and multiprocessing might be the way to go. Finally, if the worker function is mostly I/O, then multithreading is what you want.
There is a solution to using X processes (based on whatever value of X you have settled on) to complete N tasks and to be able to get return values back from your worker function, namely using a process pool of size X.
MULTITHREADING = False
n_tasks = len(arg_tr_cl)
if MULTITHREADING:
from multiprocessing.dummy import Pool
# To use multithreading instead (we can use a much larger pool size):
pool_size = min(n_tasks, 100) # 100 is fairly arbitrary
else:
from multiprocessing import Pool, cpu_count
# No point in creating pool size larger than the number of tasks we have
# Otherwise, assuming we are mostly CPU-intensive, just create pool size
# equal to the number of cpu cores that we have:
n_processors = cpu_count()
pool_size = min(n_tasks, n_processors)
pool = Pool(pool_size)
return_values = pool.map(self.text_similarity_spellings, arg_tr_cl)
# You can now iterate return_values to get the return values:
for return_value in return_values:
...
# or create a list, for example: return_values = list(return_values)
But it may be that the SpellChecker is doing lots of I/O if each invocation has to read in an external dictionary. If that is the case, is it not possible that your best performance is to initialize the SpellChecker once and then just loop checking each word and forget completely about multiprocessing (or multithreading)?

How to parallelize "for" loops? [duplicate]

Say I have a very large list and I'm performing an operation like so:
for item in items:
try:
api.my_operation(item)
except:
print 'error with item'
My issue is two fold:
There are a lot of items
api.my_operation takes forever to return
I'd like to use multi-threading to spin up a bunch of api.my_operations at once so I can process maybe 5 or 10 or even 100 items at once.
If my_operation() returns an exception (because maybe I already processed that item) - that's OK. It won't break anything. The loop can continue to the next item.
Note: this is for Python 2.7.3
First, in Python, if your code is CPU-bound, multithreading won't help, because only one thread can hold the Global Interpreter Lock, and therefore run Python code, at a time. So, you need to use processes, not threads.
This is not true if your operation "takes forever to return" because it's IO-bound—that is, waiting on the network or disk copies or the like. I'll come back to that later.
Next, the way to process 5 or 10 or 100 items at once is to create a pool of 5 or 10 or 100 workers, and put the items into a queue that the workers service. Fortunately, the stdlib multiprocessing and concurrent.futures libraries both wraps up most of the details for you.
The former is more powerful and flexible for traditional programming; the latter is simpler if you need to compose future-waiting; for trivial cases, it really doesn't matter which you choose. (In this case, the most obvious implementation with each takes 3 lines with futures, 4 lines with multiprocessing.)
If you're using 2.6-2.7 or 3.0-3.1, futures isn't built in, but you can install it from PyPI (pip install futures).
Finally, it's usually a lot simpler to parallelize things if you can turn the entire loop iteration into a function call (something you could, e.g., pass to map), so let's do that first:
def try_my_operation(item):
try:
api.my_operation(item)
except:
print('error with item')
Putting it all together:
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_my_operation, item) for item in items]
concurrent.futures.wait(futures)
If you have lots of relatively small jobs, the overhead of multiprocessing might swamp the gains. The way to solve that is to batch up the work into larger jobs. For example (using grouper from the itertools recipes, which you can copy and paste into your code, or get from the more-itertools project on PyPI):
def try_multiple_operations(items):
for item in items:
try:
api.my_operation(item)
except:
print('error with item')
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_multiple_operations, group)
for group in grouper(5, items)]
concurrent.futures.wait(futures)
Finally, what if your code is IO bound? Then threads are just as good as processes, and with less overhead (and fewer limitations, but those limitations usually won't affect you in cases like this). Sometimes that "less overhead" is enough to mean you don't need batching with threads, but you do with processes, which is a nice win.
So, how do you use threads instead of processes? Just change ProcessPoolExecutor to ThreadPoolExecutor.
If you're not sure whether your code is CPU-bound or IO-bound, just try it both ways.
Can I do this for multiple functions in my python script? For example, if I had another for loop elsewhere in the code that I wanted to parallelize. Is it possible to do two multi threaded functions in the same script?
Yes. In fact, there are two different ways to do it.
First, you can share the same (thread or process) executor and use it from multiple places with no problem. The whole point of tasks and futures is that they're self-contained; you don't care where they run, just that you queue them up and eventually get the answer back.
Alternatively, you can have two executors in the same program with no problem. This has a performance cost—if you're using both executors at the same time, you'll end up trying to run (for example) 16 busy threads on 8 cores, which means there's going to be some context switching. But sometimes it's worth doing because, say, the two executors are rarely busy at the same time, and it makes your code a lot simpler. Or maybe one executor is running very large tasks that can take a while to complete, and the other is running very small tasks that need to complete as quickly as possible, because responsiveness is more important than throughput for part of your program.
If you don't know which is appropriate for your program, usually it's the first.
There's multiprocesing.pool, and the following sample illustrates how to use one of them:
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing import Pool
pool_size = 5 # your "parallelness"
# define worker function before a Pool is instantiated
def worker(item):
try:
api.my_operation(item)
except:
print('error with item')
pool = Pool(pool_size)
for item in items:
pool.apply_async(worker, (item,))
pool.close()
pool.join()
Now if you indeed identify that your process is CPU bound as #abarnert mentioned, change ThreadPool to the process pool implementation (commented under ThreadPool import). You can find more details here: http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
You can split the processing into a specified number of threads using an approach like this:
import threading
def process(items, start, end):
for item in items[start:end]:
try:
api.my_operation(item)
except Exception:
print('error with item')
def split_processing(items, num_splits=4):
split_size = len(items) // num_splits
threads = []
for i in range(num_splits):
# determine the indices of the list this thread will handle
start = i * split_size
# special case on the last chunk to account for uneven splits
end = None if i+1 == num_splits else (i+1) * split_size
# create the thread
threads.append(
threading.Thread(target=process, args=(items, start, end)))
threads[-1].start() # start the thread we just created
# wait for all threads to finish
for t in threads:
t.join()
split_processing(items)
import numpy as np
import threading
def threaded_process(items_chunk):
""" Your main process which runs in thread for each chunk"""
for item in items_chunk:
try:
api.my_operation(item)
except Exception:
print('error with item')
n_threads = 20
# Splitting the items into chunks equal to number of threads
array_chunk = np.array_split(input_image_list, n_threads)
thread_list = []
for thr in range(n_threads):
thread = threading.Thread(target=threaded_process, args=(array_chunk[thr]),)
thread_list.append(thread)
thread_list[thr].start()
for thread in thread_list:
thread.join()

Python multiple processes consuming/iterating over single generator (divide and conquer)

I have a python generator that returns lots of items, for example:
import itertools
def generate_random_strings():
chars = "ABCDEFGH"
for item in itertools.product(chars, repeat=10):
yield "".join(item)
I then iterate over this and perform various tasks, the issue is that I'm only using one thread/process for this:
my_strings = generate_random_strings()
for string in my_strings:
# do something with string...
print(string)
This works great, I'm getting all my strings, but it's slow. I would like to harness the power of Python multiprocessing to "divide and conquer" this for loop. However, of course, I want each string to be processed only once. While I've found much documentation on multiprocessing, I'm trying to find the most simple solution for this with the least amount of code.
I'm assuming each thread should take a big chunk of items every time and process them before coming back and getting another big chunk etc...
Many thanks,
Most simple solution with least code? multiprocessing context manager.
I assume that you can put "do something with string" into a function called "do_something"
from multiprocessing import Pool as ProcessPool
number_of_processes = 4
with ProcessPool(number_of_processes) as pool:
pool.map(do_something, my_strings)
If you want to get the results of "do_something" back again, easy!
with ProcessPool(number_of_processes) as pool:
results = pool.map(do_something, my_strings)
You'll get them in a list.
Multiprocessing.dummy is a syntactic wrapper for process pools that lets you use the multiprocessing syntax. If you want threads instead of processes, just do this:
from multiprocessing.dummy import Pool as ThreadPool
You may use multiprocessing.
import multiprocessing
def string_fun(string):
# do something with string...
print(string)
my_strings = generate_random_strings()
num_of_threads = 7
pool = multiprocessing.Pool(num_of_threads)
pool.map(string_fun, my_strings)
Assuming you're using the lastest version of Python, you may want to read something about asyncio module. Multithreading is not easy to implement due to GIL lock: "In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe."
So you can swap on Multiprocessing, or, as reported above, take a look at asycio module.
asyncio — Asynchronous I/O > https://docs.python.org/3/library/asyncio.html
I'll integrate this answer with some code as soon as possible.
Hope it helps,
Hele
As #Hele mentioned, asyncio is best of all, here is an example
Code
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# python 3.7.2
from asyncio import ensure_future, gather, run
import random
alphabet = 'ABCDEFGH'
size = 1000
async def generate():
tasks = list()
result = None
for el in range(1, size):
task = ensure_future(generate_one())
tasks.append(task)
result = await gather(*tasks)
return list(set(result))
async def generate_one():
return ''.join(random.choice(alphabet) for i in range(8))
if __name__ == '__main__':
my_strings = run(generate())
print(my_strings)
Output
['CHABCGDD', 'ACBGAFEB', ...
Of course, you need to improve generate_one, this variant is very slow.
You can see source code here.

parallel computing combination_with_replacement using multiprocessing

I'm trying to get the all possible combination with replacement and make with each of them some calculation. I'm using the code below:
from itertools import combination_with_replacement
for seq in combination_with_replacement('ABCDE', 500):
# some calculation
How can I parallelize this calculation using multiprocessing?
You can use the standard library concurrent.futures.
from concurrent.futures import ProcessPoolExecutor
from itertools import combinations_with_replacement
def processing(combination):
print(combination)
# Compute interesting stuff
if __name__ == '__main__':
executor = ProcessPoolExecutor(max_workers=8)
result = executor.map(processing, combinations_with_replacement('ABCDE', 25))
for r in result:
# do stuff ...
A bit more explanations:
This code creates an executor using processes. Another possibility would be to use threads but full python threads only run on one core so it might not be the solution of interest in your case as you need to run heavy computation.
The map object return a asynchronous object. Thus, the line executor.map.. is non blocking and you can do other computation before collecting the result in the for loop.
It is important to declare the processing function out of the if __name__ == '__main__': block and to declare and use the executor in this block. This prevent for infinite executor spawning and permit to pickle the worker function to pass it to the child process. Without this block, the code is likely to fail.
I recommend this over multiprocessing.Pool as it has a more clever way to dispatch the work as you are using an iterator.
Note that your computation for combination of 500 with 5 elements ABCDE might not be possible. It needs to compute 5**500 > 1e350 elements. By parallelizing, you will only reduce your computation linearly by a factor max_workers, so in this case 8 and each process will need to run with ~ 1e349 elements, which should take about ~ 1e335 years if each computation is done in 1 micro second.

Python multiprocessing an enourmous amount of data

I have searched the site but I am not sure precisely what terms would yield relevant answers, my apologies if this question is redundant.
I need to process a very very large matrix (14,000,000 * 250,000) and would like to exploit Python's multiprocessing module to speed things up. For each pair of columns in the matrix I need to apply a function which will then store the results in a proprietary class.
I will be implementing a double four loop which provides the necessary combinations of columns.
I do not want to load up a pool with 250,000 tasks as I fear the memory usage will be significant.Ideally, I would like to have one column then be tasked out amongst the pool I.e
Process 1 takes Column A and Column B and a function F takes A,B and G and then stores the result in Class G[A,B]
Process 2 takes Column A and Column C and proceeds similarly
The processes will never access the same element of G.
So I would like to pause the for loop every N tasks. The set/get methods of G will be overriden to perform some back end tasks.
What I do not understand is whether or not pausing the loop is necessary? I.e is Python smart enough to only take what it can work on? Or will it be populating a massive amount of tasks?
Lastly, I am unclear of how the results work. I just want them to be set in G and not return anything. I do not want to have to worry about about .get() etc. but from my understanding the pool method returns a result object. Can I just ignore this?
Is there a better way? Am I completly lost?
First off - you will want to create a multiprocessing pool class. You setup how many workers you want and then use map to start up tasks. I am sure you already know but here is the python multiprocessing docs.
You say that you don't want to return data because you don't need to but how are you planning on viewing results? Will each task write the data to disk? To pass data between your processes you will want to use something like the multiprocessing queue.
Here is example code from the link on how to use process and queue:
from multiprocessing import Process, Queue
def f(q):
q.put([42, None, 'hello'])
if __name__ == '__main__':
q = Queue()
p = Process(target=f, args=(q,))
p.start()
print q.get() # prints "[42, None, 'hello']"
p.join()
And this is an example of using the Pool:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
Edit: #goncalopp makes a very important point that you may not want to do heavy numerical calculations in python due to how slow it is. Numpy is a great package for doing number crunching.
If you are heavily IO bound due to writing to disk on each process you should consider running something like 4*num_processors so that you always have something to do. You also should make sure you have a very fast disk :)

Categories

Resources