Parallelize a simple loop in Python and get results with concurrent.futures

Parallelize a simple loop in Python and get results with concurrent.futures - python

Let suppose I have complicated function I cant to run on a list :
import concurrent.futures
import random
import numpy as np
inputData = random.sample(range(60000, 1000000), 50)
def complicated_function(x):
"""complicated stuff"""
return x**x
I can process directly in a loop this way (that are how I want the results at the end of my code, with the same order if possible) :
#without parallelization
processedData = [complicated_function(x) for x in inputData]
for parallel processing I look at tutorials and it I made this code :
#with parallelization
processedData2 = []
with concurrent.futures.ThreadPoolExecutor() as e:
fut = [e.submit(complicated_function, inputData[i]) for i in range(len(inputData))]
for r in concurrent.futures.as_completed(fut):
processedData2.append(r.result())
The problem is that watching at my system monitor there is just one core working when this is running.. so obviously this is not working as I need...
Thank a lot in advance for your help !

It is because you are threading, and pythons global interpreter lock doesnt actually run the tasks in parallel across multiple cores, when you use threading.
Instead u can use multiprocessing..Just like ThreadPoolExecutor, there is a ProcessPoolExecutor which will make sure multiple cores are utilised.

Related

How do I run two looping functions parallel to each other? [duplicate]

Suppose I have the following in Python
# A loop
for i in range(10000):
Do Task A
# B loop
for i in range(10000):
Do Task B
How do I run these loops simultaneously in Python?

If you want concurrency, here's a very simple example:
from multiprocessing import Process
def loop_a():
while 1:
print("a")
def loop_b():
while 1:
print("b")
if __name__ == '__main__':
Process(target=loop_a).start()
Process(target=loop_b).start()
This is just the most basic example I could think of. Be sure to read http://docs.python.org/library/multiprocessing.html to understand what's happening.
If you want to send data back to the program, I'd recommend using a Queue (which in my experience is easiest to use).
You can use a thread instead if you don't mind the global interpreter lock. Processes are more expensive to instantiate but they offer true concurrency.

There are many possible options for what you wanted:
use loop
As many people have pointed out, this is the simplest way.
for i in xrange(10000):
# use xrange instead of range
taskA()
taskB()
Merits: easy to understand and use, no extra library needed.
Drawbacks: taskB must be done after taskA, or otherwise. They can't be running simultaneously.
multiprocess
Another thought would be: run two processes at the same time, python provides multiprocess library, the following is a simple example:
from multiprocessing import Process
p1 = Process(target=taskA, args=(*args, **kwargs))
p2 = Process(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
merits: task can be run simultaneously in the background, you can control tasks(end, stop them etc), tasks can exchange data, can be synchronized if they compete the same resources etc.
drawbacks: too heavy!OS will frequently switch between them, they have their own data space even if data is redundant. If you have a lot tasks (say 100 or more), it's not what you want.
threading
threading is like process, just lightweight. check out this post. Their usage is quite similar:
import threading
p1 = threading.Thread(target=taskA, args=(*args, **kwargs))
p2 = threading.Thread(target=taskB, args=(*args, **kwargs))
p1.start()
p2.start()
coroutines
libraries like greenlet and gevent provides something called coroutines, which is supposed to be faster than threading. No examples provided, please google how to use them if you're interested.
merits: more flexible and lightweight
drawbacks: extra library needed, learning curve.

Why do you want to run the two processes at the same time? Is it because you think they will go faster (there is a good chance that they wont). Why not run the tasks in the same loop, e.g.
for i in range(10000):
doTaskA()
doTaskB()
The obvious answer to your question is to use threads - see the python threading module. However threading is a big subject and has many pitfalls, so read up on it before you go down that route.
Alternatively you could run the tasks in separate proccesses, using the python multiprocessing module. If both tasks are CPU intensive this will make better use of multiple cores on your computer.
There are other options such as coroutines, stackless tasklets, greenlets, CSP etc, but Without knowing more about Task A and Task B and why they need to be run at the same time it is impossible to give a more specific answer.

from threading import Thread
def loopA():
for i in range(10000):
#Do task A
def loopB():
for i in range(10000):
#Do task B
threadA = Thread(target = loopA)
threadB = Thread(target = loobB)
threadA.run()
threadB.run()
# Do work indepedent of loopA and loopB
threadA.join()
threadB.join()

You could use threading or multiprocessing.

How about: A loop for i in range(10000): Do Task A, Do Task B ? Without more information i dont have a better answer.

I find that using the "pool" submodule within "multiprocessing" works amazingly for executing multiple processes at once within a Python Script.
See Section: Using a pool of workers
Look carefully at "# launching multiple evaluations asynchronously may use more processes" in the example. Once you understand what those lines are doing, the following example I constructed will make a lot of sense.
import numpy as np
from multiprocessing import Pool
def desired_function(option, processes, data, etc...):
# your code will go here. option allows you to make choices within your script
# to execute desired sections of code for each pool or subprocess.
return result_array # "for example"
result_array = np.zeros("some shape") # This is normally populated by 1 loop, lets try 4.
processes = 4
pool = Pool(processes=processes)
args = (processes, data, etc...) # Arguments to be passed into desired function.
multiple_results = []
for i in range(processes): # Executes each pool w/ option (1-4 in this case).
multiple_results.append(pool.apply_async(param_process, (i+1,)+args)) # Syncs each.
results = np.array(res.get() for res in multiple_results) # Retrieves results after
# every pool is finished!
for i in range(processes):
result_array = result_array + results[i] # Combines all datasets!
The code will basically run the desired function for a set number of processes. You will have to carefully make sure your function can distinguish between each process (hence why I added the variable "option".) Additionally, it doesn't have to be an array that is being populated in the end, but for my example, that's how I used it. Hope this simplifies or helps you better understand the power of multiprocessing in Python!

Python multiple processes consuming/iterating over single generator (divide and conquer)

I have a python generator that returns lots of items, for example:
import itertools
def generate_random_strings():
chars = "ABCDEFGH"
for item in itertools.product(chars, repeat=10):
yield "".join(item)
I then iterate over this and perform various tasks, the issue is that I'm only using one thread/process for this:
my_strings = generate_random_strings()
for string in my_strings:
# do something with string...
print(string)
This works great, I'm getting all my strings, but it's slow. I would like to harness the power of Python multiprocessing to "divide and conquer" this for loop. However, of course, I want each string to be processed only once. While I've found much documentation on multiprocessing, I'm trying to find the most simple solution for this with the least amount of code.
I'm assuming each thread should take a big chunk of items every time and process them before coming back and getting another big chunk etc...
Many thanks,

Most simple solution with least code? multiprocessing context manager.
I assume that you can put "do something with string" into a function called "do_something"
from multiprocessing import Pool as ProcessPool
number_of_processes = 4
with ProcessPool(number_of_processes) as pool:
pool.map(do_something, my_strings)
If you want to get the results of "do_something" back again, easy!
with ProcessPool(number_of_processes) as pool:
results = pool.map(do_something, my_strings)
You'll get them in a list.
Multiprocessing.dummy is a syntactic wrapper for process pools that lets you use the multiprocessing syntax. If you want threads instead of processes, just do this:
from multiprocessing.dummy import Pool as ThreadPool

You may use multiprocessing.
import multiprocessing
def string_fun(string):
# do something with string...
print(string)
my_strings = generate_random_strings()
num_of_threads = 7
pool = multiprocessing.Pool(num_of_threads)
pool.map(string_fun, my_strings)

Assuming you're using the lastest version of Python, you may want to read something about asyncio module. Multithreading is not easy to implement due to GIL lock: "In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe."
So you can swap on Multiprocessing, or, as reported above, take a look at asycio module.
asyncio — Asynchronous I/O > https://docs.python.org/3/library/asyncio.html
I'll integrate this answer with some code as soon as possible.
Hope it helps,
Hele

As #Hele mentioned, asyncio is best of all, here is an example
Code
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# python 3.7.2
from asyncio import ensure_future, gather, run
import random
alphabet = 'ABCDEFGH'
size = 1000
async def generate():
tasks = list()
result = None
for el in range(1, size):
task = ensure_future(generate_one())
tasks.append(task)
result = await gather(*tasks)
return list(set(result))
async def generate_one():
return ''.join(random.choice(alphabet) for i in range(8))
if __name__ == '__main__':
my_strings = run(generate())
print(my_strings)
Output
['CHABCGDD', 'ACBGAFEB', ...
Of course, you need to improve generate_one, this variant is very slow.
You can see source code here.

low efficiency python parallel/multiprocessing with image convolution

In each task, I have ~500 images to convolve as the first step, and it seems that filters under ndimage.filters are only using 1 core. I have tried multiprocessing.pool and multiprocessing.process with multiprocessing.queue. Both worked but ran much slower than using single process. The reason was very possibly pickle and overheads: if I generated fake data within each worker rather than passing real data to each worker, multiprocessing indeed boosted the performance by a lot.
I am running spyder on a windows machine and I will pass the code to someone else on a different machine, so recompiling python and any low level tweak are not applicable.
In matlab, convolution makes use of multicore transparently, and there is parfor, which handles overheads decently. Any idea or suggestion to realize multiprocessing convolution in python? Many thanks in advance!

It seems that multiprocessing is a poor choice if the task is numpy/scipy heavy. I should have used multithreading instead of multiprocessing, because most numpy/scipy functions are not affected by GIL, multithreading outperforms multiprocessing due to light overhead.
And most importantly, multithreading is faster than single thread.
import Queue
import threading
import numpy as np
from scipy import ndimage
import time
repeats = 24
def smooth_img(q,im):
im_stack_tmp = ndimage.filters.gaussian_laplace(im, 2.)
q.put(im_stack_tmp)
im_all = [None]*repeats
im_all_filtered = [None]*repeats
for j in range(repeats):
im_all[j] = np.random.randn(2048,2048)
start = time.time()
for j in range(repeats):
im_all_filtered[j] = ndimage.filters.gaussian_laplace(im_all[j], 2.)
print('single thread: '+str(time.time()-start))
start = time.time()
q = Queue.Queue()
for im in im_all:
t = threading.Thread(target=smooth_img, args = (q,im))
t.daemon = True
t.start()
for j in range(repeats):
im_all_filtered[j] = q.get()
print('multi thread: '+str(time.time()-start))

Threadpool in python is not as fast as expected

I'm beginner to python and machine learning. I'm trying to reproduce the code for countvectorizer() using multi-threading. I'm working with yelp dataset to do sentiment analysis using LogisticRegression. This is what I've written so far:
Code snippet:
from multiprocessing.dummy import Pool as ThreadPool
from threading import Thread, current_thread
from functools import partial
data = df['text']
rev = df['stars']
y = []
def product_helper(args):
return featureExtraction(*args)
def featureExtraction(p,t):
temp = [0] * len(bag_of_words)
for word in p.split():
if word in bag_of_words:
temp[bag_of_words.index(word)] += 1
return temp
# function to be mapped over
def calculateParallel(threads):
pool = ThreadPool(threads)
job_args = [(item_a, rev[i]) for i, item_a in enumerate(data)]
l = pool.map(product_helper,job_args)
pool.close()
pool.join()
return l
temp_X = calculateParallel(12)
Here this is just part of code.
Explanation:
df['text'] has all the reviews and df['stars'] has the ratings (1 through 5). I'm trying to find the word count vector temp_X using multi-threading. bag_of_words is a list of some frequent words of choice.
Question:
Without multi-threading , I was able to compute the temp_X in around 24 minutes and the above code took 33 mins for a dataset of size 100k reviews. My machine has 128GB of DRAM and 12 cores (6 physical cores with hyperthreading i.e., threads per core=2).
What am I doing wrong here?

Your whole code seems CPU Bound rather than IO Bound.You are just using threads which are under GIL so effectively running just one thread plus overheads.It runs only on one core.To run on multiple cores use
Use
import multiprocessing
pool = multiprocessing.Pool()
l = pool.map_async(product_helper,job_args)
from multiprocessing.dummy import Pool as ThreadPool is just a wrapper over thread module.It utilises just one core and not more than that.

Python and threads dont really work together very well. There is a known issue called the GIL (global interperter lock). Basically there is a lock in the interperter that makes all threads not run in parallel (even if you have multiple cpu cores). Python will simply give each thread a few milliseconds of cpu time one after another (and the reason it became slower is the overhead from context switching between those threads).
Here is a really good document explaining how it works: http://www.dabeaz.com/python/UnderstandingGIL.pdf
To fix your problem i suggest you try multi processing:
https://pymotw.com/2/multiprocessing/basics.html
Note: multiprocessing is not 100% equivilent to multithreading. Multiprocessing will run at parallel but the diffrent processes wont share memory so if you change a variable in one of them it will not be changed in the other process.

Python Multiprocessing using Pool goes recursively haywire

I'm trying to make an expensive part of my pandas calculations parallel to speed up things.
I've already managed to make Multiprocessing.Pool work with a simple example:
import multiprocessing as mpr
import numpy as np
def Test(l):
for i in range(len(l)):
l[i] = i**2
return l
t = list(np.arange(100))
L = [t,t,t,t]
if __name__ == "__main__":
pool = mpr.Pool(processes=4)
E = pool.map(Test,L)
pool.close()
pool.join()
No problems here. Now my own algorithm is a bit more complicated, I can't post it here in its full glory and terribleness, so I'll use some pseudo-code to outline the things I'm doing there:
import pandas as pd
import time
import datetime as dt
import multiprocessing as mpr
import MPFunctions as mpf --> self-written worker functions that get called for the multiprocessing
import ClassGetDataFrames as gd --> self-written class that reads in all the data and puts it into dataframes
=== Settings
=== Use ClassGetDataFrames to get data
=== Lots of single-thread calculations and manipulations on the dataframe
=== Cut dataframe into 4 evenly big chunks, make list of them called DDC
if __name__ == "__main__":
pool = mpr.Pool(processes=4)
LLT = pool.map(mpf.processChunks,DDC)
pool.close()
pool.join()
=== Join processed Chunks LLT back into one dataframe
=== More calculations and manipulations
=== Data Output
When I'm running this script the following happens:
It reads in the data.
It does all calculations and manipulations until the Pool statement.
Suddenly it reads in the data again, fourfold.
Then it goes into the main script fourfold at the same time.
The whole thing cascades recursively and goes haywire.
I have read before that this can happen if you're not careful, but I do not know why it does happen here. My multiprocessing code is protected by the needed name-main-statement (I'm on Win7 64), it is only 4 lines long, it has close and join statements, it calls one defined worker function which then calls a second worker function in a loop, that's it. By all I know it should just create the pool with four processes, call the four processes from the imported script, close the pool and wait until everything is done, then just continue with the script. On a sidenote, I first had the worker functions in the same script, the behaviour was the same. Instead of just doing what's in the pool it seems to restart the whole script fourfold.
Can anyone enlighten me what might cause this behaviour? I seem to be missing some crucial understanding about Python's multiprocessing behaviour.
Also I don't know if it's important, I'm on a virtual machine that sits on my company's mainframe.
Do I have to use individual processes instead of a pool?

I managed to make it work by enceasing the entire script into the if __name__ == "__main__":-statement, not just the multiprocessing part.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parallelize a simple loop in Python and get results with concurrent.futures - python

Related

How do I run two looping functions parallel to each other? [duplicate]

Python multiple processes consuming/iterating over single generator (divide and conquer)

low efficiency python parallel/multiprocessing with image convolution

Threadpool in python is not as fast as expected

Python Multiprocessing using Pool goes recursively haywire

Categories

Resources