I have two pieces of code, representative of a more complex scenario I am trying to debug. I am wondering if they are technically equivalent, and if not, why.
First one:
import time
from concurrent.futures import ThreadPoolExecutor
def cb(res):
print("done", res)
def foo():
time.sleep(3)
res = 5
cb(res)
return res
with ThreadPoolExecutor(max_workers=2) as executor:
future = executor.submit(foo)
print(future.result())
Second one:
def cb2(fut):
print("done", fut.result())
def foo2():
time.sleep(3)
return 5
with ThreadPoolExecutor(max_workers=2) as executor:
future = executor.submit(foo2)
future.add_done_callback(cb2)
print(future.result())
The core of the issue is the following: I need to call a sync, slow operation (here, represented by the sleep). When that operation completes, I have to perform subsequent fast operations. In the first code, I put these operations immediately after the sync slow one. In the second code, I put it in the callback.
In terms of implementation, I suspect the future creates a secondary thread, runs the code in the secondary thread, and this secondary thread will stop at the sync slow operation. Once this operation is completed, the secondary thread will keep going, and it can keep going either by executing the subsequent code or by calling the callbacks. I see no difference in these two pieces of code (apart from the fact that adding the callback allows injecting code from outside, an added flexibility), but I might be wrong, hence the question.
Note that I do understand that in the first case, the print is called when the future is still not resolved and in the second one it is, but it is assumed that the status is not relevant.
These two examples are not equal in terms of events ordering.
Let’s look through the lifecycle of a Future. It is roughly like that (reverse engineered from cpython’s source):
a Future is created
it is added to executor’s queue
it is popped from the queue by some free/idle thread from the threadpool
the function provided to submit() is called in that thread
the future is marked as FINISHED
the future broadcasts the ‘state changed’ event to all its waiters
callbacks are invoked (still in the same worker thread)
the worker thread becomes free/idle and may take another future from the queue
When you execute the statement print(future.result()), your main thread blocks and becomes the future’s waiter. It becomes unblocked right after the future switches to FINISHED, but right before callbacks start to execute. That means that you cannot predict what print goes first in the console - print in any of your callbacks, or print(future(result)) - they now are executing in parallel. If you deal with same data in your callbacks and in the main thread after waiting for future.result() to complete, you are likely to get data corruption.
https://gist.github.com/mangecoeur/9540178
https://docs.python.org/3.4/library/concurrent.futures.html
with concurrent.futures.ProcessPoolExecutor() as executor:
result = executor.map(function, iterable)
executor.map(fun, [data] * 10)
pool = multiprocessing.Pool()
pool.map(…)
with concurrent.futures.ThreadPoolExecutor() as executor:
result = executor.map(function, iterable)
Related
My script loops through each line of an input file and performs some actions using the string in each line. Since the tasks performed on each line are independent of each other, I decided to separate the task into threads so that the script doesn't have to wait for the task to complete to continue with the loop. The code is given below.
def myFunction(line, param):
# Doing something with line and param
# Sends multiple HTTP requests and parse the response and produce outputs
# Returns nothing
param = arg[1]
with open(targets, "r") as listfile:
for line in listfile:
print("Starting a thread for: ",line)
t=threading.Thread(target=myFunction, args=(line, param,))
threads.append(t)
t.start()
I realized that this is a bad idea as the number of lines in the input file grew large. With this code, there would be as many threads as the number of lines. Researched a bit and figured that queues would be the way.
I want to understand the optimal way of using queues for this scenario and if there are any alternatives which I can use.
To go around this problem, you can use the concept of Thread Pools, where you define a fixed number of Threads/workers to be used, for example 5 workers, and whenever a Thread finishes executing, an other Future(ly) submmited thread would take its place automatically.
Example :
import concurrent.futures
def myFunction(line, param):
print("Done with :", line, param)
param = "param_example"
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
futures = []
with open("targets", "r") as listfile:
for line in listfile:
print("Starting a thread for: ", line)
futures.append(executor.submit(myFunction, line=line, param=param))
# waiting for the threads to finish and maybe print a result :
for future in concurrent.futures.as_completed(futures):
print(future.result()) # an Exceptino should be handled here!!!
Queues are one way to do it. The way to use them is to put function parameters on a queue, and use threads to get them and do the processing.
The queue size doesn't matter too much in this case because reading the next line is fast. In another case, a more optimized solution would be to set the queue size to at least twice the number of threads. That way if all threads finish processing an item from the queue at the same time, they will all have the next item in the queue ready to be processed.
To avoid complicating the code threads can be set as daemonic so that they don't stop the program from finishing after the processing is done. They will be terminated when the main process finishes.
The alternative is to put a special item on the queue (like None) for each thread and make the threads exit after getting it from the queue and then join the threads.
For the examples bellow the number of worker threads is set using the workers variable.
Here is an example of a solution using a queue.
from queue import Queue
from threading import Thread
queue = Queue(workers * 2)
def work():
while True:
myFunction(*queue.get())
queue.task_done()
for _ in range(workers):
Thread(target=work, daemon=True).start()
with open(targets, 'r') as listfile:
for line in listfile:
queue.put((line, param))
queue.join()
A simpler solution might be using ThreadPoolExecutor. It is especially simple in this case because the function being called doesn't return anything that needs to be used in the main thread.
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(max_workers=workers) as executor:
with open(targets, 'r') as listfile:
for line in listfile:
executor.submit(myFunction, line, param)
Also, if it's not a problem to have all lines stored in memory, there is a solution which doesn't use anything other than threads. The work is split in such a way that the threads read some lines from a list and ignore other lines. A simple example with two threads is where one thread reads odd lines and the other reads even lines.
from threading import Thread
with open(targets, 'r') as listfile:
lines = listfile.readlines()
def work_split(n):
for line in lines[n::workers]:
myFunction(line, param)
threads = []
for n in range(workers):
t = Thread(target=work_split, args=(n,))
t.start()
threads.append(t)
for t in threads:
t.join()
I have done a quick benchmark and the Queue is slightly faster than the ThreadPoolExecutor, but the solution with the split work is faster than both.
From the code you have reported, has no sense the use of thread.
This because there aren't any I/O operations, and so the threads are executed in a linear way without multithread. The GIL (Global Interpreter Lock) is never released by a thread in this case, so the application is only apparently using multithreading, in reality the interpreter is using only one CPU for the program and one thread at time.
In this way you don't have any advantages on use of thread, on the contrary you can have a performance degradation for this scenario, due to the switch context, and to the thread initialization overhead when a thread starts.
The only way to have better performance in this scenario, if applicable in this case, is a multiprocess program. But pay attention on the number of process that you start, remember that every process has its own interpreter.
It was a good answer by GitFront. This answer just adds one more option using the multiprocessing package.
Using concurrent.futures or multiprocessing depends on particular requirements. Multiprocessing has a lot more options comparatively but for the given question the results should be near identical in the simplest case.
from multiprocessing import cpu_count, Pool
PROCESSES = cpu_count() # Warning: uses all cores
def pool_method(listfile, param):
p = Pool(processes=PROCESSES)
checker = [p.apply_async(myFunction, (line, param)) for line in listfile]
...
There are various other methods too other than "apply_async", but this should work well for your needs.
Say I have a very large list and I'm performing an operation like so:
for item in items:
try:
api.my_operation(item)
except:
print 'error with item'
My issue is two fold:
There are a lot of items
api.my_operation takes forever to return
I'd like to use multi-threading to spin up a bunch of api.my_operations at once so I can process maybe 5 or 10 or even 100 items at once.
If my_operation() returns an exception (because maybe I already processed that item) - that's OK. It won't break anything. The loop can continue to the next item.
Note: this is for Python 2.7.3
First, in Python, if your code is CPU-bound, multithreading won't help, because only one thread can hold the Global Interpreter Lock, and therefore run Python code, at a time. So, you need to use processes, not threads.
This is not true if your operation "takes forever to return" because it's IO-bound—that is, waiting on the network or disk copies or the like. I'll come back to that later.
Next, the way to process 5 or 10 or 100 items at once is to create a pool of 5 or 10 or 100 workers, and put the items into a queue that the workers service. Fortunately, the stdlib multiprocessing and concurrent.futures libraries both wraps up most of the details for you.
The former is more powerful and flexible for traditional programming; the latter is simpler if you need to compose future-waiting; for trivial cases, it really doesn't matter which you choose. (In this case, the most obvious implementation with each takes 3 lines with futures, 4 lines with multiprocessing.)
If you're using 2.6-2.7 or 3.0-3.1, futures isn't built in, but you can install it from PyPI (pip install futures).
Finally, it's usually a lot simpler to parallelize things if you can turn the entire loop iteration into a function call (something you could, e.g., pass to map), so let's do that first:
def try_my_operation(item):
try:
api.my_operation(item)
except:
print('error with item')
Putting it all together:
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_my_operation, item) for item in items]
concurrent.futures.wait(futures)
If you have lots of relatively small jobs, the overhead of multiprocessing might swamp the gains. The way to solve that is to batch up the work into larger jobs. For example (using grouper from the itertools recipes, which you can copy and paste into your code, or get from the more-itertools project on PyPI):
def try_multiple_operations(items):
for item in items:
try:
api.my_operation(item)
except:
print('error with item')
executor = concurrent.futures.ProcessPoolExecutor(10)
futures = [executor.submit(try_multiple_operations, group)
for group in grouper(5, items)]
concurrent.futures.wait(futures)
Finally, what if your code is IO bound? Then threads are just as good as processes, and with less overhead (and fewer limitations, but those limitations usually won't affect you in cases like this). Sometimes that "less overhead" is enough to mean you don't need batching with threads, but you do with processes, which is a nice win.
So, how do you use threads instead of processes? Just change ProcessPoolExecutor to ThreadPoolExecutor.
If you're not sure whether your code is CPU-bound or IO-bound, just try it both ways.
Can I do this for multiple functions in my python script? For example, if I had another for loop elsewhere in the code that I wanted to parallelize. Is it possible to do two multi threaded functions in the same script?
Yes. In fact, there are two different ways to do it.
First, you can share the same (thread or process) executor and use it from multiple places with no problem. The whole point of tasks and futures is that they're self-contained; you don't care where they run, just that you queue them up and eventually get the answer back.
Alternatively, you can have two executors in the same program with no problem. This has a performance cost—if you're using both executors at the same time, you'll end up trying to run (for example) 16 busy threads on 8 cores, which means there's going to be some context switching. But sometimes it's worth doing because, say, the two executors are rarely busy at the same time, and it makes your code a lot simpler. Or maybe one executor is running very large tasks that can take a while to complete, and the other is running very small tasks that need to complete as quickly as possible, because responsiveness is more important than throughput for part of your program.
If you don't know which is appropriate for your program, usually it's the first.
There's multiprocesing.pool, and the following sample illustrates how to use one of them:
from multiprocessing.pool import ThreadPool as Pool
# from multiprocessing import Pool
pool_size = 5 # your "parallelness"
# define worker function before a Pool is instantiated
def worker(item):
try:
api.my_operation(item)
except:
print('error with item')
pool = Pool(pool_size)
for item in items:
pool.apply_async(worker, (item,))
pool.close()
pool.join()
Now if you indeed identify that your process is CPU bound as #abarnert mentioned, change ThreadPool to the process pool implementation (commented under ThreadPool import). You can find more details here: http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
You can split the processing into a specified number of threads using an approach like this:
import threading
def process(items, start, end):
for item in items[start:end]:
try:
api.my_operation(item)
except Exception:
print('error with item')
def split_processing(items, num_splits=4):
split_size = len(items) // num_splits
threads = []
for i in range(num_splits):
# determine the indices of the list this thread will handle
start = i * split_size
# special case on the last chunk to account for uneven splits
end = None if i+1 == num_splits else (i+1) * split_size
# create the thread
threads.append(
threading.Thread(target=process, args=(items, start, end)))
threads[-1].start() # start the thread we just created
# wait for all threads to finish
for t in threads:
t.join()
split_processing(items)
import numpy as np
import threading
def threaded_process(items_chunk):
""" Your main process which runs in thread for each chunk"""
for item in items_chunk:
try:
api.my_operation(item)
except Exception:
print('error with item')
n_threads = 20
# Splitting the items into chunks equal to number of threads
array_chunk = np.array_split(input_image_list, n_threads)
thread_list = []
for thr in range(n_threads):
thread = threading.Thread(target=threaded_process, args=(array_chunk[thr]),)
thread_list.append(thread)
thread_list[thr].start()
for thread in thread_list:
thread.join()
I am trying to use the ThreadPoolExecutor() in a method of a class to create a pool of threads that will execute another method within the same class. I have the with concurrent.futures.ThreadPoolExecutor()... however it does not wait, and an error is thrown saying there was no key in the dictionary I query after the "with..." statement. I understand why the error is thrown because the dictionary has not been updated yet because the threads in the pool did not finish executing. I know the threads did not finish executing because I have a print("done") in the method that is called within the ThreadPoolExecutor, and "done" is not printed to the console.
I am new to threads, so if any suggestions on how to do this better are appreciated!
def tokenizer(self):
all_tokens = []
self.token_q = Queue()
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
for num in range(5):
executor.submit(self.get_tokens, num)
executor.shutdown(wait=True)
print("Hi")
results = {}
while not self.token_q.empty():
temp_result = self.token_q.get()
results[temp_result[1]] = temp_result[0]
print(temp_result[1])
for index in range(len(self.zettels)):
for zettel in results[index]:
all_tokens.append(zettel)
return all_tokens
def get_tokens(self, thread_index):
print("!!!!!!!")
switch = {
0: self.zettels[:(len(self.zettels)/5)],
1: self.zettels[(len(self.zettels)/5): (len(self.zettels)/5)*2],
2: self.zettels[(len(self.zettels)/5)*2: (len(self.zettels)/5)*3],
3: self.zettels[(len(self.zettels)/5)*3: (len(self.zettels)/5)*4],
4: self.zettels[(len(self.zettels)/5)*4: (len(self.zettels)/5)*5],
}
new_tokens = []
for zettel in switch.get(thread_index):
tokens = re.split('\W+', str(zettel))
tokens = list(filter(None, tokens))
new_tokens.append(tokens)
print("done")
self.token_q.put([new_tokens, thread_index])
'''
Expected to see all print("!!!!!!") and print("done") statements before the print ("Hi") statement.
Actually shows the !!!!!!! then the Hi, then the KeyError for the results dictionary.
As you have already found out, the pool is waiting; print('done') is never executed because presumably a TypeError raises earlier.
The pool does not directly wait for the tasks to finish, it waits for its worker threads to join, which implicitly requires the execution of the tasks to complete, one way (success) or the other (exception).
The reason you do not see that exception raising is because the task is wrapped in a Future. A Future
[...] encapsulates the asynchronous execution of a callable.
Future instances are returned by the executor's submit method and they allow to query the state of the execution and access whatever its outcome is.
That brings me to some remarks I wanted to make.
The Queue in self.token_q seems unnecessary
Judging by the code you shared, you only use this queue to pass the results of your tasks back to the tokenizer function. That's not needed, you can access that from the Future that the call to submit returns:
def tokenizer(self):
all_tokens = []
with ThreadPoolExecutor(max_workers=5) as executor:
futures = [executor.submit(get_tokens, num) for num in range(5)]
# executor.shutdown(wait=True) here is redundant, it is called when exiting the context:
# https://github.com/python/cpython/blob/3.7/Lib/concurrent/futures/_base.py#L623
print("Hi")
results = {}
for fut in futures:
try:
res = fut.result()
results[res[1]] = res[0]
except Exception:
continue
[...]
def get_tokens(self, thread_index):
[...]
# instead of self.token_q.put([new_tokens, thread_index])
return new_tokens, thread_index
It is likely that your program does not benefit from using threads
From the code you shared, it seems like the operations in get_tokens are CPU bound, rather than I/O bound. If you are running your program in CPython (or any other interpreter using a Global Interpreter Lock), there will be no benefit from using threads in that case.
In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once.
That means for any Python process, only one thread can execute at any given time. This is not so much of an issue if your task at hand is I/O bound, i.e. frequently pauses to wait for I/O (e.g. for data on a socket). If your tasks need to constantly execute bytecode in a processor, there's no benefit for pausing one thread to let another execute some instructions. In fact, the resulting context switches might even prove detrimental.
You might want to go for parallelism instead of concurrency. Take a look at ProcessPoolExecutor for this.However, I recommend to benchmark your code running sequentially, concurrently and in parallel. Creating processes or threads comes at a cost and, depending on the task to complete, doing so might take longer than just executing one task after the other in a sequential manner.
As an aside, this looks a bit suspicious:
for index in range(len(self.zettels)):
for zettel in results[index]:
all_tokens.append(zettel)
results seems to always have five items, because for num in range(5). If the length of self.zettels is greater than five, I'd expect a KeyError to raise here.If self.zettels is guaranteed to have a length of five, then I'd see potential for some code optimization here.
You need to loop over concurrent.futures.as_completed() as shown here. It will yield values as each thread completes.
I am fairly new to parallel processing with "concurrent.futures" and I am testing some simple experiments. The code I have written seems to work, but I am not sure how to store the results. I have tried to create a list ("futures") and append the results to that, but that considerably slow down the procedure. I am wondering if there is a better way to do that. Thank you.
import concurrent.futures
import time
couple_ods= []
futures=[]
dtab={}
for i in range(100):
for j in range(100):
dtab[i,j]=i+j/2
couple_ods.append((i,j))
avg_speed=100
def task(i):
origin=i[0]
destination=i[1]
time.sleep(0.01)
distance=dtab[origin,destination]/avg_speed
return distance
start1=time.time()
def main():
with concurrent.futures.ThreadPoolExecutor() as executor:
for number in couple_ods:
future=executor.submit(task,number)
futures.append(future.result())
if __name__ == '__main__':
main()
end1=time.time()
When you call future.result(), that blocks until the value is ready. So, you’re not getting any benefits out of parallelism here—you start one task, wait for it to finish, start another, wait for it to finish, and so on.
Of course your example won’t benefit from threading in the first place. Your tasks are doing nothing but CPU-bound Python computation, which means that (at least in CPython, MicroPython, and PyPy, which are the only complete implementations that come with concurrent.futures), the GIL (Global Interpreter Lock) will prevent more than one of your threads from progressing at a time.
Hopefully your real program is different. If it’s doing I/O-bound stuff (making network requests, reading files, etc.), or using an extension library like NumPy that releases the GIL around heavy CPU work, then it will work fine. But otherwise, you’ll want to use ProcessPoolExecutor here.
Anyway, what you want to do is append future itself to a list, so you get a list of all of the futures before waiting for any of them:
for number in couple_ods:
future=executor.submit(task,number)
futures.append(future)
And then, after you’ve started all of the jobs, you can start waiting for them. There are three simple options, and one complicated one when you need more control.
(1) You can just directly loop over them to wait for them in the order they were submitted:
for future in futures:
result = future.result()
dostuff(result)
(2) If you need to wait for them all to be finished before doing any work, you can just call wait:
futures, _ = concurrent.futures.wait(futures)
for future in futures:
result = future.result()
dostuff(result)
(3) If you want to handle each one as soon as it’s ready, even if they come out of order, use as_completed:
for future in concurrent.futures.as_completed(futures):
dostuff(future.result())
Notice that the examples that use this function in the docs provide some way to identify which task is finished. If you need that, it can be as simple as passing each one an index, then return index, real_result, and then you can for index, result in … for the loop.
(4) If you need more control, you can loop over waiting on whatever’s done so far:
while futures:
done, futures = concurrent.futures.wait(concurrent.futures.FIRST_COMPLETED)
for future in done:
result = future.result()
dostuff(result)
That example does the same thing as as_completed, but you can write minor variations on it to do different things, like waiting for everything to be done but canceling early if anything raises an exception.
For many simple cases, you can just use the map method of the executor to simplify the first option. This works just like the builtin map function, calling a function once for each value in the argument and then giving you something you can loop over to get the results in the same order, but it does it in parallel. So:
for result in executor.map(task, couple_ods):
dostuff(result)
I have two different functions f, and g that compute the same result with different algorithms. Sometimes one or the other takes a long time while the other terminates quickly. I want to create a new function that runs each simultaneously and then returns the result from the first that finishes.
I want to create that function with a higher order function
h = firstresult(f, g)
What is the best way to accomplish this in Python?
I suspect that the solution involves threading. I'd like to avoid discussion of the GIL.
I would simply use a Queue for this. Start the threads and the first one which has a result ready writes to the queue.
Code
from threading import Thread
from time import sleep
from Queue import Queue
def firstresult(*functions):
queue = Queue()
threads = []
for f in functions:
def thread_main():
queue.put(f())
thread = Thread(target=thread_main)
threads.append(thread)
thread.start()
result = queue.get()
return result
def slow():
sleep(1)
return 42
def fast():
return 0
if __name__ == '__main__':
print firstresult(slow, fast)
Live demo
http://ideone.com/jzzZX2
Notes
Stopping the threads is an entirely different topic. For this you need to add some state variable to the threads which needs to be checked in regular intervals. As I want to keep this example short I simply assumed that part and assumed that all workers get the time to finish their work even though the result is never read.
Skipping the discussion about the Gil as requested by the questioner. ;-)
Now - unlike my suggestion on the other answer, this piece of code does exactly what you are requesting:
from multiprocessing import Process, Queue
import random
import time
def firstresult(func1, func2):
queue = Queue()
proc1 = Process(target=func1,args=(queue,))
proc2 = Process(target=func2, args=(queue,))
proc1.start();proc2.start()
result = queue.get()
proc1.terminate(); proc2.terminate()
return result
def algo1(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 1")
def algo2(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 2")
print firstresult(algo1, algo2)
Run each function in a new worker thread, the 2 worker threads send the result back to the main thread in a 1 item queue or something similar. When the main thread receives the result from the winner, it kills (do python threads support kill yet? lol.) both worker threads to avoid wasting time (one function may take hours while the other only takes a second).
Replace the word thread with process if you want.
You will need to run each function in another process (with multiprocessing) or in a different thread.
If both are CPU bound, multithread won help much - exactly due to the GIL -
so multiprocessing is the way.
If the return value is a pickleable (serializable) object, I have this decorator I created that simply runs the function in background, in another process:
https://bitbucket.org/jsbueno/lelo/src
It is not exactly what you want - as both are non-blocking and start executing right away. The tirck with this decorator is that it blocks (and waits for the function to complete) as when you try to use the return value.
But on the other hand - it is just a decorator that does all the work.