Python threading return values - python

I am new to threading an I have existing application that I would like to make a little quicker using threading.
I have several functions that return to a main Dict and would like to send these to separate threads so that run at the same time rather than one at a time.
I have done a little googling but I cant seem to find something that fits my existing code and could use a little help.
I have around six functions that return to the main Dict like this:
parsed['cryptomaps'] = pipes.ConfigParse.crypto(parsed['split-config'], parsed['asax'], parsed['names'])
The issue here is with the return value. I understand that I would need to use a queue for this but would I need a queue for each of these six functions or one queue for all of these. If it is the later how would I separate the returns from the threads and assign the to the correct Dict entries.
Any help on this would be great.
John

You can push tuples of (worker, data) to queue to identify the source.
Also please note that due to Global Interpreter Lock Python threading is not very useful. I suggest to take a look at multiprocessing module which offers interface very similiar to multithreading but will actually scale with number of workers.
Edit:
Code sample.
import multiprocessing as mp
# py 3 compatibility
try:
from future_builtins import range, map
except ImportError:
pass
data = [
# input data
# {split_config: ... }
]
def crypto(split_config, asax, names):
# your code here
pass
if __name__ == "__main__":
terminate = mp.Event()
input = mp.Queue()
output = mp.Queue()
def worker(id, terminate, input, output):
# use event here to graciously exit
# using Process.terminate would leave queues
# in undefined state
while not terminate.is_set():
try:
x = input.get(True, timeout=1000)
output.put((id, crypto(**x)))
except Queue.Empty:
pass
workers = [mp.Process(target=worker, args=(i, )) for i in range(0, mp.cpu_count())]
for worker in workers:
worker.start()
for x in data:
input.put(x)
# terminate workers
terminate.set()
# process results
# make sure that queues are emptied otherwise Process.join can deadlock
for worker in workers:
worker.join()

Related

How to return values from Process- or Thread instances?

So I want to run a function which can either search for information on the web or directly from my own mysql database.
The first process will be time-consuming, the second relatively fast.
With this in mind I create a process which starts this compound search (find_compound_view). If the process finishes relatively fast it means it's present on the database so I can render the results immediately. Otherwise, I will render "drax_retrieving_data.html".
The stupid solution I came up with was to run the function twice, once to check if the process takes a long time, the other to actually get the return values of the function. This is pretty much because I don't know how to return the values of my find_compound_view function. I've tried googling but I can't seem to find how to return the values from the class Process specifically.
p = Process(target=find_compound_view, args=(form,))
p.start()
is_running = p.is_alive()
start_time=time.time()
while is_running:
time.sleep(0.05)
is_running = p.is_alive()
if time.time() - start_time > 10 :
print('Timer exceeded, DRAX is retrieving info!',time.time() - start_time)
return render(request,'drax_internal_dbs/drax_retrieving_data.html')
compound = find_compound_view(form,use_email=False)
if compound:
data=*****
return render(request, 'drax_internal_dbs/result.html',data)
You will need a multiprocessing.Pipe or a multiprocessing.Queue to send the results back to your parent-process. If you just do I/0, you should use a Thread instead of a Process, since it's more lightweight and most time will be spend on waiting. I'm showing you how it's done for Process and Threads in general.
Process with Queue
The multiprocessing queue is build on top of a pipe and access is synchronized with locks/semaphores. Queues are thread- and process-safe, meaning you can use one queue for multiple producer/consumer-processes and even multiple threads in these processes. Adding the first item on the queue will also start a feeder-thread in the calling process. The additional overhead of a multiprocessing.Queue makes using a pipe for single-producer/single-consumer scenarios preferable and more performant.
Here's how to send and retrieve a result with a multiprocessing.Queue:
from multiprocessing import Process, Queue
SENTINEL = 'SENTINEL'
def sim_busy(out_queue, x):
for _ in range(int(x)):
assert 1 == 1
result = x
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Process(target=sim_busy, args=(out_queue, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel breaks the loop
print(result)
The queue is passed as argument into the function, results are .put() on the queue and the parent get.()s from the queue. .get() is a blocking call, execution does not resume until something is to get (specifying timeout parameter is possible). Note the work sim_busy does here is cpu-intensive, that's when you would choose processes over threads.
Process & Pipe
For one-to-one connections a pipe is enough. The setup is nearly identical, just the methods are named differently and a call to Pipe() returns two connection objects. In duplex mode, both objects are read-write ends, with duplex=False (simplex) the first connection object is the read-end of the pipe, the second is the write-end. In this basic scenario we just need a simplex-pipe:
from multiprocessing import Process, Pipe
SENTINEL = 'SENTINEL'
def sim_busy(write_conn, x):
for _ in range(int(x)):
assert 1 == 1
result = x
write_conn.send(result)
# If all results are send, send a sentinel-value to let the parent know
# no more results will come.
write_conn.send(SENTINEL)
if __name__ == '__main__':
# duplex=False because we just need one-way communication in this case.
read_conn, write_conn = Pipe(duplex=False)
p = Process(target=sim_busy, args=(write_conn, 150e6)) # 150e6 == 150000000.0
p.start()
for result in iter(read_conn.recv, SENTINEL): # sentinel breaks the loop
print(result)
Thread & Queue
For use with threading, you want to switch to queue.Queue. queue.Queue is build on top of a collections.deque, adding some locks to make it thread-safe. Unlike with multiprocessing's queue and pipe, objects put on a queue.Queue won't get pickled. Since threads share the same memory address-space, serialization for memory-copying is unnecessary, only pointers are transmitted.
from threading import Thread
from queue import Queue
import time
SENTINEL = 'SENTINEL'
def sim_io(out_queue, query):
time.sleep(1)
result = query + '_result'
out_queue.put(result)
# If all results are enqueued, send a sentinel-value to let the parent know
# no more results will come.
out_queue.put(SENTINEL)
if __name__ == '__main__':
out_queue = Queue()
p = Thread(target=sim_io, args=(out_queue, 'my_query'))
p.start()
for result in iter(out_queue.get, SENTINEL): # sentinel-value breaks the loop
print(result)
Read here why for result in iter(out_queue.get, SENTINEL):
should be prefered over a while True...break setup, where possible.
Read here why you should use if __name__ == '__main__': in all your scripts and especially in multiprocessing.
More about get()-usage here.

Python multi-threaded processing with limited CPU/ports

I have a dictionary of folder names that I would like to process in parallel. Under each folder, there is an array of file names that I would like to process in series:
folder_file_dict = {
folder_name : {
file_names_key : [file_names_array]
}
}
Ultimately, I will be creating a folder named folder_name which contains the files with names len(folder_file_dict[folder_name][file_names_key]). I have a method like so:
def process_files_in_series(file_names_array, udp_port):
for file_name in file_names_array:
time_consuming_method(file_name, udp_port)
# create "file_name"
udp_ports = [123, 456, 789]
Note the time_consuming_method() above, which takes a long time due to calls over a UDP port. I am also limited to using the UDP ports in the array above. Thus, I have to wait for time_consuming_method to complete on a UDP port before I can use that UDP port again. This means that I can only have len(udp_ports) threads running at a time.
Thus, I will ultimately create len(folder_file_dict.keys()) threads, with len(folder_file_dict.keys()) calls to process_files_in_series. I also have a MAX_THREAD count. I am trying to use the Queue and Threading modules, but I am not sure what kind of design I need. How can I do this using Queues and Threads, and possibly Conditions as well? A solution that uses a thread pool may also be helpful.
NOTE
I am not trying to increase the read/write speed. I am trying to parallelize the calls to time_consuming_method under process_files_in_series. Creating these files is just part of the process, but not the rate limiting step.
Also, I am looking for a solution that uses Queue, Threading, and possible Condition modules, or anything relevant to those modules. A threadpool solution may also be helpful. I cannot use processes, only threads.
I am also looking for a solution in Python 2.7.
Using a thread pool:
#!/usr/bin/env python2
from multiprocessing.dummy import Pool, Queue # thread pool
folder_file_dict = {
folder_name: {
file_names_key: file_names_array
}
}
def process_files_in_series(file_names_array, udp_port):
for file_name in file_names_array:
time_consuming_method(file_name, udp_port)
# create "file_name"
...
def mp_process(filenames):
udp_port = free_udp_ports.get() # block until a free udp port is available
args = filenames, udp_port
try:
return args, process_files_in_series(*args), None
except Exception as e:
return args, None, str(e)
finally:
free_udp_ports.put_nowait(udp_port)
free_udp_ports = Queue() # in general, use initializer to pass it to children
for port in udp_ports:
free_udp_ports.put_nowait(port)
pool = Pool(number_of_concurrent_jobs) #
for args, result, error in pool.imap_unordered(mp_process, get_files_arrays()):
if error is not None:
print args, error
I don't think you need to bind number of threads to number of udp ports if the processing time may differ for different filenames arrays.
If I understand the structure of folder_file_dict correctly then to generate the filenames arrays:
def get_files_arrays(folder_file_dict=folder_file_dict):
for folder_name_dict in folder_file_dict.itervalues():
for filenames_array in folder_name_dict.itervalues():
yield filenames_array
Use the multiprocessing.pool.ThreadPool. It handles queue / thread management for you and can be easily changed to do multiprocessing instead.
EDIT: Added example
Here's an example... multiple threads may end up using the same udp port. I'm not sure if that's a problem for you.
import multithreading
import multithreading.pool
import itertools
def process_files_in_series(file_names_array, udp_port):
for file_name in file_names_array:
time_consuming_method(file_name, udp_port)
# create "file_name"
udp_ports = [123, 456, 789]
folder_file_dict = {
folder_name : {
file_names_key : [file_names_array]
}
}
def main(folder_file_dict, udp_ports):
# number of threads - here I'm limiting to the smaller of udp_ports,
# file lists to process and a cap I arbitrarily set to 4
num_threads = min(len(folder_file_dict), len(udp_ports), 4)
# the pool
pool = multithreading.pool.ThreadPool(num_threads)
# build files to be processed into list. You may want to do other
# Things like join folder_name...
file_arrays = [value['file_names_key'] for value in folder_file_dict.values()]
# do the work
pool.map(process_files_in_series, zip(file_arrays, itertools.cycle(udp_ports))
pool.close()
pool.join()
This is kind of a blue print to how you could use multiprocessing.Process
with JoinableQueue 's to deliver Jobs to Workers. You will
still be bound by I/O but with Process you do have true concurrency,
which may prove to be useful, since threading may even be slower than
a normal script processing the files.
(Be aware that this will also prevent you from doing anything else with your Laptop
if you dare to start too many processes at once :P).
I tried to explain the code
as much as possible with comments.
import traceback
from multiprocessing import Process, JoinableQueue, cpu_count
# Number if CPU's on your PC
cpus = cpu_count()
# The Worker Function. Could also be modelled as a class
def Worker(q_jobs):
while True:
# Try / Catch / finally may be necessary for error-prone tasks since the processes
# may hang forever if the task_done() method is not called.
try:
# Get an item from the Queue
item = q_jobs.get()
# At this point the data should somehow be processed
except:
traceback.print_exc()
else:
pass
finally:
# Inform the Queue that the Task has been done
# Without this. The processes can not be killed
# and will be left as Zombies afterwards
q_jobs.task_done()
# A Joinable Queue to end the process
q_jobs = JoinableQueue()
# Create process depending on the number of CPU's
for i in range(cpus):
# target function and arguments
# a list of multiple arguments should not end with ',' e.g.
# (q_jobs, 'bla')
p = Process(target=Worker,
args=(q_jobs,)
)
p.daemon = True
p.start()
# fill Queue with Jobs
q_jobs.put(['Do'])
q_jobs.put(['Something'])
# End Process
q_jobs.join()
Cheers
EDIT
I wrote this with Python 3 in mind.
Taking the parenthesis from the print function
print item
should make this work for 2.7.

how to pass argument into threading?

I want to add 5 for every element in range(1,100) with threading module,
to watch which rusult is in which thread.
I finished almost of the code,but how to pass argument into threading.Thread?
import threading,queue
x=range(1,100)
y=queue.Queue()
for i in x:
y.put(i)
def myadd(x):
print(x+5)
for i in range(5):
print(threading.Thread.getName())
threading.Thread(target=myadd,args=x).start() #it is wrong here
y.join()
Thinks to dano ,it is ok now ,in order to run in interactive way, i rewrite it as:
method 1:run in interactive way.
from concurrent.futures import ThreadPoolExecutor
import threading
x = range(1, 100)
def myadd(x):
print("Current thread: {}. Result: {}.".format(threading.current_thread(), x+5))
def run():
t = ThreadPoolExecutor(max_workers=5)
t.map(myadd, x)
t.shutdown()
run()
methdo 2:
from concurrent.futures import ThreadPoolExecutor
import threading
x = range(1, 100)
def myadd(x):
print("Current thread: {}. Result: {}.".format(threading.current_thread(), x+5))
def run():
t = ThreadPoolExecutor(max_workers=5)
t.map(myadd, x)
t.shutdown()
if __name__=="__main__":
run()
What about if more args to be passed into ThreadPoolExecutor?
I want to calculate 1+3, 2+4, 3+45 until 100+102 with multi-processing module.
And what about 20+1,20+2,20+3 until 20+100 with multi-processing module?
from multiprocessing.pool import ThreadPool
do = ThreadPool(5)
def myadd(x,y):
print(x+y)
do.apply(myadd,range(3,102),range(1,100))
How to fix it?
Here you need to pass a tuple rather than using a single element.
For making a tuple the code would be .
dRecieved = connFile.readline();
processThread = threading.Thread(target=processLine, args=(dRecieved,));
processThread.start();
Please refer here for the more explanation
It looks like you're trying to create a thread pool manually, so that five threads are used to add up all 100 results. If this is the case, I would recommend using multiprocessing.pool.ThreadPool for this:
from multiprocessing.pool import ThreadPool
import threading
import queue
x = range(1, 100)
def myadd(x):
print("Current thread: {}. Result: {}.".format(
threading.current_thread(), x+5))
t = ThreadPool(5)
t.map(myadd, x)
t.close()
t.join()
If you're using Python 3.x, you could use concurrent.futures.ThreadPoolExecutor instead:
from concurrent.futures import ThreadPoolExecutor
import threading
x = range(1, 100)
def myadd(x):
print("Current thread: {}. Result: {}.".format(threading.current_thread(), x+5))
t = ThreadPoolExecutor(max_workers=5)
t.map(myadd, x)
t.shutdown()
I think there are two issues with your original code. First, you need to pass a tuple to the args keyword argument, not a single element:
threading.Thread(target=myadd,args=(x,))
However, you're also trying to pass the entire list (or range object, if using Python 3.x) returned by range(1,100) to myadd, which isn't really what you want to do. It's also not clear what you're using the queue for. Maybe you meant to pass that to myadd?
One final note: Python uses a Global Interpreter Lock (GIL), which prevents more than one thread from using the CPU at a time. This means that doing CPU-bound operations (like addition) in threads provides no performance boost in Python, since only one of the threads will ever run at a time. Therefore, In Python it's preferred to use the multiple processes to parallelize CPU-bound operations. You could make the above code use multiple processes by replacing the ThreadPool in the first example with from mulitprocessing import Pool. In the second example, you would use ProcessPoolExecutor instead of ThreadPoolExecutor. You would also probably want to replace threading.current_thread() with os.getpid().
Edit:
Here's how to handle the case where there are two different args to pass:
from multiprocessing.pool import ThreadPool
def myadd(x,y):
print(x+y)
def do_myadd(x_and_y):
return myadd(*x_and_y)
do = ThreadPool(5)
do.map(do_myadd, zip(range(3, 102), range(1, 100)))
We use zip to create a list where we pair together each variable in the range:
[(3, 1), (4, 2), (5, 3), ...]
We use map to call do_myadd with each tuple in that list, and do_myadd uses tuple expansion (*x_and_y), to expand the tuple into two separate arguments, which get passed to myadd.
From:
import threading,queue
x=range(1,100)
y=queue.Queue()
for i in x:
y.put(i)
def myadd(x):
print(x+5)
for i in range(5):
print(threading.Thread.getName())
threading.Thread(target=myadd,args=x).start() #it is wrong here
y.join()
To:
import threading
import queue
# So print() in various threads doesn't garble text;
# I hear it is better to use RLock() instead of Lock().
screen_lock = threading.RLock()
# I think range() is an iterator or generator. Thread safe?
argument1 = range(1, 100)
argument2 = [100,] * 100 # will add 100 to each item in argument1
# I believe this creates a tuple (immutable).
# If it were a mutable object then perhaps it wouldn't be thread safe.
data = zip(argument1, argument2)
# object where multiple threads can grab data while avoiding deadlocks.
q = queue.Queue()
# Fill the thread-safe queue with mock data
for item in data:
q.put(item)
# It could be wiser to use one queue for each inbound data stream.
# For example one queue for file reads, one queue for console input,
# one queue for each network socket. Remembering that rates of
# reading files and console input and receiving network traffic all
# differ and you don't want one I/O operation to block another.
# inbound_file_data = queue.Queue()
# inbound_console_data = queue.Queue() # etc.
# This function is a thread target
def myadd(thread_name, a_queue):
# This thread-targetted function blocks only within each thread;
# at a_queue.get() and at a_queue.put() (if queue is full).
#
# Each thread targetting this function has its own copy of
# this functions local() namespace. So each thread will
# pause when the queue is empty, on queue.get(), or when
# the queue is full, on queue.put(). With one queue, this
# means all threads will block at the same time, when the
# single queue is full or when the single queue is empty
# unless we check for the number of remaining items in the
# queue before we do a queue.get() and if none remain in the
# queue just exit this function. This presumes the data is
# not a continues and slow stream like a network connection
# or a rotating log file but limited like a closed file.
# Let each thread continue to read from the global
# queue until it is empty.
#
# This is a bad use-case for using threading.
#
# If each thread had a separate queue it would be
# a better use-case. You don't want one slow stream of
# data blocking the processing of a fast stream of data.
#
# For a single stream of data it is likely better just not
# to use threads. However here is a single "global" queue
# example...
# presumes a_queue starts off not empty
while a_queue.qsize():
arg1, arg2 = a_queue.get() # blocking call
# prevent console/tty text garble
if screen_lock.acquire():
print('{}: {}'.format(thread_name, arg1 + arg2))
print('{}: {}'.format(thread_name, arg1 + 5))
print()
screen_lock.release()
else:
# print anyway if lock fails to acquire
print('{}: {}'.format(thread_name, arg1 + arg2))
print('{}: {}'.format(thread_name, arg1 + 5))
print()
# allows .join() to keep track of when queue finished
a_queue.task_done()
# create threads and pass in thread name and queue to thread-target function
threads = []
for i in range(5):
thread_name = 'Thread-{}'.format(i)
thread = threading.Thread(
name=thread_name,
target=myadd,
args=(thread_name, q))
# Recommended:
# queues = [queue.Queue() for index in range(len(threads))] # put at top of file
# thread = threading.Thread(
# target=myadd,
# name=thread_name,
# args=(thread_name, queues[i],))
threads.append(thread)
# some applications should start threads after all threads are created.
for thread in threads:
thread.start()
# Each thread will pull items off the queue. Because the while loop in
# myadd() ends with the queue.qsize() == 0 each thread will terminate
# when there is nothing left in the queue.

Multiprocessing with python3 only runs once

I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.

Return whichever expression returns first

I have two different functions f, and g that compute the same result with different algorithms. Sometimes one or the other takes a long time while the other terminates quickly. I want to create a new function that runs each simultaneously and then returns the result from the first that finishes.
I want to create that function with a higher order function
h = firstresult(f, g)
What is the best way to accomplish this in Python?
I suspect that the solution involves threading. I'd like to avoid discussion of the GIL.
I would simply use a Queue for this. Start the threads and the first one which has a result ready writes to the queue.
Code
from threading import Thread
from time import sleep
from Queue import Queue
def firstresult(*functions):
queue = Queue()
threads = []
for f in functions:
def thread_main():
queue.put(f())
thread = Thread(target=thread_main)
threads.append(thread)
thread.start()
result = queue.get()
return result
def slow():
sleep(1)
return 42
def fast():
return 0
if __name__ == '__main__':
print firstresult(slow, fast)
Live demo
http://ideone.com/jzzZX2
Notes
Stopping the threads is an entirely different topic. For this you need to add some state variable to the threads which needs to be checked in regular intervals. As I want to keep this example short I simply assumed that part and assumed that all workers get the time to finish their work even though the result is never read.
Skipping the discussion about the Gil as requested by the questioner. ;-)
Now - unlike my suggestion on the other answer, this piece of code does exactly what you are requesting:
from multiprocessing import Process, Queue
import random
import time
def firstresult(func1, func2):
queue = Queue()
proc1 = Process(target=func1,args=(queue,))
proc2 = Process(target=func2, args=(queue,))
proc1.start();proc2.start()
result = queue.get()
proc1.terminate(); proc2.terminate()
return result
def algo1(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 1")
def algo2(queue):
time.sleep(random.uniform(0,1))
queue.put("algo 2")
print firstresult(algo1, algo2)
Run each function in a new worker thread, the 2 worker threads send the result back to the main thread in a 1 item queue or something similar. When the main thread receives the result from the winner, it kills (do python threads support kill yet? lol.) both worker threads to avoid wasting time (one function may take hours while the other only takes a second).
Replace the word thread with process if you want.
You will need to run each function in another process (with multiprocessing) or in a different thread.
If both are CPU bound, multithread won help much - exactly due to the GIL -
so multiprocessing is the way.
If the return value is a pickleable (serializable) object, I have this decorator I created that simply runs the function in background, in another process:
https://bitbucket.org/jsbueno/lelo/src
It is not exactly what you want - as both are non-blocking and start executing right away. The tirck with this decorator is that it blocks (and waits for the function to complete) as when you try to use the return value.
But on the other hand - it is just a decorator that does all the work.

Categories

Resources