Recently I have been doing a lot of network or IO bound operations and using threads helps speed up the code a lot. I noticed that I have been writing code like this over and over again:
threads = []
for machine, user, data in machine_list:
mythread = threading.Thread(target=get_info, args=(machine, user, data))
mythread.start()
threads.append(mythread)
for mythread in threads:
mythread.join()
This feels somewhat repetitive. It works, but I suspect there is likely a more "Pythonic" way to write this. Any suggestions?
What you are looking for is multiprocessing.pool.ThreadPool, which has the same semantics as multiprocessing.pool.Pool, but uses threads instead of processes.
You can do what you are currently doing more concisely like this:
from multiprocessing.pool import ThreadPool
pool = ThreadPool() # optionally pass the number of threads in the pool
res = pool.starmap_async(get_info, machine_list)
res.wait()
This is not exactly equivalent to your code since ThreadPool creates a fixed number of threads (by default equal to the number of available CPUs) and distributes work among them, but you can nonetheless pass the number you want (e.g. ThreadPool(len(machine_list))) to create exactly one for each item in the list.
Then you can also create a function to easily do this multiple times:
def run_threads(func, arglist):
ThreadPool(len(arglist)).starmap_async(func, arglist).wait()
Note: .starmap_async() is just one way to achieve this. There are multiple methods you can use. Take a look at the documentation for Pool linked above and choose the one you prefer.
In Python, there is an easy and simple approach to work with many threads.
from concurrent.futures import ThreadPoolExecutor
from time import sleep
from tqdm import tqdm
def do_something(item):
print(item)
sleep(1)
print(f"Finished {item}")
items = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with ThreadPoolExecutor(max_workers=8) as executor:
for item in items:
executor.submit(do_something, item)
# with progressbar:
with ThreadPoolExecutor(max_workers=8) as executor:
list(tqdm(executor.map(do_something, items), total=len(items)))
print("finished")
Note: I tried others, but this is the only one that worked with Docker multi-threading on a single vCPU ( google cloud run environment )
The pythonic way would probably be to use asyncio. The problem you have is exactly what asyncio was designed for. The net result is broadly the same as using threading. However, instead of threads though, you have tasks. And when a task is blocked, the executor will switch to a different task. However, the program will be single-threaded and so avoid the overhead caused by the GIL when switching between threads.
import asyncio
async def get_info(machine, user, data):
# NB. async declaration
...
async def main():
tasks = [
asyncio.create_task(get_info(machine, user, data))
for machine, user, data in machine_list
]
done, _pending = await asyncio.wait(tasks)
# asyncio is more powerful in that it allows you to directly get results of tasks.
# This is unlike threading, where you must use some form of signalling
# (such as a queue) to get data back from a thread.
results = {}
for args, task in zip(machine_list, tasks):
result = await task # this gets the result immediately since you have
# already used asyncio.wait
results[args] = result
task: (await task) for task in tasks}
if __name__ == '__main__':
asyncio.run(main())
The problem with this approach is that you'll have to start using asyncio-aware libraries and rewrite your own code to be asyncio-aware. Though to get started you can use asyncio.to_thread(). It will run the given function in a separate thread,
import asyncio
def get_info(machine, user, data):
# NB. no async declaration
...
async def main():
tasks = [
asyncio.to_thread(get_info, machine, user, data)
for machine, user, data in machine_list
]
done, _pending = asyncio.wait(tasks)
if __name__ == '__main__':
asyncio.run(main())
concurrent.futures
If you're heavily invested in the threading model and switching to asyncio would be too much work, and having to learn all the new concepts would be too much of a barrier, then you can use concurrent.futures
from concurrent.futures import ThreadPoolExecutor
def get_info(machine, user, data):
...
def get_info_helper(args)
machine, user, data = args
return get_info(machine, user, data)
with ThreadPoolExecutor() as executor:
results = list(executor.map(get_info_helper, machine_list))
Related
I've an async code that looks like this:
There's a third-party function that performs some operations on the string and returns a modified string, for the purpose of this question, it's something like non_async_func.
I've an async def async_func_single function that wraps around the non_async_func that performs a single operation.
Then another async def async_func_batch function that nested-wraps around async_func_single to perform the function for a batch of data.
The code kind of works but I would like to know more about why/how, my questions are
Is it necessary to create the async_func_single and have async_func_batch wrap around it?
Can I directly just feed in a batch of data in async_func_batch to call non_async_func?
I have a per_chunk function that feeds in the data in batches, is there any asyncio operations/functions that can avoid the use of pre-batching the data I want to send to async_func_batch?
import nest_asyncio
nest_asyncio.apply()
import asyncio
from itertools import zip_longest
from loremipsum import get_sentences
def per_chunk(iterable, n=1, fillvalue=None):
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
def non_async_func(text):
return text[::-1]
async def async_func_single(text):
# Perform some string operation.
return non_async_func(text)
async def async_func_batch(batch):
tasks = [async_func_single(text) for text in batch]
return await asyncio.gather(*tasks)
# Create some random inputs
thousand_texts = get_sentences(1000)
# Loop through 20 sentence at a time.
for batch in per_chunk(thousand_texts, n=20):
loop = asyncio.get_event_loop()
results = loop.run_until_complete(async_func_batch(batch))
for i, o in zip(thousand_texts, results):
print(i, o)
Note that marking your functions as "async def", rather than "def" doesn't make them automatically asynchronous - you can have "async def" functions that are synchronous. The difference between asynchronous functions and synchronous ones is that asynchronous functions define places (using "await") where it waits on either another asynchronous function or waits on an asynchronous IO operation.
Also note that asyncio is not magic - it is basically a scheduler that schedules asynchronous functions to be run based on whether the function/operation that is being "awaited" has completed. And, as the scheduler and the asynchronous functions all run on a single thread, then at any given moment, only a single asynchronous function can be running.
So, going back to your code, the only thing your "async_func_single" function is doing is calling an synchronous function, therefore, despite being marked as "async def", it is still a synchronous function. And the same logic applies to the "async_func_batch" function - the "async_func_single" tasks passed to "asyncio.gather" are all synchronous, so the "asyncio.gather" is just running each task synchronously (so it is not offering up any benefits over a simple for loop waiting on each task), so again the "async_func_batch" is a synchronous function. Because you are just calling synchronous functions, then asyncio is not offering any benefits to your program.
If you want multiple synchronous functions that all run at the same time, you don't use asynchronous functions. You need to run them in parallel processes/threads:
import sys
import itertools
import concurrent.futures
from loremipsum import get_sentences
executor = concurrent.futures.ProcessPoolExecutor(workers=sys.cpu_count())
def per_chunk(iterable, n=1):
while True:
chunk = tuple(itertools.islice(iterable, n))
if chunk:
yield chunk
else:
break
def non_async_func(text):
return text[::-1]
def process_batches(batches):
futures = [
executor.submit(non_async_func, batch)
for batch in batches
]
concurrent.futures.wait(futures)
thousand_texts = get_sentences(1000)
process_batches(per_chunk(thousand_texts, n=20))
If you still want to use an asynchronous function to process the batches, then asyncio provides asynchronous wrappers around the concurrent futures:
async def process_batches(batches):
event_loop = asyncio.get_running_loop()
futures = [
event_loop.run_in_executor(executor, non_async_func, batch)
for batch in batches
]
await asyncio.wait(futures)
thousand_texts = get_sentences(1000)
asyncio.run(process_batches(per_chunk(thousand_texts, n=20)))
but it gives no advantages unless you have other asynchronous functions that can be run while it is waiting.
I have tried to answer your questions below.
The code kind of works but I would like to know more about why/how, my questions are
Is it necessary to create the async_func_single and have
async_func_batch wrap around it?
No, this is absolutely not necessary.
Can I directly just feed in a batch of data in async_func_batch to
call non_async_func?
You could do something like the example 1 below, where you feed all the data directly.
I have a per_chunk function that feeds in the data in batches, is
there any asyncio operations/functions that can avoid the use of
pre-batching the data I want to send to async_func_batch?
It's possible to use Asyncio Queues with a max size and then process data until the queue is empty and fill it up again. Check out example 2.
Example 1
import asyncio
from concurrent.futures import ThreadPoolExecutor
from loremipsum import get_sentences
def non_async_func(text):
return text[::-1]
async def async_func_batch(batch):
with ThreadPoolExecutor(max_workers=20) as executor:
futures = [loop.run_in_executor(executor, non_async_func, text) for text in batch]
return(await asyncio.gather(*futures))
# Create some random inputs
thousand_texts = get_sentences(1000)
# Loop through 20 sentence at a time.
loop = asyncio.get_event_loop()
results = loop.run_until_complete(async_func_batch(thousand_texts))
for i, o in zip(thousand_texts, results):
print(i, o)
Example 2
Queues can be infinite in size. If you do not specify maxsize it will Queue up all elements before processing. If you remove maxsize, then you need to move join outside of the for-loop and remove if taskQueue.full():.
from loremipsum import get_sentences
import asyncio
async def async_func(text, taskQueue, resultsQueue):
await resultsQueue.put(text[::-1]) # add the result to the resultsQueue
taskQueue.task_done() # Tell the taskQueue that the task is finished
taskQueue.get_nowait() # Don't wait for it (unblocking)
async def main():
taskQueue = asyncio.Queue(maxsize=20)
resultsQueue = asyncio.Queue()
thousand_texts = get_sentences(1000)
results = []
for text in thousand_texts:
await taskQueue.put(asyncio.create_task(async_func(text, taskQueue, resultsQueue)))
if taskQueue.full(): # If maxsize is reached
await taskQueue.join() # Will block until finished
while not resultsQueue.empty():
results.append(await resultsQueue.get())
for i, o in zip(thousand_texts, results):
print(i, o)
if __name__ == "__main__":
asyncio.run(main())
I want to write and run a directed acyclic graph (DAG) with several tasks running in serial or parallel. Ideally it would look like:
def task1():
# ...
def task2():
# ...
graph = Sequence([
task1,
task2,
Parallel([
task3,
task4
]),
task5
]
graph.run()
It would run 1 -> 2 -> (3 and 4 concurrently) -> 5. The tasks need to access the global scope to store results, write logs and access command line parameters.
My use case is writing a deployment script. Parallel tasks are IO-bound: typically waiting on a remote server to complete a step.
I looked into threading, asyncio, Airflow, but did not find any simple library that would allow this without some boilerplate code to traverse and control the graph's execution. Does anything like that exist?
Here's a quick proof-of-concept implementation. It can be used like:
graph = sequence(
lambda: print(1),
lambda: print(2),
parallel(
lambda: print(3),
lambda: print(4),
sequence(
lambda: print(5),
lambda: print(6))),
lambda: print(7)
graph()
1
2
3
5
6
4
7
sequence produces a function that wraps a for loop, and parallel produces a function that wraps use of a thread pool:
from typing import Callable
from multiprocessing.pool import ThreadPool
Task = Callable[[], None]
_pool: ThreadPool = ThreadPool()
def sequence(*tasks: Task) -> Task:
def run():
for task in tasks:
task()
return run # Returning "run" to be used as a task by other "sequence" and "parallel" calls
def parallel(*tasks: Task) -> Task:
def run():
_pool.map(lambda f: f(), tasks) # Delegate to a pool used for IO tasks
return run
Each call to sequence and parallel returns a new "Task" (a function taking no arguments and returning nothing). That task can then be called by other, outer calls to sequence and parallel.
Things to note about the ThreadPool:
While this does use a thread pool for parallel, due to the GIL, this will still only execute one thing at a time. This means parallel is essentially useless for CPU-bound tasks.
I haven't specified how many threads the pool should begin with. I think it defaults to the number of cores you have available to you. You could specify how many you want to start with using the first parameter to ThreadPool if you want more.
For brevity, I'm not cleaning up the ThreadPool. You should definitely do that though if you use this.
Even though ThreadPool is a part of multiprocessing, confusingly it uses threads not processes.
You mentioned that yours tasks are IO bound, that means that asycnio would be a good candidate for this. You can try the aiodag library, which is an extremely light interface on top of asycnio that lets you easily define asynchronous dags:
import asyncio
from aiodag import task
#task
async def task1(x):
...
#task
async def task2(x):
...
#task
async def task3(x):
...
#task
async def task4(x):
...
#task
async def task5(x, y):
...
# rest of task funcs
async def main():
t1 = task1()
t2 = task2(t1)
t3 = task3(t2) # t3/t4 take t2, when t2 finishes, will run concurrently
t4 = task4(t2)
t5 = task5(t3, t4) # will wait until t3/t4 finish to execute
await t5
loop = asyncio.new_event_loop()
asyncio.run_until_complete(main())
Check out the readme on the github page for aiodag for a bit of detail on how the dag is constructed/optimally executed.
https://github.com/aa1371/aiodag
If you don't want to be tied to async functions, then check out dask's delayed interface. The definition of the dag works the same way as aiodag's, where the dag is constructed by function invocations. Dask will seamlessly handle executing your dag in the optimal parallel scheme, and can distribute over an arbitrarily large cluster to perform the parallel executions as well.
https://docs.dask.org/en/latest/delayed.html
I've written a library of objects, many which make HTTP / IO calls. I've been looking at moving over to asyncio due to the mounting overheads, but I don't want to rewrite the underlying code.
I've been hoping to wrap asyncio around my code in order to perform functions asynchronously without replacing all of my deep / low level code with await / yield.
I began by attempting the following:
async def my_function1(some_object, some_params):
#Lots of existing code which uses existing objects
#No await statements
return output_data
async def my_function2():
#Does more stuff
while True:
loop = asyncio.get_event_loop()
tasks = my_function(some_object, some_params), my_function2()
output_data = loop.run_until_complete(asyncio.gather(*tasks))
print(output_data)
I quickly realised that while this code runs, nothing actually happens asynchronously, the functions complete synchronously. I'm very new to asynchronous programming, but I think this is because neither of my functions are using the keyword await or yield and thus these functions are not cooroutines, and do not yield, thus do not provide an opportunity to move to a different cooroutine. Please correct me if I am wrong.
My question is, is it possible to wrap complex functions (where deep within they make HTTP / IO calls ) in an asyncio await keyword, e.g.
async def my_function():
print("Welcome to my function")
data = await bigSlowFunction()
UPDATE - Following Karlson's Answer
Following and thanks to Karlsons accepted answer, I used the following code which works nicely:
from concurrent.futures import ThreadPoolExecutor
import time
#Some vars
a_var_1 = 0
a_var_2 = 10
pool = ThreadPoolExecutor(3)
future = pool.submit(my_big_function, object, a_var_1, a_var_2)
while not future.done() :
print("Waiting for future...")
time.sleep(0.01)
print("Future done")
print(future.result())
This works really nicely, and the future.done() / sleep loop gives you an idea of how many CPU cycles you get to use by going async.
The short answer is, you can't have the benefits of asyncio without explicitly marking the points in your code where control may be passed back to the event loop. This is done by turning your IO heavy functions into coroutines, just like you assumed.
Without changing existing code you might achieve your goal with greenlets (have a look at eventlet or gevent).
Another possibility would be to make use of Python's Future implementation wrapping and passing calls to your already written functions to some ThreadPoolExecutor and yield the resulting Future. Be aware, that this comes with all the caveats of multi-threaded programming, though.
Something along the lines of
from concurrent.futures import ThreadPoolExecutor
from thinair import big_slow_function
executor = ThreadPoolExecutor(max_workers=5)
async def big_slow_coroutine():
await executor.submit(big_slow_function)
As of python 3.9 you can wrap a blocking (non-async) function in a coroutine to make it awaitable using asyncio.to_thread(). The exampe given in the official documentation is:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
This seems like a more joined up approach than using concurrent.futures to make a coroutine, but I haven't tested it extensively.
I have successfully built a RESTful microservice with Python asyncio and aiohttp that listens to a POST event to collect realtime events from various feeders.
It then builds an in-memory structure to cache the last 24h of events in a nested defaultdict/deque structure.
Now I would like to periodically checkpoint that structure to disc, preferably using pickle.
Since the memory structure can be >100MB I would like to avoid holding up my incoming event processing for the time it takes to checkpoint the structure.
I'd rather create a snapshot copy (e.g. deepcopy) of the structure and then take my time to write it to disk and repeat on a preset time interval.
I have been searching for examples on how to combine threads (and is a thread even the best solution for this?) and asyncio for that purpose but could not find something that would help me.
Any pointers to get started are much appreciated!
It's pretty simple to delegate a method to a thread or sub-process using BaseEventLoop.run_in_executor:
import asyncio
import time
from concurrent.futures import ProcessPoolExecutor
def cpu_bound_operation(x):
time.sleep(x) # This is some operation that is CPU-bound
#asyncio.coroutine
def main():
# Run cpu_bound_operation in the ProcessPoolExecutor
# This will make your coroutine block, but won't block
# the event loop; other coroutines can run in meantime.
yield from loop.run_in_executor(p, cpu_bound_operation, 5)
loop = asyncio.get_event_loop()
p = ProcessPoolExecutor(2) # Create a ProcessPool with 2 processes
loop.run_until_complete(main())
As for whether to use a ProcessPoolExecutor or ThreadPoolExecutor, that's kind of hard to say; pickling a large object will definitely eat some CPU cycles, which initially would make you think ProcessPoolExecutor is the way to go. However, passing your 100MB object to a Process in the pool would require pickling the instance in your main process, sending the bytes to the child process via IPC, unpickling it in the child, and then pickling it again so you can write it to disk. Given that, my guess is the pickling/unpickling overhead will be large enough that you're better off using a ThreadPoolExecutor, even though you're going to take a performance hit because of the GIL.
That said, it's very simple to test both ways and find out for sure, so you might as well do that.
I also used run_in_executor, but I found this function kinda gross under most circumstances, since it requires partial() for keyword args and I'm never calling it with anything other than a single executor and the default event loop. So I made a convenience wrapper around it with sensible defaults and automatic keyword argument handling.
from time import sleep
import asyncio as aio
loop = aio.get_event_loop()
class Executor:
"""In most cases, you can just use the 'execute' instance as a
function, i.e. y = await execute(f, a, b, k=c) => run f(a, b, k=c) in
the executor, assign result to y. The defaults can be changed, though,
with your own instantiation of Executor, i.e. execute =
Executor(nthreads=4)"""
def __init__(self, loop=loop, nthreads=1):
from concurrent.futures import ThreadPoolExecutor
self._ex = ThreadPoolExecutor(nthreads)
self._loop = loop
def __call__(self, f, *args, **kw):
from functools import partial
return self._loop.run_in_executor(self._ex, partial(f, *args, **kw))
execute = Executor()
...
def cpu_bound_operation(t, alpha=30):
sleep(t)
return 20*alpha
async def main():
y = await execute(cpu_bound_operation, 5, alpha=-2)
loop.run_until_complete(main())
Another alternative is to use loop.call_soon_threadsafe along with an asyncio.Queue as the intermediate channel of communication.
The current documentation for Python 3 also has a section on Developing with asyncio - Concurrency and Multithreading:
import asyncio
# This method represents your blocking code
def blocking(loop, queue):
import time
while True:
loop.call_soon_threadsafe(queue.put_nowait, 'Blocking A')
time.sleep(2)
loop.call_soon_threadsafe(queue.put_nowait, 'Blocking B')
time.sleep(2)
# This method represents your async code
async def nonblocking(queue):
await asyncio.sleep(1)
while True:
queue.put_nowait('Non-blocking A')
await asyncio.sleep(2)
queue.put_nowait('Non-blocking B')
await asyncio.sleep(2)
# The main sets up the queue as the communication channel and synchronizes them
async def main():
queue = asyncio.Queue()
loop = asyncio.get_running_loop()
blocking_fut = loop.run_in_executor(None, blocking, loop, queue)
nonblocking_task = loop.create_task(nonblocking(queue))
running = True # use whatever exit condition
while running:
# Get messages from both blocking and non-blocking in parallel
message = await queue.get()
# You could send any messages, and do anything you want with them
print(message)
asyncio.run(main())
How to send asyncio tasks to loop running in other thread may also help you.
If you need a more "powerful" example, check out my Wrapper to launch async tasks from threaded code. It will handle the thread safety part for you (for the most part) and let you do things like this:
# See https://gist.github.com/Lonami/3f79ed774d2e0100ded5b171a47f2caf for the full example
async def async_main(queue):
# your async code can go here
while True:
command = await queue.get()
if command.id == 'print':
print('Hello from async!')
elif command.id == 'double':
await queue.put(command.data * 2)
with LaunchAsync(async_main) as queue:
# your threaded code can go here
queue.put(Command('print'))
queue.put(Command('double', 7))
response = queue.get(timeout=1)
print('The result of doubling 7 is', response)
I have a problem running multiple processes in python3 .
My program does the following:
1. Takes entries from an sqllite database and passes them to an input_queue
2. Create multiple processes that take items off the input_queue, run it through a function and output the result to the output queue.
3. Create a thread that takes items off the output_queue and prints them (This thread is obviously started before the first 2 steps)
My problem is that currently the 'function' in step 2 is only run as many times as the number of processes set, so for example if you set the number of processes to 8, it only runs 8 times then stops. I assumed it would keep running until it took all items off the input_queue.
Do I need to rewrite the function that takes the entries out of the database (step 1) into another process and then pass its output queue as an input queue for step 2?
Edit:
Here is an example of the code, I used a list of numbers as a substitute for the database entries as it still performs the same way. I have 300 items on the list and I would like it to process all 300 items, but at the moment it just processes 10 (the number of processes I have assigned)
#!/usr/bin/python3
from multiprocessing import Process,Queue
import multiprocessing
from threading import Thread
## This is the class that would be passed to the multi_processing function
class Processor:
def __init__(self,out_queue):
self.out_queue = out_queue
def __call__(self,in_queue):
data_entry = in_queue.get()
result = data_entry*2
self.out_queue.put(result)
#Performs the multiprocessing
def perform_distributed_processing(dbList,threads,processor_factory,output_queue):
input_queue = Queue()
# Create the Data processors.
for i in range(threads):
processor = processor_factory(output_queue)
data_proc = Process(target = processor,
args = (input_queue,))
data_proc.start()
# Push entries to the queue.
for entry in dbList:
input_queue.put(entry)
# Push stop markers to the queue, one for each thread.
for i in range(threads):
input_queue.put(None)
data_proc.join()
output_queue.put(None)
if __name__ == '__main__':
output_results = Queue()
def output_results_reader(queue):
while True:
item = queue.get()
if item is None:
break
print(item)
# Establish results collecting thread.
results_process = Thread(target = output_results_reader,args = (output_results,))
results_process.start()
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
# Perform multi processing
perform_distributed_processing(dbList,10,Processor,output_results)
# Wait for it all to finish.
results_process.join()
A collection of processes that service an input queue and write to an output queue is pretty much the definition of a process pool.
If you want to know how to build one from scratch, the best way to learn is to look at the source code for multiprocessing.Pool, which is pretty simply Python, and very nicely written. But, as you might expect, you can just use multiprocessing.Pool instead of re-implementing it. The examples in the docs are very nice.
But really, you could make this even simpler by using an executor instead of a pool. It's hard to explain the difference (again, read the docs for both modules), but basically, a future is a "smart" result object, which means instead of a pool with a variety of different ways to run jobs and get results, you just need a dumb thing that doesn't know how to do anything but return futures. (Of course in the most trivial cases, the code looks almost identical either way…)
from concurrent.futures import ProcessPoolExecutor
def Processor(data_entry):
return data_entry*2
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
yield from executor.map(processor_factory, dbList)
if __name__ == '__main__':
# Use this as a substitute for the database in the example
dbList = [i for i in range(300)]
for result in perform_distributed_processing(dbList, 8, Processor):
print(result)
Or, if you want to handle them as they come instead of in order:
def perform_distributed_processing(dbList, threads, processor_factory):
with ProcessPoolExecutor(processes=threads) as executor:
fs = (executor.submit(processor_factory, db) for db in dbList)
yield from map(Future.result, as_completed(fs))
Notice that I also replaced your in-process queue and thread, because it wasn't doing anything but providing a way to interleave "wait for the next result" and "process the most recent result", and yield (or yield from, in this case) does that without all the complexity, overhead, and potential for getting things wrong.
Don't try to rewrite the whole multiprocessing library again. I think you can use any of multiprocessing.Pool methods depending on your needs - if this is a batch job you can even use the synchronous multiprocessing.Pool.map() - only instead of pushing to input queue, you need to write a generator that yields input to the threads.