Reserve cpu-time in multiprocessing or multithreading application

Reserve cpu-time in multiprocessing or multithreading application - python

I'm working on a project with a Raspberry Pi 3 for some environmental control with a number of simple recurring events in a continuous loop. The RP3 is way overqualified for this job, but it alows me focus on other stuff.
Characteristics of the application:
The application should collect sensordata (with variable interval n-seconds) from a dozen sensors (temperature, humidity, pH, ORP, etc).
Based on time and these sensordata, the controller calculates output (switches, valves and PWM-drivers).
Almost none events needs to run sequential.
Some events are in the category "safety" and should run instantly when triggered (fail-safe sensors, emergency button).
Most events run repetitive in a seconds interval (every second, till every 30 seconds).
Some events triggers an action, activating a relay during 1 to 120 seconds.
Some events use a time-bassed value. This value needs to be calculated several times a day and is fairly CPU intensive (uses a few itterative interpolating formula, and therefore has a variable runtime).
Display with environment status (in a continuous loop)
I'm familiar (not by profession) with VB.NET, but decided to do this project in Python 3.6.
The last few months I read a lot about subjecs like design patterns, threads, processes, events, paralel prcessing, etc.
Based on my reading I think Asyncio combined with some tasks in an Executor will do the job.
Most tasks/events are not time-critical. Controller output can use 'most recent' sensordata.
Some tasks, on the other hand, activating a relay for a certain period of time. I would like to know how to programm these tasks without the chance another 'time consuming' task is blocking the processor during the period of time (for example) a CO2 valve is open. This could be disastrous for my environment.
Herefore I need some advice.
See below for my code so far. I'm not sure I make correct use of the Asyncio functions in Python.
For the sake of readability, I will store the contents of the various tasks in separate modules.
import asyncio
import concurrent.futures
import datetime
import time
import random
import math
# define a task...
async def firstTask():
while True:
await asyncio.sleep(1)
print("First task executed")
# define another task...
async def secondTask():
while True:
await asyncio.sleep(5)
print("Second Worker Executed")
# define/simulate heavy CPU-bound task
def heavy_load():
while True:
print('Heavy_load started')
i = 0
for i in range(50000000):
f = math.sqrt(i)*math.sqrt(i)
print('Heavy_load finished')
time.sleep(4)
def main():
# Create a process pool (for CPU bound tasks).
processpool = concurrent.futures.ProcessPoolExecutor()
# Create a thread pool (for I/O bound tasks).
threadpool = concurrent.futures.ThreadPoolExecutor()
loop = asyncio.get_event_loop()
try:
# Add all tasks. (Correct use?)
asyncio.ensure_future(firstTask())
asyncio.ensure_future(secondTask())
loop.run_in_executor(processpool, heavy_load)
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("Loop will be ended")
loop.close()
if __name__ == '__main__':
main()

Most tasks/events are not time-critical. Controller output can use 'most recent' sensordata. Some tasks, on the other hand, activating a relay for a certain period of time. I would like to know how to programm these tasks without the chance another 'time consuming' task is blocking the processor during the period of time (for example) a CO2 valve is open. This could be disastrous for my environment.
Allow me to stress that Python is not a real-time language, and asyncio is not a real-time component. They sport neither the infrastructure for real-time execution (Python is garbage collected and typically runs on time-shared systems), nor have they been tested in such environments in practice. Consequently I would strongly advise against using them in any scenario where a misstep could be disastrous for your environment.
With that out of the way, your code has a problem: while the heavy_load calculation will not block the event loop, it will never complete either, nor will it provide information on its progress. The idea behind run_in_executor is that the calculation you are running will eventually halt, and that the event loop will want to be notified about it. Idiomatic usage of run_in_executor could look like this:
def do_heavy_calc(param):
print('Heavy_load started')
f = 0
for i in range(50000000):
f += math.sqrt(i)*math.sqrt(i)
return f
def heavy_calc(param):
loop = asyncio.get_event_loop()
return loop.run_in_executor(processpool, do_heavy_calc)
The expression heavy_calc(...) not only runs without blocking the event loop, but it is also awaitable. That means that asynchronous code can await its result, also without blocking other coroutines:
async def sum_params(p1, p2):
s1 = await heavy_calc(p1)
s2 = await heavy_calc(p2)
return s1 + s2
The above runs the two calculations one after the other. It can also be done in parallel:
async def sum_params_parallel(p1, p2):
s1, s2 = await asyncio.gather(heavy_calc(p1), heavy_calc(p2))
return s1 + s2
Another thing that could improve is the setup code:
asyncio.ensure_future(firstTask())
asyncio.ensure_future(secondTask())
loop.run_in_executor(processpool, heavy_load)
loop.run_forever()
Calling asyncio.ensure_future and then never awaiting the result is somewhat of an asyncio anti-pattern. Exceptions raised by unawaited tasks are silently swallowed, which is almost certainly not something you'd want. Sometimes people simply forget to write await, which is why asyncio complains about unawaited pending tasks when the loop is destroyed.
It is good coding practice to arrange for every task to be awaited by someone, either immediately with await or gather to combine it with other task, or at a later point. For instance, if the task needs to run in the background, you can store it somewhere and await or cancel it at the end of the application lifecycle. In your case, I would combine gather with loop.run_until_complete:
everything = asyncio.gather(firstTask(), secondTask(),
loop.run_in_executor(processpool, heavy_load))
loop.run_until_complete(everything)

Some events are in the category "safety" and should run instantly when triggered (fail-safe sensors, emergency button).
Then I strongly advise you to not rely on software to fulfil this function. Emergency stop buttons that cut off the power is the way that kind of thing is normally done. If you have software doing that, and you genuinely have a threat-to-life situation to handle, you're in for a whole pile of woe - there's almost certainly a ton of regulations with which you have to comply.

Related

Python Asyncio/Trio for Asynchronous Computing/Fetching

I am looking for a way to efficiently fetch a chunk of values from disk, and then perform computation/calculations on the chunk. My thought was a for loop that would run the disk fetching task first, then run the computation on the fetched data. I want to have my program fetch the next batch as it is running the computation so I don't have to wait for another data fetch every time a computation completes. I expect the computation will take longer than the fetching of the data from disk, and likely cannot be done truly in parallel due to a single computation task already pinning the cpu usage at near 100%.
I have provided some code below in python using trio (but could alternatively be used with asyncio to the same effect) to illustrate my best attempt at performing this operation with async programming:
import trio
import numpy as np
from datetime import datetime as dt
import time
testiters=10
dim = 6000
def generateMat(arrlen):
for _ in range(30):
retval= np.random.rand(arrlen, arrlen)
# print("matrix generated")
return retval
def computeOpertion(matrix):
return np.linalg.inv(matrix)
def runSync():
for _ in range(testiters):
mat=generateMat(dim)
result=computeOpertion(mat)
return result
async def matGenerator_Async(count):
for _ in range(count):
yield generateMat(dim)
async def computeOpertion_Async(matrix):
return computeOpertion(matrix)
async def runAsync():
async with trio.open_nursery() as nursery:
async for value in matGenerator_Async(testiters):
nursery.start_soon(computeOpertion_Async,value)
#await computeOpertion_Async(value)
print("Sync:")
start=dt.now()
runSync()
print(dt.now()-start)
print("Async:")
start=dt.now()
trio.run(runAsync)
print(dt.now()-start)
This code will simulate getting data from disk by generating 30 random matrices, which uses a small amount of cpu. It will then perform matrix inversion on the generated matrix, which uses 100% cpu (with openblas/mkl configuration in numpy). I compare the time taken to run the tasks by timing the synchronous and asynchronous operations.
From what I can tell, both jobs take exactly the same amount of time to finish, meaning the async operation did not speed up the execution. Observing the behavior of each computation, the sequential operation runs the fetch and computation in order and the async operation runs all the fetches first, then all the computations afterwards.
Is there a way to use asynchronously fetch and compute? Perhaps with futures or something like gather()? Asyncio has these functions, and trio has them in a seperate package trio_future. I am also open to solutions via other methods (threads and multiprocessing).
I believe that there likely exists a solution with multiprocessing that can make the disk reading operation run in a separate process. However, inter-process communication and blocking then becomes a hassle, as I would need some sort of semaphore to control how many blocks could be generated at a time due to memory constraints, and multiprocessing tends to be quite heavy and slow.
EDIT
Thank you VPfB for your answer. I am not able to sleep(0) in the operation, but I think even if I did, it would necessarily block the computation in favor of performing disk operations. I think this may be a hard limitation of python threading and asyncio, that it can only execute 1 thread at a time. Running two different processes simultaneously is impossible if both require anything but waiting for some external resource to respond from your CPU.
Perhaps there is a way with an executor for a multiprocessing pool. I have added the following code below:
import asyncio
import concurrent.futures
async def asynciorunAsync():
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
async for value in matGenerator_Async(testiters):
result = await loop.run_in_executor(pool, computeOpertion,value)
print("Async with PoolExecutor:")
start=dt.now()
asyncio.run(asynciorunAsync())
print(dt.now()-start)
Although timing this, it still takes the same amount of time as the synchronous example. I think I will have to go with a more involved solution as it seems that async and await are too crude of a tool to properly do this type of task switching.

I don't work with trio, my answer it asyncio based.
Under these circumstances the only way to improve the asyncio performance I see is to break the computation into smaller pieces and insert await sleep(0) between them. This would allow the data fetching task to run.
Asyncio uses cooperative scheduling. A synchronous CPU bound routine does not cooperate, it blocks everything else while it is running.
sleep() always suspends the current task, allowing other tasks to run.
Setting the delay to 0 provides an optimized path to allow other tasks
to run. This can be used by long-running functions to avoid blocking
the event loop for the full duration of the function call.
(quoted from: asyncio.sleep)
If that is not possible, try to run the computation in an executor. This adds some multi-threading capabilities to otherwise pure asyncio code.

The point of async I/O is to make it easy to write programs where there is lots of network I/O but very little actual computation (or disk I/O). That applies to any async library (Trio or asyncio) or even different languages (e.g. ASIO in C++). So your program is ideally unsuited to async I/O! You will need to use multiple threads (or processes). Although, in fairness, async I/O including Trio can be useful for coordinating work on threads, and that might work well in your case.
As VPfB's answer says, if you're using asyncio then you can use executors, specifically a ThreadPoolExecutor passed to loop.run_in_executor(). For Trio, the equivalent would be trio.to_thread.run_sync() (see also Threads (if you must) in the Trio docs), which is even easier to use. In both cases, you can await the result, so the function is running in a separate thread while the main Trio thread can continue running your async code. Your code would end up looking something like this:
async def matGenerator_Async(count):
for _ in range(count):
yield await trio.to_thread.run_sync(generateMat, dim)
async def my_trio_main()
async with trio.open_nursery() as nursery:
async for matrix in matGenerator_Async(testiters):
nursery.start_soon(trio.to_thread.run_sync, computeOperation, matrix)
trio.run(my_trio_main)
There's no need for the computation functions (generateMat and computeOperation) to be async. In fact, it's problematic if they are because you could no longer run them in a separate thread. In general, only make a function async if it needs to await something or use async with or async for.
You can see from the above example how to pass data to the functions running in the other thread: just pass them as parameters to trio.to_thread.run_sync(), and they will be passed along as parameters to the function. Getting the result back from generateMat() is also straightforward - the return value of the function called in the other thread is returned from await trio.to_thread.run_sync(). Getting the result of computeOperation() is trickier, because it's called in the nursery, so its return value is thrown away. You'll need to pass a mutable parameter to it (like a dict) and stash the result in there. But be careful about thread safety; the easiest way to do that is to pass a new object to each coroutine, and only inspect them all after the nursery has finished.
A few final footnotes that you can probably ignore:
Just to be clear, yield await in the code above isn't some sort of special syntax. It's just await foo(), which returns a value once foo() has finished, followed by yield of that value.
You can change the number of threads Trio uses for calls to to_thread.run_sync() by passing a CapacityLimiter object, or by finding the default one and setting the count on that. It looks like the default is currently 40, so you might want to turn that down a bit, but it's probably not too important.
There is a common myth that Python doesn't support threads, or at least can't do computation in multiple threads simultaneously, because it has a single global lock (the global interpreter lock, or GIL). That would mean that you need to use multiple processes, rather than threads, for your program to really compute thing in parallel. It's true there is a GIL in Python, but so long as you're doing your computation using something like numpy, which you are, then it doesn't stop multithreading from working effectively.
Trio actually has great support for async file I/O. But I don't think it would be helpful in your case.

To supplement my other answer (which uses Trio like you asked), here's how to do it use it just using threads without any async library. The easiest way to do this with Future objects and a ThreadPoolExecutor.
futures = []
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
for matrix in matGenerator(testiters):
futures.append(executor.submit(computeOperation, matrix))
results = [f.result() for f in futures]
The code is actually pretty similar to the async code, but if anything it's simpler. If you don't need to do network I/O, you're better off with this method.

I think the main issue with using multiprocessing and not seeing any improvement is the 100% utilization of the CPU. It essentially leaves you with an async-like behavior where resources are occasionally being freed up and used for the I/O process. You could set a limit to the number of workers for your ProcessPoolExecutor and that might allow the I/O the room it needs to be more at the ready.
Disclaimer: I'm still new to multiprocessing and threading.

How to properly use asyncio run_coroutine_threadsafe function?

I am trying to understand asyncio module and spend about one hour with run_coroutine_threadsafe function, I even came to the working example, it works as expected, but works with several limitations.
First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
What are situations when I have to mix asyncio and threading (in Python) in real life? Since as far as I understand asyncio is supposed to take place of threading in Python (due to GIL for not IO ops), if I am wrong, do not be angry and share your suggestions.
Python version is 3.7/3.8
import asyncio
import threading
import time
async def coro_func():
return await asyncio.sleep(3, 42)
def another_thread(_loop):
coro = coro_func() # is local thread coroutine which we would like to run in another thread
# _loop is a loop which was created in another thread
future = asyncio.run_coroutine_threadsafe(coro, _loop)
print(f"{threading.current_thread().name}: {future.result()}")
time.sleep(15)
print(f"{threading.current_thread().name} is Finished")
if __name__ == '__main__':
loop = asyncio.get_event_loop()
main_th_cor = asyncio.sleep(10)
# main_th_cor is used to make loop busy with something until another_thread will not send coroutine to it
print("START MAIN")
x = threading.Thread(target=another_thread, args=(loop, ), name="Some_Thread")
x.start()
time.sleep(1)
loop.run_until_complete(main_th_cor)
print("FINISH MAIN")

First of all I do not understand how should I properly call asyncio loop in main (any other) thread, in the example I call it with run_until_complete and give it a coroutine to make it busy with something until another thread will not give it a coroutine. What are other options I have?
This is a good use case for loop.run_forever(). The loop will run and serve the coroutines you submit using run_coroutine_threadsafe. (You can even submit such coroutines from multiple threads in parallel; you never need to instantiate more than one event loop.)
You can stop the loop from a different thread by calling loop.call_soon_threadsafe(loop.stop).
What are situations when I have to mix asyncio and threading (in Python) in real life?
Ideally there should be none. But in the real world, they do crop up; for example:
When you are introducing asyncio into an existing large program that uses threads and blocking calls and cannot be converted to asyncio all at once. run_coroutine_threadsafe allows regular blocking code to make use of asyncio.
When you are dealing with older "async" APIs which use threads under the hood and call the user-supplied APIs from other threads. There are many examples, such as Python's own multiprocessing.
When you need to call blocking functions that have no async equivalent from asyncio - e.g. CPU-bound functions, legacy database drivers, things like that. This is not a use case for run_coroutine_threadsafe, here you'd use run_in_executor, but it is another example of mixing threads and asyncio.

Python synchronous code example faster than async

I was migrating a production system to async when I realized the synchronous version is 20x faster than the async version. I was able to create a very simple example to demonstrate this in a repeatable way;
Asynchronous Version
import asyncio, time
data = {}
async def process_usage(key):
data[key] = key
async def main():
await asyncio.gather(*(process_usage(key) for key in range(0,1000000)))
s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This takes 19 seconds. The code loops through 1M keys and builds a dictionary, data with the same key and value.
$ python3.7 async_test.py
Took 19.08 seconds.
Synchronous Version
import time
data = {}
def process_usage(key):
data[key] = key
def main():
for key in range(0,1000000):
process_usage(key)
s = time.perf_counter()
results = main()
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This takes 0.17 seconds! And does exactly the same thing as above.
$ python3.7 test.py
Took 0.17 seconds.
Asynchronous Version with create_task
import asyncio, time
data = {}
async def process_usage(key):
data[key] = key
async def main():
for key in range(0,1000000):
asyncio.create_task(process_usage(key))
s = time.perf_counter()
results = asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"Took {elapsed:0.2f} seconds.")
This version brings it down to 11 seconds.
$ python3.7 async_test2.py
Took 11.91 seconds.
Why does this happen?
In my production code I will have a blocking call in process_usage where I save the value of key to a redis database.

When comparing those benchmarks one should note that the asynchronous version is, well, asynchronous: asyncio spends a considerable effort to ensure that the coroutines you submit can run concurrently. In your particular case they don't actually run concurrently because process_usage doesn't await anything, but the system doesn't actually that. The synchronous version on the other hand makes no such provisions: it just runs everything sequentially, hitting the happy path of the interpreter.
A more reasonable comparison would be for the synchronous version to try to parallelize things in the way idiomatic for synchronous code: by using threads. Of course, you won't be able to create a separate thread for each process_usage because, unlike asyncio with its tasks, the OS won't allow you to create a million threads. But you can create a thread pool and feed it tasks:
def main():
with concurrent.futures.ThreadPoolExecutor() as executor:
for key in range(0,1000000):
executor.submit(process_usage, key)
# at the end of "with" the executor automatically
# waits for all futures to finish
On my system this takes ~17s, whereas the asyncio version takes ~18s. (The faster asyncio version takes ~13s.)
If the speed gain of asyncio is so small, one could ask why bother with asyncio? The difference is that with asyncio, assuming idiomatic code and IO-bound coroutines, you have at your disposal a virtually unlimited number of tasks that in a very real sense execute concurrently. You can create tens of thousands of asynchronous connections at the same time, and asyncio will happily juggle them all at once, using a high-quality poller and a scalable coroutine scheduler. With a thread pool the number of tasks executed in parallel is always limited by the number of threads in the pool, typically in the hundreds at most.
Even toy examples have value, for learning if nothing else. If you are using such microbenchmarks to make decisions, I suggest investing some more effort to give the examples more realism. The coroutine in the asyncio example should contain at least one await, and the sync example should use threads to emulate the same amount of parallelism you obtain with async. If you adjust both to match your actual use case, then the benchmark actually puts you in a position to make a (more) informed decision.

Why does this happen?
TL;DR
Because using asyncio itself doesn't speedup code. You need multiple gathered network I/O related operations to see the difference toward synchronous version.
Detailed
asyncio is not a magic that allows you to speedup arbitrary code. With or without asyncio your code is still being run by CPU with limit performance.
asyncio is a way to manage multiple execution flows (coroutines) in a nice, clear way. Multiple execution flows allow you to start next I/O-related operation (such as request to database) before waiting for other one to be completed. Please read this answer for more detailed explanation.
Please also read this answer for explanation when it makes sense to use asyncio.
Once you start to use asyncio right way overhead for using it should be much lower than benefits you get for parallelizing I/O operations.

Start processing async tasks while adding them to the event loop

A common pattern with asyncio, like the one shown here, is to add a collection of coroutines to a list, and then asyncio.gather them.
For instance:
async def some_task(i):
# Do something asynchronously with i
tasks = [some_task(i) for i in range(100)]
loop.run_until_complete(asyncio.gather(**tasks))
Here, the execution order of this code is such that none of the tasks are running while we build up the list. We add task 1 to the list, then task 2, etc. and then we add the tasks 1-100 to the event loop.
However, I want task creation itself to be part of the event loop. I want task 1 to be scheduled immediately as it's created, and then when task is waiting for something on another thread, return to task creation and create task 2 and add it to the event loop.
I believe this would give me better concurrency from my async code. Is this possible?
For example, my first thought would be to put task creation into a coroutine and schedule tasks as they are created:
async def some_task(i):
# Do something asynchronously with i
async def generate_tasks(loop):
tasks = []
for i in range(100):
task = loop.create_task(some_task(i))
tasks.append(loop)
await asyncio.gather(**tasks)
loop.run_until_complete(generate_tasks())
However, because my generate_tasks never uses await, execution is never passed back to the event loop, so the entirety of generate_tasks will run before some_task() is run at all.
But then, if I await each task as they are created, it will wait for each task to complete before moving on to the next task, giving me no concurrency at all!
async def generate_tasks(loop):
tasks = []
for i in range(100):
await some_task(i)
loop.run_until_complete(generate_tasks())

However, because my generate_tasks never uses await, execution is never passed back to the event loop
You can use await asyncio.sleep(0) to force yielding to the event loop inside for. But that is unlikely to make a difference, creating a task/coroutine pair is really efficient.
Before optimizing this, measure (with something as simple as time.time if need be) how much time it takes to execute the [some_task(i) for i in range(100)] list comprehension. Then consider whether dispersing that time (possibly making it take longer to finish due to increased scheduling overhead) will make any difference for your application. The results might surprise you.

asyncio with multiple processors [duplicate]

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?

You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()

Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.

Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res

See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.