Python3 Asyncio shared resources between concurrent tasks

Python3 Asyncio shared resources between concurrent tasks - python

I've got a network application written in Python3.5 which takes advantage of pythons Asyncio which concurrently handles each incoming connection.
On every concurrent connection, I want to store the connected clients data in a list. I'm worried that if two clients connect at the same time (which is a possibility) then both tasks will attempt to write to the list at the same time, which will surely raise an issue. How would I solve this?

asyncio does context switching only on yield points (await expressions), thus two parallel tasks are not executed at the same time.
But if race conditions are still possible (it depends on concrete code structure) you may use asyncio synchronization primitives and queues.

There is lots of info that is missing in your question.
Is your app threaded? If yes, then you have to wrap your list in a threading.Lock.
Do you switch context (e.g. use await) between writes (to the list) in the request handler? If yes, then you have to wrap your list in a asyncio.Lock.
Do you do multiprocessing? If yes then you have to use multiprocessing.Lock
Is your app divided onto multiple machines? Then you have to use some external shared database (e.g. Redis).
If answers to all of those questions is no then you don't have to do anything since single-threaded async app cannot update shared resource parallely.

Related

Concurrency: multiprocessing, threading, greenthreads and asyncio

I'm currently working on Python project that receives a lot os AWS SQS messages (more than 1 million each day), process these messages, and send then to another SQS queue with additional data. Everything works fine, but now we need to speed up this process a lot!
From what we have seen, or biggest bottleneck is in regards to HTTP requests to send and receive messages from AWS SQS api. So basically, our code is mostly I/O bound due to these HTTP requests.
We are trying to escalate this process by one of the following methods:
Using Python's multiprocessing: this seems like a good idea, but our workers run on small machines, usually with a single core. So creating different process may still give some benefit, since the CPU will probably change process as one or another is stuck at an I/O operation. But still, that seems a lot of overhead of process managing and resources for an operations that doesn't need to run in parallel, but concurrently.
Using Python's threading: since GIL locks all threads at a single core, and threads have less overhead than processes, this seems like a good option. As one thread is stuck waiting for an HTTP response, the CPU can take another thread to process, and so on. This would get us to our desired concurrent execution. But my question is how dos Python's threading know that it can switch some thread for another? Does it knows that some thread is currently on an I/O operation and that he can switch her for another one? Will this approach absolutely maximize CPU usage avoiding busy wait? Do I specifically has to give up control of a CPU inside a thread or is this automatically done in Python?
Recently, I also read about a concept called green-threads, using Eventlet on Python. From what I saw, they seem the perfect match for my project. The have little overhead and don't create OS threads like threading. But will we have the same problems as threading referring to CPU control? Does a green-thread needs to warn the CPU that it may take another one? I saw on some examples that Eventlet offers some built-in libraries like Urlopen, but no Requests.
The last option we considered was using Python's AsyncIo and async libraries such as Aiohttp. I have done some basic experimenting with AsyncIo and wasn't very pleased. But I can understand that most of it comes from the fact that Python is not a naturally asynchronous language. From what I saw, it would behave something like Eventlet.
So what do you think would be the best option here? What library would allow me to maximize performance on a single core machine? Avoiding busy waits as much as possible?

Is data safety guaranteed while using `ThreadPoolExecutor` from python's `future` module?

I'm looking for a conceptual answer on this question.
I'm wondering whether using ThreadPool in python to perform concurrent tasks, guarantees that data is not corrupted; I mean multiple threads don't access the critical data at the same time.
If so, how does this ThreadPoolExecutor internally works to ensure that critical data is accessed by only one thread at a time?

Thread pools do not guarantee that shared data is not corrupted. Threads can swap at any byte code execution boundary and corruption is always a risk. Shared data should be protected by synchronization resources such as locks, condition variables and events. See the threading module docs
concurrent.futures.ThreadPoolExecutor is a thread pool specialized to the concurrent.futures async task model. But all of the risks of traditional threading are still there.
If you are using the python async model, things that fiddle with shared data should be dispatched on the main thread. The thread pool should be used for autonomous events, especially those that wait on blocking I/O.

If so, how does this ThreadPoolExecutor internally works to ensure that critical data is accessed by only one thread at a time?
It doesn't, that's your job.
The high-level methods like map will use a safe work queue and not share work items between threads, but if you've got other resources which can be shared then the pool does not know or care, it's your problem as the developer.

Flask: spawning a single async sub-task within a request

I have seen a few variants of my question but not quite exactly what I am looking for, hence opening a new question.
I have a Flask/Gunicorn app that for each request inserts some data in a store and, consequently, kicks off an indexing job. The indexing is 2-4 times longer than the main data write and I would like to do that asynchronously to reduce the response latency.
The overall request lifespan is 100-150ms for a large request body.
I have thought about a few ways to do this, that is as resource-efficient as possible:
Use Celery. This seems the most obvious way to do it, but I don't want to introduce a large library and most of all, a dependency on Redis or other system packages.
Use subprocess.Popen. This may be a good route but my bottleneck is I/O, so threads could be more efficient.
Using threads? I am not sure how and if that can be done. All I know is how to launch multiple processes concurrently with ThreadPoolExecutor, but I only need to spawn one additional task, and return immediately without waiting for the results.
asyncio? This too I am not sure how to apply to my situation. asyncio has always a blocking call.
Launching data write and indexing concurrently: not doable. I have to wait for a response from the data write to launch indexing.
Any suggestions are welcome!
Thanks.

Celery will be your best bet - it's exactly what it's for.
If you have a need to introduce dependencies, it's not a bad thing to have dependencies. Just as long as you don't have unneeded dependencies.
Depending on your architecture, though, more advanced and locked-in solutions might be available. You could, if you're using AWS, launch an AWS Lambda function by firing off an AWS SNS notification, and have that handle what it needs to do. The sky is the limit.

I actually should have perused the Python manual section on concurrency better: the threading module does just what I needed: https://docs.python.org/3.5/library/threading.html
And I confirmed with some dummy sleep code that the sub-thread gets completed even after the Flask request is completed.

asyncio and coroutines vs task queues

I've been reading about asyncio module in python 3, and more broadly about coroutines in python, and I can't get what makes asyncio such a great tool.
I have the feeling that all you can do with coroutines, you can do better by using task queues based on the multiprocessing module (celery for example).
Are there use cases where coroutines are better than task queues?

Not a proper answer, but a list of hints that could not fit into a comment:
You are mentioning the multiprocessing module (and let's consider threading too). Suppose you have to handle hundreds of sockets: can you spawn hundreds of processes or threads?
Again, with threads and processes: how do you handle concurrent access to shared resources? What is the overhead of mechanisms like locking?
Frameworks like Celery also add an important overhead. Can you use it e.g. for handling every single request on a high-traffic web server? By the way, in that scenario, who is responsible for handling sockets and connections (Celery for its nature can't do that for you)?
Be sure to read the rationale behind asyncio. That rationale (among other things) mentions a system call: writev() -- isn't that much more efficient than multiple write()s?

Adding to the above answer:
If the task at hand is I/O bound and operates on a shared data, coroutines and asyncio are probably the way to go.
If on the other hand, you have CPU-bound tasks where data is not shared, a multiprocessing system like Celery should be better.
If the task at hand is a both CPU and I/O bound and sharing of data is not required, I would still use Celery.You can use async I/O from within Celery!
If you have a CPU bound task but with the need to share data, the only viable option as I see now is to save the shared data in a database. There have been recent attempts like pyparallel but they are still work in progress.

Threaded vs. asynchronous image processing?

I have a Python function which generates an image once it is accessed. I can either invoke it directly upon a HTTP request, or do it asynchronously using Gearman. There are a lot of requests.
Which way is better:
Inline - create an image inline, will result in many images being generated at once
Asynchronous - queue jobs (with Gearman) and generate images in a worker
Which option is better?
In this case "better" would mean the best speed / load combinations. The image generation example is symbolical, as this can also be applied to Database connections and other things.

I have a Python function which
generates an image once it is
accessed. I can either invoke it
directly upon a HTTP request, or do it
asynchronously using Gearman. There
are a lot of requests.
You should not do it inside you request because then you can't throttle(your server could get overloaded). All big sites use a message queue to do the processing offline.
Which option is better?
In this case "better" would mean the
best speed / load combinations. The
image generation example is
symbolical, as this can also be
applied to Database connections and
other things.
You should do it asynchronous because the most compelling reason to do it besides it speeds up your website is that you can throttle your queue if you are on high load. You could first execute the tasks with the highest priority.
I believe forking processes is expensive. I would create a couple worker processes(maybe do a little threading inside process) to handle the load. I would probably use redis because it is fast, actively developed(antirez/pietern commits almost everyday) and has a very good/stable python client library. blpop/rpush could be used to simulate a queue(job)

If your program is CPU bound in the interpreter then spawning multiple threads will actually slow down the result even if there are enough processors to run them all. This happens because the GIL (global interpreter lock) only allows one thread to run in the interpreter at a time.
If most of the work happens in a C library it's likely the lock is not held and you can productively use multiple threads.
If you are spawning threads yourself you'll need to make sure to not create too many - 10K threads at one would be bad news - so you'd need to setup a work queue that the threads read from instead of just spawning them in a loop.
If I was doing this I'd just use the standard multiprocessing module.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.