python gevent in function that is not patchable - python

I'm working in a REST service that is basically an wrapper to a library. I'm using flask and gunicorn. Basically each endpoint in the service maps to a different function in the library.
It happens that some of the calls to the library can take a long time to return, and that is making my service run out of workers once the service starts receiving a few requests. Right now I'm using the default gunicorn workers (sync).
I wanted to use gevent workers in order to be able to receive more requests, because not every endpoint takes that long to execute. However the function in the library does not use any of the patchable gevent functions, meaning that it won't cooperatively schedule to another green thread.
I had this idea of using a pool of threads or processes to handle the calls to the library asynchronously, and then each green thread produced by gunicorn would sleep until the process is not finished. Does this idea make sense at all?
Is it possible to use the multiprocessing.Process with gevent? and then have the join method to give up control to another green thread, and only return when the process is finished?

Yes, it makes perfect sense to use (real) threads or processes from within gevent for code that needs to be asynchronous but can't be monkeypatched by gevent.
Of course it can be tricky to get right—first, because you may have monkeypatched threading, and second, because you want your cooperative threads to be able to block on a pool or a pool result without blocking the whole main thread.
But that's exactly what gevent.threadpool is for.
If you would have used concurrent.futures.ThreadPoolExecutor in a non-gevent app, monkeypatch threading and then use gevent.threadpool.ThreadPoolExecutor.
If you would have used multiprocessing.dummy.Pool in a non-gevent app, monkeypatch threading and then use gevent.threadpool.ThreadPool.
Either way, methods like map, submit, apply_async, etc. work pretty much the way you'd expect. The Future and AsyncResult objects play nice with greenlets; you can gevent.wait things, or attach callbacks (which will run as greenlets), etc. Most of the time it just works like magic, and the rest of the time it's not too hard to figure out.
Using processes instead of threads is doable, but not as nice. AFAIK, there's no wrappers for anything as complete as multiprocessing.Process or multiprocessing.Pool, and trying to use the normal multiprocessing just hangs. You can manually fork if you're not on Windows, but that's about all that's built in. If you really need multiprocessing, you may need to do some multi-layered thing, where your greenlets don't talk to a process, but instead talk to a thread that creates a pipe, forks, execs, and then proxies between the gevent world and the child process.
If the calls are taking a long time because they're waiting on I/O from a backend service, or waiting on a subprocess, or doing GIL-releasing numpy work, I wouldn't bother trying to do multiprocessing. But if they're taking a long time because they're burning CPU… well, then you either need to get multiprocessing working, or go lower-level and just spin off a subprocess.Popen([sys.executable, 'workerscript.py']).

Related

Concurrency: multiprocessing, threading, greenthreads and asyncio

I'm currently working on Python project that receives a lot os AWS SQS messages (more than 1 million each day), process these messages, and send then to another SQS queue with additional data. Everything works fine, but now we need to speed up this process a lot!
From what we have seen, or biggest bottleneck is in regards to HTTP requests to send and receive messages from AWS SQS api. So basically, our code is mostly I/O bound due to these HTTP requests.
We are trying to escalate this process by one of the following methods:
Using Python's multiprocessing: this seems like a good idea, but our workers run on small machines, usually with a single core. So creating different process may still give some benefit, since the CPU will probably change process as one or another is stuck at an I/O operation. But still, that seems a lot of overhead of process managing and resources for an operations that doesn't need to run in parallel, but concurrently.
Using Python's threading: since GIL locks all threads at a single core, and threads have less overhead than processes, this seems like a good option. As one thread is stuck waiting for an HTTP response, the CPU can take another thread to process, and so on. This would get us to our desired concurrent execution. But my question is how dos Python's threading know that it can switch some thread for another? Does it knows that some thread is currently on an I/O operation and that he can switch her for another one? Will this approach absolutely maximize CPU usage avoiding busy wait? Do I specifically has to give up control of a CPU inside a thread or is this automatically done in Python?
Recently, I also read about a concept called green-threads, using Eventlet on Python. From what I saw, they seem the perfect match for my project. The have little overhead and don't create OS threads like threading. But will we have the same problems as threading referring to CPU control? Does a green-thread needs to warn the CPU that it may take another one? I saw on some examples that Eventlet offers some built-in libraries like Urlopen, but no Requests.
The last option we considered was using Python's AsyncIo and async libraries such as Aiohttp. I have done some basic experimenting with AsyncIo and wasn't very pleased. But I can understand that most of it comes from the fact that Python is not a naturally asynchronous language. From what I saw, it would behave something like Eventlet.
So what do you think would be the best option here? What library would allow me to maximize performance on a single core machine? Avoiding busy waits as much as possible?

Difference between multiprocessing, asyncio, threading and concurrency.futures in python

Being new to using concurrency, I am confused about when to use the different python concurrency libraries. To my understanding, multiprocessing, multithreading and asynchronous programming are part of concurrency, while multiprocessing is part of a subset of concurrency called parallelism.
I searched around on the web about different ways to approach concurrency in python, and I came across the multiprocessing library, concurrenct.futures' ProcessPoolExecutor() and ThreadPoolExecutor(), and asyncio. What confuses me is the difference between these libraries. Especially what the multiprocessing library does, since it has methods like pool.apply_async, does it also do the job of asyncio? If so, why is it called multiprocessing when it is a different method to achieve concurrency from asyncio (multiple processes vs cooperative multitasking)?
There are several different libraries at play:
threading: interface to OS-level threads. Note that CPU-bound work is mostly serialized by the GIL, so don't expect threading to speed up calculations. Use it when you need to invoke blocking APIs in parallel, and when you require precise control over thread creation. Avoid creating too many threads (e.g. thousands), as they are not free. If possible, don't create threads yourself, use concurrent.futures instead.
multiprocessing: interface to spawning multiple python processes with an API intentionally similar to threading. Multiple processes work in parallel, so you can actually speed up calculations using this method. The disadvantage is that you can't share in-memory datastructures without using multi-processing specific tools.
concurrent.futures: A modern interface to threading and multiprocessing, which provides convenient thread/process pools it calls executors. The pool's main entry point is the submit method which returns a handle that you can test for completion or wait for its result. Getting the result gives you the return value of the submitted function and correctly propagates raised exceptions (if any), which would be tedious to do with threading. concurrent.futures should be the tool of choice when considering thread or process based parallelism.
asyncio: While the previous options are "async" in the sense that they provide non-blocking APIs (this is what methods like apply_async refer to), they are still relying on thread/process pools to do their magic, and cannot really do more things in parallel than they have workers in the pool. Asyncio is different: it uses a single thread of execution and async system calls across the board. It has no blocking calls at all, the only blocking part being the asyncio.run() entry point. Asyncio code is typically written using coroutines, which use await to suspend until something interesting happens. (Suspending is different than blocking in that it allows the event loop thread to continue to other things while you're waiting.) It has many advantages compared to thread-based solutions, such as being able to spawn thousands of cheap "tasks" without bogging down the system, and being able to cancel tasks or easily wait for multiple things at once. Asyncio should be the tool of choice for servers and for clients connecting to multiple servers.
When choosing between asyncio and multithreading/multiprocessing, consider the adage that "threading is for working in parallel, and async is for waiting in parallel".
Also note that asyncio can await functions executed in thread or process pools provided by concurrent.futures, so it can serve as glue between all those different models. This is part of the reason why asyncio is often used to build new library infrastructure.

Python Threading vs Gevent for High Volume Web Scraping

I'm trying to decide if I should use gevent or threading to implement concurrency for web scraping in python.
My program should be able to support a large (~1000) number of concurrent workers. Most of the time, the workers will be waiting for requests to come back.
Some guiding questions:
What exactly is the difference between a thread and a greenlet? What is the max number of threads \ greenlets I should create in a single process (with regard to the spec of the server)?
The python thread is the OS thread which is controlled by the OS which means it's a lot heavier since it needs context switch, but green threads are lightweight and since it's in userspace the OS does not create or manage them.
I think you can use gevent, Gevent = eventloop(libev) + coroutine(greenlet) + monkey patch. Gevent give you threads but without using threads with that you can write normal code but have async IO.
Make sure you don't have CPU bound stuff in your code.
I don't think you have thought this whole thing through. I have done some considerable lightweight thread apps with Greenlets created from the Gevent framework. As long as you allow control to switch between Greenlets with appropriate sleep's or switch's -- everything tends to work fine. Rather than blocking or waiting for a reply, it is recommended that the wait or block timeout, raise and except and then sleep (in except part of your code) and then loop again - otherwise you will not switch Greenlets readily.
Also, take care to join and/or kill all Greenlets, since you could end up with zombies that cause copious effects that you do not want.
However, I would not recommend this for your application. Rather, one of the following Websockets extensions that use Gevent... See this link
Websockets in Flask
and this link
https://www.shanelynn.ie/asynchronous-updates-to-a-webpage-with-flask-and-socket-io/
I have implemented a very nice app with Flask-SocketIO
https://flask-socketio.readthedocs.io/en/latest/
It runs through Gunicorn with Nginx very nicely from a Docker container. The SocketIO interfaces very nicely with Javascript on the client side.
(Be careful on the webscraping - use something like Scrapy with the appropriate ethical scraping enabled)

How to use Tornado with APScheduler?

I am running python's apscheduler and periodically want to do some work POST-ing to some http resources which will involve using tornado's AsyncHttpClient as a scheduled job. Each job will do several POSTs. When each http request responds a callback is then called (I think that Tornado uses a future to accomplish this).
I am concerned with thread-safety here since Apscheduler runs jobs in various threads. I have not been able to find a well explained example of how tornado would best be used across multiple threads in this context.
How can I best use apscheduler with tornado in this manner?
Specific concerns:
Which tornado ioloop to use? The docs say that AsyncHTTPClient "works like magic". Well, magic scares me. Do I need to use AsyncHTTPClient from within the current thread or can I use the main one (it can be specified)?
Are there thread-safety issues with my callback with respect to which ioloop I use?
Not clear to me what happens when a thread completes but there is still a pending callback/future that needs to be called. Are there issues here?
Since apscheduler is run as threads in-process, and python has the GIL, then is it pretty much the same to have one IOLoop from the main thread - as opposed to multiple loops from different threads (with respect to performance)?
All of Tornado's utilities work around Tornado's IOLoop - this includes the AsyncHTTPClient as well. And an IOLoop is not considered thread safe. Therefore, it is not a great idea to be running AsyncHTTPClient from any thread other than the thread running your main IOLoop. For more details on how to use the IOLoop, read this.
If you use tornado.ioloop.IOLoop.instance(), then I suppose you will if your intention is not to add callbacks to the main thread's IOLoop. You can use tornado.ioloop.IOLoop.current() to correctly refer to the right IOLoop instance for the right thread. And you will have to do just too much book keeping to add a callback to a non-main thread's IOLoop from another non-main thread's IOLoop - it will just get too messy.
I don't quite get this. But the way I understand it, there are two scenarios. Either you are talking about a thread with an IOLoop or without an IOLoop. If the thread does not have an IOLoop running, then after whatever the thread does to reach completion, whatever callback has to be executed by the IOLoop in some other thread (perhaps main thread) will be executed. The other scenario is that the thread you are talking about has an IOLoop running. Then the thread won't complete unless you have stopped the IOLoop. And therefore, execution of the callback will really depend on when you stop the IOLoop.
Honestly, I don't see much point of using threads with Tornado. There won't be any performance gain unless you are running on PyPy, which I am not sure if Tornado will play well with (not all the things are known to work on it and honestly I don't know about Tornado as well). You might as well have multiple process of your Tornado app if it is webserver and use Nginx as a proxy and LB. Since you have brought in apscheduler, I would suggest using IOLoop's add_timeout which does pretty much the same thing that you need and it is native to Tornado which play much nicer with it. Callbacks are anyways much difficult to debug. Combine it with Python's threading and you can have a massive mess. If you are ready to consider another option, just move all the async processing out of this process - it will make life much easier. Think of something like Celery for this.

When to thread?

I have never written any code that uses threads.
I have a web application that accepts a POST request, and creates an image based on the data in the body of the request.
Would I want to spin off a thread for the image creation, as to prevent the server from hanging until the image is created? Is this an appropriate use, or merely a solution looking for a problem ?
Please correct any misunderstandings I may have.
Rather than thinking about handling this via threads or even processes, consider using a distributed task manager such as Celery to manage this sort of thing.
Usual approach for handling HTTP requests synchronously is to spawn (or re-use one in the pool) new thread for each request as soon as it comes.
However, python threads are not very good for HTTP, due to GIL and some i/o and other calls blocking whole app, including other threads.
You should look into multiprocessing module for this usage. Spawn some worker processes, and then pass requests to them to process.

Categories

Resources