So, I'm writing a python web application using the twisted web2 framework. There's a library that I need to use (SQLAlchemy, to be specific) that doesn't have asynchronous code. Would it be bad to spawn a thread to handle the request, fetch any data from the DB, and then return a response? I'm afraid that if there was a flood of requests, too many threads would be started and the server would be overwhelmed. Is there something built into twisted that prevents this from happening (eg request throttling)?
See the docs, and specifically the thread pool which lets you control how many threads are active at most. Spawning one new thread per request would definitely be an inferior idea!
Related
We're running a Django project with gunicorn and eventlet.
I'd like to use threading.local to stash some http request data for use later in that thread (with some custom middleware). I'm wondering if this is safe with eventlet.
From the docs
Eventlet is thread-safe and can be used in conjunction with normal Python threads. The way this works is that coroutines are confined to their ‘parent’ Python thread. It’s like each thread contains its own little world of coroutines that can switch between themselves but not between coroutines in other threads.
which sounds like it might be.
But I understand that eventlet, based on reading their docs on 'How the Hubs Work', may suspend a co-routine to process another one. Is it possible, with gunicorn the an http request processing may get suspended and another http request would get picked up and processed by a co-routine in that same initial thread? And if so, does that mean that the threading.local could get shared between two requests?
Can I get away with using threading.local and be certain that each incoming request will get it's own thread.local space?
I also saw this post
the simultaneous connections are handled by green threads. Green threads are not like real threads. In simple terms, green threads are functions (coroutines) that yield whenever the function encounters I/O operation
which makes me think a single "thread" could process multiple requests. And I guess if that is true, then I wonder where exactly is threading.local? at the thread? in a co-routine eventlet (air quotes)thread(air quotes)?
Any pointers would be appreciated here.
Thanks
tl;dr: the answer is yes.
The eventlet coroutines are treated as separate threads so threading.local will work.
A longer discussion is available on the eventlet GitHub issue.
I have the following scenario:
There is one thread that manages long-polling HTTP connection (non-stop) from an API. When a new message arrives, it must be processed within the special process() method.
I just want to design it in a way that incoming messages will be processed concurrently, but there is another important point: in the end of each processing an answer should be passed to the outcoming queue, which is organized in a separated thread. From there the answers will be sent via HTTP.
Here is a scheme:
Let's consider that it can be 30-50 messages in a second, and procces method will work from 1 up to 10 seconds.
The question is: what library or framework can I use to implement this architecture?
As far as I have researched, Python Tornado have good benchmarks, but here I do not need a web framework, just a tool that can provide a concurrent running of message processors.
Your message rate is pretty low. So you may freely use "standard" tools like RabbitMQ/Redis, Celery ("Celery Project") and asyncio.
RabbitMQ/Redis with Celery - are great tools to implement queues and manage your tasks and processes.
Asyncio is faster than Tornado but it doesn't matter for your task. What is more important is that asyncio gives you all the benefits of modern async/await coroutine technique.
Our server has a lot if CPUs, and some web requests could be faster if request handlers would do some parallel processing.
Example: Some work needs to be done on N (about 1-20) pictures, to severe one web request.
Caching or doing the stuff before the request comes in is not possible.
What can be done to use several CPUs of the hardware:
threads: I don't like them
multiprocessing: Every request needs to start N processes. Many CPU cycles will be lost for starting a new process and importing libraries.
special (hand made) service, which has N processes ready for processing
cellery (rabbitMQ): I don't know how big the communication overhead is...
Other solution?
Platform: Django (Python)
Regarding your second and third alternatives: you do not need to start a new process for every request. This is what process pools are for. New processes are created when your app starts up. When you submit a request to the pool, it is automatically queued until a worker is available. The disadvantage is that requests are blocking- if no worker is available at the moment, your user will sit and wait.
You could use the standard library module asyncore.
This module provides the basic infrastructure for writing asynchronous socket service clients and servers.
There is an example for how to create a basic HTML client.
Then there's Twisted, it can do lots and lots of things, which is why it's somewhat daunting. Here is an example using its HTTP client.
Twisted "speaks HTTP", asyncore does not, you'll have to.
Other libraries:
Tornado's httpclient
asynchttp
Tornado is non-blocking webserver.
However, all of the operations are run in a single thread.
How does it stay non-blocking if it is handled by single thread?
If there is a long operation, will it block new coming request?
Is downloading a large file from Tornado a long blocking process?
Please kindly correct me if my understanding is not accurate.
Many Thanks
If there is a long operation, will it block new coming request?
Yes. No. It depends.
Anything which happens inside Tornado itself blocks. So if you do "time.sleep(10)" or do a computationally intensive operation, it will block.
What Tornado (and Twisted, and node.js) can do well is request data from another service (like Amazon, or Facebook, or a subprocess, or a database with an async library) then serve other requests while it's waiting for a reply. See http://www.tornadoweb.org/documentation/overview.html#non-blocking-asynchronous-requests
To do this, you need the server in front to be async too (so Nginx, not Apache).
I have never written any code that uses threads.
I have a web application that accepts a POST request, and creates an image based on the data in the body of the request.
Would I want to spin off a thread for the image creation, as to prevent the server from hanging until the image is created? Is this an appropriate use, or merely a solution looking for a problem ?
Please correct any misunderstandings I may have.
Rather than thinking about handling this via threads or even processes, consider using a distributed task manager such as Celery to manage this sort of thing.
Usual approach for handling HTTP requests synchronously is to spawn (or re-use one in the pool) new thread for each request as soon as it comes.
However, python threads are not very good for HTTP, due to GIL and some i/o and other calls blocking whole app, including other threads.
You should look into multiprocessing module for this usage. Spawn some worker processes, and then pass requests to them to process.