I have a fairly simple problem that I would like to solve in Python.
I would like a webserver that has the following behavior: if it receives a POST request for /work, then it should add it to a work queue and execute some function on the data attached. If it receives a POST request for /cancel it should cancel whatever its current task is.
Unfortunately, the only way I can seem to get a BaseHTTPRequestHandler to handle multiple requests is to use a ThreadingMixIn, but that seems unecessarily complicated as I then have to use a set of locks to prevent multiple work tasks from executing concurrently.
I tried to use a BaseHTTPRequestHandler without a ThreadingMixIn and just spin off threads in do_POST, but that didn't work since apparently BaseHTTPRequestHandler closes its connection when the do_POST function returns.
Ideally, I'm looking for an interface that gives me the ability to close the connection to the client on my own terms, so I can do it in a worker thread, and manage the queue myself, rather than working around the ThreadingMixIn's behavior in this regard.
Related
I'm currently trying to figure out how I can host a simple webserver to handle POST requests with Python 3.7. My problem is that I want to answer requests after they were received, but the submitted POST data should be used to play back a specific audio file on my RaspberryPi. In two days of googling I couldn't figure out how to have the webserver run constantly while processing the incoming requests in the background.
I tried to use the subprocess module to run the playback script in the background but I never found a way to have it run in the background independently from the webserver. I always end up with my webserver getting a request which is than handled, but while this happens the webserver is unaccessible.
I would apreciate if someone pointed out a direction to look at for me.
I always end up with my webserver getting a request which is than handled, but while this happens the webserver is unaccessible.
To solve this problem, you can create a separate thread or a process to handle the request while the main thread/process goes back to processing new requests.
The workflow will be something like this:
The main process receives a request
The main process creates a new process to handle the request.
The main process goes back to listening for new requests while the new process processes the received request.
Assuming, you are unfamiliar with multithreading and multiprocessing, I'd suggest you go read a little about these topics. Most likely, multithreading will solve your problem. So, you can start from there. Here's a good article about it: An Intro to Threading in Python
Short version: How can I prevent blocking Pika in a Remote Procedure Call situation?
Long version:
None of the Pika examples demonstrate my use case.
I have a Tornado server which communicates with other processes/machines over AMQP (RabbitMQ, Pika). These other processes are not very well-defined, but they will, for the most part, be returning data (see the RPC example on RabbitMQ's website). Sometimes, a process might need to take an extremely long time to process a large amount of information, but it shouldn't completely block smaller requests from being taken by the process. Or maybe the remote server is blocking because it sent out a web request. Think of it like a web server, but using AMQP instead of HTTP.
Since Pika documentation claims that it's not thread-safe, I cannot pass the connection to multiple threads (or processes, for that matter). What I want to do is start a new process, and add a socket event (for the pipe to that program) to the Pika IOLoop, as I would be able to do with Tornado. The Pika IOLoop is much different from the Tornado IOLoop, and it doesn't seem to support adding multiple handlers; it seems to operate using one "poller" on one socket.
I'd like to avoid requiring the Tornado package for this package, because I would only be using the IOLoop. It's not out of the question, but I want to see what my other options are, or if there is a solution to my problem by somehow connecting multiple Pika IOLoops/Pollers. RabbitMQ's documentation says that workers can often be "scaled up" by adding more. I'd like to avoid creating a connection for every request that comes in (if they're coming in fast).
From what you described, I believe you unfortunately either need a different communication model or need multiple Pika IOLoops/Pollers/Redundant Connections.
It sounds like from documentation and from other sites that RPC in Pika is always a blocking statement and unable to be passed around between threads. See http://www.rabbitmq.com/tutorials/tutorial-six-python.html where the author points out that RPC in Pika is inherently blocking once you actually call the ioloop.
"When in doubt avoid RPC. If you can, you should use an asynchronous pipeline - instead of RPC-like blocking"
If you want to keep sending multiple RPC calls on the same connection before one completes, you'll need a different Asynchronous model. Multiple RPC calls on the same connection before completion isn't the usual implementation of the RPC model, though it's not technically forbidden ( http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.progcomm%2Fdoc%2Fprogcomc%2Frpc_mod.htm ). I don't think Pika operates with this model, though it does have asynchronous support via callbacks (not what you are looking for I think).
If you just want to easily be able to generate new connections on the fly you could use a thread or process wrapper on a connection, where you create and block on the RPC in the other context and push to a common Queue which the main thread can monitor. Tornado might give you this, but I agree that it's a bit of overkill, and making such a connection wrapper shouldn't be all that difficult as I've done something similar for other I/O ops in less than 100 lines of Python (see Queue package for Threaded wrapper version). I think you already saw this possibility though based on your talk of multiple IOLoops.
I'm quite new to python threading/network programming, but have an assignment involving both of the above.
One of the requirements of the assignment is that for each new request, I spawn a new thread, but I need to both send and receive at the same time to the browser.
I'm currently using the asyncore library in Python to catch each request, but as I said, I need to spawn a thread for each request, and I was wondering if using both the thread and the asynchronous is overkill, or the correct way to do it?
Any advice would be appreciated.
Thanks
EDIT:
I'm writing a Proxy Server, and not sure if my client is persistent. My client is my browser (using firefox for simplicity)
It seems to reconnect for each request. My problem is that if I open a tab with http://www.google.com in it, and http://www.stackoverflow.com in it, I only get one request at a time from each tab, instead of multiple requests from google, and from SO.
I answered a question that sounds amazingly similar to your, where someone had a homework assignment to create a client server setup, with each connection being handled in a new thread: https://stackoverflow.com/a/9522339/496445
The general idea is that you have a main server loop constantly looking for a new connection to come in. When it does, you hand it off to a thread which will then do its own monitoring for new communication.
An extra bit about asyncore vs threading
From the asyncore docs:
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
As this quote suggests, using asyncore and threading should be for the most part mutually exclusive options. My link above is an example of the threading approach, where the server loop (either in a separate thread or the main one) does a blocking call to accept a new client. And when it gets one, it spawns a thread which will then continue to handle the communication, and the server goes back into a blocking call again.
In the pattern of using asyncore, you would instead use its async loop which will in turn call your own registered callbacks for various activity that occurs. There is no threading here, but rather a polling of all the open file handles for activity. You get the sense of doing things all concurrently, but under the hood it is scheduling everything serially.
I use python to request a web service with many requests in the same time. To do so I create threads and use urllib (first version, I use python 2.6).
When I start the threads, all goes well until one reach the ulllib.urlopen(). The second thread has to wait until the first one end before passing through the ulllib.urlopen() function. As I do a lot of work after having retrieved the Json from remote web service, I wish the second thread to "urlopen" in the same time or just after the first one closes its socket.
I tried closing the socket opened just after having collected the JSON returned but it changes nothing. The second thread has to wait for the first one to be ended. To see that I use prints.
I can understand that urllib isn't thread-safe (google this doesn't give clear answers) but why does the second thread has to wait for the first-one end (and not just the socket process end) ?
Thanks for your help and hints
PS: I do not use Python 3 for compatibility with modules / packages I require
This does not sounds intended behavior as two parallel urllib request should be possible. Are you sure your remote server can handle two paraller requests (e.g. it is not in debug mode with a single thread)?
Any case: threading is not a preferred approach for parallel programming with Python. Either use processes or async, especially on the server side (you didn't mention the use case or your platform which may also be buggy).
I have had very good experiences processing and transforming JSON/XML with Spawning and Eventlets which patch Python socket code to be asynchronous.
http://pypi.python.org/pypi/Spawning/
http://eventlet.net/
I have never written any code that uses threads.
I have a web application that accepts a POST request, and creates an image based on the data in the body of the request.
Would I want to spin off a thread for the image creation, as to prevent the server from hanging until the image is created? Is this an appropriate use, or merely a solution looking for a problem ?
Please correct any misunderstandings I may have.
Rather than thinking about handling this via threads or even processes, consider using a distributed task manager such as Celery to manage this sort of thing.
Usual approach for handling HTTP requests synchronously is to spawn (or re-use one in the pool) new thread for each request as soon as it comes.
However, python threads are not very good for HTTP, due to GIL and some i/o and other calls blocking whole app, including other threads.
You should look into multiprocessing module for this usage. Spawn some worker processes, and then pass requests to them to process.