I need to create a python server that can accept multiple job requests. Then from those it requests, it processes each Job one at a time but the server can still accept new Jobs while processing a task.
Does anyone have an suggestions on how to do this?
Thanks
Sure. Create a multiprocessing.Pool which will by default spawn one process per core. Then use the original process to run an HTTP service or something else that accepts jobs via some protocol. The main process then listens for new requests and submits them to the pool for async processing.
Use twisted. Twisted is an event-driven networking engine. Twisted also supports many common network protocols, including SMTP, POP3, IMAP, SSHv2, and DNS.
Related
I want to write an Java Server may be using Netty or anything else suggested.
The whole purpose is that I want to queue incoming HTTP Request for a while because the systems I'm targeting are doing Super Memory and Compute intensive tasks so if they are burdened with heavy load they eventually tend to get crashed.
I want to have a queue in place that will actually allow only max upto 5 requests passed to destination at any given time and hold the rest of the requests in queue.
Can this be achieved using Netty in Java, I'm equally open for an implementation in Scala, Python or clojure.
I did something similar with Scala Akka actors. Instead of HTTP Request I had unlimited number of job requests come in and get added to a queue (regular Queue). Worker Manager would manage that queue and dispatch work to worker actors whenever they are done processing previous tasks. Workers would notify Worker Manager that task is complete and it would send them a new one from the queue. So in this case there is no busy waiting or looping, everything happens on message reception. You can do the same with your HTTP Requests. Akka can be used from Scala or Java and a process I described is easier to implement than it sounds.
As a web server you could use anything really. It can be Jetty, or some Servlet Container like Tomcat, or even Spray-can. All it needs to do is to receive a request and send a message to Worker Manager. The whole system would be asynchronous and non-blocking.
I need to share some queue between two applications on same machine, one is Tornado which is going to occasionally add message to that queue and another is python script runs from cron which is going in every iteration add new messages. Can anyone suggest me module for this ?
(Can this be solved with redis usage, I avoid to use mysql for this purpose )
I would use redis with a list. You can push a element top, and rpop to remove from the tail.
See redis rpop
and redis lpushx
The purest way I can think of to do this is with IPC. Python has very good support for IPC between two processes when one process spawns another, but not in your scenario. There are python modules for ipc such as sysv_ipc and posix_ipc. But if you are going to have your main application built in tornado, why not just have it listen on a zeromq socket for published messages.
Here is a link with more information. You want the Publisher-Subscriber model.
http://zeromq.github.io/pyzmq/eventloop.html#tornado-ioloop
Your cron job will start and publish messages a to zeromq socket. Your already running application will receive them as subscriber.
Try RabbitMQ for hosting the queue independent of your applications, then access using Pika, which even comes with a Tornado adapter. Just pick the appropriate model: queue/exchange/topic and protocol of the message you want (strings, json, xml, yaml) and you are set.
Our server has a lot if CPUs, and some web requests could be faster if request handlers would do some parallel processing.
Example: Some work needs to be done on N (about 1-20) pictures, to severe one web request.
Caching or doing the stuff before the request comes in is not possible.
What can be done to use several CPUs of the hardware:
threads: I don't like them
multiprocessing: Every request needs to start N processes. Many CPU cycles will be lost for starting a new process and importing libraries.
special (hand made) service, which has N processes ready for processing
cellery (rabbitMQ): I don't know how big the communication overhead is...
Other solution?
Platform: Django (Python)
Regarding your second and third alternatives: you do not need to start a new process for every request. This is what process pools are for. New processes are created when your app starts up. When you submit a request to the pool, it is automatically queued until a worker is available. The disadvantage is that requests are blocking- if no worker is available at the moment, your user will sit and wait.
You could use the standard library module asyncore.
This module provides the basic infrastructure for writing asynchronous socket service clients and servers.
There is an example for how to create a basic HTML client.
Then there's Twisted, it can do lots and lots of things, which is why it's somewhat daunting. Here is an example using its HTTP client.
Twisted "speaks HTTP", asyncore does not, you'll have to.
Other libraries:
Tornado's httpclient
asynchttp
Is there a way to push stdout into a queue broker or to a websocket?
So far I've been unable to find a clear explanation on how to do this.
I have several processes running in parallel and the idea is to create a UI where you can switch from process to process and take a look at what they are doing.
One approach that will work (and is non-blocking, can serve multiple clients) is using Python, Twisted and Autobahn:
Connect one or multiple ProcessProtocol instances
http://twistedmatrix.com/documents/current/core/howto/process.html
to a Twisted WebSocket server
https://github.com/tavendo/AutobahnPython
https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/broadcast
https://github.com/tavendo/AutobahnPython/tree/master/examples/wamp/pubsub/simple
Disclosure: I am author of Autobahn.
So, I'm writing a python web application using the twisted web2 framework. There's a library that I need to use (SQLAlchemy, to be specific) that doesn't have asynchronous code. Would it be bad to spawn a thread to handle the request, fetch any data from the DB, and then return a response? I'm afraid that if there was a flood of requests, too many threads would be started and the server would be overwhelmed. Is there something built into twisted that prevents this from happening (eg request throttling)?
See the docs, and specifically the thread pool which lets you control how many threads are active at most. Spawning one new thread per request would definitely be an inferior idea!