Handle multiple HTTP connections and a heavy blocking function like SSH

Handle multiple HTTP connections and a heavy blocking function like SSH - python

Scenario:
My server/application needs to handle multiple concurrent requests and for each request the server opens an ssh link to another m/c, runs a long command and sends the result back.
1 HTTP request comes → server starts 1 SSH connection → waits long time → sends back the SSH result as HTTP response
This should happen simultaneously for > 200 HTTP and SSH connections in real time.
Solution:
The server just has to do one task, that is, open an SSH connection for each HTTP request and keep the connection open. I can't even write the code in an asynchronous way as there is just one task to do: SSH. IOLoop will get blocked for each request. Callback and deferred functions don't provide an advantage as the ssh task runs for a long time. Threading sounds the only way out with event driven technique.
I have been going through tornado examples in Python but none suit my particular need:
tornado with twisted
gevent/eventlet pseudo multithreading
python threads
established HTTP servers like Apache
Environment:
Ubuntu 12.04, high RAM & net speed
Note:
I am bound to use python for coding and please be limited to my scenario only. Opening up multiple SSH links while keeping HTTP connections open sounds all asynch work but I want it to look like a synchronous activity.

If by ssh you mean actually running the ssh process, try tornado.process.Subprocess. If you want something more integrated into the server, twisted has an asynchronous ssh implementation. If you're using something like paramiko, threads are probably your best bet.

Related

Best way for client to fire off separate process without blocking client/server communication

The end result I am trying to achieve is allow a server to assign specific tasks to a client when it makes it's connection. A simplified version would be like this
Client connects to Server
Server tells Client to run some network task
Client receives task and fires up another process to complete task
Client tells Server it has started
Server tells Client it has another task to do (and so on...)
A couple of notes
There would be a cap on how many tasks a client can do
The client would need to be able to monitor the task/process (running? died?)
It would be nice if the client could receive data back from the process to send to the server if needed
At first, I was going to try threading, but I have heard python doesn't do threading correctly (is that right/wrong?)
Then it was thought to fire of a system call from python and record the PID. Then send certain signals to it for status, stop, (SIGUSR1, SIGUSR2, SIGINT). But not sure if that will work, because I don't know if I can capture data from another process. If you can, I don't have a clue how that would be accomplished. (stdout or a socket file?)
What would you guys suggest as far as the best way to handle this?

Use spawnProcess to spawn a subprocess. If you're using Twisted already, then this should integrate pretty seamlessly into your existing protocol logic.

Use Celery, a Python distributed task queue. It probably does everything you want or can be made to do everything you want, and it will also handle a ton of edge cases you might not have considered yet (what happens to existing jobs if the server crashes, etc.)
You can communicate with Celery from your other software using a messaging queue like RabbitMQ; see the Celery tutorials for details on this.
It will probably be most convenient to use a database such as MySQL or PostgreSQL to store information about tasks and their results, but you may be able to engineer a solution that doesn't use a database if you prefer.

What's the best way of communication between tornado and Python based daemon?

I use Tornado as the web server. I write some daemons with Python, which run in the server hardware. Sometimes the web server needs to send some data to the daemon and receives some computed results. There are two working:
1. Asynchronous mode: the server sends some data to the daemons, and it doesn't need the results soon. Can I use message queue to do it perfectly?
2. Synchronous mode: the server sends data to the daemons, and it will wait until it get the results. Should Iuse sockets?
So what's the best way of communication between tornado and Python based daemon?

ZeroMQ can be used for this purpose. It has various sockets for different purposes and it's fast enough to never be your bottleneck. For asynchronous you can use DEALER/ROUTER sockets and for strict synchronous mode you can use REQ/REP sockets.
You can use the python binding for this --> http://www.zeromq.org/bindings:python.
For the async mode you can try something like this from the zguide chapter 3 Router-to-dealer async routing :
In your case, the "client" in the diagram will be your web server and your daemon will be the "worker".
For synchronous you can try a simple request-reply broker or some variant to suit your need.
The diagram above shows a strictly synchronous cycle of send/recv at the REQ/REP sockets. Read through the zguide link to understand how it works. They also have a python code snippet on the page.

Depending on the scale - the simple thing is to just use HTTP and the AsyncHTTPClient in Tornado. For the request<->response case in our application we're going 300 connections/second with such an approach.
For the first case Fire and forget, you could also use AsyncHTTP and just have the server close out the connection and continue working...

How-To - Update Live Running Python Application

I have a python application , to be more precise a Network Application that can't go down this means i can't kill the PID since it actually talks with other servers and clients and so on ... many € per minute of downtime , you know the usual 24/7 system.
Anyway in my hobby projects i also work a lot with WSGI frameworks and i noticed that i have the same problem even during off-peak hours.
Anyway imagine a normal server using TCP/UDP ( put here your favourite WSGI/SIP/Classified Information Server/etc).
Now you perform a git pull in the remote server and there goes the new python files into the server (these files will of course ONLY affect the data processing and not the actual sockets so there is no need to re-raise the sockets or touch in any way the network part).
I don't usually use File monitors since i prefer to use SIGNAL to wakeup the internal app updater.
Now imagine the following code
from mysuper.app import handler
while True:
data = socket.recv()
if data:
socket.send(handler(data))
Lets imagine that handler is a APP with DB connections, cache connections , etc.
What is the best way to update the handler.
Is it safe to call reload(handler) ?
Will this break DB connections ?
Will DB Connections survive to this restart ?
Will current transactions be lost ?
Will this create anti-matter ?
What is the best-pratice patterns that you guys usually use if there are any ?

It's safe to call reload(handler).
Depends where you initialize your connections. If you make the connections inside handler(), then yes, they'll be garbage collected when the handler() object falls out of scope. But you wouldn't be connecting inside your main loop, would you? I'd highly recommend something like:
dbconnection = connect(...)
while True:
...
socket.send(handler(data, dbconnection))
if for no other reason than that you won't be making an expensive connection inside a tight loop.
That said, I'd recommend going with an entirely different architecture. Make a listener process that does basically nothing more than listen for UDP datagrams, sends them to a messaging queue like RabbitMQ, then waits for the reply message to send the results back to the client. Then write your actual servers that get their requests from the messaging queue, process them, and send a reply message back.
If you want to upgrade the UDP server, launch the new instance listening on another port. Update your firewall rules to redirect incoming traffic to the new port. Reload the rules. Kill the old process. Voila: seamless cutover.
The real win is from uncoupling your backend. Since multiple processes can listen for the same messages from your frontend "proxy" service, you can run several in parallel - on different machines, if you want to. To upgrade the backend, start a new instance then kill the old one so that there's no time when at least one instance isn't running.
To scale your proxy, have multiple instances running on different ports or different hosts, and configure your firewall to randomly redirect incoming datagrams to one of the proxies.
To scale your backend, run more instances.

simultaneously sending/receiving info from a server, in python?

I'm trying to figure out how to make a server that can accept multiple clients at one time. While doing so, I need the client able to send and receive data from the server at the same time.
Would i have to make a threaded server? And have a thread for listening for data.
And then another thread for sending out information to the client?
Then for the client side, do i need use threads to send/get info?

Use async IO. There are dozen of async IO socket libs for python. Here is a brief benchmark.
I also tested gevent, eventlet, asyncore, twisted, pyev, pycurl, tornado.
Twsited
is stable but most slow and also not easy to start with.
gevent, eventlet (libevent)
easy to start and fast (code looks like blocking) but have some issues with forking.
pycurl (libcurl)
fast and easy (if you ok to do flags magic.. but there are example) but only http.
pyev (libev)
you must understand what you are doing almost like polling yourself.
tornado (polling in python)
fast enough and i think stable and also easy to start.
asyncore
really fast.. but don't use it.. it is ugly-ugly.
Don't use threads in python unless you are really know what you are doing.
Python and threads not really big friends (unless version <3.2 in 3.2 there must be a new gil).

On server-side you clearly need a Socket Server. This server creates a new thread for every incoming client connection.
Once a connection is established, both the client and the thread that was instantiated for the communication require an additional thread if they have to do other business in parallel than listening to the socket if the communication is synchronous. In case an asynchronous communication is what you need, then Python provides an excellent Asynchronous Socket Handler.

Use a asynchronous socket. Example server could be found here and the client code here. No direct hassle with threads. Depending on your needs, you probably don't need the asynchronous client.

You don't need threads for either client or server; you can instead select() to multiplex all the I/O inside a single thread.

Message queue proxy in Python + Twisted

I want to implement a lightweight Message Queue proxy. It's job is to receive messages from a web application (PHP) and send them to the Message Queue server asynchronously. The reason for this proxy is that the MQ isn't always avaliable and is sometimes lagging, or even down, but I want to make sure the messages are delivered, and the web application returns immediately.
So, PHP would send the message to the MQ proxy running on the same host. That proxy would save the messages to SQLite for persistence, in case of crashes. At the same time it would send the messages from SQLite to the MQ in batches when the connection is available, and delete them from SQLite.
Now, the way I understand, there are these components in this service:
message listener (listens to the messages from PHP and writes them to a Incoming Queue)
DB flusher (reads messages from the Incoming Queue and saves them to a database; due to SQLite single-threadedness)
MQ connection handler (keeps the connection to the MQ server online by reconnecting)
message sender (collects messages from SQlite db and sends them to the MQ server, then removes them from db)
I was thinking of using Twisted for #1 (TCPServer), but I'm having problem with integrating it with other points, which aren't event-driven. Intuition tells me that each of these points should be running in a separate thread, because all are IO-bound and independent of each other, but I could easily put them in a single thread. Even though, I couldn't find any good and clear (to me) examples on how to implement this worker thread aside of Twisted's main loop.
The example I've started with is the chatserver.py, which uses service.Application and internet.TCPServer objects. If I start my own thread prior to creating TCPServer service, it runs a few times, but the it stops and never runs again. I'm not sure, why this is happening, but it's probably because I don't use threads with Twisted correctly.
Any suggestions on how to implement a separate worker thread and keep Twisted? Do you have any alternative architectures in mind?

You're basically considering writing an ad-hoc extension to your messaging server, the job of which it is to provide whatever reliability guarantees you've asked of it.
Instead, perhaps you should take the hardware where you were planning to run this new proxy and run another MQ node on it. The new node should take care of persisting and relaying messages that you deliver to it while the other nodes are overloaded or offline.

Maybe it's not the best bang for your buck to use a separate thread in Twisted to get around a blocking call, but sometimes the least evil solution is the best. Here's a link that shows you how to integrate threading into Twisted:
http://twistedmatrix.com/documents/10.1.0/core/howto/threading.html
Sometimes in a pinch easy-to-implement is faster than hours/days of research which may all turn out to be for nought.

A neat solution to this problem would be to use the Key Value store Redis. Its a high speed persistent data store, with plenty of clients - it has a php and a python client (if you want to use a timed/batch process to process messages - it saves you creating a database, and also deals with your persistence stories. It runs fine on Cywin/Windows + posix environments.
PHP Redis client is here.
Python client is here.
Both have a very clean and simple API. Redis also offers a publish/subscribe mechanism, should you need it, although it sounds like it would be of limited value if you're publishing to an inconsistent queue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.