Python Twisted multithreaded TCP proxy

Python Twisted multithreaded TCP proxy - python

I am trying to write a TCP proxy using Python's twisted framework. I started with the Twisted's port forward example and it seems to do the job in a standard secnario. The problem is that I have a rather peculiar scenario. What we need to so is to process each TCP data packet and look for a certain pattern.
In case the pattern matches we need to do a certain process. This process takes anywhere between 30-40 seconds (I know its not a good design but currently thats how things stand). The trouble is that if this process starts all other packets get held up/stuck till the process completes. So if there are 100 live connections and even if 1 of them calls the process all the remaining 99 processes are stuck.
Is there a standard 'twisted' way wherein each connection/session is handled in a separate thread so that the 'blocking process' does not intervene with the other live connections?
Example Code:
from twisted.internet import reactor
from twisted.protocols import portforward
from twisted.internet import threads
def processingOperation(data)
# doing the processing operation here
sleep(30)
return data
def server_dataReceived(self, data):
if data.find("pattern we need to test")<> -1:
data = processingOperation(data)
portforward.Proxy.dataReceived(self, data)
portforward.ProxyServer.dataReceived = server_dataReceived
def client_dataReceived(self, data):
portforward.Proxy.dataReceived(self, data)
portforward.ProxyClient.dataReceived = client_dataReceived
reactor.listenTCP(8383, portforward.ProxyFactory('xxx.yyy.uuu.iii', 80))
reactor.run()

Of cause there is. You defer the processing to a thread. For example:
def render_POST(self, request):
# some code you may have to run before processing
d = threads.deferToThread(method_that_does_the_processing, request)
return ''
There is a trick: This will return before the processing is done. And the client will get the answer back. So you might want to return 202/Accepted instead of 200/Ok (or my dummy '').
If you need to return after the processing is complete, you can use an inline call-back (http://twistedmatrix.com/documents/10.2.0/api/twisted.internet.defer.inlineCallbacks.html).

Related

Implement gethostbyaddr() with asyncore

I was having fun with socket.gethostbyaddr(), searching how to speed up a really simple code that generate some IP address randomly and try to solve them. The problem comes when no host can be found, there is a timeout that can be really long (about 10 seconds...)
By chance, I found this article, he solves the problem by using Multi-threading : https://www.depier.re/attempts_to_speed_up_gethostbyaddr/
I was wondering if it is possible to do something equivalent using Asyncore ? That's what I tried to do first but failed miserably...
Here is a template :
import socket
import random
def get_ip():
a = str(random.randint(140,150))
b = str(random.randint(145,150))
c = str(random.randint(145,150))
for d in range(100):
addr = a + "." + b + "." + c +"."+ str(1 + d)
yield addr
for addr in get_ip():
try:
o = socket.gethostbyaddr(addr)
print addr + "...Ok :"
print "---->"+ str(o[0])
except:
print addr + "...Nothing"

You are looking for a way how to convert several IPs to names (or vice versa) in parallel. Basically it is a DNS request/response operation and the gethostbyaddr is doing this lookup synchronously, i.e. in a blocking manner. It sends the request, waits for the response, returns the result.
asyncio and similar libraries use so called coroutines and cooperative scheduling. Cooperative means that coroutines are written to support the concurency. A running coroutine explicitly returns the control (using await or yield from) to a waiting scheduler which then selects another coroutine and runs it until that one returns the control and so on. Only one coroutine can be running at a time. For a smooth run coroutines must not execute code for a longer time without returning the control. A blocking operation in a coroutine blocks the whole programs. That prohibits the usage of gethostbyaddr.
A solution requires support for asynchronous DNS lookups. A coroutine sends the DNS request, sets a timeout, arranges that a DNS response will be passed to it and returns the control. Thus multiple coroutines can send their requests one after another before they wait for all the responses.
There are third party libraries for async DNS, but I have never used them. Looking at aiodns examples, it seems quite easy to write the code you are looking for. asyncore.gather would be probably the core of such function.

Maintaining log file from multiple threads in Python

I have my python baseHTTPServer server, which handles post requests.
I used ThreadingMixIn and its now opens a thread for each connection.
I wish to do several multithreaded actions, such as:
1. Monitoring successful/failed connections activities, by adding 1 to a counter for each.
I need a lock for that. My counter is in global scope of the same file. How can I do that?
2. I wish to handle some sort of queue and write it to a file, where the content of the queue is a set of strings, written from my different threads, that simply sends some information for logging issues. How can it be done? I fail to accomplish that since my threading is done "behind the scenes", as each time Im in do_POST(..) method, Im already in a different thread.
Succcessful_Logins = 0
Failed_Logins = 0
LogsFile = open(logfile)
class httpHandler(BaseHTTPRequestHandler):
def do_POST(self):
..
class ThreadingHTTPServer(ThreadingMixIn, HTTPServer):
pass
server = ThreadingHTTPServer(('localhost', PORT_NUMBER), httpHandler)
server.serve_forever()
this is a small fragile of my server.
Another thing that bothers my is the face I want to first send the post response back to the client, and only then possibly get delayed due to locking mechanism or whatever.

From your code, it looks like a new httpHandler is constructed in each thread? If that's the case you can use a class variable for the count and a mutex to protect the count like:
class httpHandler(...):
# Note that these are class variables and are therefore accessable
# to all instances
numSuccess = 0
numSuccessLock = new threading.Lock()
def do_POST(self):
self.numSuccessLock.aquire()
self.numSuccess += 1
self.numSuccessLock.release()
As for writing to a file from different threads, there are a few options:
Use the logging module, "The logging module is intended to be thread-safe without any special work needing to be done by its clients." from http://docs.python.org/2/library/logging.html#thread-safety
Use a Lock object like above to serialize writes to the file
Use a thread safe queue to queue up writes and then read from the queue and write to the file from a separate thread. See http://docs.python.org/2/library/queue.html#module-Queue for examples.

How can I keep multiple gevent servers serving forever?

Currently, I have an application that has two servers: the first processes orders and responds individually, the second broadcasts results to other interested subscribers. They need to be served from different ports. I can start() both of them, but I can only get one or the other to serve_forever() as I read it is a blocking function. I am looking for ideas on how to keep both the servers from exiting. abbreviated code below:
def main():
stacklist = []
subslist = []
stacklist.append(CreateStack('stuff'))
subslist.append(Subscription('stuff'))
bcastserver = BroadcastServer(subslist) # creates a new server
tradeserver = TradeServer(stacklist) # creates a new server
bcastserver.start() # start accepting new connections
tradeserver.start() # start accepting new connections
#bcastserver.serve_forever() #if I do it here, the first one...
#tradeserver.serve_forever() #blocks the second one
class TradeServer(StreamServer):
def __init__(self, stacklist):
self.stacklist = stacklist
StreamServer.__init__(self, ('localhost', 12345), self.handle)
#self.serve_forever() #If I put it here in both, neither works
def handle(self, socket, address):
#handler here
class BroadcastServer(StreamServer):
def __init__(self, subslist):
StreamServer.__init__(self, ('localhost', 8000), self.handle)
self.subslist = subslist
#self.serve_forever() #If I put it here in both, neither works
def handle(self, socket, address):
#handler here
Perhaps I just need a way to keep the two from exiting, but I'm not sure how. In the end, I want both servers to listen forever for incoming connections and handle them.

I know this question has an accepted answer, but there is a better one. I'm adding it for people like me who find this post later.
As described on in the gevent documentation about servers:
The BaseServer.serve_forever() method calls BaseServer.start() and then waits until interrupted or until the server is stopped.
So you can just do:
def main():
stacklist = []
subslist = []
stacklist.append(CreateStack('stuff'))
subslist.append(Subscription('stuff'))
bcastserver = BroadcastServer(subslist) # creates a new server
tradeserver = TradeServer(stacklist) # creates a new server
bcastserver.start() # starts accepting bcast connections and returns
tradeserver.serve_forever() # starts accepting trade connections and blocks until tradeserver stops
bcastserver.stop() # stops also the bcast server
The gevent introduction documentation explains why this works:
Unlike other network libraries, though in a similar fashion as
eventlet, gevent starts the event loop implicitly in a dedicated
greenlet. There’s no reactor that you must call a run() or dispatch()
function on. When a function from gevent’s API wants to block, it
obtains the gevent.hub.Hub instance — a special greenlet that runs the
event loop — and switches to it (it is said that the greenlet yielded
control to the Hub).
When serve_forever() blocks, it does not prevent either server from continuing communication.
Note: In the above code the trader server is the one that decides when the whole application stops. If you want the broadcast server to decide this, you should swap them in the start() and serve_forever() calls.

ok, I was able to do this using threading and with gevent's monkeypatch library:
from gevent import monkey
def main():
monkey.patch_thread()
# etc, etc
t = threading.Thread(target=bcastserver.serve_forever)
t.setDaemon(True)
t.start()
tradeserver.serve_forever()

Start each server loop in its own instance of Python (one console per gevent). I've never understood trying to run multiple servers from one program. You can run the same server many times and use a reverse proxy like nginx to load balance and route accordingly.

Sending data through a socket from another thread does not work in Python

This is my 'game server'. It's nothing serious, I thought this was a nice way of learning a few things about python and sockets.
First the server class initialized the server.
Then, when someone connects, we create a client thread. In this thread we continually listen on our socket.
Once a certain command comes in (I12345001001, for example) it spawns another thread.
The purpose of this last thread is to send updates to the client.
But even though I see the server is performing this code, the data isn't actually being sent.
Could anyone tell where it's going wrong?
It's like I have to receive something before I'm able to send. So I guess somewhere I'm missing something
#!/usr/bin/env python
"""
An echo server that uses threads to handle multiple clients at a time.
Entering any line of input at the terminal will exit the server.
"""
import select
import socket
import sys
import threading
import time
import Queue
globuser = {}
queue = Queue.Queue()
class Server:
def __init__(self):
self.host = ''
self.port = 2000
self.backlog = 5
self.size = 1024
self.server = None
self.threads = []
def open_socket(self):
try:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.server.bind((self.host,self.port))
self.server.listen(5)
except socket.error, (value,message):
if self.server:
self.server.close()
print "Could not open socket: " + message
sys.exit(1)
def run(self):
self.open_socket()
input = [self.server,sys.stdin]
running = 1
while running:
inputready,outputready,exceptready = select.select(input,[],[])
for s in inputready:
if s == self.server:
# handle the server socket
c = Client(self.server.accept(), queue)
c.start()
self.threads.append(c)
elif s == sys.stdin:
# handle standard input
junk = sys.stdin.readline()
running = 0
# close all threads
self.server.close()
for c in self.threads:
c.join()
class Client(threading.Thread):
initialized=0
def __init__(self,(client,address), queue):
threading.Thread.__init__(self)
self.client = client
self.address = address
self.size = 1024
self.queue = queue
print 'Client thread created!'
def run(self):
running = 10
isdata2=0
receivedonce=0
while running > 0:
if receivedonce == 0:
print 'Wait for initialisation message'
data = self.client.recv(self.size)
receivedonce = 1
if self.queue.empty():
print 'Queue is empty'
else:
print 'Queue has information'
data2 = self.queue.get(1, 1)
isdata2 = 1
if data2 == 'Exit':
running = 0
print 'Client is being closed'
self.client.close()
if data:
print 'Data received through socket! First char: "' + data[0] + '"'
if data[0] == 'I':
print 'Initializing user'
user = {'uid': data[1:6] ,'x': data[6:9], 'y': data[9:12]}
globuser[user['uid']] = user
print globuser
initialized=1
self.client.send('Beginning - Initialized'+';')
m=updateClient(user['uid'], queue)
m.start()
else:
print 'Reset receivedonce'
receivedonce = 0
print 'Sending client data'
self.client.send('Feedback: ' +data+';')
print 'Client Data sent: ' + data
data=None
if isdata2 == 1:
print 'Data2 received: ' + data2
self.client.sendall(data2)
self.queue.task_done()
isdata2 = 0
time.sleep(1)
running = running - 1
print 'Client has stopped'
class updateClient(threading.Thread):
def __init__(self,uid, queue):
threading.Thread.__init__(self)
self.uid = uid
self.queue = queue
global globuser
print 'updateClient thread started!'
def run(self):
running = 20
test=0
while running > 0:
test = test + 1
self.queue.put('Test Queue Data #' + str(test))
running = running - 1
time.sleep(1)
print 'Updateclient has stopped'
if __name__ == "__main__":
s = Server()
s.run()

I don't understand your logic -- in particular, why you deliberately set up two threads writing at the same time on the same socket (which they both call self.client), without any synchronization or coordination, an arrangement that seems guaranteed to cause problems.
Anyway, a definite bug in your code is you use of the send method -- you appear to believe that it guarantees to send all of its argument string, but that's very definitely not the case, see the docs:
Returns the number of bytes sent.
Applications are responsible for
checking that all data has been sent;
if only some of the data was
transmitted, the application needs to
attempt delivery of the remaining
data.
sendall is the method that you probably want:
Unlike send(), this method continues
to send data from string until either
all data has been sent or an error
occurs.
Other problems include the fact that updateClient is apparently designed to never terminate (differently from the other two thread classes -- when those terminate, updateClient instances won't, and they'll just keep running and keep the process alive), redundant global statements (innocuous, just confusing), some threads trying to read a dict (via the iteritems method) while other threads are changing it, again without any locking or coordination, etc, etc -- I'm sure there may be even more bugs or problems, but, after spotting several, one's eyes tend to start to glaze over;-).

You have three major problems. The first problem is likely the answer to your question.
Blocking (Question Problem)
The socket.recv is blocking. This means that execution is halted and the thread goes to sleep until it can read data from the socket. So your third update thread just fills the queue up but it only gets emptied when you get a message. The queue is also emptied by one message at a time.
This is likely why it will not send data unless you send it data.
Message Protocol On Stream Protocol
You are trying to use the socket stream like a message stream. What I mean is you have:
self.server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
The SOCK_STREAM part says it is a stream not a message such as SOCK_DGRAM. However, TCP does not support message. So what you have to do is build messages such as:
data =struct.pack('I', len(msg)) + msg
socket.sendall(data)
Then the receiving end will looking for the length field and read the data into a buffer. Once enough data is in the buffer it can grab out the entire message.
Your current setup is working because your messages are small enough to all be placed into the same packet and also placed into the socket buffer together. However, once you start sending large data over multiple calls with socket.send or socket.sendall you are going to start having multiple messages and partial messages being read unless you implement a message protocol on top of the socket byte stream.
Threads
Even though threads can be easier to use when starting out they come with a lot of problems and can degrade performance if used incorrectly especially in Python. I love threads so do not get me wrong. Python also has a problem with the GIL (global interpreter lock) so you get bad performance when using threads that are CPU bound. Your code is mostly I/O bound at the moment, but in the future it may become CPU bound. Also you have to worry about locking with threading. A thread can be a quick fix but may not be the best fix. There are circumstances where threading is quite simply the easiest way to break some time consuming process. So do not discard threads as evil or terrible. In Python they are considered bad mainly because of the GIL, and in other languages (including Python) because of concurrency issues so most people recommend you to use multiple processes with Python or use asynchronous code. The subject of to use a thread or not is very complex as it depends on the language (way your code is run), the system (single or multiple processors), and contention (trying to share a resource with locking), and other factors, but generally asynchronous code is faster because it utilizes more CPU with less overhead especially if you are not CPU bound.
The solution is the usage of the select module in Python, or something similar. It will tell you when a socket has data to be read, and you can set a timeout parameter.
You can gain more performance by doing asynchronous work (asynchronous sockets). To turn a socket into asynchronous mode you simply call socket.settimeout(0) which will make it not block. However, you will constantly eat CPU spinning waiting for data. The select module and friends will prevent you from spinning.
Generally for performance you want to do as much asynchronous (same thread) as possible, and then expand with more threads that are also doing as much asynchronously as possible. However as previously noted Python is an exception to this rule because of the GIL (global interpreter lock) which can actually degrade performance from what I have read. If you are interesting you should try writing a test case and find out!
You should also check out the thread locking primitives in the threading module. They are Lock, RLock, and Condition. They can help multiple threads share data with out problems.
lock = threading.Lock()
def myfunction(arg):
with lock:
arg.do_something()
Some Python objects are thread safe and others are not.
Sending Updates Asynchronously (Improvement)
Instead of using a third thread only to send updates you could instead use the client thread to send updates by checking the current time with the last time an update was sent. This would remove the usage of a Queue and a Thread. Also to do this you must convert your client code into asynchronous code and have a timeout on your select so that you can at interval check the current time to see if an update is needed.
Summary
I would recommend you rewrite your code using asynchronous socket code. I would also recommend that you use a single thread for all clients and the server. This will improve performance and decrease latency. It would make debugging easier because you would have no possible concurrency issues like you have with threads. Also, fix your message protocol before it fails.

Proper way of cancelling accept and closing a Python processing/multiprocessing Listener connection

(I'm using the pyprocessing module in this example, but replacing processing with multiprocessing should probably work if you run python 2.6 or use the multiprocessing backport)
I currently have a program that listens to a unix socket (using a processing.connection.Listener), accept connections and spawns a thread handling the request. At a certain point I want to quit the process gracefully, but since the accept()-call is blocking and I see no way of cancelling it in a nice way. I have one way that works here (OS X) at least, setting a signal handler and signalling the process from another thread like so:
import processing
from processing.connection import Listener
import threading
import time
import os
import signal
import socket
import errno
# This is actually called by the connection handler.
def closeme():
time.sleep(1)
print 'Closing socket...'
listener.close()
os.kill(processing.currentProcess().getPid(), signal.SIGPIPE)
oldsig = signal.signal(signal.SIGPIPE, lambda s, f: None)
listener = Listener('/tmp/asdf', 'AF_UNIX')
# This is a thread that handles one already accepted connection, left out for brevity
threading.Thread(target=closeme).start()
print 'Accepting...'
try:
listener.accept()
except socket.error, e:
if e.args[0] != errno.EINTR:
raise
# Cleanup here...
print 'Done...'
The only other way I've thought about is reaching deep into the connection (listener._listener._socket) and setting the non-blocking option...but that probably has some side effects and is generally really scary.
Does anyone have a more elegant (and perhaps even correct!) way of accomplishing this? It needs to be portable to OS X, Linux and BSD, but Windows portability etc is not necessary.
Clarification:
Thanks all! As usual, ambiguities in my original question are revealed :)
I need to perform cleanup after I have cancelled the listening, and I don't always want to actually exit that process.
I need to be able to access this process from other processes not spawned from the same parent, which makes Queues unwieldy
The reasons for threads are that:
They access a shared state. Actually more or less a common in-memory database, so I suppose it could be done differently.
I must be able to have several connections accepted at the same time, but the actual threads are blocking for something most of the time. Each accepted connection spawns a new thread; this in order to not block all clients on I/O ops.
Regarding threads vs. processes, I use threads for making my blocking ops non-blocking and processes to enable multiprocessing.

Isnt that what select is for??
Only call accept on the socket if the select indicates it will not block...
The select has a timeout, so you can break out occasionally occasionally to check
if its time to shut down....

I thought I could avoid it, but it seems I have to do something like this:
from processing import connection
connection.Listener.fileno = lambda self: self._listener._socket.fileno()
import select
l = connection.Listener('/tmp/x', 'AF_UNIX')
r, w, e = select.select((l, ), (), ())
if l in r:
print "Accepting..."
c = l.accept()
# ...
I am aware that this breaks the law of demeter and introduces some evil monkey-patching, but it seems this would be the most easy-to-port way of accomplishing this. If anyone has a more elegant solution I would be happy to hear it :)

I'm new to the multiprocessing module, but it seems to me that mixing the processing module and the threading module is counter-intuitive, aren't they targetted at solving the same problem?
Anyway, how about wrapping your listen functions into a process itself? I'm not clear how this affects the rest of your code, but this may be a cleaner alternative.
from multiprocessing import Process
from multiprocessing.connection import Listener
class ListenForConn(Process):
def run(self):
listener = Listener('/tmp/asdf', 'AF_UNIX')
listener.accept()
# do your other handling here
listen_process = ListenForConn()
listen_process.start()
print listen_process.is_alive()
listen_process.terminate()
listen_process.join()
print listen_process.is_alive()
print 'No more listen process.'

Probably not ideal, but you can release the block by sending the socket some data from the signal handler or the thread that is terminating the process.
EDIT: Another way to implement this might be to use the Connection Queues, since they seem to support timeouts (apologies, I misread your code in my first read).

I ran into the same issue. I solved it by sending a "stop" command to the listener. In the listener's main thread (the one that processes the incoming messages), every time a new message is received, I just check to see if it's a "stop" command and exit out of the main thread.
Here's the code I'm using:
def start(self):
"""
Start listening
"""
# set the command being executed
self.command = self.COMMAND_RUN
# startup the 'listener_main' method as a daemon thread
self.listener = Listener(address=self.address, authkey=self.authkey)
self._thread = threading.Thread(target=self.listener_main, daemon=True)
self._thread.start()
def listener_main(self):
"""
The main application loop
"""
while self.command == self.COMMAND_RUN:
# block until a client connection is recieved
with self.listener.accept() as conn:
# receive the subscription request from the client
message = conn.recv()
# if it's a shut down command, return to stop this thread
if isinstance(message, str) and message == self.COMMAND_STOP:
return
# process the message
def stop(self):
"""
Stops the listening thread
"""
self.command = self.COMMAND_STOP
client = Client(self.address, authkey=self.authkey)
client.send(self.COMMAND_STOP)
client.close()
self._thread.join()
I'm using an authentication key to prevent would be hackers from shutting down my service by sending a stop command from an arbitrary client.
Mine isn't a perfect solution. It seems a better solution might be to revise the code in multiprocessing.connection.Listener, and add a stop() method. But, that would require sending it through the process for approval by the Python team.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.