Treatment of large size packets at Twisted

Treatment of large size packets at Twisted - python

I am currently building a TCP server (the server is going to be used by my company so it must be at a high production level)
My requirements are:
The server will be fast (it can handle a large number of requests simultaneously because our clients send large files regularly and this will create Bottleneck).
The server will be easy to maintain.
Support as many users as possible at the same time.
That the server will be a TCP server because it receives protocol messages that our company has developed and I need to parse it myself.
After checking the options, I chose Python Twisted because he seems to be meeting the first requirement (and since he is in Python then the second demand is solved by itself).
After reading Twisted's documentation I thought of a problem I had not yet found an elegant answer , my clients send me pretty large packets and I make decisions based sub sequence within these packets.
Let's say if I got 1000 first bytes that are all zeros and then another 5,000 bytes that all of them 0x10, I will send back "Hello world" and if I get instead 2000 bytes that all of them 0x50 I will answer "Hello everyone".
My problem with Twisted is that the data eventually comes to "protocol.Protocol" and they are treated with the "dataReceived (self, data)" function, and this instant is a one-time instant, which means that if I only get some of the bytes for the first time and I'm going to get the rest of the bytes at the second time I do not have how to save the data from the first time...
I can not save the data in the protocol.Factory because I will talk with multiple clients simultaneously and then one will use the data of the other, for the same reason I can not use the Globals.
I'm pretty sure I'm not the first to come across this problem, I've seen several online solutions that included re-implementation of "protocol.Protocol" and they really were not elegant ..
Is there a simple and elegant way to solve this problem?
(The solution must be elegant because I add multi-threading over it since the things I give back to the client are much more than "Hello World" and I do not want to block the server)
By the way, if someone with experience can recommend for me on a better solution than Twisted I am more than happy.
Thank you
yoko

It sounds like you need to maintain some per connection state. That's the minimum amount you can store and still employ such a protocol. Your protocol class should be instantiated once per connection, so you should be able to use properties on that class to store things like zeros_received and twos_received.
def dataReceived(self, data):
if self.zeros_received != 1000:
for x, b in enumerate(data):
if b != 0x00:
# Handle unexpected byte error
self.zeros_received += 1
if self.zeros_received == 1000:
break
if self.twos_received != 5000:
for b in data[x+1:]:
if b != 0x10:
# handle unexpected byte error
self.twos_received += 1
if self.twos_received == 5000:
break
# send hello...
An easier solution would be to buffer the data from the client and then block (with a timeout on the connection) until you've received the first 6k bytes. I'd be careful about prematurely optimizing. You assume now this will be your bottleneck, but often your assumptions may be wrong. Implement a naive solution first (using a buffered reader) and then benchmark speed/memory usage and see what actually needs improving.
def dataReceived(self, data):
self.data += data
if len(self.data) >= 6000:
assert all(lambda b: b == 0x00, self.data[:1000]), 'expected 0x00'
assert all(lambda b: b == 0x10, self.data[1000:6000]), 'expected 0x10'
# send hello

Related

I need advice should I use select or threading?

I'm building a live radio streamer, and I was wondering how I should handle multiple connections. Now from my experience select will block the audio from being streamed. It only plays 3 seconds then stops playing. I will provide an example of what I mean.
import socket, select
headers = """
HTTP/1.0 200 OK\n
Content-Type: audio/mpeg\n
Connection: keep-alive\n
\n\n
"""
file="/path/to/file.mp3"
bufsize=4096 # actually have no idea what this should be but python-shout uses this amount
sock = socket.socket()
cons = list()
buf = 0
nbuf = 0
def runMe():
cons.append(sock)
file = open(file)
nbuf = file.read(bufsize) # current buffer
while True:
buf = nbuf
nbuf = file.read(bufsize)
if len(buf) == 0:
break
rl, wl, xl = select.select(cons, [], [], 0.2)
for s in rl:
if s == sock:
con, addr = s.accept()
con.setblocking(0)
cons.append(con)
con.send(header)
else:
data = s.recv(1024)
if not data:
s.close()
cons.remove(s)
else:
s.send(buf)
That is an example of how i'd use select. But, the song will not play all the way. But if I send outside the select loop it'll play but it'll die on a 2nd connection. Should I use threading?

That is an example of how i'd use select. But, the song will not play
all the way. But if I send outside the select loop it'll play but
it'll die on a 2nd connection. Should I use threading?
You can do it either way, but if your select-implementation isn't working properly it's because your code is incorrect, not because a select-based implementation isn't capable of doing the job -- and I don't think a multithreaded solution will be easier to get right than a select-based solution.
Regardless of which implementation you choose, one issue you're going to have to think about is timing/throughput. Do you want your program to send out the audio data at approximately the same rate it is meant to be played back, or do you want to send out audio data as fast as the client is willing to read it, and leave it up to the client to read the data at the appropriate speed? Keep in mind that each TCP stream's send-rate will be different, depending on how fast the client chooses to recv() the information, as well as on how well the network path between your server and the client performs.
The next problem to deal with after that is the problem of a slow client -- what do you want your program to do when one of the TCP connections is very slow, e.g. due to network congestion? Right now your code just blindly calls send() on all sockets without checking the return value, which (given that the sockets are non-blocking) means that if a given socket's output-buffer is full, then some (probably many) bytes of the file will simply get dropped -- maybe that is okay for your purpose, I don't know. Will the clients be able to make use of an mp3 data stream that has arbitrary sections missing? I imagine that the person running that client will hear glitches, at best.
Implementation issues aside, if it was me I'd prefer the single-threaded/select() approach, simply because it will be easier to test and validate. Either approach is going to take some doing to get right, but with a single thread, your program's behavior is much more deterministic -- either it works right or it doesn't, and running a given test will generally give the same result each time (assuming consistent network conditions). In a multithreaded program, OTOH, the scheduling of the threads is non-deterministic, which makes it very easy to end up with a program that works correctly 99.99% of the time and then seriously malfunctions, but only once in a blue moon -- a situation that can be very difficult to debug, as you end up spending hours or days just reproducing the fault, let alone diagnosing and fixing it.

Sending big files in Twisted

I have a really simple code that allows me to send an image from client to server. And it works.
As simple as this:
On the client side...
def sendFile(self):
image = open(picname)
data = image.read()
self.transport.write(data)
On the server side...
def dataReceived(self, data):
print 'Received'
f = open("image.png",'wb')
f.write(data)
f.close()
Problem with this is that only works if the image is up to 4.somethingkB, as it stops working when the image is bigger (at least doesn't work when gets to 6kB). Then, is when I see that the "Received" is being printed more than one time. Which makes me think that data is being separated in smaller chunks. However, even if those chunks of data get to the server (as I'm seeing the repeated print called from the dataReceived) the image is corrupted and can't be opened.
I don't know that much about protocols, but I supposed that TCP should be reliable, so the fact that the packets got there in a different order or so, shouldn't...happen? So I was thinking that maybe Twisted is doing something there that I ignore and maybe I should use another Protocol.
So here is my question. Is there something that I could do now to make it work or I should definitely change to another Protocol? If so...any idea? My goal would be sending a bigger image, maybe the order of hundreds of kB.

This is a variant of an entry in the Twisted FAQ:
Why is protocol.dataReceived called with only part of the data I called transport.write with?
TCP is a stream-based protocol. It is delivering a stream of bytes, which may be broken up into an arbitrary number of fragments. If you write one big blob of bytes, it may be broken up into an arbitrary number of smaller chunks, depending on the characteristics of your physical network connection. When you say that TCP should be "reliable", and that the chunks should arrive in order, you are roughly correct: however, what arrives in order is the bytes, not the chunks.
What you are doing in your dataReceived method is, upon receiving each chunk, opening a file and writing the contents of just that chunk to "image.png", then closing it. If you change it to open the file in connectionMade and close the file in connectionLost you should see at least vaguely the right behavior, although this will still cause you to get corrupted / truncated images if the connection is lost unexpectedly, with no warning. You should really use a framing protocol like AMP; although if you're just sending big blobs of data around, HTTP is probably a better choice.

Send strings over the network

Here's a simple question and I'm surprised I haven't come across a similar one already.
I would like two processes to send strings (messages) to each other with send() and receive() functions. Here's a basic example:
# Process 1
# ... deal with sockets, connect to process 2 ...
msg = 'An arbitrarily long string\nMaybe with line breaks'
conn.send(msg)
msg = conn.receive()
if process1(msg):
conn.send('ok')
else:
conn.send('nok')
and
# Process 2
# ... deal with sockets, connect to process 1 ...
msg = conn.receive()
conn.send(process2(msg))
msg = conn.receive()
if msg == 'ok':
print('Success')
elif msg == 'nok':
print('Failure')
else:
print('Protocol error')
I know it is quite easy with bare stream sockets, but that's still cumbersome and error prone (do several conn.recv() inside a loop and check for size, like HTTP or end of stream marker, like SMTP, etc).
By the way, it doesn't necessarily need to use sockets, as long as messages of any size can be reliably carried through the network in an efficient manner.
Am I doing something wrong? Isn't there a simple library (Twisted AMP doesn't look simple) doing exactly that? I've been searching the Internet for a few hours without success :)

You can use ZeroMQ, there is an excellent Python binding called pyzmq.
It is a library for writing all kinds of distributed software, based on the
concept of message queues.
The project got a lot of hype lately, and you will find numerous examples and
tutorials on the web.

How can I deserialize incoming data on the TCP server?

I've set up a server reusing the code found in the documentation where I have self.data = self.request.recv(1024).strip().
But how do I go from this, deserialize it to protobuf message (Message.proto/Message_pb2.py). Right now it seems that it's receiving chunks of 1024 bytes, and that more then one at the time... making it all rubbish :D

TCP is typically just a stream of data. Just because you sent each packet as a unit, doesn't mean the receiver gets that. Large messages may be split into multiple packets; small messages may be combined into a single packet.
The only way to interpret multiple messages over TCP is with some kind of "framing". With text-based protocols, a CR/LF/CRLF/zero-byte might signify the end of each frame, but that won't work with binary protocols like protobuf. In such cases, the most common approach is to simply prepend each message with the length, for example in a fixed-size (4 bytes?) network-byte-order chunk. Then the payload. In the case of protobuf, the API for your platform may also provide a mechanism to write the length as a "varint".
Then, reading is a matter of:
read an entire length-header
read (and buffer) that many bytes
process the buffered data
rinse and repeat
But keeping in mind that you might have (in a single packet) the end of one message, 2 complete messages, and the start of another message (maybe half of the length-header, just to make it interesting). So: keeping track of exactly what you are reading at any point becomes paramount.

Simplest python network messaging

I have a machine control system in Python that currently looks roughly like this
goal = GoalState()
while True:
current = get_current_state()
move_toward_goal(current,goal)
Now, I'm trying to add in the ability to control the machine over the network. The code I want to write would be something like this:
goal = GoalState()
while True:
if message_over_network():
goal = new_goal_from_message()
current = get_current_state()
move_toward_goal(current,goal)
What would be the simplest and most Pythonic way of adding this sort of networking capability into my application? Sockets could work, thought they don't particularly feel Pythonic. I've looked at XMLRPC and Twisted, but both seemed like they would require major revisions to the code. I also looked at ØMQ, but it felt like I was adding an external dependency that didn't offer anything that I didn't already have with sockets.
I'm not opposed to using any of the systems that I've addressed above, as what I believe to be failings are probably misunderstandings on my part. I'm simply curious as to the idiomatic way of handling this simple, common issue.

The are at least two issues you need to decide on:
How to exchange messages?
In what format?
Regarding 1. TCP sockets are the lowest level and you would need to deal with low level things like recognizing messages boundaries. Also, TCP connection gives you reliable delivery but only as long as the connection is not reset (due to for example a temporary network failure). If you want your application to gracefully recover when a TCP connection resets, you need to implement some form of messages acknowledgements to keep track what needs to be resend over the new connection. OMQ gives you higher level of abstraction than plain TCP connection. You don't need to deal with a stream of bytes but with whole messages. It still does not give you reliable delivery, messages can get lost, but it gives several communication patterns that can be used to ensure reliable delivery. 0MQ is also highly performant, IMO it is a good choice.
Regarding 2, if interoperability with other languages is not needed, Pickle is a very convenient and Pythonic choice. If interoperability is needed, you can consider JSON, or, if performance is an issue, binary format, such as Google protocol buffers. This last choice would require the most work (you'll need to define messages formats in .idl files) this would definitely not feel Pythonic.
Take a look how exchange of messages (any serializable Python object) over a plain socket can look like:
def send(sockfd, message):
string_message = cPickle.dumps(message)
write_int(sockfd, len(string_message))
write(sockfd, string_message)
def write_int(sockfd, integer):
integer_buf = struct.pack('>i', integer)
write(sockfd, integer_buf)
def write(sockfd, data):
data_len = len(data)
offset = 0
while offset != data_len:
offset += sockfd.send(data[offset:])
Not bad, but as you can see having to deal with serialization of a message length is quite low level.
And to receive such message:
def receive(self):
message_size = read_int(self.sockfd)
if message_size == None:
return None
data = read(self.sockfd, message_size)
if data == None:
return None
message = cPickle.loads(data)
return message
def read_int(sockfd):
int_size = struct.calcsize('>i')
intbuf = read(sockfd, int_size)
if intbuf == None:
return None
return struct.unpack('>i', intbuf)[0]
def read(sockfd, size):
data = ""
while len(data) != size:
newdata = sockfd.recv(size - len(data))
if len(newdata) == 0:
return None
data = data + newdata
return data
But this does not gracefully deal with errors (no attempt to determine which messages were delivered successfully).

If you're familiar with sockets, I would consider SocketServer.UDPServer (see http://docs.python.org/library/socketserver.html#socketserver-udpserver-example). UDP is definitely the simplest messaging system, but, obviously, you'll have to deal with fact that some messages can be lost, duplicated or delivered out of order. If your protocol is very simple, it's relatively easy to handle. The advantage is you don't need any additional threads and no external dependencies are needed. It also might be very good choice if you application doesn't have concept of session.
Might be good for a start, but there are much more details to be considered that are not included in your question. I also wouldn't be worried of the fact, that sockets are not very Pythonic. At the very end you're going to use sockets anyway, someone will just wrap them for you and you'll be forced to learn the framework, which in best case may be overwhelming for your requirements.
(Please note my opinion is highly biased, as I love dealing with raw sockets.)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Treatment of large size packets at Twisted - python

Related

I need advice should I use select or threading?

Sending big files in Twisted

Send strings over the network

How can I deserialize incoming data on the TCP server?

Simplest python network messaging

Categories

Resources