Receive socket size limits good?

Receive socket size limits good? - python

I am writing a program in Python that will act as a server and accept data from a client, is it a good idea to impose a hard limit as to the amount of data, if so why?
More info:
So certain chat programs limit the amount of text one can send per send (i.e. per time user presses send) so the question comes down to is there a legit reason for this and if yes, what is it?

Most likely you've seen code which protects against "extra" incoming data. This is often due to the possibility of buffer overruns, where the extra data being copied into memory overruns the pre-allocated array and overwrites executable code with attacker code. Code written in languages like C typically has a lot of length checking to prevent this type of attack. Functions such as gets, and strcpy are replaced with their safer counterparts like fgets and strncpy which have a length argument to prevent buffer overruns.
If you use a dynamic language like Python, your arrays resize so they won't overflow and clobber other memory, but you still have to be careful about sanitizing foreign data.
Chat programs likely limit the size of a message for reasons such as database field size. If 80% of your incoming messages are 40 characters or less, 90% are 60 characters or less, and 98% are 80 characters or less, why make your message text field allow 10k characters per message?

What is your question exactly?
What happens when you do receive on a socket is that the current available data in the socket buffer is immediately returned. If you give receive (or read, I guess), a huge buffer size, such as 40000, it'll likely never return that much data at once. If you give it a tiny buffer size like 100, then it'll return the 100 bytes it has immediately and still have more available. Either way, you're not imposing a limit on how much data the client is sending you.

I don't know what your actual application is, however, setting a hard limit on the total amount of data that a client can send could be useful in reducing your exposure to denial of service attacks, e.g. client connects and sends 100MB of data which could load your application unacceptably.
But it really depends on what you application is. Are you after a per line limit or a total per connection limit or what?

Related

Maximum UDP-packet number which can be stored in the socket buffer? (Ubuntu)

Client:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
msg = b"X"
for i in range(1500):
s.sendto(msg,("<IP>",<PORT>))
Server:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(("",>PORT>))
counter = 0
for i in range(1500):
s.recv(1)
counter += 1
I have two machines - the first one with Windows7 and the second one with Ubuntu 16.04.
Now the problem:
If I try to send 1500 UDP-packets (for example) from the client to the server, then:
Windows7 is Client and Ubuntu16.04 is server:
server only receives between 200 and 280 packets
Ubuntu16.04 is Client and Windows7 is server:
server receives all 1500 packets
My first question:
What is the reason for this? Are there any limitations on the OS?
Second question:
Is it possible to optimize the sockets in Python?
I know that it will be possible, that UDP-packages can get lost - but up to 4/5 of all packets?
edit:
Why this kind of question?
Imagine I have a big sensor-network... and one server. Each sensor-node should send his information to the server. The program on the server can only be programmed in an asynchronious way - the server is only able to read the data out of the socket at a specific time. Now I want to calculate how many sensor-nodes can send data via UDP-packets to the server during the period of time where the server is not able to read out his buffer. With the information how many different UDP-packets can be stored in the buffer, I can calculate how many sensor-nodes I can use...

Instead of writing a cluttered comment trail, here's a few cents to the problem.
As documented by redhat the default values for the different OS:es in this writing moment is:
Linux: 131071
Windows: No known limit
Solaris: 262144
FreeBSD, Darwin: 262144
AIX: 1048576
These values should correspond to the output of:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
print(s.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF))
These numbers represents how many bytes can be held at any given moment in the socket receive buffer. The numbers can be increased at any given time at the cost of RAM being reserved for this buffer (or at least that's what I remember).
On Linux (And some BSD flavors), to increase the buffer you can use sysctl:
sudo sysctl -w net.core.rmem_max=425984
sudo sysctl -w net.core.rmem_default=425984
This sets the buffer to 416KB. You can most likely increase this to a few megabytes if buffering is something you see a lot of.
However, buffers usually indicate a problem because your machine should rarely have much in the buffer at all. It's a mechanism to handle sudden peaks and to serve as a tiny platter for your machine to store work load. If it gets to full, either you have a really slow code that needs to get quicker or you need to offload your server quite a bit. Because if the buffer fills up - no matter how big it is, eventually it will get full again.
Supposedly you can also increase the buffer size from Python via:
s.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 1024)
However, again, if your OS is capped at a certain roof - that will supersede any value you put in your python program.
tl;dr:
Every OS has limitations based on optimizations/performance reasons. Sockets, file handles (essentially any I/O operation) has them.
It's common, you should find a lot of information on it. All this information above was mostly found via a search on "linux udp recieve buffer".
Also, "windows increase udp buffer size" landed me on this: Change default socket buffer size under Windows
Final note
As you mentioned, the performance, amount etc can vary vastly due to the fact that you're using UDP. It is prone to data loss at benefit of speed. Distance between servers, drivers, NIC's (especially important, some NIC's have a limited hardware buffer that can cause these things) etc all impact the data you'll receive. Windows do a lot of auto-magic as well in these situations, make sure you tune your Linux machine to the same parameters. A UDP packet consists not only of the ammount of data you send.. but all the parameters in the headers before it (in the IP packet, for instance TTL, Fragmentation, ECN, etc.).
For instance, you can tune how much memory your UDP stack can eat under certain loads, to find out your lower threshold (UDP won't bother checking RAM usage), pressure threshold (memory management under load) and the max value UDP sockets can use per socket.
sudo sysctl net.ipv4.udp_mem
Here's a good article on UDP tuning from ESnet:
https://fasterdata.es.net/network-tuning/udp-tuning/
Beyond this, you're tweaking to your grave. Most likely, your problem can be solved by redesigning your code. Because unless you're actually pushing 1-10GB/s from your network, the kernel should be able to handle it assuming you process the packets fast enough, rather than piling them up in a buffer.

Sending big files in Twisted

I have a really simple code that allows me to send an image from client to server. And it works.
As simple as this:
On the client side...
def sendFile(self):
image = open(picname)
data = image.read()
self.transport.write(data)
On the server side...
def dataReceived(self, data):
print 'Received'
f = open("image.png",'wb')
f.write(data)
f.close()
Problem with this is that only works if the image is up to 4.somethingkB, as it stops working when the image is bigger (at least doesn't work when gets to 6kB). Then, is when I see that the "Received" is being printed more than one time. Which makes me think that data is being separated in smaller chunks. However, even if those chunks of data get to the server (as I'm seeing the repeated print called from the dataReceived) the image is corrupted and can't be opened.
I don't know that much about protocols, but I supposed that TCP should be reliable, so the fact that the packets got there in a different order or so, shouldn't...happen? So I was thinking that maybe Twisted is doing something there that I ignore and maybe I should use another Protocol.
So here is my question. Is there something that I could do now to make it work or I should definitely change to another Protocol? If so...any idea? My goal would be sending a bigger image, maybe the order of hundreds of kB.

This is a variant of an entry in the Twisted FAQ:
Why is protocol.dataReceived called with only part of the data I called transport.write with?
TCP is a stream-based protocol. It is delivering a stream of bytes, which may be broken up into an arbitrary number of fragments. If you write one big blob of bytes, it may be broken up into an arbitrary number of smaller chunks, depending on the characteristics of your physical network connection. When you say that TCP should be "reliable", and that the chunks should arrive in order, you are roughly correct: however, what arrives in order is the bytes, not the chunks.
What you are doing in your dataReceived method is, upon receiving each chunk, opening a file and writing the contents of just that chunk to "image.png", then closing it. If you change it to open the file in connectionMade and close the file in connectionLost you should see at least vaguely the right behavior, although this will still cause you to get corrupted / truncated images if the connection is lost unexpectedly, with no warning. You should really use a framing protocol like AMP; although if you're just sending big blobs of data around, HTTP is probably a better choice.

What is the best way to receive a capped number of bytes from a socket?

I'm using a socket(AF_UNIX, SOCK_STREAM) socket to communicate between two processes.
I'd like the receiver to receive a capped number of bytes, where the cap is quite large (in the 16MB to 64MB range). That is, I'd like to receive a message of up to [cap] bytes, and if the message is larger than that, I'd like to stop receiving or discard the rest of the message.
From the docs, the way to do this would seem to be to use socket.recv(bufsize=[cap]). But there is the following note:
Note For best match with hardware and network realities, the value of bufsize should be a relatively small power of 2, for example, 4096.
That suggests that the buffer size is not intended for this purpose. So, what would be the best way to achieve this?

You should use a loop, as demonstrated in this ActiveState recipe.
The bufsize argument is best left at its default, as per the doc you quoted. Simply call recv and add the received byte string to your own buffer until one of these conditions is met:
recv returns an empty value. This means the other end has closed the connection
your buffer reaches its maximum allowed size (the capacity you are talking about)
an exception is raised. This probably means the connection has been ungracefully closed - inspect the error code
Note that recv will block if you do not know in advance how many bytes you want to receive. Have the client send the message size beforehand to avoid this issue. But keep in mind your recv can still block if the connection is laggy. Use timeouts and watch for this 4th stop condition in your loop to be sure.

How can I deserialize incoming data on the TCP server?

I've set up a server reusing the code found in the documentation where I have self.data = self.request.recv(1024).strip().
But how do I go from this, deserialize it to protobuf message (Message.proto/Message_pb2.py). Right now it seems that it's receiving chunks of 1024 bytes, and that more then one at the time... making it all rubbish :D

TCP is typically just a stream of data. Just because you sent each packet as a unit, doesn't mean the receiver gets that. Large messages may be split into multiple packets; small messages may be combined into a single packet.
The only way to interpret multiple messages over TCP is with some kind of "framing". With text-based protocols, a CR/LF/CRLF/zero-byte might signify the end of each frame, but that won't work with binary protocols like protobuf. In such cases, the most common approach is to simply prepend each message with the length, for example in a fixed-size (4 bytes?) network-byte-order chunk. Then the payload. In the case of protobuf, the API for your platform may also provide a mechanism to write the length as a "varint".
Then, reading is a matter of:
read an entire length-header
read (and buffer) that many bytes
process the buffered data
rinse and repeat
But keeping in mind that you might have (in a single packet) the end of one message, 2 complete messages, and the start of another message (maybe half of the length-header, just to make it interesting). So: keeping track of exactly what you are reading at any point becomes paramount.

Pyserial buffer fills faster than I can read

I am reading data from a microcontroller via serial, at a baudrate of 921600. I'm reading a large amount of ASCII csv data, and since it comes in so fast, the buffer get's filled and all the rest of the data gets lost before I can read it. I know I could manually edit the pyserial source code for serialwin32 to increase the buffer size, but I was wondering if there is another way around it?
I can only estimate the amount of data I will receive, but it is somewhere around 200kB of data.

Have you considered reading from the serial interface in a separate thread that is running prior to sending the command to uC to send the data?
This would remove some of the delay after the write command and starting the read. There are other SO users who have had success with this method, granted they weren't having buffer overruns.
If this isn't clear let me know and I can throw something together to show this.
EDIT
Thinking about it a bit more, if you're trying to read from the buffer and write it out to the file system even the standalone thread might not save you. To minimize the processing time you might consider reading say 100 bytes at a time serial.Read(size=100) and pushing that data into a Queue to process it all after the transfer has completed
Pseudo Code Example
def thread_main_loop(myserialobj, data_queue):
data_queue.put_no_wait(myserialobj.Read(size=100))
def process_queue_when_done(data_queue):
while(1):
if len(data_queue) > 0:
poped_data = data_queue.get_no_wait()
# Process the data as needed
else:
break;

There's a "Receive Buffer" slider that's accessible from the com port's Properties Page in Device Manager. It is found by following the Advanced button on the "Port Settings" tab.
More info:
http://support.microsoft.com/kb/131016 under heading Receive Buffer
http://tldp.org/HOWTO/Serial-HOWTO-4.html under heading Interrupts
Try knocking it down a notch or two.

You do not need to manually change pyserial code.
If you run your code on Windows platform, you simply need to add a line in your code
ser.set_buffer_size(rx_size = 12800, tx_size = 12800)
Where 12800 is an arbitrary number I chose. You can make receiving(rx) and transmitting(tx) buffer as big as 2147483647a
See also:
https://docs.python.org/3/library/ctypes.html
https://msdn.microsoft.com/en-us/library/system.io.ports.serialport.readbuffersize(v=vs.110).aspx
You might be able to setup the serial port from the DLL
// Setup serial
mySerialPort.BaudRate = 9600;
mySerialPort.PortName = comPort;
mySerialPort.Parity = Parity.None;
mySerialPort.StopBits = StopBits.One;
mySerialPort.DataBits = 8;
mySerialPort.Handshake = Handshake.None;
mySerialPort.RtsEnable = true;
mySerialPort.ReadBufferSize = 32768;
Property Value
Type: System.Int32
The buffer size, in bytes. The default value is 4096; the maximum value is that of a positive int, or 2147483647
And then open and use it in Python

I am somewhat surprised that nobody has yet mentioned the correct solution to such problems (when available), which is effective flow control through either software (XON/XOFF) or hardware flow control between the microcontroller and its sink. The issue is well described by this web article.
It may be that the source device doesn't honour such protocols, in which case you are stuck with a series of solutions that delegate the problem upwards to where more resources are available (move it from the UART buffer to the driver and upwards towards your application code). If you are losing data, it would certainly seem sensible to try and implement a lower data rate if that's a possibility.

For me the problem was it was overloading the buffer when receiving data from the Arduino.
All I had to do was mySerialPort.flushInput() and it worked.
I don't know why mySerialPort.flush() didn't work. flush() must only flush the outgoing data?
All I know is mySerialPort.flushInput() solved my problems.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.