Muliple clients sending UDP data to python socket are getting lost

Muliple clients sending UDP data to python socket are getting lost - python

I have a python socket reader to listen for incoming UDP packets from about 5000 clients every minute. As I started rolling it out it was working fine but now that I'm up to about 4000 clients I'm losing about 50% of the data coming in. The VM has plenty of memory and CPU so I assume it's something with my UDP socket listener on the server getting too much data at once. Via cron, every minute the clients send in this data:
site8385','10.255.255.255','1525215422','3.3.0-2','Jackel','00:15:65:20:39:10'
This is the socket reader portion of my listener script.
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
port = 18000
s.bind(('', port))
while True:
# Establish connection with client.
d = s.recvfrom(1024)
Could it be the buffer size is too small? How do I determine the size of the packets coming in so I can adjust the 1024 value?

Every 60 seconds, you get a storm of ~5000 messages. You process them sequentially, and it takes "quite a bit" of time. So pretty quickly, one of your buffers gets full up and either your OS, your network card, or your router starts dropping packets. (Most likely it's the buffer your kernel sets aside for this particular socket, and the kernel is dropping the packets, but all of the other options are possible too.)
You could try increasing those buffers. That will give yourself a lot more "allowed lag time", so you can get farther behind before the kernel starts dropping packets. If you want to go down this road, the first step is setsockopt to raise the SO_RCVBUF value, but you really need to learn about all the issues that could be involved here.1
If you control the client code, you could also have the clients stagger their packets (e.g., just sleeping for random.random() * 55 before the send).
But it's probably better to try to actually service those packets as quickly as possible, and do the processing in the background.2
Trying to do this in-thread could be ideal, but it could also be very fiddly to get right. A simpler solution is to just a background thread, or a pool of them:
def process_msg(d):
# your actual processing code
with concurrent.futures.ThreadPoolExecutor(max_workers=12) as x:
while True:
d = s.recvfrom(1024)
x.submit(process_msg, d)
This may not actually help. If your processing is CPU-bound rather than I/O-bound, the background threads will just be fighting over the GIL with the main thread. If you're using Python 2.7 or 3.2 or something else old, even I/O-bound threads can interfere in some situations. But either way, there's an easy fix: Just change that ThreadPoolExecutor to a ProcessPoolExecutor (and maybe drop max_workers to 1 fewer than the number of cores you have, to make sure the receiving code can have a whole core to itself).
1. Redhat has a nice doc on Network Performance Tuning. It's written more from the sysadmin's point of view than the programmer's, and it expects you to either know, or know how to look up, a lot of background information—but it should be helpful if you're willing to do that. You may also want to try searching Server Fault rather than Stack Overflow if you want to go down this road.
2. Of course if there's more than a minute's work to be done to process each minute's messages, the queue will just get longer and longer, and eventually everything will fail catastrophically, which is worse than just dropping some packets until you catch up… But hopefully that's not an issue here.

Related

Handling lots of UDP packets in python

I'm developing a program in Python that uses UDP to receive data from an FPGA (a data collector device). The speed is very high, about 54 MB/s at the highest setting, that's why we use a dedicated gigabit ethernet connection. My problem is: a lot of packages get lost. This is not a momentary problem, the packets come in for a long time, then there's a few seconds long pause, then everything seems fine again. The pause depends on the speed (faster communication, more lost).
I've tried setting buffers higher, but something seems to be missing. I've set self.sock_data.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF,2**28) to increase buffer size along with the matching kernel option: sysctl -w net.core.rmem_max=268435456.
Packages have an internal counter, so I know which one got lost (also, I use this to fix their order). An example: 11s of data lost, around 357168 packages. (I've checked, and it's not a multiple of an internal buffer size in either of my program or the FPGA's firmware). I'm watching the socket on a separate thread, and immediately put them into a Queue to save everything.
What else should I set or check?

Maximum UDP-packet number which can be stored in the socket buffer? (Ubuntu)

Client:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
msg = b"X"
for i in range(1500):
s.sendto(msg,("<IP>",<PORT>))
Server:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(("",>PORT>))
counter = 0
for i in range(1500):
s.recv(1)
counter += 1
I have two machines - the first one with Windows7 and the second one with Ubuntu 16.04.
Now the problem:
If I try to send 1500 UDP-packets (for example) from the client to the server, then:
Windows7 is Client and Ubuntu16.04 is server:
server only receives between 200 and 280 packets
Ubuntu16.04 is Client and Windows7 is server:
server receives all 1500 packets
My first question:
What is the reason for this? Are there any limitations on the OS?
Second question:
Is it possible to optimize the sockets in Python?
I know that it will be possible, that UDP-packages can get lost - but up to 4/5 of all packets?
edit:
Why this kind of question?
Imagine I have a big sensor-network... and one server. Each sensor-node should send his information to the server. The program on the server can only be programmed in an asynchronious way - the server is only able to read the data out of the socket at a specific time. Now I want to calculate how many sensor-nodes can send data via UDP-packets to the server during the period of time where the server is not able to read out his buffer. With the information how many different UDP-packets can be stored in the buffer, I can calculate how many sensor-nodes I can use...

Instead of writing a cluttered comment trail, here's a few cents to the problem.
As documented by redhat the default values for the different OS:es in this writing moment is:
Linux: 131071
Windows: No known limit
Solaris: 262144
FreeBSD, Darwin: 262144
AIX: 1048576
These values should correspond to the output of:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
print(s.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF))
These numbers represents how many bytes can be held at any given moment in the socket receive buffer. The numbers can be increased at any given time at the cost of RAM being reserved for this buffer (or at least that's what I remember).
On Linux (And some BSD flavors), to increase the buffer you can use sysctl:
sudo sysctl -w net.core.rmem_max=425984
sudo sysctl -w net.core.rmem_default=425984
This sets the buffer to 416KB. You can most likely increase this to a few megabytes if buffering is something you see a lot of.
However, buffers usually indicate a problem because your machine should rarely have much in the buffer at all. It's a mechanism to handle sudden peaks and to serve as a tiny platter for your machine to store work load. If it gets to full, either you have a really slow code that needs to get quicker or you need to offload your server quite a bit. Because if the buffer fills up - no matter how big it is, eventually it will get full again.
Supposedly you can also increase the buffer size from Python via:
s.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 1024)
However, again, if your OS is capped at a certain roof - that will supersede any value you put in your python program.
tl;dr:
Every OS has limitations based on optimizations/performance reasons. Sockets, file handles (essentially any I/O operation) has them.
It's common, you should find a lot of information on it. All this information above was mostly found via a search on "linux udp recieve buffer".
Also, "windows increase udp buffer size" landed me on this: Change default socket buffer size under Windows
Final note
As you mentioned, the performance, amount etc can vary vastly due to the fact that you're using UDP. It is prone to data loss at benefit of speed. Distance between servers, drivers, NIC's (especially important, some NIC's have a limited hardware buffer that can cause these things) etc all impact the data you'll receive. Windows do a lot of auto-magic as well in these situations, make sure you tune your Linux machine to the same parameters. A UDP packet consists not only of the ammount of data you send.. but all the parameters in the headers before it (in the IP packet, for instance TTL, Fragmentation, ECN, etc.).
For instance, you can tune how much memory your UDP stack can eat under certain loads, to find out your lower threshold (UDP won't bother checking RAM usage), pressure threshold (memory management under load) and the max value UDP sockets can use per socket.
sudo sysctl net.ipv4.udp_mem
Here's a good article on UDP tuning from ESnet:
https://fasterdata.es.net/network-tuning/udp-tuning/
Beyond this, you're tweaking to your grave. Most likely, your problem can be solved by redesigning your code. Because unless you're actually pushing 1-10GB/s from your network, the kernel should be able to handle it assuming you process the packets fast enough, rather than piling them up in a buffer.

I need advice should I use select or threading?

I'm building a live radio streamer, and I was wondering how I should handle multiple connections. Now from my experience select will block the audio from being streamed. It only plays 3 seconds then stops playing. I will provide an example of what I mean.
import socket, select
headers = """
HTTP/1.0 200 OK\n
Content-Type: audio/mpeg\n
Connection: keep-alive\n
\n\n
"""
file="/path/to/file.mp3"
bufsize=4096 # actually have no idea what this should be but python-shout uses this amount
sock = socket.socket()
cons = list()
buf = 0
nbuf = 0
def runMe():
cons.append(sock)
file = open(file)
nbuf = file.read(bufsize) # current buffer
while True:
buf = nbuf
nbuf = file.read(bufsize)
if len(buf) == 0:
break
rl, wl, xl = select.select(cons, [], [], 0.2)
for s in rl:
if s == sock:
con, addr = s.accept()
con.setblocking(0)
cons.append(con)
con.send(header)
else:
data = s.recv(1024)
if not data:
s.close()
cons.remove(s)
else:
s.send(buf)
That is an example of how i'd use select. But, the song will not play all the way. But if I send outside the select loop it'll play but it'll die on a 2nd connection. Should I use threading?

That is an example of how i'd use select. But, the song will not play
all the way. But if I send outside the select loop it'll play but
it'll die on a 2nd connection. Should I use threading?
You can do it either way, but if your select-implementation isn't working properly it's because your code is incorrect, not because a select-based implementation isn't capable of doing the job -- and I don't think a multithreaded solution will be easier to get right than a select-based solution.
Regardless of which implementation you choose, one issue you're going to have to think about is timing/throughput. Do you want your program to send out the audio data at approximately the same rate it is meant to be played back, or do you want to send out audio data as fast as the client is willing to read it, and leave it up to the client to read the data at the appropriate speed? Keep in mind that each TCP stream's send-rate will be different, depending on how fast the client chooses to recv() the information, as well as on how well the network path between your server and the client performs.
The next problem to deal with after that is the problem of a slow client -- what do you want your program to do when one of the TCP connections is very slow, e.g. due to network congestion? Right now your code just blindly calls send() on all sockets without checking the return value, which (given that the sockets are non-blocking) means that if a given socket's output-buffer is full, then some (probably many) bytes of the file will simply get dropped -- maybe that is okay for your purpose, I don't know. Will the clients be able to make use of an mp3 data stream that has arbitrary sections missing? I imagine that the person running that client will hear glitches, at best.
Implementation issues aside, if it was me I'd prefer the single-threaded/select() approach, simply because it will be easier to test and validate. Either approach is going to take some doing to get right, but with a single thread, your program's behavior is much more deterministic -- either it works right or it doesn't, and running a given test will generally give the same result each time (assuming consistent network conditions). In a multithreaded program, OTOH, the scheduling of the threads is non-deterministic, which makes it very easy to end up with a program that works correctly 99.99% of the time and then seriously malfunctions, but only once in a blue moon -- a situation that can be very difficult to debug, as you end up spending hours or days just reproducing the fault, let alone diagnosing and fixing it.

ZMQ DEALER ROUTER loses message at high frequency?

I am sending 20000 messages from a DEALER to a ROUTER using pyzmq.
When I pause 0.0001 seconds between each messages they all arrive but if I send them 10x faster by pausing 0.00001 per message only around half of the messages arrive.
What is causing the problem?

What is causing the problem?
A default setup of the ZMQ IO-thread - that is responsible for the mode of operations.
I would dare to call it a problem, the more if you invest your time and dive deeper into the excellent ZMQ concept and architecture.
Since early versions of the ZMQ library, there were some important parameters, that help the central masterpiece ( the IO-thread ) keep the grounds both stable and scalable and thus giving you this powerful framework.
Zero SHARING / Zero COPY / (almost) Zero LATENCY are the maxims that do not come at zero-cost.
The ZMQ.Context instance has quite a rich internal parametrisation that can be modified via API methods.
Let me quote from a marvelous and precious source -- Pieter HINTJENS' book, Code Connected, Volume 1.
( It is definitely worth spending time and step through the PDF copy. C-language code snippets do not hurt anyone's pythonic state of mind as the key messages are in the text and stories that Pieter has crafted into his 300+ thrilling pages ).
High-Water Marks
When you can send messages rapidly from process to process, you soon discover that memory is a precious resource, and one that can be trivially filled up. A few seconds of delay somewhere in a process can turn into a backlog that blows up a server unless you understand the problem and take precautions.
...
ØMQ uses the concept of HWM (high-water mark) to define the capacity of its internal pipes. Each connection out of a socket or into a socket has its own pipe, and HWM for sending, and/or receiving, depending on the socket type. Some sockets (PUB, PUSH) only have send buffers. Some (SUB, PULL, REQ, REP) only have receive buffers. Some (DEALER, ROUTER, PAIR) have both send and receive buffers.
In ØMQ v2.x, the HWM was infinite by default. This was easy but also typically fatal for high-volume publishers. In ØMQ v3.x, it’s set to 1,000 by default, which is more sensible. If you’re still using ØMQ v2.x, you should always set a HWM on your sockets, be it 1,000 to match ØMQ v3.x or another figure that takes into account your message sizes and expected subscriber performance.
When your socket reaches its HWM, it will either block or drop data depending on the socket type. PUB and ROUTER sockets will drop data if they reach their HWM, while other socket types will block. Over the inproc transport, the sender and receiver share the same buffers, so the real HWM is the sum of the HWM set by both sides.
Lastly, the HWM-s are not exact; while you may get up to 1,000 messages by default, the real buffer size may be much lower (as little as half), due to the way libzmq implements its queues.

Multithreaded Python socket server CPU usage spirals out of control

My Situation:
I have a python server that does the following:
Listens on port 500 (for firewall reasons), and everytime it receives a connection, spawns a thread to handle the connection. Each thread that is started will respond to client input with a few methods that are mostly database interaction (I actually use the Django ORM, as my server application is coupled with a Django website).
I'd like to use select at some point (and ultimately, Twisted), but right now, I can't, so i'll have to go with this.
My problem:
For some reason I can't seem to understand, the server's CPU usage sometime spirals totally out of control and goes up to 200% CPU usage (we're running on a dual core), making it even pretty difficult to ssh in to stop it.
What I don't understand is that it usually does not happen during operation (If I have one or multiple clients connected, the CPU usage stays very low), but once all clients have disconnected, my server goes up to 200% CPU usage.
This led me to believe that the problem is not in the worker threads (If they didn't die properly, I'd rather expect a massive RAM usage than a CPU one), but in the server's accept method.
Up to now, I've been using this code:
s = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
s.bind((host,port))
s.listen(5)
logger.warning('Started - listening on {0}:{1}'.format(host, port))
while True:
(newsocket, clientaddr) = s.accept()
logger.info('Received connection from {0}:{1}'.format(clientaddr[0], clientaddr[1]))
WorkerThread(newsocket, clientaddr, timeout).start()
I can't really grasp what could be going wrong here, but I thought that maybe I should use the following syntax:
while True:
(newsocket, clientaddr) = s.accept()
logger.info('Received connection from {0}:{1}'.format(clientaddr[0], clientaddr[1]))
wk = WorkerThread(newsocket, clientaddr, timeout)
wk.start()
I've seen this written a lot more often than what I've doing up to now.
Does anyone of you knows whether this could cause a problem such as the one I'm describing?
Thanks in advance,

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.