Update to original post: A colleague pointed out what I was doing wrong.
I'll give the explanation at the bottom of the post, as it might be helpful
for others.
I am trying to get a basic understanding of the limits on network performance
of python programs and have run into an anomaly. The code fragment
while 1:
sock.sendto("a",target)
sends UDP packets to a target machine, as fast as the host will send.
I measure a sending rate of just over 4000 packets per second, or 250 us
per packet. This seems slow, even for an interpreted language like python
(the program is running on a 2 GHz AMD opteron, Linux, python version 2.6.6).
I've seen much better performance in python for TCP, so I find this a bit weird.
If I run this in the background and run top, I find that python is using
just 25% of the cpu, suggesting that python may be artificially delaying
the transmission of UDP packets.
Has anyone else experienced anything similar? Does anyone know if python
does limit the rate of packet transmission, and if there is a way to turn
this off?
BTW, a similar C++ program can send well over 200,000 packets per second,
so it's not an intrinsic limit of the platform or OS.
So, it turns out I made a silly newbie mistake. I neglected to call gethostbyname
explicitly. Consequently, the target address in the sendto command contained
a symbolic name. This was triggering a name resolution every time a packet was
sent. After fixing this, I measure a maximum sending rate of about 120,000 p/s.
Much better.
You might want to post a more complete code sample so that others can repeat your benchmark. 250μs per loop iteration is too slow. Based on daily experience with optimizing Python, I would expect Python's interpreter overhead to be well below 1μs on a modern machine. In other words, if the C++ program is sending 200k packets per second, I would expect Python to be in the same order of magnitude of speed.
(In light of the above, the usual optimization suggestions such as moving the attribute lookup of sock.sendto out of the loop do not apply here because the slowness is coming from another source.)
A good first step to be to use strace to check what Python is actually doing. Is it a single-threaded program or a multithreaded application that might be losing time waiting on the GIL? Is sock a normal Python socket or is it part of a more elaborate API? Does the same happen when you directly call os.write on the socket's fileno?
Have you tried doing a connect() first, then using send() instead of sendto()? (UDP connect() just establishes the destination address, it doesn't actually make a "connection".) I'm rusty on this, but I believe Python does more interpretation on the address parameter than C sockets, which might be adding overhead.
Related
I am developing a python application where a drone and a computer communicate over local network (wifi). My need is to stream the drone's camera to OpenCV-python on the computer with the lowest possible latency at the highest possible resolution.
Thus far I have been trying rather naive approaches over TCP that give okay-ish results, I get something like 0.1s or 0.2s latency for VGA format. It has a point for some use cases as it enables lossless transmission, but since the most common scenario is to aggressively control the drone in real time from the stream, I am aiming for something of much lower latency and hopefully higher resolution.
My advisor has recommended using WebRTC. I have done some research on the matter, found the aiortc library that implements WebRTC in python, but I am unsure this is the way to go for my use case as it seems to be more geared toward web developers.
I am a bit lost I think. Could you highlight the advantages of WebRTC in my application if any, or point me toward solutions that are more relevant for my problem please?
Thanks in advance!
[1]
rtc communicate peer to peer, I think you knew that. And if u use local network U will not need STUN server or TURN to connect two devices. That is make more decrease latency and code shorter. I'm not work with drone but I think your method stream had latency < 0,2 is good.
fyi protocol campare
So I'm quite deep into this monitoring implementation, and I'm curious as to how to calculate the theoretical maximum it can handle.
I know python is not the most efficient language, and I'm honestly not too worried about missing a packet here or there - but how can I figure out how fast it's going?
My network isn't corporate large, but it can keep up with an nmap scan. (Or so it seems)
It matches Wireshark, so I'm curious of it's limitations on a network with thousands of computers.
The scapy documentation doesn't seem to get too far into it, but I admit I may have missed something
I create an async sniff with a callback that just throws the desired information into a hashtable/dictionary with the srcMac as a key, if that would affect anything.
In my case where I was sending 4 MB of file from Host A to Host B using python sockets, I was getting 2.3Gb/s speed. The Scapy speed was 100x slower than the network speed depending on what kind of operations are you doing after sniffing the packets. When I was doing too much operations on sniffed packets, I could capture only 15 of 1000 packets(4MB). After optimizing my code Maximum I could sniff was still less than 200 packets out of 1000. If you need more precise measurements feel free to tag me, happy to share knowledge.
Background: I have a TCP Socket python script set up on a Raspberry Pi 4 to speak with an off-network machine. The machine has commands set up that I send it a request over the socket essentially querying "What is the value of A?" and the machine response is the value of A. While I am not super familiar with the coding of the machine, my understanding is that when queried in this way, it prints out the value. I am receiving that data through the socket by simply putting a socket.sendall("What is the value of A") followed immediately by a socket.recv(SIZE). My analogy is its like throwing a ball against the wall with one hand, and catching it with the other. The connection is made with an Ethernet cable approximately 100ft long.
Problem: I can query across this socket quite a bit, up to the point that I can send 60 entry arrays back and forth between 5-10 times, but eventually, the connection closes up. I'm not quite sure why this is. It gets to the point where I have to unplug everything, close out all the sockets, and just give it some time before trying again. While it may not be clear to diagnose the problem without fully understanding the machine (which I can't really give more information about), I am leaning towards using threading to run the two processes independent: one thread to query, one thread to receive. My guess would be that the recv() misses the proverbial ball, and then sits waiting to hear back from the machine, but the machine never talks again because the code is just left to wait. I don't have a lot of experience with threading, and therefore would appreciate some suggestions. Another thought is that the Ethernet cable is too long for the Pi to handle pushing that much data across. This feels more naive, but I am not a network engineer and therefore don't claim to fully understand that process.
Thanks in advance, feel free to ask any clarifying questions.
I'm developing a program in Python that uses UDP to receive data from an FPGA (a data collector device). The speed is very high, about 54 MB/s at the highest setting, that's why we use a dedicated gigabit ethernet connection. My problem is: a lot of packages get lost. This is not a momentary problem, the packets come in for a long time, then there's a few seconds long pause, then everything seems fine again. The pause depends on the speed (faster communication, more lost).
I've tried setting buffers higher, but something seems to be missing. I've set self.sock_data.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF,2**28) to increase buffer size along with the matching kernel option: sysctl -w net.core.rmem_max=268435456.
Packages have an internal counter, so I know which one got lost (also, I use this to fix their order). An example: 11s of data lost, around 357168 packages. (I've checked, and it's not a multiple of an internal buffer size in either of my program or the FPGA's firmware). I'm watching the socket on a separate thread, and immediately put them into a Queue to save everything.
What else should I set or check?
Client:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
msg = b"X"
for i in range(1500):
s.sendto(msg,("<IP>",<PORT>))
Server:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(("",>PORT>))
counter = 0
for i in range(1500):
s.recv(1)
counter += 1
I have two machines - the first one with Windows7 and the second one with Ubuntu 16.04.
Now the problem:
If I try to send 1500 UDP-packets (for example) from the client to the server, then:
Windows7 is Client and Ubuntu16.04 is server:
server only receives between 200 and 280 packets
Ubuntu16.04 is Client and Windows7 is server:
server receives all 1500 packets
My first question:
What is the reason for this? Are there any limitations on the OS?
Second question:
Is it possible to optimize the sockets in Python?
I know that it will be possible, that UDP-packages can get lost - but up to 4/5 of all packets?
edit:
Why this kind of question?
Imagine I have a big sensor-network... and one server. Each sensor-node should send his information to the server. The program on the server can only be programmed in an asynchronious way - the server is only able to read the data out of the socket at a specific time. Now I want to calculate how many sensor-nodes can send data via UDP-packets to the server during the period of time where the server is not able to read out his buffer. With the information how many different UDP-packets can be stored in the buffer, I can calculate how many sensor-nodes I can use...
Instead of writing a cluttered comment trail, here's a few cents to the problem.
As documented by redhat the default values for the different OS:es in this writing moment is:
Linux: 131071
Windows: No known limit
Solaris: 262144
FreeBSD, Darwin: 262144
AIX: 1048576
These values should correspond to the output of:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
print(s.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF))
These numbers represents how many bytes can be held at any given moment in the socket receive buffer. The numbers can be increased at any given time at the cost of RAM being reserved for this buffer (or at least that's what I remember).
On Linux (And some BSD flavors), to increase the buffer you can use sysctl:
sudo sysctl -w net.core.rmem_max=425984
sudo sysctl -w net.core.rmem_default=425984
This sets the buffer to 416KB. You can most likely increase this to a few megabytes if buffering is something you see a lot of.
However, buffers usually indicate a problem because your machine should rarely have much in the buffer at all. It's a mechanism to handle sudden peaks and to serve as a tiny platter for your machine to store work load. If it gets to full, either you have a really slow code that needs to get quicker or you need to offload your server quite a bit. Because if the buffer fills up - no matter how big it is, eventually it will get full again.
Supposedly you can also increase the buffer size from Python via:
s.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 1024)
However, again, if your OS is capped at a certain roof - that will supersede any value you put in your python program.
tl;dr:
Every OS has limitations based on optimizations/performance reasons. Sockets, file handles (essentially any I/O operation) has them.
It's common, you should find a lot of information on it. All this information above was mostly found via a search on "linux udp recieve buffer".
Also, "windows increase udp buffer size" landed me on this: Change default socket buffer size under Windows
Final note
As you mentioned, the performance, amount etc can vary vastly due to the fact that you're using UDP. It is prone to data loss at benefit of speed. Distance between servers, drivers, NIC's (especially important, some NIC's have a limited hardware buffer that can cause these things) etc all impact the data you'll receive. Windows do a lot of auto-magic as well in these situations, make sure you tune your Linux machine to the same parameters. A UDP packet consists not only of the ammount of data you send.. but all the parameters in the headers before it (in the IP packet, for instance TTL, Fragmentation, ECN, etc.).
For instance, you can tune how much memory your UDP stack can eat under certain loads, to find out your lower threshold (UDP won't bother checking RAM usage), pressure threshold (memory management under load) and the max value UDP sockets can use per socket.
sudo sysctl net.ipv4.udp_mem
Here's a good article on UDP tuning from ESnet:
https://fasterdata.es.net/network-tuning/udp-tuning/
Beyond this, you're tweaking to your grave. Most likely, your problem can be solved by redesigning your code. Because unless you're actually pushing 1-10GB/s from your network, the kernel should be able to handle it assuming you process the packets fast enough, rather than piling them up in a buffer.