I'm using python's ftplib to transfer lots and lots of data (~100 files X 2GB) across a local network to an FTP server. This code is running on Ubuntu. Here is my call (self is my FtpClient object, which is a wrapper around ftplib client):
# Store file.
self.ftpClient.storbinary('STOR ' + destination, fileHandle, blocksize = self.blockSize, callback = self.__UpdateFileTransferProgress)
My question is, how do I choose an optimal block size? My understanding is that the optimal block size is dependent on a number of things, not the least of which are connection speed and latency. My code will be running on many different networks with different speeds and varying amounts of congestion throughout the day. Ideally, I would like to compute the optimal block size at run time.
Would the optimal FTP transfer block size be the same as the optimal TCP window size? If this is true, and TCP window scaling is turned on, is there a way to get the optimal TCP window size from the kernel? How/when does the linux kernel determine optimal window size? Ideally I could ask the linux kernel for the optimal block size, so as to avoid reinventing the wheel.
this is an interesting question and I had to dive in a bit deeper ;)
Anyway, here is a good example how to determine the MTU: http://erlerobotics.gitbooks.io/erle-robotics-python-gitbook-free/content/udp_and_tcp/udp_fragmentation.html
But, you should also think about the following: the MTU is something that is a local phenomena and maybe regards only a part of your local network. What you think about is the Path MTU, the minimal MTU over the complete transport path. http://en.wikipedia.org/wiki/Path_MTU_Discovery
So, you'll have to know every MTU of every involved component. This can be a problem, for example if you're using Jumbo Frames and a switch not, the switch have to split the frames. I already had the problem that a switch did not understand jumbo frames and dropped the frames.
Now the most interesting question: the optimal blocksize. A lot of python functions take arguments like blocksize or chunksize. But they don't address the blocksize of the underlying transport protocol. The blocksize defines a reading buffer that will contain the data to be send/read. The standard size in ftplib is 8K (8192 bytes). So, adjusting the blocksize should not really affect the speed of the transfer.
Controlling the MTU of the underlying transport protocol is something that will be handled by the operation system and its kernel.
Finally some words about ftp. ftp is an old dinosaur which is easy to setup up and use but also is not always the best method to transfer files. Especially if you transfer a lot of small files. I don't know exactly your use case, so thinking about other transfer protocol alternatives like rsync or bbcp could make sense. The later seems to increase the copy speed drastically. You really should have a look at http://moo.nac.uci.edu/~hjm/HOWTO_move_data.html
just my two cents...
Related
I am developing a python application where a drone and a computer communicate over local network (wifi). My need is to stream the drone's camera to OpenCV-python on the computer with the lowest possible latency at the highest possible resolution.
Thus far I have been trying rather naive approaches over TCP that give okay-ish results, I get something like 0.1s or 0.2s latency for VGA format. It has a point for some use cases as it enables lossless transmission, but since the most common scenario is to aggressively control the drone in real time from the stream, I am aiming for something of much lower latency and hopefully higher resolution.
My advisor has recommended using WebRTC. I have done some research on the matter, found the aiortc library that implements WebRTC in python, but I am unsure this is the way to go for my use case as it seems to be more geared toward web developers.
I am a bit lost I think. Could you highlight the advantages of WebRTC in my application if any, or point me toward solutions that are more relevant for my problem please?
Thanks in advance!
[1]
rtc communicate peer to peer, I think you knew that. And if u use local network U will not need STUN server or TURN to connect two devices. That is make more decrease latency and code shorter. I'm not work with drone but I think your method stream had latency < 0,2 is good.
fyi protocol campare
So I'm quite deep into this monitoring implementation, and I'm curious as to how to calculate the theoretical maximum it can handle.
I know python is not the most efficient language, and I'm honestly not too worried about missing a packet here or there - but how can I figure out how fast it's going?
My network isn't corporate large, but it can keep up with an nmap scan. (Or so it seems)
It matches Wireshark, so I'm curious of it's limitations on a network with thousands of computers.
The scapy documentation doesn't seem to get too far into it, but I admit I may have missed something
I create an async sniff with a callback that just throws the desired information into a hashtable/dictionary with the srcMac as a key, if that would affect anything.
In my case where I was sending 4 MB of file from Host A to Host B using python sockets, I was getting 2.3Gb/s speed. The Scapy speed was 100x slower than the network speed depending on what kind of operations are you doing after sniffing the packets. When I was doing too much operations on sniffed packets, I could capture only 15 of 1000 packets(4MB). After optimizing my code Maximum I could sniff was still less than 200 packets out of 1000. If you need more precise measurements feel free to tag me, happy to share knowledge.
I'm developing a program in Python that uses UDP to receive data from an FPGA (a data collector device). The speed is very high, about 54 MB/s at the highest setting, that's why we use a dedicated gigabit ethernet connection. My problem is: a lot of packages get lost. This is not a momentary problem, the packets come in for a long time, then there's a few seconds long pause, then everything seems fine again. The pause depends on the speed (faster communication, more lost).
I've tried setting buffers higher, but something seems to be missing. I've set self.sock_data.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF,2**28) to increase buffer size along with the matching kernel option: sysctl -w net.core.rmem_max=268435456.
Packages have an internal counter, so I know which one got lost (also, I use this to fix their order). An example: 11s of data lost, around 357168 packages. (I've checked, and it's not a multiple of an internal buffer size in either of my program or the FPGA's firmware). I'm watching the socket on a separate thread, and immediately put them into a Queue to save everything.
What else should I set or check?
Client:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
msg = b"X"
for i in range(1500):
s.sendto(msg,("<IP>",<PORT>))
Server:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
s.bind(("",>PORT>))
counter = 0
for i in range(1500):
s.recv(1)
counter += 1
I have two machines - the first one with Windows7 and the second one with Ubuntu 16.04.
Now the problem:
If I try to send 1500 UDP-packets (for example) from the client to the server, then:
Windows7 is Client and Ubuntu16.04 is server:
server only receives between 200 and 280 packets
Ubuntu16.04 is Client and Windows7 is server:
server receives all 1500 packets
My first question:
What is the reason for this? Are there any limitations on the OS?
Second question:
Is it possible to optimize the sockets in Python?
I know that it will be possible, that UDP-packages can get lost - but up to 4/5 of all packets?
edit:
Why this kind of question?
Imagine I have a big sensor-network... and one server. Each sensor-node should send his information to the server. The program on the server can only be programmed in an asynchronious way - the server is only able to read the data out of the socket at a specific time. Now I want to calculate how many sensor-nodes can send data via UDP-packets to the server during the period of time where the server is not able to read out his buffer. With the information how many different UDP-packets can be stored in the buffer, I can calculate how many sensor-nodes I can use...
Instead of writing a cluttered comment trail, here's a few cents to the problem.
As documented by redhat the default values for the different OS:es in this writing moment is:
Linux: 131071
Windows: No known limit
Solaris: 262144
FreeBSD, Darwin: 262144
AIX: 1048576
These values should correspond to the output of:
import socket
s = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
print(s.getsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF))
These numbers represents how many bytes can be held at any given moment in the socket receive buffer. The numbers can be increased at any given time at the cost of RAM being reserved for this buffer (or at least that's what I remember).
On Linux (And some BSD flavors), to increase the buffer you can use sysctl:
sudo sysctl -w net.core.rmem_max=425984
sudo sysctl -w net.core.rmem_default=425984
This sets the buffer to 416KB. You can most likely increase this to a few megabytes if buffering is something you see a lot of.
However, buffers usually indicate a problem because your machine should rarely have much in the buffer at all. It's a mechanism to handle sudden peaks and to serve as a tiny platter for your machine to store work load. If it gets to full, either you have a really slow code that needs to get quicker or you need to offload your server quite a bit. Because if the buffer fills up - no matter how big it is, eventually it will get full again.
Supposedly you can also increase the buffer size from Python via:
s.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 1024)
However, again, if your OS is capped at a certain roof - that will supersede any value you put in your python program.
tl;dr:
Every OS has limitations based on optimizations/performance reasons. Sockets, file handles (essentially any I/O operation) has them.
It's common, you should find a lot of information on it. All this information above was mostly found via a search on "linux udp recieve buffer".
Also, "windows increase udp buffer size" landed me on this: Change default socket buffer size under Windows
Final note
As you mentioned, the performance, amount etc can vary vastly due to the fact that you're using UDP. It is prone to data loss at benefit of speed. Distance between servers, drivers, NIC's (especially important, some NIC's have a limited hardware buffer that can cause these things) etc all impact the data you'll receive. Windows do a lot of auto-magic as well in these situations, make sure you tune your Linux machine to the same parameters. A UDP packet consists not only of the ammount of data you send.. but all the parameters in the headers before it (in the IP packet, for instance TTL, Fragmentation, ECN, etc.).
For instance, you can tune how much memory your UDP stack can eat under certain loads, to find out your lower threshold (UDP won't bother checking RAM usage), pressure threshold (memory management under load) and the max value UDP sockets can use per socket.
sudo sysctl net.ipv4.udp_mem
Here's a good article on UDP tuning from ESnet:
https://fasterdata.es.net/network-tuning/udp-tuning/
Beyond this, you're tweaking to your grave. Most likely, your problem can be solved by redesigning your code. Because unless you're actually pushing 1-10GB/s from your network, the kernel should be able to handle it assuming you process the packets fast enough, rather than piling them up in a buffer.
Update to original post: A colleague pointed out what I was doing wrong.
I'll give the explanation at the bottom of the post, as it might be helpful
for others.
I am trying to get a basic understanding of the limits on network performance
of python programs and have run into an anomaly. The code fragment
while 1:
sock.sendto("a",target)
sends UDP packets to a target machine, as fast as the host will send.
I measure a sending rate of just over 4000 packets per second, or 250 us
per packet. This seems slow, even for an interpreted language like python
(the program is running on a 2 GHz AMD opteron, Linux, python version 2.6.6).
I've seen much better performance in python for TCP, so I find this a bit weird.
If I run this in the background and run top, I find that python is using
just 25% of the cpu, suggesting that python may be artificially delaying
the transmission of UDP packets.
Has anyone else experienced anything similar? Does anyone know if python
does limit the rate of packet transmission, and if there is a way to turn
this off?
BTW, a similar C++ program can send well over 200,000 packets per second,
so it's not an intrinsic limit of the platform or OS.
So, it turns out I made a silly newbie mistake. I neglected to call gethostbyname
explicitly. Consequently, the target address in the sendto command contained
a symbolic name. This was triggering a name resolution every time a packet was
sent. After fixing this, I measure a maximum sending rate of about 120,000 p/s.
Much better.
You might want to post a more complete code sample so that others can repeat your benchmark. 250μs per loop iteration is too slow. Based on daily experience with optimizing Python, I would expect Python's interpreter overhead to be well below 1μs on a modern machine. In other words, if the C++ program is sending 200k packets per second, I would expect Python to be in the same order of magnitude of speed.
(In light of the above, the usual optimization suggestions such as moving the attribute lookup of sock.sendto out of the loop do not apply here because the slowness is coming from another source.)
A good first step to be to use strace to check what Python is actually doing. Is it a single-threaded program or a multithreaded application that might be losing time waiting on the GIL? Is sock a normal Python socket or is it part of a more elaborate API? Does the same happen when you directly call os.write on the socket's fileno?
Have you tried doing a connect() first, then using send() instead of sendto()? (UDP connect() just establishes the destination address, it doesn't actually make a "connection".) I'm rusty on this, but I believe Python does more interpretation on the address parameter than C sockets, which might be adding overhead.