I am having problems with the python timing. Right now I am creating a game with both a server and client (using twisted). I am trying to measure the time it takes for a packet it come in so I can calculate speed correctly. I have tried using time.time() and time.clock() with the same problem. They are just not reliable.
So what I do is send a movement update roughly every 0.2 seconds. I measured the client sent the packet after 0.202649094381 seconds. The server takes the current time minus the time we last got a packet. However the server says it took 0.19164764274 seconds since it got the last packet. Which doesn't make sense, the server shouldn't be getting packets QUICKER than the client can send them, especially if you factor in native network latency. This seems to happen intermittently (client: 0.202243417922 vs server: 0.20065745487) How can python be that imprecise? I tried time.time() which is suppose to use the system clock and the time.clock() which is supposed to measure CPU clock but both are still not reliable enough. Maybe I am going about this wrong? Any help would be appreciative.
Related
I'm using a ZMQ pub-sub to pass messages from a C++ program to a Python script.
I'm passing around 5000 messages per second, and they vary in weight from 100 to 4500 bytes, with an average of about 300 bytes each. On average, I'm sending about 2.5 megabytes worth of data per second via this socket.
My monitoring script works like this:
Bind C++ program to the socket and leave it running.
Add a timestamp to each message going out of the C++ program.
Start python script, connect to socket, receive msg, get a timestamp and subtract the output timestamp, get the latency measurement.
Keep track of the average latency rolling over x messages.
My problem:
When I start the script, the socket latency hovers around 200 microseconds. That's within my expectations, and seems correct. However, after 1 minute avg latency is around 1.5 milliseconds, around 10-15ms after 4 minutes, and stabilizes at around 50ms after 10 minutes.
If I restart my Python script, latency goes back to 200 microseconds. I don't think the problem is message backlogging because if that was the case, the latency would keep increasing, and it would occasionally decrease too, instead of increasing linearly up to a certain threshold in a predictable way.
What could possibly be the issue?
Is there a way to either change ZMQ settings to try and improve this, or see if it's having some internal backlogging problem?
When you mentioned 50ms I immediately related to the Nagle algorithm.
However as you are saying that you use ZMQ on both sides and Nagle is disabled by default then I am sort of taken back.
Here are some ideas to debug:
Start the C++ process under strace with strace <binary> args and look at the logs, in particular the ones right after the socket creation. Check that TCP_NODELAY is set on the socket.
Start wireshark as admin and capture on the interface you are publishing. Follow the given TCP stream and try to check for blatant errors (eg retransmission) and patterns.
I'm developing a program in Python that uses UDP to receive data from an FPGA (a data collector device). The speed is very high, about 54 MB/s at the highest setting, that's why we use a dedicated gigabit ethernet connection. My problem is: a lot of packages get lost. This is not a momentary problem, the packets come in for a long time, then there's a few seconds long pause, then everything seems fine again. The pause depends on the speed (faster communication, more lost).
I've tried setting buffers higher, but something seems to be missing. I've set self.sock_data.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF,2**28) to increase buffer size along with the matching kernel option: sysctl -w net.core.rmem_max=268435456.
Packages have an internal counter, so I know which one got lost (also, I use this to fix their order). An example: 11s of data lost, around 357168 packages. (I've checked, and it's not a multiple of an internal buffer size in either of my program or the FPGA's firmware). I'm watching the socket on a separate thread, and immediately put them into a Queue to save everything.
What else should I set or check?
I've whipped up a proof-of-concept for a multiplayer game with a script to simulate a heavy load, but I'm not quite sure how to see how many streams of data it can handle.
For reference I'm sending ~1kb of data every 20ms for realtime multiplayer. I'm trying to determine what kind of load channels can take because I'm currently weighing channels vs nodejs or golang.
What's the best way to do this? Maybe average time between WebSocket received and broadcast as a proxy for server performance? So when that drops to some slower value performance is beginning to degrade?
I have a python socket reader to listen for incoming UDP packets from about 5000 clients every minute. As I started rolling it out it was working fine but now that I'm up to about 4000 clients I'm losing about 50% of the data coming in. The VM has plenty of memory and CPU so I assume it's something with my UDP socket listener on the server getting too much data at once. Via cron, every minute the clients send in this data:
site8385','10.255.255.255','1525215422','3.3.0-2','Jackel','00:15:65:20:39:10'
This is the socket reader portion of my listener script.
s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
port = 18000
s.bind(('', port))
while True:
# Establish connection with client.
d = s.recvfrom(1024)
Could it be the buffer size is too small? How do I determine the size of the packets coming in so I can adjust the 1024 value?
Every 60 seconds, you get a storm of ~5000 messages. You process them sequentially, and it takes "quite a bit" of time. So pretty quickly, one of your buffers gets full up and either your OS, your network card, or your router starts dropping packets. (Most likely it's the buffer your kernel sets aside for this particular socket, and the kernel is dropping the packets, but all of the other options are possible too.)
You could try increasing those buffers. That will give yourself a lot more "allowed lag time", so you can get farther behind before the kernel starts dropping packets. If you want to go down this road, the first step is setsockopt to raise the SO_RCVBUF value, but you really need to learn about all the issues that could be involved here.1
If you control the client code, you could also have the clients stagger their packets (e.g., just sleeping for random.random() * 55 before the send).
But it's probably better to try to actually service those packets as quickly as possible, and do the processing in the background.2
Trying to do this in-thread could be ideal, but it could also be very fiddly to get right. A simpler solution is to just a background thread, or a pool of them:
def process_msg(d):
# your actual processing code
with concurrent.futures.ThreadPoolExecutor(max_workers=12) as x:
while True:
d = s.recvfrom(1024)
x.submit(process_msg, d)
This may not actually help. If your processing is CPU-bound rather than I/O-bound, the background threads will just be fighting over the GIL with the main thread. If you're using Python 2.7 or 3.2 or something else old, even I/O-bound threads can interfere in some situations. But either way, there's an easy fix: Just change that ThreadPoolExecutor to a ProcessPoolExecutor (and maybe drop max_workers to 1 fewer than the number of cores you have, to make sure the receiving code can have a whole core to itself).
1. Redhat has a nice doc on Network Performance Tuning. It's written more from the sysadmin's point of view than the programmer's, and it expects you to either know, or know how to look up, a lot of background information—but it should be helpful if you're willing to do that. You may also want to try searching Server Fault rather than Stack Overflow if you want to go down this road.
2. Of course if there's more than a minute's work to be done to process each minute's messages, the queue will just get longer and longer, and eventually everything will fail catastrophically, which is worse than just dropping some packets until you catch up… But hopefully that's not an issue here.
Update to original post: A colleague pointed out what I was doing wrong.
I'll give the explanation at the bottom of the post, as it might be helpful
for others.
I am trying to get a basic understanding of the limits on network performance
of python programs and have run into an anomaly. The code fragment
while 1:
sock.sendto("a",target)
sends UDP packets to a target machine, as fast as the host will send.
I measure a sending rate of just over 4000 packets per second, or 250 us
per packet. This seems slow, even for an interpreted language like python
(the program is running on a 2 GHz AMD opteron, Linux, python version 2.6.6).
I've seen much better performance in python for TCP, so I find this a bit weird.
If I run this in the background and run top, I find that python is using
just 25% of the cpu, suggesting that python may be artificially delaying
the transmission of UDP packets.
Has anyone else experienced anything similar? Does anyone know if python
does limit the rate of packet transmission, and if there is a way to turn
this off?
BTW, a similar C++ program can send well over 200,000 packets per second,
so it's not an intrinsic limit of the platform or OS.
So, it turns out I made a silly newbie mistake. I neglected to call gethostbyname
explicitly. Consequently, the target address in the sendto command contained
a symbolic name. This was triggering a name resolution every time a packet was
sent. After fixing this, I measure a maximum sending rate of about 120,000 p/s.
Much better.
You might want to post a more complete code sample so that others can repeat your benchmark. 250μs per loop iteration is too slow. Based on daily experience with optimizing Python, I would expect Python's interpreter overhead to be well below 1μs on a modern machine. In other words, if the C++ program is sending 200k packets per second, I would expect Python to be in the same order of magnitude of speed.
(In light of the above, the usual optimization suggestions such as moving the attribute lookup of sock.sendto out of the loop do not apply here because the slowness is coming from another source.)
A good first step to be to use strace to check what Python is actually doing. Is it a single-threaded program or a multithreaded application that might be losing time waiting on the GIL? Is sock a normal Python socket or is it part of a more elaborate API? Does the same happen when you directly call os.write on the socket's fileno?
Have you tried doing a connect() first, then using send() instead of sendto()? (UDP connect() just establishes the destination address, it doesn't actually make a "connection".) I'm rusty on this, but I believe Python does more interpretation on the address parameter than C sockets, which might be adding overhead.