Profiling Django-Channels

Profiling Django-Channels - python

I've whipped up a proof-of-concept for a multiplayer game with a script to simulate a heavy load, but I'm not quite sure how to see how many streams of data it can handle.
For reference I'm sending ~1kb of data every 20ms for realtime multiplayer. I'm trying to determine what kind of load channels can take because I'm currently weighing channels vs nodejs or golang.
What's the best way to do this? Maybe average time between WebSocket received and broadcast as a proxy for server performance? So when that drops to some slower value performance is beginning to degrade?

Related

Efficient way of sending a large number of images from client to server

I'm working on a project where one client needs to take several snapshots from a camera (i.e. it's actually taking a short-duration video, hence a stream of frames), then send all images to a server which then performs some processing on these images and returns a result to the client.
Client and server are all running Python3 code.
The critical part is the image sending one.
Some background first, images are *640*480 jpeg* files. JPEG was chosen as a default choice, but lower quality encoding can be selected as well. They are captured in sequence by a camera. We have thus approximately ~600 frames to send. Frame size is around 110KiB.
The client consists of a Raspberry Pi 3 model B+. It sends the frames via wifi to a 5c server. Server and client both reside in the same LAN for the prototype version. But future deployments might be different, both in term of connectivity medium (wired or wireless) and area (LAN or metro).
I've implemented several solutions for this:
Using Python sockets on the server and the client: I'm either sending one frame directly after one frame capture, or I'm sending all images in sequence after the whole stream capture is done.
Using Gstreamer: I'm launching a GStreamer endpoint on my client and directly send the frames to the server as I stream. I'm capturing the stream on the server side with OpenCV compiled with GStreamer support, then save them to the disk.
Now, the issue I'm facing is that even if both solutions work 'well' (they get the 'final' job done, which is to send data to a server and receive a result based on some remote processing), I'm convinced there is a better way to send a large amount of data to a server, using either the Python socket library, or any other available tools.
All personal researches are done on that matter lead me to solutions similar to mine, using Python sockets, or were out of context (relying on other backends than pure Python).
By a better way, I assume:
A solution that saves bandwidth as much as possible.
A solution that sends all data as fast as possible.
For 1. I slightly modified my first solution to archive and compress all captured frames in a .tgz file that I send over to the server. It indeed decreases the bandwidth usage but also increases the time spent on both ends (due to the un/compression processes). It's obviously particularly true when the dataset is large.
For 2. GStreamer allowed me to have a negligible delay between the capture and the reception on my server. I have however no compression at all and for the reasons stated above, I cannot really use this library for further development.
How can I send a large number of images from one client to one server with minimal bandwidth usage and delay in Python?

If you want to transfer images as frames you can use some existing apps like MJPEG-Streamer which encode images from a webcam interface to JPG which reduces the image size. But if you need a more robust transfer with advanced encoding you can use some Linux tools like FFMPEG with streaming which is documented in here.
If you want lower implementation and control the whole stream by your code for modifications you can use web-based frameworks like Flask and transfer your images directly throw HTTP protocol. You can find a good example in here.
If you don't want to stream you can convert a whole set of images to a video encoded format like h264 and then transfer bytes throw the network. You can use opencv to do this.
There are also some good libraries written in python like pyffmpeg.

you can restream camera using ffmpeg over network so that client can read it both ways. it will reduce delays.

Is cleint side rendering of huge data with dc for interactive vizualization is suitable for analytics tools

To develop a interactive dashboard like https://dc-js.github.io/dc.js/vc/index.html with DC, is client side rendering on the whole dataset with crossfilter is best option for an analytics platform?
Some insight on analytics platform - a platform which can connect to any database tables residing anywhere on the world upon providing the database connection details and fetch the columns of a specific table for analysis(summation,average,minimum,maximum) ans then render them with charting library for visualization on client side.
I know there is a possible way with server side (NodeJS) rendering, one can leverage the crossfilter library, But as a backend the plan is to use Python.
Main concern- The interaction should be smooth on client side even when the data size is huge. Now the questions are:
Is it a good idea to fetch all the data in client side regardless of its size to make the app most interactive?
How much limitation does it impose on client memory stack?
Is there any better way to do it so we don't have to trade off between interactivity and client/server side processing of data?

Is it a good idea to fetch all the data in client side regardless of
its size to make the app most interactive?
No
How much limitation does it impose on client memory stack?
As big as your data. Chrome maxes out around 1GB of memory consumption, but is going to grind to a halt long before you get there for most use-cases. If you have more than ~10MB of data compressed, it is time to start thinking about non-client-side options.
Is there any better way to do it so we don't have to trade off between
interactivity and client/server side processing of data?
You'll need to think about advanced architectures that shares the load of data processing between the client and the server, which generally don't have a simple library implementation available. I put together http://lcadata.info (it's open source) as one example of how to do this. It's Lift/Scala/Spark on the back-end, but you could do something similar with Python.

Streaming data to clients

I have a program that sniffs network data and stores it in a database using pcapy (based on this). I need to make the data available in realtime over a network connection.
Right now when i run the program it will start a second thread for the sniffer and a Twisted server on the main thread, however i have no idea how to get clients to 'tap into' the sniffer that's running in the background.
The end result should be that a client enters an url and the connection will be kept open until the client disconnects (even when there's nothing to send), whenever the server has network activity the sniffer will sniff it and send it to the clients.
I'm a beginner with Python so i'm quite overwhelmed so if anyone could point me in the right direction it would be greatly appreciated.

Without more information (a simple code sample that doesn't work as you expect, perhaps) it's tough to give a thorough answer.
However, here are two pointers which may help you:
Twisted Pair, a (unfortunately very rudimentary and poorly documented) low-level/raw sockets networking library within Twisted itself, which may be able to implement the packet capture directly in a Twisted-friendly way, or
The recently-released Crochet, which will allow you to manage the background Twisted thread and its interactions with your pcapy-based capture code.

udp rate limits in python?

Update to original post: A colleague pointed out what I was doing wrong.
I'll give the explanation at the bottom of the post, as it might be helpful
for others.
I am trying to get a basic understanding of the limits on network performance
of python programs and have run into an anomaly. The code fragment
while 1:
sock.sendto("a",target)
sends UDP packets to a target machine, as fast as the host will send.
I measure a sending rate of just over 4000 packets per second, or 250 us
per packet. This seems slow, even for an interpreted language like python
(the program is running on a 2 GHz AMD opteron, Linux, python version 2.6.6).
I've seen much better performance in python for TCP, so I find this a bit weird.
If I run this in the background and run top, I find that python is using
just 25% of the cpu, suggesting that python may be artificially delaying
the transmission of UDP packets.
Has anyone else experienced anything similar? Does anyone know if python
does limit the rate of packet transmission, and if there is a way to turn
this off?
BTW, a similar C++ program can send well over 200,000 packets per second,
so it's not an intrinsic limit of the platform or OS.
So, it turns out I made a silly newbie mistake. I neglected to call gethostbyname
explicitly. Consequently, the target address in the sendto command contained
a symbolic name. This was triggering a name resolution every time a packet was
sent. After fixing this, I measure a maximum sending rate of about 120,000 p/s.
Much better.

You might want to post a more complete code sample so that others can repeat your benchmark. 250μs per loop iteration is too slow. Based on daily experience with optimizing Python, I would expect Python's interpreter overhead to be well below 1μs on a modern machine. In other words, if the C++ program is sending 200k packets per second, I would expect Python to be in the same order of magnitude of speed.
(In light of the above, the usual optimization suggestions such as moving the attribute lookup of sock.sendto out of the loop do not apply here because the slowness is coming from another source.)
A good first step to be to use strace to check what Python is actually doing. Is it a single-threaded program or a multithreaded application that might be losing time waiting on the GIL? Is sock a normal Python socket or is it part of a more elaborate API? Does the same happen when you directly call os.write on the socket's fileno?

Have you tried doing a connect() first, then using send() instead of sendto()? (UDP connect() just establishes the destination address, it doesn't actually make a "connection".) I'm rusty on this, but I believe Python does more interpretation on the address parameter than C sockets, which might be adding overhead.

How can I reduce memory usage of a Twisted server?

I wrote an audio broadcasting server with Python/Twisted. It works fine, but the usage of memory grows too fast! I think that's because some user's network might not be good enough to download the audio in time.
My audio server broadcast audio data to different listener's client, if some of them can't download the audio in time, that means, my server keep the audio data until listeners received. And what's more, my audio server is a broadcasting server, it receive audio data, and send them to different clients, I though Twisted copy those data in different buffer, even they are same audio piece.
I want to reduce the usage of memory usage, so I need to know when is the audio received by the client, so that I can decide when to discard some slow clients. But I have no idea how to achieve that with Twisted. Do anyone have idea?
And what else can I do to reduce usage of memory usage?
Thanks.
Victor Lin.

You didn't say, but I'm going to assume that you're using TCP. It would be hard to write a UDP-based system which had ever increasing memory because of clients who can't receive data as fast as you're trying to send it.
TCP has built-in flow control capabilities. If a receiver cannot read data as fast as you'd like to send it, this information will be made available to you and you can send more slowly. The way this works with the BSD socket API is that a send(2) call will block or will return 0 to indicate it cannot add any bytes to the send buffer. The way it works in Twisted is by a system called "producers and consumers". The gist of this system is that you register a producer with a consumer. The producer calls write on the consumer repeatedly. When the consumer cannot keep up, it calls pauseProducing on the producer. When the consumer is again ready for more data, it calls resumeProducing on the producer.
You can read about this system in more detail in the producer/consumer howto, part of Twisted's documentation.

Make sure you're using Python's garbage collector and then go through and delete variables you aren't using.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.