How to speed up Flask response download speed - python

My frontend web app is calling my python Flask API on an endpoint that is cached and returns a JSON that is about 80,000 lines long and 1.7 megabytes.
It takes my UI about 7.5 seconds to download all of it.
It takes Chrome when calling the path directly about 6.5 seconds.
I know that I can split up this endpoint for performance gains, but out of curiosity, what are some other great options to improve the download speed of all this content?
Options I can think of so far:
1) compressing the content. But then I would have to decompress it on the frontend
2) Use something like gRPC
Further info:
My flask server is using WSGIServer from gevent and the endpoint code is below. PROJECT_DATA_CACHE is the already Jsonified data that is returned:
#blueprint_2.route("/projects")
def getInitialProjectsData():
global PROJECT_DATA_CACHE
if PROJECT_DATA_CACHE:
return PROJECT_DATA_CACHE
else:
LOGGER.debug('No cache available for GET /projects')
updateProjectsCache()
return PROJECT_DATA_CACHE

Maybe you could stream the file? I cannot see any way to transfer a file 80,000 lines long without some kind of download or wait.
This would be an opportunity to compress and decompress it, like you suggested. Definitely make sure that the JSON is minified.
One way to minify a JSON: https://www.npmjs.com/package/json-minify
Streaming a file:
https://blog.al4.co.nz/2016/01/streaming-json-with-flask/
It also really depends on the project, maybe you could get the users to download it completely?

The best way to do this is to break your JSON into chunks and stream it by passing a generator to the Response. You can then render the data as you receive it or show a progress bar displaying the percentage that is done. I have an example of how to stream data as a file is being downloaded from AWS s3 here. That should point you in the right direction.

Related

Nuclio Streaming Contents Support? (Docker setup - Python)

Is there support for streaming back a response in Nuclio? The workflow I'm trying to achieve is to have the UI request a large file from a Nuclio function running inside a docker container and having it stream back the large file.
For example this is how Flask supports streaming contents:
https://flask.palletsprojects.com/en/2.2.x/patterns/streaming/
I can't seem to find anywhere that mentions how to have Nuclio stream back large data/file.
I do see they mention some stuff about stream triggers, but I don't know if that'll help with streaming back the response:
https://nuclio.io/docs/latest/concepts/architecture/
https://nuclio.io/docs/latest/reference/triggers/
If there's no support, would my best bet be to stream the data to some 3rd party platform and have the UI download the data/file from there?

Running python script concurrently based on trigger

What would be best way to solve following problem with Python ?
I have real-time data stream coming to my object-oriented storage from user application (json files being stored into S3 storage in Amazon).
Upon receiving of each JSON file, I have to within certain time (1s in this instance) process data in the file and generate response that is send back to the user. This data is being processed by simple Python script.
My issue is, that the real-time data stream can at the same time generate even few hundreds JSON files from user applications that I need to run trough my Python script and I don't know how to approach this the best way.
I understand, that way to tackle this would be to use trigger based Lambdas that would execute job on the top of every file once uploaded from real-time stream in server-less environment, however this option is quite expensive compared to have single server instance running and somehow triggering jobs inside.
Any advice is appreciated. Thanks.
Serverless can actually be cheaper than using a server. It is much cheaper when there are periods of no activity because you don't need to pay for a server doing nothing.
The hardest part of your requirement is sending the response back to the user. If an object is uploaded to S3, there is no easy way to send back a response and it isn't even obvious who is the user that sent the file.
You could process the incoming file and then store a response back in a similarly-named object, and the client could then poll S3 for the response. That requires the upload to use a unique name that is somehow generated.
An alternative would be for the data to be sent to AWS API Gateway, which can trigger an AWS Lambda function and then directly return the response to the requester. No server required, automatic scaling.
If you wanted to use a server, then you'd need a way for the client to send a message to the server with a reference to the JSON object in S3 (or with the data itself). The server would need to be running a web server that can receive the request, perform the work and provide back the response.
Bottom line: Think about the data flow first, rather than the processing.

Transfer PDF files between servers in python

We have two servers (client-facing, and back-end database) between which we would like to transfer PDFs. Here's the data flow:
User requests PDF from website.
Site sends request to client-server.
Client server requests PDF from back-end server (different IP).
Back-end server sends PDF to client server.
Client server sends PDF to website.
1-3 and 5 are all good, but #4 is the issue.
We're currently using Flask requests for our API calls and can transfer text and .csv easily, but binary files such as PDF are not working.
And no, I don't have any code, so take it easy on me. Just looking for a suggestion from someone who may have come across this issue.
As you said you have no code, that's fine, but I can only give a few suggestions.
I'm not sure how you're sending your files, but I'm assuming that you're using pythons open function.
Make sure you are reading the file as bytes (e.g. open('<pdf-file>','rb'))
Cut the file up into chunks and send it as one file, this way it doesn't freeze or get stuck.
Try smaller PDF files, if this works definitely try suggestion #2.
Use threads, you can multitask with them.
Have a download server, this can save memory and potentially save bandwidth. Also it also lets you skip the PDF send back, from flask.
Don't use PDF files if you don't have to.
Use a library to do it for you.
Hope this helps!
I wanted to share my solution to this, but give credit to #CoolqB for the answer. The key was including 'rb' to properly read the binary file and including the codecs library. Here are the final code snippets:
Client request:
response = requests.get('https://www.mywebsite.com/_api_call')
Server response:
f = codecs.open(file_name, 'rb').read()
return f
Client handle:
with codecs.open(file_to_write, 'w') as f:
f.write(response.content)
f.close()
And all is right with the world.

tracking the current http transfer in python

Hello I would like to do a simple bandwidth test. The size of the default html page is 10 MB. The upload speed of the server is 5 Mbps, so under no circumstances 10 MB can be completed in 10 seconds. My plan is to start time interval in get request and after 10 seconds later I should be able to get either percentage or amount of total bytes sent to one particular client. So my question here is how do I get the percentage or amount of total bytes?
Approach 1
So the simplest solution would be to use this open source speedtest that will show in the browser the download speeds they are getting from your specific server. It will also show the upload speeds to your server. This particular solution uses php and perl on the server side. There is a related question Python speedtest.net, or equivalent that didn't have an answer in the lines you are looking for.
Approach 2
You can do the download test using javascript and ajax which let you monitor the download progress of a file from the web browser. For the upload portion you also use the same technique of monitoring a ajax upload.

Split up a massive WSDL file

I am working on interacting with the Netsuite Web Services layer with Python. Using suds to parse the WSDL takes close to two minutes. I was able to write a caching layer using redis that solves a bit of the loading headaches once the client has been parsed, but it still takes a ton of time the first time around.
>>> # Takes several minutes to load
>>> client = suds.Client(huge_four_mb_wsdl_file)
Since I only use a small subset of the services, is there a way to pull only those services from the WSDL and put them into my own smaller WSDL?
If you look at the v2013_2 version of the wsdl source you'll see that it's actually importing 38 other xsd files.
You can speed up your proccess by:
Creating a local wsdl that only imports some of the xsd files. (saves download/parse time)
Serialising a ready client using pickle and loading it on boot (saves parse time)
Also make sure you only have to create the client once in your application lifetime.

Categories

Resources