How to speed up JSON for a flask application? - python

I'm currently implementing a webapp in flask. It's an app that does a visualization of data gathered. Each page or section will always have a GET call and each call will return a JSON response which then will be processed into displayed data.
The current problem is that some calculation is needed before the function could return a JSON response. This causes some of the response to arrive slower than others and thus making the page loads a bit slow. How do I properly deal with this? I have read into caching in flask and wonder whether that is what the app need right now. I have also researched a bit into implementing a Redis-Queue. I'm not really sure which is the correct method.
Any help or insights would be appreciated. Thanks in advance

Here are some ideas:
If the source data that you use for your calculations is not likely to change often then you can run the calculations once and save the results. Then you can serve the results directly for as long as the source data remains the same.
You can save the results back to your database, or as you suggest, you can save them in a faster storage such as Redis. Based on your description I suspect the big performance gain will be in not doing calculations so often, the difference between storing in a regular database vs. Redis or similar is probably not significant in comparison.
If the data changes often then you will still need to do calculations frequently. For such a case an option that you have is to push the calculations to the client. Your Flask app can just return the source data in JSON format and then the browser can do the processing on the user's computer.
I hope this helps.

You can use
copy_current_request_context and Redis, Thread
It is helpful when you need long time to make JSON response.
The first request maybe slow, but next request will faster.
Example
from datetime import timedelta, datetime
from threading import Thread
from . import dbb, redis_client
from flask import Blueprint, request, jsonify, flash, after_this_request, copy_current_request_context, \
current_app, send_from_directory
from .models import Shop, Customers
def save_customer_json_to_redis(request):
response_json = {
"have_customer": False,
"status": False,
"anythingelse": None,
"message":"False, you have to check..."
}
#print(request.data)
headers = request.headers
Authorization = headers['Authorization']
token = Authorization.replace("Bearer", "")
phone = request.args.get('phone')
if phone is not None and phone != "":
print('token', token, "phone", phone)
now = datetime.utcnow() + timedelta(hours=7)
shop = Shop.query.filter(Shop.private_token == token, Shop.ended_date > now, Shop.shop_active == True).first()
customer = Customers.query.filter_by(shop_id=shop.id, phone=phone).first()
if customer:
redis_name = f'{shop.id}_api_v2_customer_phone_{phone}_customer_id_{customer.id}'
print(redis_name)
response_json["anythingelse"] = ...# do want you want, it need long time to do
response_json["status"] = True
response_json["message"] = "Successful"
redis_client.set(redis_name, json.dumps(response_json)) #Save JSON to Redis
#app.route('/api/v2/customer', methods=['GET'])
def api_customer():
#copy_current_request_context
def do_update_customer_to_redis():# this function to save JSON you want to response next time to Redis
save_customer_json_to_redis(request)
Thread(target=do_update_customer_to_redis).start()
response_json = {
"have_customer": False,
"status": False,
"anythingelse": {},
"message": "False, you have to check..."
}
#print(request.data)
headers = request.headers
Authorization = headers['Authorization']
token = Authorization.replace("Bearer", "")
phone = request.args.get('phone')
if phone is not None and phone != "":
print('token', token, "phone", phone)
now = datetime.utcnow() + timedelta(hours=7)
shop = Shop.query.filter(Shop.private_token == token, Shop.ended_date > now,Shop.shop_active == True).first()
customer = Customers.query.filter_by(shop_id=shop.id, phone=phone).first()
if customer:
redis_name = f'{shop.id}_api_v2_customer_phone_{phone}_customer_id_{customer.id}'
print(redis_name)
try:
response_json = json.loads(redis_client.get(redis_name)) # if have json from app
print("json.loads(redis_client.get(redis_name))")
except Exception as e:
print("json.loads(redis_client.get(redis_name))", e)
#do any thing you want to response json
response_json["anythingelse"] = ...# do want you want, it need long time to do
response_json["message"]= ...#do want you want
#redis_client.set(redis_name, json.dumps(response_json))
response_json["status"] = True
response_json["message"] = "Successful"
return jsonify(response_json)
In the init.py
from flask import Flask
from flask_cors import CORS
from flask_mail import Mail
from flask_sqlalchemy import SQLAlchemy
from redis import Redis
# init SQLAlchemy so we can use it later in our models
dbb = SQLAlchemy(session_options={"autoflush": False})
redis_client = Redis(
host='localhost',
port='6379',
password='your_redis_password'
)
def create_app():
app = Flask(__name__)
...........

Related

Python Flask types of showing responses because response is too big

This is my first time using Flask to build an API.
I have a CSV file that I'm reading and storing into a dataframe and then being converted to a dictionary in order to become a response for a Flask API using flask_restful to show a JSON output.
The CSV file has around 230000 rows, so when I try to connect to the endpoint it takes to much time to load the data. The data is only fully loaded if I use parameters to filter by date, so the dataframe can show less rows.
If I try to use Postman to test the endpoint I can see that the full load is 120mb of data (only if I change the settings to show more than 50mb), so Power BI or any other tool is not showing all the data as well.
This is the response I get with full data from the endpoint without parameters (it takes to much time to load):
So my question is, is there any other way to show the JSON output from Flask and connect in other tools with this amount of rows?
Endpoint with full data: https://masterdimensions.herokuapp.com/dimensioncalendar
Endpoint with parameters: https://masterdimensions.herokuapp.com/dimensioncalendar?start_dt=2020-01-01&end_dt=2021-12-31
My application:
# app.py
from flask import Flask
from flask_restful import Resource, Api, reqparse
import pandas as pd
import os
import pathlib
# Launching an app flask
app = Flask(__name__)
api = Api(app, default_mediatype='application/json')
# API class for Index page
class Index(Resource):
def get(self):
index_data = [
{
"href": "/dimensioncalendar",
"name": "Dimension Calendar",
"description": "Dimension Calendar endpoint with date attributes"
}
]
return (index_data), 200
# API class for DimensionCalendar page
class DimensionCalendar(Resource):
def get(self):
# Initialize reparse from Flask
parser = reqparse.RequestParser()
# Add arguments
parser.add_argument('start_dt')
parser.add_argument('end_dt')
parser.add_argument('lang')
args = parser.parse_args() # parse arguments to dictionary
# Read CSV file
data = pd.read_csv(os.path.join(os.path.sep, pathlib.Path(__file__).parent.resolve(), "static", "csv", "DimensionCalendar.csv"))
# If parameters on API is not null then filter dataframe by start date and end date
if args['start_dt'] is not None and args['end_dt'] is not None:
data = data.loc[(data['DT_SHORTDATE'] >= str(args['start_dt'])) & (data['DT_SHORTDATE'] <= str(args['end_dt']))]
elif args['start_dt'] is not None and args['end_dt'] is None:
data = data.loc[data['DT_SHORTDATE'] == str(args['start_dt'])]
if args['lang'] == 'pt-BR':
data.columns = ['DT_DATA', 'NR_ANO', 'NR_MES', 'NR_DIA', 'NM_MES', 'NM_DIADASEMANA', 'NR_MESANO', 'NM_MESANO', 'NM_DATALONGA', 'NR_DIADASEMANA', 'NR_DIADOANO', 'NR_SEMESTRE', 'NM_SEMESTRE', 'NR_QUADRIMESTRE', 'NR_TRIMESTRE', 'NM_TRIMESTRE', 'NR_BIMESTRE', 'NM_BIMESTRE', 'NR_ANORELATIVO', 'NR_MESRELATIVO', 'NR_DIARELATIVO', 'NR_DATA', 'NR_ANOMES', 'NR_ANOTRIMESTRE', 'IC_DIAUTIL', 'IC_FINALDESEMANA', 'IC_FERIADO', 'IC_ANOATUAL', 'IC_MESATUAL', 'IC_DATAATUAL', 'IC_ANOPASSADO', 'IC_MESPASSADO', 'IC_ONTEM', 'LINDATA', 'LINORIGEM']
# Convert dataframe to dictionary
data = data.to_dict()
# Return data and 200 OK code
return (data), 200
# Case the page is not found
#app.errorhandler(404)
def page_not_found(e):
return "<h1>404</h1><p>The resource could not be found.</p>", 404
# '/dimensioncalendar' is our entry point for DimensionCalendar
api.add_resource(DimensionCalendar, '/dimensioncalendar')
# '/Index' is our entry point for Index
api.add_resource(Index, '/')
# Run our Flask app
if __name__ == '__main__':
app.run()

Flask and python does not allow recursive function Twitch API

I have a function that generates all video URLs of some user on Twitch
def get_videos(cursor=None): # functiont to retrieve all vod URLs possible, kinda slow for now
params_get_videos = {('user_id', userid_var)} # params for request.get
if cursor is not None: # check if there was a cursor value passed
params_get_videos = list(params_get_videos) + list({('after', cursor)}) # add another param for pagination
url_get_videos = 'https://api.twitch.tv/helix/videos' # URL to request data
response_get_videos = session.get(url_get_videos, params=params_get_videos, headers=headers) # get the data
reponse_get_videos_json = response_get_videos.json() # parse and interpret data
file = open(MyUsername+" videos.txt", "w")
for i in range(0, len(reponse_get_videos_json['data'])): # parse and interpret data
file.write(reponse_get_videos_json['data'][i]['url'] +'\n') # parse and interpret data
if 'cursor' in reponse_get_videos_json['pagination']: # check if there are more pages
get_videos(reponse_get_videos_json['pagination']['cursor']) # iterate the function until there are no more pages
this works perfectly fine on its own (with other functions) but whenever i try to call it from a dummy flask server like this
from flask import Flask
from flask import render_template
from flask import request
from main import *
app = Flask(__name__)
#app.route('/')
def hello_world():
return render_template("hello.html")
#app.route('/magic', methods=['POST', 'GET'])
def get_username():
username = request.form.get('username')
get_videos()
return ("Success")
It no longer acts recursive and only prints first 20 values. What am I doing wrong?
I'm new also so I don't have enough reputation to make a comment, I have to post an answer. I'm confused by your get_username method, once you grab the username from the form you're not sending it anywhere? It looks like on your get_videos method you are maybe hard coding the username by saving a variable called MyUsername outside of this method?
What you should do imo is send the username you grab from the form, do your get_videos method like this.
get_videos(username)
And change your other method to this
def get_videos(MyUsername, cursor=None):

Catch http-status code in Flask

I lately started using Flask in one of my projects to provide data via a simple route. So far I return a json file containing the data and some other information. When running my Flask app I see the status code of this request in terminal. I would like to return the status code as a part of my final json file. Is it possible to catch the same code I see in terminal?
Some simple might look like this
from flask import Flask
from flask import jsonify
app = Flask(__name__)
#app.route('/test/<int1>/<int2>/')
def test(int1,int2):
int_sum = int1 + int2
return jsonify({"result":int_sum})
if __name__ == '__main__':
app.run(port=8082)
And in terminal I get:
You are who set the response code (by default 200 on success response), you can't catch this value before the response is emited. But if you know the result of your operation you can put it on the final json.
#app.route('/test/<int1>/<int2>/')
def test(int1, int2):
int_sum = int1 + int2
response_data = {
"result": int_sum,
"sucess": True,
"status_code": 200
}
# make sure the status_code on your json and on the return match.
return jsonify(response_data), 200 # <- the status_code displayed code on console
By the way if you access this endpoint from a request library, on the response object you can find the status_code and all the http refered data plus the json you need.
Python requests library example
import requests
req = requests.get('your.domain/test/3/3')
print req.url # your.domain/test/3/3
print req.status_code # 200
print req.json() # {u'result': 6, u'status_code: 200, u'success': True}
You can send HTTP status code as follow:
#app.route('/test')
def test():
status_code = 200
return jsonify({'name': 'Nabin Khadka'}, status_code) # Notice second element of the return tuple(return)
This way you can control what status code to return to the client (typically to web browser.)

How to send asynchronous request using flask to an endpoint with small timeout session?

I am new to backend development using Flask and I am getting stuck on a confusing problem. I am trying to send data to an endpoint whose Timeout session is 3000 ms. My code for the server is as follows.
from flask import Flask, request
from gitStat import getGitStat
import requests
app = Flask(__name__)
#app.route('/', methods=['POST', 'GET'])
def handle_data():
params = request.args["text"].split(" ")
user_repo_path = "https://api.github.com/users/{}/repos".format(params[0])
first_response = requests.get(
user_repo_path, auth=(
'Your Github Username', 'Your Github Password'))
repo_commits_path = "https://api.github.com/repos/{}/{}/commits".format(params[
0], params[1])
second_response = requests.get(
repo_commits_path, auth=(
'Your Github Username', 'Your Github Password'))
if(first_response.status_code == 200 and params[2] < params[3] and second_response.status_code == 200):
values = getGitStat(params[0], params[1], params[2], params[3])
response_url = request.args["response_url"]
payload = {
"response_type": "in_channel",
"text": "Github Repo Commits Status",
"attachments": [
{
"text": values
}
]
}
headers = {'Content-Type': 'application/json',
'User-Agent': 'Mozilla /5.0 (Compatible MSIE 9.0;Windows NT 6.1;WOW64; Trident/5.0)'}
response = requests.post(response_url, json = test, headers = headers)
else:
return "Please enter correct details. Check if the username or reponame exists, and/or Starting date < End date. \
Also, date format should be MM-DD"
My server code takes arguments from the request it receives and from that request's JSON object, it extracts parameters for the code. This code executes getGitStats function and sends the JSON payload as defined in the server code to the endpoint it received the request from.
My problem is that I need to send a text confirmation to the endpoint that I have received the request and data will be coming soon. The problem here is that the function, getGitStats take more than a minute to fetch and parse data from Github API.
I searched the internet and found that I need to make this call asynchronous and I can do that using queues. I tried to understand the application using RQ and RabbitMQ but I neither understood nor I was able to convert my code to an asynchronous format. Can somebody give me pointers or any idea on how can I achieve this?
Thank you.
------------Update------------
Threading was able to solve this problem. Create another thread and call the function in that thread.
If you are trying to have a async task in request, you have to decide whether you want the result/progress or not.
You don't care about the result of the task or if there where any errors while processing the task. You can just process this in a Thread and forget about the result.
If you just want to know about success/fail for the task. You can store the state of the task in Database and query it when needed.
If you want progress of the tasks like (20% done ... 40% done). You have to use something more sophisticated like celery, rabbitMQ.
For you i think option #2 fits better. You can create a simple table GitTasks.
GitTasks
------------------------
Id(PK) | Status | Result
------------------------
1 |Processing| -
2 | Done | <your result>
3 | Error | <error details>
You have to create a simple Threaded object in python to processing.
import threading
class AsyncGitTask(threading.Thread):
def __init__(self, task_id, params):
self.task_id = task_id
self.params = params
def run():
## Do processing
## store the result in table for id = self.task_id
You have to create another endpoint to query the status of you task.
#app.route('/TaskStatus/<int:task_id>')
def task_status(task_id):
## query GitTask table and accordingly return data
Now that we have build all the components we have to put them together in your main request.
from Flask import url_for
#app.route('/', methods=['POST', 'GET'])
def handle_data():
.....
## create a new row in GitTasks table, and use its PK(id) as task_id
task_id = create_new_task_row()
async_task = AsyncGitTask(task_id=task_id, params=params)
async_task.start()
task_status_url = url_for('task_status', task_id=task_id)
## This is request you can return text saying
## that "Your task is being processed. To see the progress
## go to <task_status_url>"

Modifying MongoDB data in flask

I am running a flask server which fetched data from mongo DB.
from flask import Flask
from flask import render_template
from pymongo import Connection
import json
from bson import json_util
from bson.json_util import dumps
app = Flask(__name__)
MONGODB_HOST = 'localhost'
MONGODB_PORT = 27017
DBS_NAME = 'donorschoose'
COLLECTION_NAME = 'projects'
FIELDS = {'school_state': True, 'resource_type': True, 'poverty_level': True, 'date_posted': True, 'total_donations': True, '_id': False}
#app.route("/")
def index():
return render_template("index.html")
#app.route("/donorschoose/projects")
def donorschoose_projects():
connection = Connection(MONGODB_HOST, MONGODB_PORT)
collection = connection[DBS_NAME][COLLECTION_NAME]
projects = collection.find(fields=FIELDS)
json_projects = []
for project in projects:
json_projects.append(project)
json_projects = json.dumps(json_projects, default=json_util.default)
connection.disconnect()
return json_projects
if __name__ == "__main__":
app.run(host='0.0.0.0',port=5000,debug=True)
I got this code from the net and implemented it successfully. And feeding data to d3 apps via this app. My question is: Is it possible to modify data right here in the flask environment using python? (in the code that i have pasted above?). I only ask because python would allow a greater deal of flexibility than d3 as my expertise in d3 is less. To the problem: The 'poverty-level' column will have 4 fixed values i.e. low, medium, high, unkown.
My aim is to calculate the percentage of high poverty level
i.e. for the column'poverty_level' -> count(val =high)/count(all rows)
Essentially i need just one column to display my metric and i had a tough time doing this is d3. Any d3 or python level help will be much appreciated :)
Thank you.
First you need to completely iterate the Cursor returned by find():
projects = list(collection.find(fields=FIELDS))
Then calculate the total number and the number of high-poverty projects:
high_poverty_count = len(p for p in projects if p['poverty_level'] == 'high')
high_poverty_ratio = float(high_poverty_count) / len(projects)
Then I'd add this together with the list of all projects, together as a document:
result = {'high_poverty_ratio': high_poverty_ratio,
'projects': projects}
return json.dumps(result, default=json_util.default)
Also, note that your application has two severe problems:
First, you use "Connection", which is obsolete. Do this:
from pymongo import MongoClient
client = MongoClient(MONGODB_HOST, MONGODB_PORT)
Second, you create a new client and disconnect it for each request. This is extremely slow. Instead, create the client when your application begins, and never disconnect it:
client = MongoClient(MONGODB_HOST, MONGODB_PORT)
#app.route("/donorschoose/projects")
def donorschoose_projects():
collection = client[DBS_NAME][COLLECTION_NAME]
# ... etc ....
return json.dumps(result, default=json_util.default)

Categories

Resources