Is the Flask Caching filesystem cache shared across processes?

Is the Flask Caching filesystem cache shared across processes? - python

Let us assume I use Flask with the filesystem cache in combination with uWSGI or gunicorn, either of them starting multiple processes or workers. Do all these processes share the same cache? Or asked differently, do the functions and parameters always evaluate to the same cache key regardless of process pid, thread state etc.?
For instance, consider the following minimal example:
import time
from flask import Flask, jsonify
from flask_caching import Cache
app = Flask(__name__)
cache = Cache(app, config={
'CACHE_TYPE': 'filesystem',
'CACHE_DIR': 'my_cache_directory',
'CACHE_DEFAULT_TIMEOUT': 3600,
})
#cache.memoize()
def compute(param):
time.sleep(5)
return param + 1
#app.route('/')
#app.route('/<int:param>')
def main(param=41):
expensive = compute(param)
return jsonify({"Hello expensive": expensive})
if __name__ == '__main__':
app.run()
Will www.example.com/41 just take 5 seconds once and then (for 3600 seconds) be available instantly regardless of the uWSGI or gunicorn workers?

If i run it locally on my machine, the cache is stable across different processes as well as even different restarts of the entire server.

I am finding that the flask-caching filesystem cache is kept per-worker. Simple example I just tried in my app (4 workers):
#app.route("/product/<id>", methods=["GET"])
#app.cache.cached()
def product(id):
product = Product.from_id(id)
app.pp.pprint(product.get_data())
I'm re-loading the page that calls that view and I see the pprint output 4 times in the console, then no more after that.

Related

How to make flask application run every 5 minutes?

I have this application that works how I want to but now I want it to grab live data from some virtual machines every 5 minutes. In the code below I have it set to renew every ten seconds just to see if it works but nothing is happening. I am using time.sleep. What else am I missing?
import time
from flask import Flask, render_template
from testapi import grab_cpu
app = Flask(__name__)
starttime = time.time()
while True:
machines =["build05", "build06", "build07","build08", "build09", "build10", "build11", "build12","build14","build15", "winbuild10","winbuild11", "winbuild12", "winbuild13", "wbuild14", "wbuild15", "winbuild16", "winbuild17", "winbuild18"]
cpu_percentage =[grab_cpu("build05"), grab_cpu("build06"),grab_cpu("build07"),
grab_cpu("build08"), grab_cpu("build09"), grab_cpu("build10"), grab_cpu("build11"), grab_cpu("build12"), grab_cpu("build13"), grab_cpu("build14"), grab_cpu("build15"), grab_cpu("winbuild10"), grab_cpu("winbuild11"), grab_cpu("winbuild12"), grab_cpu("winbuild14"), grab_cpu("winbuild15"), grab_cpu("winbuild16"), grab_cpu("winbuild17"), grab_cpu("winbuild18")]
#app.route("/") # this sets the route to this page
def home():
return render_template('testdoc.html', len = len(machines), machines = machines, cpu_percentage = cpu_percentage)
app.run(use_reloader = True, debug = True)
time.sleep(10.0 - ((time.time() - starttime) % 10.0))
Edit (this is an update with the suggestions, it's still not working as i'd like):
Edit 2 more info: I have one file with a function, grab_cpu, that does an api call to a vm and returns the percentage of usage. I have another file called test doc.html which just displays the html. From these responses i'm guessing I need to use some javascript and something with sockets. Can someone please drop a link to point me in the right direction?
import time
from flask import Flask, render_template
from testapi import grab_cpu
app = Flask(__name__)
#app.route("/") # this sets the route to this page
def home():
starttime = time.time()
while True:
machines =["build05", "build06", "build07","build08", "build09", "build10", "build11", "build12","build14","build15", "winbuild10","winbuild11", "winbuild12", "winbuild13", "wbuild14", "wbuild15", "winbuild16", "winbuild17", "winbuild18"]
cpu_percentage =[grab_cpu("build05"), grab_cpu("build06"),grab_cpu("build07"),
grab_cpu("build08"), grab_cpu("build09"), grab_cpu("build10"), grab_cpu("build11"), grab_cpu("build12"), grab_cpu("build13"), grab_cpu("build14"), grab_cpu("build15"), grab_cpu("winbuild10"), grab_cpu("winbuild11"), grab_cpu("winbuild12"), grab_cpu("winbuild14"), grab_cpu("winbuild15"), grab_cpu("winbuild16"), grab_cpu("winbuild17"), grab_cpu("winbuild18")]
return render_template('testdoc.html', len = len(machines), machines = machines, cpu_percentage = cpu_percentage)
time.sleep(10.0 - ((time.time() - starttime) % 10.0))
app.run(use_reloader = True, debug = True)
Here is the page:
Thank you.

I recommend against using Flask to handle the scheduling. The intended flow for Flask requests is:
Receive an HTTP request from the browser
Generate a response as quickly as possible (ideally in a few milliseconds, but almost always less than a few seconds)
Return the response to the browser
The intention of the code above seems to be to use Flask to push updates down to the browser, but Flask can only respond to incoming requests, not force the browser to change.
For the use case you're describing, a simpler solution would be to handle the refresh logic in the browser.
For a very rudimentary example, put this into your testdoc.html template:
<script type="text/javascript">
setTimeout(function () {
location.reload();
}, 10 * 1000);
</script>
That will reload the page every 10 seconds, which will generate a new request to your Flask server and display updated information in your browser.
If you want to get fancier and avoid reloading the entire page, you can use JavaScript XmlHttpRequests or the more modern Fetch API to update specific elements of the page asynchronously.
Also, here's a suggested simplification of the Python code:
from flask import Flask, render_template
from testapi import grab_cpu
app = Flask(__name__)
build_machines = list(map(lambda i: 'build%02d' % i, range(5, 16)))
win_build_machines = list(map(lambda i: 'winbuild%02d' % i, range(10, 19)))
machines = build_machines + win_build_machines
# Handle HTTP GET requests to / route
#app.route("/", methods=['GET'])
def home():
cpu_percentage = list(map(lambda b: grab_cpu(b), machines))
return render_template('testdoc.html', len = len(machines), machines = machines, cpu_percentage = cpu_percentage)
app.run(use_reloader = True, debug = True)

Your code needs some work, Flask views are not meant to be declared inside a loop. I would suggest the following:
Remove the Flask view from inside the while loop.
Declare the server outside the loop too.
Write your code inside a function.
Run the while loop just calling the function to grab the information from your sources.
Based on what I think is just an example made on the run, I will also make some assumptions about your requirement:
This is not exactly your code.
Your code works perfectly, but this implementation is lacking.
Your project has some level of complexity.
You need to show this data somewhere else.
Base on these (and perhaps additional conditions), I would say you have two ways to achieve in an effective way what you need:
(Not recommended) Create a Cron Job and run everything in a dedicated script.
(My favorite) Encapsulate the logic of your script inside a Flask API call, a method, or a function, and declare it as a Celery task, scheduled to run every 5 seconds, updating your database, and use a view with some JS reactivity to show the data in realtime.

issue where Flask app request handling is choked when a regex.serach is happening?

Setup :
Language i am using is Python
I am running a flask app with threaded =True and inside the flask app when an endpoint is hit, it starts a thread and returns a thread started successfully with status code 200.
Inside the thread, there is re.Search happening, which takes 30-40 seconds(worst case) and once the thread is completed it hits a callback URL completing one request.
Ideally, the Flask app is able to handle concurrent requests
Issue:
When the re.search is happening inside the thread, the flask app is not accepting concurrent requests. I am assuming some kind of thread locking is happening and unable to figure out.
Question :
is it ok to do threading(using multi processing) inside flask app which has "threaded = True"?
When regex is happening does it do any thread locking?
Code snippet: hit_callback does a post request to another api which is not needed for this issue.
import threading
from flask import *
import re
app = Flask(__name__)
#app.route(/text)
def temp():
text = request.json["text"]
def extract_company(text)
attrib_list = ["llc","ltd"] # realone is 700 in length
entity_attrib = r"((\s|^)" + r"(.)?(\s|$)|(\s|^)".join(attrib_list) + r"(\s|$))"
raw_client = re.search("(.*(?:" + entity_attrib + "))", text, re.I)
hit_callback(raw_client)
extract_thread = threading.Thread(target=extract_company, )
extract_thread.start()
return jsonify({"Response": True}), 200
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=4557, threaded=True)

Please read up on the GIL - basically, Python can only execute ONE piece of code at the same time - even if in threads. So your re-search runs and blocks all other threads until run to completion.
Solutions: Use Gunicorn etc. pp to run multiple flask processes - do not attempt to do everything in one flask process.
To add something: Also your design is problematic - 40 seconds or so for a http answer is way to long. A better design would have at least two services: a web server and a search service. The first would register a search and give back an id. Your search service would communicate asynchronously with the web service and return resullt +id when the result is ready. Your clients could pull the web service until they get a result.

Flask REST-API within an alpine docker container memory leak

I'm trying to find a kind of memory leak in my flask REST-API for a few days now without any relevant progress.
I have a flask REST-API using a mysql database (packages like SQLAlchemy, connexion and marshmallow). It is available via a docker container which have a base image from alpine:latest.
The main problem i have: with every request to the REST-API the memory usage of the docker container increases and the memory is not released. The API do not cache the results.
Here is the code from the server.py (the main program of the RESt-API):
"""
Main module of the server file
"""
# 3rd party moudles
# local modules
import config
# Get the application instance
connex_app = config.connex_app
# Read the swagger.yml file to configure the endpoints
connex_app.add_api("swagger_2.0.yml")
# create a URL route in our application for "/"
#connex_app.route("/")
def home():
return None
if __name__ == "__main__":
connex_app.run(debug=True)
and the config file:
import os
import connexion
from flask_cors import CORS
from flask_marshmallow import Marshmallow
from flask_sqlalchemy import SQLAlchemy
from memory_profiler import memory_usage
basedir = os.path.abspath(os.path.dirname(__file__))
# Create the Connexion application instance
connex_app = connexion.App(__name__, specification_dir=basedir)
# Get the underlying Flask app instance
app = connex_app.app
CORS(app)
# Configure the SQLAlchemy part of the app instance
app.config['SQLALCHEMY_ECHO'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = "mysql://root:somepassword#someHostId/sponge"
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
#app.after_request
def add_header(response):
#response.cache_control.no_store = True
if 'Cache-Control' not in response.headers:
response.headers['Cache-Control'] = 'max-age=0'
print(memory_usage(-1, interval=.2, timeout=1), "after request")
return response
# Create the SQLAlchemy db instance
db = SQLAlchemy(app)
# Initialize Marshmallow
ma = Marshmallow(app)
An example for an endpoint you can see here:
from flask import abort
import models
def read(disease_name=None):
"""
This function responds to a request for /sponge/dataset/?disease_name={disease_name}
with one matching entry to the specifed diesease_name
:param disease_name: name of the dataset to find (if not given, all available datasets will be shown)
:return: dataset matching ID
"""
if disease_name is None:
# Create the list of people from our data
data = models.Dataset.query \
.all()
else:
# Get the dataset requested
data = models.Dataset.query \
.filter(models.Dataset.disease_name.like("%" + disease_name + "%")) \
.all()
# Did we find a dataset?
if len(data) > 0:
# Serialize the data for the response
return models.DatasetSchema(many=True).dump(data).data
else:
abort(404, 'No data found for name: {disease_name}'.format(disease_name=disease_name))
I tried to find the memory leak within the code with the memory_profiler tool, but since the same behavior (increasing memory usage of the docker container at each request) can be observed at each REST-API endpoint.
Can anyone explain what is happening oder have an idea of how i can fix the caching problem.

Problem is fixed. Actually it was no problem. The docker stats memory usage increases due to implementation of python. If the rest-api request is multiple GB big, then python allocates a certain percentage of that used memory and do not free it immidiatly. So the peaks at 500 GB were after a relly great answer. I added to the API endpoints a fixed limit and and hint for the user if he exceeds this limit he should download the whole database as a zip fornat and work locally with it.

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?

Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Does bottle handle requests with no concurrency?

At first, I think Bottle will handle requests concurrently, so I wrote test code bellow:
import json
from bottle import Bottle, run, request, response, get, post
import time
app = Bottle()
NUMBERS = 0
#app.get("/test")
def test():
id = request.query.get('id', 0)
global NUMBERS
n = NUMBERS
time.sleep(0.2)
n += 1
NUMBERS = n
return id
#app.get("/status")
def status():
return json.dumps({"numbers": NUMBERS})
run(app, host='0.0.0.0', port=8000)
Then I use jmeter to request /test url with 10 threads loops 20 times.
After that, /status gives me {"numbers": 200}, which seems like that bottle does not handle requests concurrently.
Did I misunderstand anything?
UPDATE
I did another test, I think it can prove that bottle deal with requests one by one(with no concurrency). I did a little change to the test function:
#app.get("/test")
def test():
t1 = time.time()
time.sleep(5)
t2 = time.time()
return {"t1": t1, "t2": t2}
And when I access /test twice in a browser I get:
{
"t2": 1415941221.631711,
"t1": 1415941216.631761
}
{
"t2": 1415941226.643427,
"t1": 1415941221.643508
}

Concurrency isn't a function of your web framework -- it's a function of the web server you use to serve it. Since Bottle is WSGI-compliant, it means you can serve Bottle apps through any WSGI server:
wsgiref (reference server in the Python stdlib) will give you no concurrency.
CherryPy dispatches through a thread pool (number of simultaneous requests = number of threads it's using).
nginx + uwsgi gives you multiprocess dispatch and multiple threads per process.
Gevent gives you lightweight coroutines that, in your use case, can easily achieve C10K+ with very little CPU load (on Linux -- on Windows it can only handle 1024 simultaneous open sockets) if your app is mostly IO- or database-bound.
The latter two can serve massive numbers of simultaneous connections.
According to http://bottlepy.org/docs/dev/api.html , when given no specific instructions, bottle.run uses wsgiref to serve your application, which explains why it's only handling one request at once.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is the Flask Caching filesystem cache shared across processes? - python

If i run it locally on my machine, the cache is stable across different processes as well as even different restarts of the entire server.

Related

How to make flask application run every 5 minutes?

issue where Flask app request handling is choked when a regex.serach is happening?

Flask REST-API within an alpine docker container memory leak

Background tasks in flask

Does bottle handle requests with no concurrency?

Categories

Resources