I'm trying to find a kind of memory leak in my flask REST-API for a few days now without any relevant progress.
I have a flask REST-API using a mysql database (packages like SQLAlchemy, connexion and marshmallow). It is available via a docker container which have a base image from alpine:latest.
The main problem i have: with every request to the REST-API the memory usage of the docker container increases and the memory is not released. The API do not cache the results.
Here is the code from the server.py (the main program of the RESt-API):
"""
Main module of the server file
"""
# 3rd party moudles
# local modules
import config
# Get the application instance
connex_app = config.connex_app
# Read the swagger.yml file to configure the endpoints
connex_app.add_api("swagger_2.0.yml")
# create a URL route in our application for "/"
#connex_app.route("/")
def home():
return None
if __name__ == "__main__":
connex_app.run(debug=True)
and the config file:
import os
import connexion
from flask_cors import CORS
from flask_marshmallow import Marshmallow
from flask_sqlalchemy import SQLAlchemy
from memory_profiler import memory_usage
basedir = os.path.abspath(os.path.dirname(__file__))
# Create the Connexion application instance
connex_app = connexion.App(__name__, specification_dir=basedir)
# Get the underlying Flask app instance
app = connex_app.app
CORS(app)
# Configure the SQLAlchemy part of the app instance
app.config['SQLALCHEMY_ECHO'] = False
app.config['SQLALCHEMY_DATABASE_URI'] = "mysql://root:somepassword#someHostId/sponge"
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
#app.after_request
def add_header(response):
#response.cache_control.no_store = True
if 'Cache-Control' not in response.headers:
response.headers['Cache-Control'] = 'max-age=0'
print(memory_usage(-1, interval=.2, timeout=1), "after request")
return response
# Create the SQLAlchemy db instance
db = SQLAlchemy(app)
# Initialize Marshmallow
ma = Marshmallow(app)
An example for an endpoint you can see here:
from flask import abort
import models
def read(disease_name=None):
"""
This function responds to a request for /sponge/dataset/?disease_name={disease_name}
with one matching entry to the specifed diesease_name
:param disease_name: name of the dataset to find (if not given, all available datasets will be shown)
:return: dataset matching ID
"""
if disease_name is None:
# Create the list of people from our data
data = models.Dataset.query \
.all()
else:
# Get the dataset requested
data = models.Dataset.query \
.filter(models.Dataset.disease_name.like("%" + disease_name + "%")) \
.all()
# Did we find a dataset?
if len(data) > 0:
# Serialize the data for the response
return models.DatasetSchema(many=True).dump(data).data
else:
abort(404, 'No data found for name: {disease_name}'.format(disease_name=disease_name))
I tried to find the memory leak within the code with the memory_profiler tool, but since the same behavior (increasing memory usage of the docker container at each request) can be observed at each REST-API endpoint.
Can anyone explain what is happening oder have an idea of how i can fix the caching problem.
Problem is fixed. Actually it was no problem. The docker stats memory usage increases due to implementation of python. If the rest-api request is multiple GB big, then python allocates a certain percentage of that used memory and do not free it immidiatly. So the peaks at 500 GB were after a relly great answer. I added to the API endpoints a fixed limit and and hint for the user if he exceeds this limit he should download the whole database as a zip fornat and work locally with it.
Related
I am using docker for a data-science project. The docker image contains a python code that reads a file(image file), processes it, and generates a vector of floats as an output.
I am using flask micro-framework for a user to interact with the container. Here is the server-side code (running inside the container). It is of course buggy!
from flask import Flask, request
from flask_restful import Api
app = Flask(__name__)
api = Api(app)
#app.route('/process_image', methods=['GET', 'POST'])
def process_image():
params = request.json
with open(params["file_name"]. "r") as f:
# the above access to a file in host machine from docker container,
# will definetly lead to an access error.
# do some processing
pass
Here is the client-side code
import requests
file_name = "some-path/image.jpg" # on the host machine
requests.get('http://0.0.0.0:5000/process_image/', json={"file_name": file_name})
What is the right way to pass a file via requests to the container? I am looking at a solution where the client-side code is minimal and the user must be able to send a file stored at any location in the host-machine.
I am new to docker as well as web-programming, I would appreciate any feedback/suggestion. Thanks in advance!
I chose to use a server-side session management with Flask using Flask-Session.
I store the data using filesystem and as expected, these files are stored under a /flask_session folder in my config directory.
Here is how I set this up in my __init__.py
# __init__.py
from flask_session import Session
[...]
app.config['SESSION_TYPE'] = 'filesystem'
app.config['SECRET_KEY'] = config.SECRET_KEY
sess = Session()
sess.init_app(app)
As expected, session files generated & stored under /flask_session
▾ flask_session/
1695e5cbf9b4edbbbb82a8ef1fad89ae
192761f7ce8e3cbf3ca11665133b7794
2029240f6d1128be89ddc32729463129
...
Question is: Are these files automatically removed by flask_session after a specific amount of time (ie. as the session stored client-side)? If yes, is it possible to decrease/increase this timing?
As Danila Ganchar commented, using PERMANENT_SESSION_LIFETIME allows to control the session expiration time.
Flask-Session use the same builtin config than Flask itself (related to session). From Flask-Session doc:
The following configuration values are builtin configuration values
within Flask itself that are related to session. They are all
understood by Flask-Session, for example, you should use
PERMANENT_SESSION_LIFETIME to control your session lifetime.
Example:
# __init__.py
from flask_session import Session
from datetime import timedelta
app.config['SESSION_PERMANENT'] = True
app.config['SESSION_TYPE'] = 'filesystem'
app.config['PERMANENT_SESSION_LIFETIME'] = timedelta(hours=5)
# The maximum number of items the session stores
# before it starts deleting some, default 500
app.config['SESSION_FILE_THRESHOLD'] = 100
app.config['SECRET_KEY'] = config.SECRET_KEY
sess = Session()
sess.init_app(app)
I've been thinking about the factory pattern for WSGI applications, as recommended by the Flask docs, for a while now. Specifically about those functions usually being shown to make use of objects that have been created at module import time, like db in the example, as opposed to having been created in the factory function.
Would the factory function ideally create _everything_ anew or wouldn't that make sense for objects like the db engine?
(I'm thinking cleaner separation and better testability here.)
Here is some code, where I'm trying to accomplish creating all needed objects for the wsgi app. in its factory function.
# factories.py
def create_app(config, engine=None):
"""Create WSGI application to be called by WSGI server. Full factory function
that takes care to deliver entirely new WSGI application instance with all
new member objects like database engine etc.
Args:
config (dict): Dict to update the wsgi app. configuration.
engine (SQLAlchemy engine): Database engine to use.
"""
# flask app
app = Flask(__name__) # should be package name instead of __name__ acc. to docs
app.config.update(config)
# create blueprint
blueprint = ViewRegistrationBlueprint('blueprint', __name__, )
# request teardown behaviour, always called, even on unhandled exceptions
# register views for blueprint
from myapp.views import hello_world
# dynamically scrapes module and registers methods as views
blueprint.register_routes(hello_world)
# create engine and request scoped session for current configuration and store
# on wsgi app
if (engine is not None):
# delivers transactional scope when called
RequestScopedSession = scoped_session(
sessionmaker(bind=engine),
scopefunc=flask_request_scope_func
)
def request_scoped_session_teardown(*args, **kwargs):
"""Function to register and call by the framework when a request is finished
and the session should be removed.
"""
# wrapped in try/finally to make sure no error collapses call stack here
try:
RequestScopedSession.remove() # rollback all pending changes, close and return conn. to pool
except Exception as exception_instance:
msg = "Error removing session in request teardown.\n{}"
msg = msg.format(exception_instance)
logger.error(msg)
finally:
pass
app.config["session"] = RequestScopedSession
blueprint.teardown_request(request_scoped_session_teardown)
# register blueprint
app.register_blueprint(blueprint)
return app
def create_engine(config):
"""Create database engine from configuration
Args:
config (dict): Dict used to assemble the connection string.
"""
# connection_string
connection_string = "{connector}://{user}:{password}#{host}/{schema}"
connection_string = connection_string.format(**config)
# database engine
return sqlalchemy_create_engine(
connection_string,
pool_size=10,
pool_recycle=7200,
max_overflow=0,
echo=True
)
# wsgi.py (served by WSGI server)
from myapp.factories import create_app
from myapp.factories import create_engine
from myapp.configuration.config import Config
config = Config()
engine = create_engine(config.database_config)
app = create_app(config.application_config, engine=engine)
# conftest.py
from myapp.factories import create_app
from myapp.factories import create_engine
from myapp.configuration.config import Config
#pytest.fixture
def app():
config = TestConfig()
engine = create_engine(config.database_config)
app = create_app(config.application_config, engine=engine)
with app.app_context():
yield app
As you also tagged this with sanic I'll answer with that background. Sanic is async and thus relies on an event loop. An event loop is a resource and thus must not be shared between tests but created anew for each one. Hence, the database connection etc also need to be created for each test and cannot be re-used as it is async and depends on the event loop. Even without the async nature it would be cleanest to create db connections per test because they have state (like temp tables).
So I ended up with a create_app() that creates everything which allows me to create an arbitrary number of independent apps in a test run. (To be honest there are some global resources like registered event listeners, but tearing those down is easy with py.test factories.) For testability I'd try to avoid global resources that are created on module import. Although I've seen differently in big and successful projects.
That's not really a definite answer, I know...
I have this route in a flask app:
#APP.route('/comparisons', methods=['POST'])
def save_comparison():
if 'comparisons' not in session or not session['comparisons']:
session['comparisons'] = []
entity_id = request.form.get('entity_id')
session['comparisons'].append(entity_id)
session.modified = True
return entity_id
APP.secret_key = 'speakfriend'
This adds entity_ids to session['comparisons'] as expected. However, when I refresh the page, the session object does not have a 'comparisons' property anymore, so the list of comparisons is empty. What am I missing?
Update:
I left out what I didn't know was the important information. The vue app also makes calls to a flask api, which sets its own session. The SECRET_KEYs were different. So, when there was an api call between webserver calls (or visa versa) the session from one application would be replaced by the session from the other. Neither was mutually intelligible (different SECRET_KEYs). Since these are always deployed together using docker-compose the solution was to use a common an env variable to pass the same secret to both.
I have created a flask application and am hosting it on a Ubuntu server. I know that my apache config is correct since I am able serve the example flask application. However, this one seems to be giving me trouble. The code is below:
from flask import Flask, render_template, request, url_for
import pickle
import engine
import config
# Initialize the Flask application
app = Flask(__name__)
model = pickle.load(open(config.MODEL_PATH, "rb"))
collection = engine.Collection(config.DATABASE_PATH)
search_engine = engine.SearchEngine(model, collection)
#app.route('/')
def form():
return render_template('index.html')
#app.route('/search/', methods=['POST'])
def search():
query = request.form['query']
results = search_engine.query(query)
return render_template('form_action.html', query=query, results=results)
#app.route('/retrieve/<int:item_number>', methods=['GET'])
def retrieve(item_number):
item = engine.Product(item_number, collection.open_document(str(item_number)))
return render_template('document.html', item=item)
if __name__ == '__main__':
app.run()
When running the file directly through the python interpreter, it works fine and I can access. However, when starting through apache and wsgi, I get no response from the server. It just hangs when making a request and nothing is available on the logs.
I suspect that my issue may have something to do with the three objects I initialize at the beginning of the program. Perhaps it gets stuck running those?
Update: I have tried commenting out certain parts of the code to see what is causing it to stall. Tracing it out to the engine module, importing NearestNeighbors seems to be causing the issue.
import sqlite3
import config
from sklearn.neighbors import NearestNeighbors
from preprocessor import preprocess_document
from collections import namedtuple