Flask and best way to reduce mysql queries, maybe celery?

Flask and best way to reduce mysql queries, maybe celery? - python

After i researched for three days and played with redis and celery i m no longer sure what the right solution to my problem is.
Its a simple problem. I have a simple flask app returning the data of a mysql query. But i dont want to query the database for every request made, as they might be 100 requests in a second. I wanna setup a daemon that queries independently my database every five seconds and if someone makes a request it should return the data of the previous request and when those 5 secs pass it will return the data from the latest query. All users recieve the same data. Is CELERY the solution?
i researched for three days.

The easiest way is to use Flask-Caching]
Just set a cache timeout of 5 seconds on your view and it will return a cached view response containing the result of the query made the first time and for all other query in the next 5 secs. When time is out, the first request will regenerate the cache by doing the query and all the flow of your view.
If your view function use arguments, use memoization instead of cache decorator to let caching use your arguments to generate the cache. For exemple, if you want to return a page details and you don't use memoization, you will return the same page detail for all your user, no matter of the id / slug in arguments.
The documentation of Flask-Caching explain everything better than me

Related

Preserving value of variables between subsequent requests in Python Django

I have a Django application to log the character sequences from an autocomplete interface. Each time a call is made to the server, the parameters are added to a list and when the user submits the query, the list is written to a file.
Since I am not sure how to preserve the list between subsequent calls, I relied on a global variable say query_logger. Now I can preserve the list in the following way:
def log_query(query, completions, submitted=False):
global query_logger
if query_logger is None:
query_logger = list()
query_logger.append(query, completions, submitted)
if submitted:
query_logger = None
While this hack works for a single client sending requests I don't think this is a stable solution when requests come from multiple clients. My question is two-fold:
What is the order of execution of requests: Do they follow first come first serve (especially if the requests are asynchronous)?
What is a better approach for doing this?

If your django server is single-threaded, then yes, it will respond to requests as it receives them. If you're using wsgi or another proxy, that becomes more complicated. Regardless, I think you'll want to use a db to store the information.
I encountered a similar problem and ended up using sqlite to store the data temporarily, because that's super simple and easy to manage. You'll want to use IP addresses or create a unique ID passed as a url parameter in order to identify clients on subsequent requests.
I also scheduled a daily task (using cron on ubuntu) that goes through and removes any incomplete requests that haven't been completed (excluding those started in the last hour).

You must not use global variables for this.
The proper answer is to use the session - that is exactly what it is for.

Simplest (bad) solution would be to have a global variable. Which means you need some in memory location or a db to store this info

Logging a snapshot of online users for Django website (postgresql backend, nginx webserver)

In a Django + postgresql website of mine, I need to publicly show all is online at a point in time (it's a social website). How do I do this? For instance, can there be a way to enumerate all logged in users who hit my nginx webserver in the previous 10 mins? Something like that could work. I'm a beginner and fishing for a viable solution at the moment.
Currently to accomplish this, I store sessions to the database, using an external library to make sessions enumeratable. This allows me to query how many unique users are online at a point in time.
But this scheme creates a lot of needless DB traffic. As a result, logging and pruning logs has become ineffective. Moreover pgFouine shows me that session related DB calls are the biggest performance bottleneck my website currently has.
There's a proposed solution here, but it uses the database.

Use django's cache framework to save the result of the db query to memory. That way you don't need to do the expensive database query for every page render.
from django.core.cache import cache
def count_current_users():
users = cache.get('users')
if users is None:
# last count has timed out
users = do_expensive_db_query()
cache.set('users', users, timeout=500)
return users
https://docs.djangoproject.com/en/1.10/topics/cache/#basic-usage
You can also use Template fragment caching and write a custom template tag that only runs the db query if the cache is stale. This will cache the result for 500 seconds.
{% cache 500 logged_in_users %}
{% expensive_query_db_for_logged_in_users %}
{% endcache %}
If you want your user count to be more real time, you probably have to bypass django's cache framework, and communicate directly with Redis.
Store each logged in user as a key with a set time to live. Getting a list of currently active keys from Redis would be much cheaper than the equivalent query to a sql database. It can also be implemented with just a few lines of python code.

If you're using django-user-sessions, the Session model has a last_activity field.
You may be able to do something like:
from user_sessions import Session
from datetime import datetime, timedelta
time_threshold = datetime.now() - timedelta(minutes=10)
qs = Session.objects.filter(last_activity__gt=time_threshold)
Though, the django-user-sessions does not have a database index on that field, which means if you have a very large number of users / sessions, that query may be hard and take a long time. A more complicated answer might involve creating a materialzed view (if you're using postgres) that refreshes via a cron job.

Currently, I'm trying a different approach. I've written a middleware where upon each request, the user's user_id is stored in a global sorted set. I do this only if they're authenticated, and I use the redis key-value store to ensure everything is blazingly fast.
The solution isn't live yet. I'm going to report more here and give a full answer once I'm done. I'll also consider other answers given here before marking the correct solution.

Django session race condition?

Summary: is there a race condition in Django sessions, and how do I prevent it?
I have an interesting problem with Django sessions which I think involves a race condition due to simultaneous requests by the same user.
It has occured in a script for uploading several files at the same time, being tested on localhost. I think this makes simultaneous requests from the same user quite likely (low response times due to localhost, long requests due to file uploads). It's still possible for normal requests outside localhost though, just less likely.
I am sending several (file post) requests that I think do this:
Django automatically retrieves the user's session*
Unrelated code that takes some time
Get request.session['files'] (a dictionary)
Append data about the current file to the dictionary
Store the dictionary in request.session['files'] again
Check that it has indeed been stored
More unrelated code that takes time
Django automatically stores the user's session
Here the check at 6. will indicate that the information has indeed been stored in the session. However, future requests indicate that sometimes it has, sometimes it has not.
What I think is happening is that two of these requests (A and B) happen simultaneously. Request A retrieves request.session['files'] first, then B does the same, changes it and stores it. When A finally finishes, it overwrites the session changes by B.
Two questions:
Is this indeed what is happening? Is the django development server multithreaded? On Google I'm finding pages about making it multithreaded, suggesting that by default it is not? Otherwise, what could be the problem?
If this race condition is the problem, what would be the best way to solve it? It's an inconvenience but not a security concern, so I'd already be happy if the chance can be decreased significantly.
Retrieving the session data right before the changes and saving it right after should decrease the chance significantly I think. However I have not found a way to do this for the request.session, only working around it using django.contrib.sessions.backends.db.SessionStore. However I figure that if I change it that way, Django will just overwrite it with request.session at the end of the request.
So I need a request.session.reload() and request.session.commit(), basically.

Yes, it is possible for a request to start before another has finished. You can check this by printing something at the start and end of a view and launch a bunch of request at the same time.
Indeed the session is loaded before the view and saved after the view. You can reload the session using request.session = engine.SessionStore(session_key) and save it using request.session.save().
Reloading the session however does discard any data added to the session before that (in the view or before it). Saving before reloading would destroy the point of loading late. A better way would be to save the files to the database as a new model.
The essence of the answer is in the discussion of Thomas' answer, which was incomplete so I've posted the complete answer.

Mark just nailed it, only minor addition from me, is how to load that session:
for key in session.keys(): # if you have potential removals
del session[key]
session.update(session.load())
session.modified = False # just making it clean
First line optional, you only need it if certain values might be removed meanwhile from the session.
Last line is optional, if you update the session, then it does not really matter.

That is true. You can confirm it by having a look at the django.contrib.sessions.middleware.SessionMiddleware.
Basically, request.session is loaded before request hits your view (in process_request), and it is updated in the session backend (if needed) after the response has left your view (in process_response).
If what I mean is unclear, you might want to have a look at the django documentation for Middleware.
The best way to solve the issue will depend on what you're trying to achieve with that information. I'll update my answer if you provide that information!

How to execute query only once in Django?

This is the following query that I have in my views.py:
#parse json function
parse = get_persistent_graph(request)
#guys
male_pic = parse.fql('SELECT name,uid,education FROM user WHERE sex="male" AND uid IN (SELECT uid2 FROM friend WHERE uid1 = me())')
This query currently takes approximately 10 seconds for me to load with about 800+ friends.
Is it possible to only query this once, update when needed and save to a variable to use instead of having to query every time the page is loaded/refreshed?
Some possible solutions that I can think of are:
Saving to the database - IMO that doesn't seem easily scaleable if I save every query from every user of this application
Some function I have no knowledge of
Improved and more efficient query request
Could someone please guide me to the path gate with the most efficient direction? Hoping I can lower 10 second requests to < 1 second! Thanks!

Use cache system for that (like memcache).

How to optimize for Django's paginator module

I have a question about how Django's paginator module works and how to optimize it. I have a list of around 300 items from information that I get from different APIs on the internet. I am using Django's paginator module to display the list for my visitors, 10 items at a time. The pagination does not work as well as I want it to. It seems that the paginator has to get all 300 items before pulling out the ten that need to be displayed each time the page is changed. For example, if there are 30 pages, then going to page 2 requires my website to query the APIs again, put all the information in a list, and then access the ten that the visitor's browser requests. I do not want to keep querying the APIs for the same information that I already have on each page turn.
Right now, my views has a function that looks at the get request and queries the APIs for information based on the query. Then it puts all that information into a list and passes it onto the template file. So, this function always loads whenever someone turns the page, resulting in querying the APIs again.
How should I fix this?
Thank you for your help.

The paginator will in this case need the full list in order to do its job.
My advice would be to update a cache of the feeds at a regular interval, and then use that cache as the input to the paginator module. Doing an intensive or length task on each and every request is always a bad idea. If not for the page load times the user will experience, think of the vulnerability of your server to attack.
You may want to check out Django's low level cache API which would allow you to store the feed result in a globally accessible place under a key, which you can later use to retrieve the cache and paginate for each page request.

ORM's do not load data until the row is selected:
query_results = Foo(id=1) # No sql executed yet, just stored.
foo = query_results[0] # now it fires
or
for foo in query_results:
foo.bar() # sql fires
If you are using a custom data source that is loading results on initialization then the pagination will not work as expected since all feeds will be fetched at once. You may want to subclass __getitem__ or __iter__ to do the actual fetch. It will then coincide with the way Django expects the results to be loaded.
Pagination is going to need to know how many results there are to do things like has_next(). In sql it is usually inexpensive to get a count(*) with an index. So you would also, want to have know how many results there would be (or maybe just estimate if it too expensive to know exactly).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.