I have a question about how Django's paginator module works and how to optimize it. I have a list of around 300 items from information that I get from different APIs on the internet. I am using Django's paginator module to display the list for my visitors, 10 items at a time. The pagination does not work as well as I want it to. It seems that the paginator has to get all 300 items before pulling out the ten that need to be displayed each time the page is changed. For example, if there are 30 pages, then going to page 2 requires my website to query the APIs again, put all the information in a list, and then access the ten that the visitor's browser requests. I do not want to keep querying the APIs for the same information that I already have on each page turn.
Right now, my views has a function that looks at the get request and queries the APIs for information based on the query. Then it puts all that information into a list and passes it onto the template file. So, this function always loads whenever someone turns the page, resulting in querying the APIs again.
How should I fix this?
Thank you for your help.
The paginator will in this case need the full list in order to do its job.
My advice would be to update a cache of the feeds at a regular interval, and then use that cache as the input to the paginator module. Doing an intensive or length task on each and every request is always a bad idea. If not for the page load times the user will experience, think of the vulnerability of your server to attack.
You may want to check out Django's low level cache API which would allow you to store the feed result in a globally accessible place under a key, which you can later use to retrieve the cache and paginate for each page request.
ORM's do not load data until the row is selected:
query_results = Foo(id=1) # No sql executed yet, just stored.
foo = query_results[0] # now it fires
or
for foo in query_results:
foo.bar() # sql fires
If you are using a custom data source that is loading results on initialization then the pagination will not work as expected since all feeds will be fetched at once. You may want to subclass __getitem__ or __iter__ to do the actual fetch. It will then coincide with the way Django expects the results to be loaded.
Pagination is going to need to know how many results there are to do things like has_next(). In sql it is usually inexpensive to get a count(*) with an index. So you would also, want to have know how many results there would be (or maybe just estimate if it too expensive to know exactly).
Related
I am loving Django, and liking its implemented pagination functionality. However, I encounter issues when attempting to split a randomly ordered queryset across multiple pages.
For example, I have 100 elements in a queryset, and wish to display them 25 at a time. Providing the context object as a queryset ordered randomly (with the .order_by('?') specification), a completely new queryset is loaded into the context each time a new page is requested (page 2, 3, 4).
Explicitly stated: how do I (or can I) request a single queryset, randomly ordered, and display it across digestible pages?
I ran into the same problem recently where I didn't want to have to cache all the results.
What I did to resolve this was a combination of .extra() and raw().
This is what it looks like:
raw_sql = str(queryset.extra(select={'sort_key': 'random()'})
.order_by('sort_key').query)
set_seed = "SELECT setseed(%s);" % float(random_seed)
queryset = self.model.objects.raw(set_seed + raw_sql)
I believe this will only work for postgres. Doing a similar thing in MySQL is probably simpler since you can pass the seed directly to RAND(123).
The seed can be stored in the session/a cookie/your frontend in the case of ajax calls.
Warning - There is a better way
This is actually a very slow operation. I found this blog post describes a very good method both for retrieving a single result as well as sets of results.
In this case the seed will be used in your local random number generator.
i think this really good answer will be useful to you: How to have a "random" order on a set of objects with paging in Django?
basically he suggests to cache the list of objects and refer to it with a session variable, so it can be maintained between the pages (using django pagination).
or you could manually randomize the list and pass a seed to maintain the randomification for the same user!
The best way to achive this is to use some pagination APP like:
pure-pagination
django-pagination
django-infinite-pagination
Personally i use the first one, it integrates pretty well with Haystack.
""" EXAMPLE: (django-pagination) """
#paginate 10 results.
{% autopaginate my_query 10 %}
Very similar to this question, except that the answer is not suitable.
I populate a table from a datastore query, then there is a link allowing the user to delete a specific row. Clicking the link goes to a url that deletes the row from the datastore then redirects back to the table.
Changes more often than not aren't shown in the table until reloading again.
Easy solution is to redirect to another page, that uses a javascript redirect to add a delay of a couple of seconds. Other alternative is to send details back to the page like action=delete&key=### and then make sure that item is missed from the table. That's a pain though.
The answer is with ancestor queries.
https://cloud.google.com/appengine/docs/python/datastore/queries#Python_Ancestor_queries
Create the entities with a parent. When one of the entities is deleted, you can run an ancestor query for your table list view which will have strong consistency when data is changed.
Example ancestor query:
tom = Person(key_name='Tom')
photo_query = Photo.all()
photo_query.ancestor(tom)
With the datastore, unless you can use ancestors, you cant guarantee when the indexes will be updated, only the entity itself (for getting by key later) by doing a put without async. Best is a combination of your suggestion where the client takes into account its action to patch the ui, plus maybe using memcache to remember recent actions and patch queries server-side before returning to client.
Here is a different approach. Use Javascript and AJAX. When the user clicks a link, you do two things:
Use Javascript/jQuery to remove the row from the DOM, and
Send an AJAX call to the server to do the appropriate datastore modifications.
It makes for a nice user experience because you are not reloading the page at all.
You might want to consider that there is always room for displaying outdated info in the table: for example displaying the table simultaneously in 2 different windows/tabs, then in one of them deletion is performed, the other will still display a delete link which will cause a 404 if followed.
With this in mind I'd 1st focus on managing the expectations (the user should know that the page may occasionally display outdated info) and then on the user's ability to get an outdated page in sync (refresh button?). Which might make the issue moot.
The delay-based "solutions" are bound to fail sooner or later in race condition scenarios, I wouldn't bother with the extra complexity. Especially for the document where the deletion is done: that's exactly where the user knows that the info is outdated (for free) and would be likely inclined to refresh until the recent change becomes visible.
I'm using Django to create a website for a project. The user fills out a form, then I run some queries with this data and display the results on another page. Currently it's a two page site.
I want to warn the user if their query result data is very large. Say if a user ends up getting 1000 rows in the results table, I want to warn the user that queries of this size might take a long time to load. I imagine that between the form page and the results page, I could make a popup textbox that displays the warning. I could have this box show if the query object size is greater than 1000.
Does Django have a method for me implementing this? How can I get this textbox to appear before the result page template is shown?
Yes, query object has the method like this. It is simply:
query.count()
No, I don't think django has a function that will do this. You could easily do this, if you wanted to, using django and javascript though.
Loading a site with 1000 results really isn't that many. If the number of results is affecting performance, paginate them.
I think it might be a bit cleaner to just load the results page, directly, with either:
paginated results
no results and have an ajax request fetch results after the page is loaded, so the page doesn't lag while loading all results
What will your user think of an intermediarary popup? I belive to maximize their expereince, load the page in the fastest least intrusive way possible
I want to load info from another site (this part is done), but i am doing this every time the page is loaded and that wont do. So i was thinking of having a variable in a table of settings like 'last checked bbc site' and when the page loads it would check if its been long enough since last check to check again. Is there anything silly about doing it that way?
Also do i absolutely have to use tables to store 1 off variables like this setting?
I think there are 2 options that would work for you, besides creating a entity in the datastore to keep track of "last visited time".
One way is to just check the external page periodically, using the cron api as described by jldupont.
The second way is to store the last visited time in memcache. Although memcache is not permanent, it doesn't have to be if you are only storing last refresh times. If your entry in memcache were to disappear for some reason, the worst that would happen would be that you would fetch the page again, and update memcache with the current date/time.
The first way would be best if you want to check the external page at regular intervals. The second way might be better if you want to check the external page only when a user clicks on your page, and you haven't fetched that page yourself in the recent past. With this method, you aren't wasting resources fetching the external page unless someone is actually looking for data related to it.
You could also use Scheduled Tasks.
Also, you don't absolutely need to use the Datastore for configuration parameters: you could have this in a script / config file.
If you want some handler on your GAE app (including one for a scheduled task, reception of messages, web page visits, etc) to store some new information in such a way that some handler in the future can recover that information, then GAE's storage is the only good general way (memcache could expire from under you, for example). Not sure what you mean by "tables" (?!), but guessing that you actually mean GAE's storage the answer is "yes". (Under very specific circumstances you might want to put that data to some different place on the network, such as your visitor's browser e.g. via cookies, or an Amazon storage instance, etc, but it does not appear to me that those specific circumstances are appliable to your use case).
Suppose I have a simple view which needs to parse data from an external website.
Right now it looks something like this:
def index(request):
source = urllib2.urlopen(EXTERNAL_WEBSITE_URL)
bs = BeautifulSoup.BeautifulSoup(source.read())
finalList = [] # do whatever with bs to populate the list
return render_to_response('someTemplate.html', {'finalList': finalList})
First of all, is this an acceptable use?
Obviously, this is not good performance-wise. The external website page is pretty big, and I am only extracting a small part of it. I thought of two solutions:
Do all of this asynchronously. Load the rest of the page, populate with data once I get it. But I don't even know where to start. I'm just starting with Django and never done anything async up until now.
I don't care if this data is updated every 2-3 minutes, so caching is a good solution as well (also saves me the extra round-trips). How would I go about caching this data?
First, don't optimize prematurely. Get this to work.
Then, add enough logging to see what the performance problems (if any) really are.
You may find that end-user's PC is the slowest part; getting data from another site may, actually, be remarkably fast when you do not fetch .JS libraries and .CSS and artwork and the render then entire thing in a browser.
Once you're absolutely sure that the fetch of the remote content really IS a problem. Really. Then you have to do the following.
Write a "crontab" script that does the remote fetch form time to time.
Design a place to cache the remote results. Database or file system, pick one.
Update your Django app to get the data from the cache (database or filesystem) instead of the remote URL.
Only after you have absolute proof that the urllib2 read of the remote site is the bottleneck.
Caching with django is pretty easy,
from django.core.cache import cache
key = 'some-key'
data = cache.get(key)
if data is None:
# soupify the page and what not
cache.set(data, key, 60*60*8)
return render_to_response ...
return render_to_response
To answer your questions, you can do this asynchronously, but then you would have to use something like django cron to update the cache ever so often. On the other hand you can write this as a standalone python script, replace the cache imported from django with memcache and it would work the same way. It would reduce some of the performance issues your site could have, and as long as you know the cache key, you can retrieve the data from the cache.
Like Jarret said I would read django's caching docs and memcache's docs for more information.
Django has robust, built-in support for caching views: http://docs.djangoproject.com/en/dev/topics/cache/#topics-cache.
It offers solutions for caching entire views (such as in your case), or just certain parts of data in the view. There are even controls for how often to update the cache, and so forth.