Why accessing Django QuerySet became very slow? - python

I have a model query in Django:
Query = Details.objects.filter(name__iexact=nameSelected)
I filter it later:
Query2 = Query .filter(title__iexact=title0)
Then I access it using:
...Query2[0][0]...
A few days ago it worked very fast. But now it became at least 20 times slower.
I test it on other PC, it works very fast.
Update: filtering is not the reason of the delay, Query[0][0] is the reason.
Besides that, it became super slow suddenly not over time.
What can make it so slow on my first PC?

Maybe you could try to make a list out of the Queryset when you create it so that you have a real list not only a lazy QS
Query2 = list(Query .filter(title__iexact=title0))

The best way is to avoid loop for filtering the query. What I did is to create a hashmap dictionary
dict0 = {}
Then I added list of items and data that corresponds to that item in the query:
dict0 = dict(zip(title0List, DataList))
Finally I use dict0 instead of query, It boosts the speed at least 10 times for me)

Related

Reducing database access when same query on multiple similar objects

I have an operation in one of my views
order_details = [order.get_order_details() for order in orders]
Now order.get_order_details() runs one database query. So for current situation. Depending on size of orders the number of database access will be huge.
Before having to use cache, is there anything that can speed this up?
Is it possible to merge all the select operations into one single database operation?
Will making it an atomic transaction using transaction.atomic() increase any performance? because technically the query will be sent at once instead of individually, right?
Edit: is there any design changes/ pattern that will avoid this situation?
Edit:
def get_order_details(self):
items = Item.objects.filter(order=self)
item_list = [item.serialize for item in items]
return {
'order_details': self.serialize,
'item_list': item_list
}
Assuming orders is a QuerySet, e.g. the result of Order.objects.filter(...), add:
.prefetch_related(Prefetch('item_set'))
to the end of the query. Then use:
items = self.item_set
in get_order_details.
See the docs here: https://docs.djangoproject.com/en/1.11/ref/models/querysets/#prefetch-related

Django storing a lot of data in table

Right now, I use this code to save the data to the database-
for i in range(len(companies)):
for j in range(len(final_prices)):
linechartdata = LineChartData()
linechartdata.foundation = company //this refers to a foreign-key of a different model
linechartdata.date = finald[j]
linechartdata.price = finalp[j]
linechartdata.save()
Now len(companies) can vary from [3-50] and len(final_prices) can vary from somewhere between [5000-10000]. I know its a very inefficient way to store it in the database and takes a lot of time. What should I do to make it effective and less time consuming?
If you really need to store them in the database you might check bulk_create. From the documents:
This method inserts the provided list of objects into the database in an efficient manner (generally only 1 query, no matter how many objects there are):
Although, I never personally used it for that many objects, docs say it can. This could make your code more efficient in terms of hitting the database and using multiple save().
Basically to try; create list of objects (without saving) and then use bulk_create. Like this:
arr = []
for i in range(len(companies)):
for j in range(len(final_prices)):
arr.append(
LineChartData(
foundation = company,
date = finald[j],
price = finalp[j]
)
)
LineChartData.objects.bulk_create(arr)

Django QuerySet vs Raw Query performance

I have noticed a huge timing difference between using django connection.cursor vs using the model interface, even with small querysets.
I have made the model interface as efficient as possible, with values_list so no objects are constructed and such. Below are the two functions tested, don't mind the spanish names.
def t3():
q = "select id, numerosDisponibles FROM samibackend_eventoagendado LIMIT 1000"
with connection.cursor() as c:
c.execute(q)
return list(c)
def t4():
return list(EventoAgendado.objects.all().values_list('id','numerosDisponibles')[:1000])
Then using a function to time (self made with time.clock())
r1 = timeme(t3); r2 = timeme(t4)
The results are as follows:
0.00180384529631 and 0.00493390727024 for t3 and t4
And just to make sure the queries are and take the same to execute:
connection.queries[-2::]
Yields:
[
{u'sql': u'select id, numerosDisponibles FROM samibackend_eventoagendado LIMIT 1000', u'time': u'0.002'},
{u'sql': u'SELECT `samiBackend_eventoagendado`.`id`, `samiBackend_eventoagendado`.`numerosDisponibles` FROM `samiBackend_eventoagendado` LIMIT 1000', u'time': u'0.002'}
]
As you can see, two exact queries, returning two exact lists (performing r1 == r2 returns True), takes totally different timings (difference gets bigger with a bigger query set), I know python is slow, but is django doing so much work behind the scenes to make the query that slower?
Also, just to make sure, I have tried building the queryset object first (outside the timer) but results are the same, so I'm 100% sure the extra time comes from fetching and building the result structure.
I have also tried using the iterator() function at the end of the query but that doesn't help neither.
I know the difference is minimal, both execute blazingly fast, but this is being bencharked with apache ab, and this minimal difference, when having 1k concurrent requests, makes day and light.
By the way, I'm using django 1.7.10 with mysqlclient as the db connector.
EDIT: For the sake of comparison, the same test with a 11k result query set, the difference gets even bigger (3x slower, compared to the first one where it is around 2.6x slower)
r1 = timeme(t3); r2 = timeme(t4)
0.0149241530889
0.0437563529558
EDIT2: Another funny test, if I actually convert the queryset object to it's actual string query (with str(queryset.query)), and use it inside a raw query instead, I get the same good performance as the raw query, by the execption that using the queryset.query string sometimes gives me an actual invalid SQL query (ie, if the queryset has a filter on a date value, the date value is not escaped with '' on the string query, giving an sql error when executing it with a raw query, this is another mystery)
-- EDIT3:
Going through the code, seems like the difference is made by how the result data is retrieved, for a raw query set, it simply calls iter(self.cursor) which I believe when using a C implemented connector will run all in C code (as iter is also a built in), while the ValuesListQuerySet is actually a python level for loop with a yield tuple(row) statement, which will be quite slow. I guess there's nothing to be done in this matter to have the same performance as the raw query set :'(.
If anyone is interested, the slow loop is this one:
for row in self.query.get_compiler(self.db).results_iter():
yield tuple(row)
-- EDIT 4: I have come with a very hacky code to convert a values list query set into usable data to be sent to a raw query, having the same performance as running a raw query, I guess this is very bad and will only work with mysql, but, the speed up is very nice while allowing me to keep the model api filtering and such. What do you think?
Here's the code.
def querysetAsRaw(qs):
q = qs.query.get_compiler(qs.db).as_sql()
with connection.cursor() as c:
c.execute(q[0], q[1])
return c
The answer was simple, update to django 1.8 or above, which changed some code that no longer has this issue in performance.

Reduce Queries by optimizing _sets in Django

The follow results in 4 db hits. Since lines 3 & 4 are just filtering what I grabbed in line 2, what do I need to change so it doesn't hit the db again?
page = get_object_or_404(Page, url__iexact = page_url)
installed_modules = page.module_set.all()
navigation_links = installed_modules.filter(module_type=ModuleTypeCode.MODAL)
module_map = dict([(m.module_static_object.key, m) for m in installed_modules])
Django querysets are lazy, so the following line doesn't hit the database:
installed_modules = page.module_set.all()
The query isn't executed until you iterate over the queryset in this line:
module_map = dict([(m.module_static_object.key, m) for m in installed_modules])
So the code you posted only looks like 3 database queries hits to me, not 4.
Since you are fetching all of the modules from the database already, you could filter the navigation links using a list comprehension instead of another query:
navigation_links = [m for m in installed_modules if m.module_type == ModuleTypeCode.MODAL]
You would have to do some benchmarking to see if this improved performance. It looks like it could be premature optimisation to me.
You might be doing one database query for each module where you fetch module_static_object.key. In this case, you could use select_related.
This is a case of premature optimization. 4 DB queries for a page load is not bad. The idea is to use as few queries as possible, but you're never going to get it down to 1 in every scenario. The code you have there doesn't seem off-the-wall in terms of needlessly creating queries, so it's highly probable that it's already as optimized as you'll be able to make it.

Large Sqlite database search

How is it possible to implement an efficient large Sqlite db search (more than 90000 entries)?
I'm using Python and SQLObject ORM:
import re
...
def search1():
cr = re.compile(ur'foo')
for item in Item.select():
if cr.search(item.name) or cr.search(item.skim):
print item.name
This function runs in more than 30 seconds. How should I make it run faster?
UPD: The test:
for item in Item.select():
pass
... takes almost the same time as my initial function (0:00:33.093141 to 0:00:33.322414). So the regexps eat no time.
A Sqlite3 shell query:
select '' from item where name like '%foo%';
runs in about a second. So the main time consumption happens due to the inefficient ORM's data retrieval from db. I guess SQLObject grabs entire rows here, while Sqlite touches only necessary fields.
The best way would be to rework your logic to do the selection in the database instead of in your python program.
Instead of doing Item.select(), you should rework it to do Item.select("""name LIKE ....
If you do this, and make sure you have the name and skim columns indexed, it will return very quickly. 90000 entries is not a large database.
30 seconds to fetch 90,000 rows might not be all that bad.
Have you benchmarked the time required to do the following?
for item in Item.select():
pass
Just to see if the time is DB time, network time or application time?
If your SQLite DB is physically very large, you could be looking at -- simply -- a lot of physical I/O to read all that database stuff in.
If you really need to use a regular expression, there's not really anything you can do to speed that up tremendously.
The best thing would be to write an sqlite function that performs the comparison for you in the db engine, instead of Python.
You could also switch to a db server like postgresql that has support for SIMILAR.
http://www.postgresql.org/docs/8.3/static/functions-matching.html
I would definitely take a suggestion of Reed to pass the filter to the SQL (forget the index part though).
I do not think that selecting only specified fields or all fields make a difference (unless you do have a lot of large fields). I would bet that SQLObject creates/instanciates 80K objects and puts them into a Session/UnitOfWork for tracking. This could definitely take some time.
Also if you do not need objects in your session, there must be a way to select just what the fields you need using custom-query creation so that no Item objects are created, but only tuples.
Initially doing regex via Python was considered for y_serial, but that
was dropped in favor of SQLite's GLOB (which is far faster).
GLOB is similar to LIKE except that it's syntax is more
conventional: * instead of %, ? instead of _ .
See the Endnotes at http://yserial.sourceforge.net/ for more details.
Given your example and expanding on Reed's answer your code could look a bit like the following:
import re
import sqlalchemy.sql.expression as expr
...
def search1():
searchStr = ur'foo'
whereClause = expr.or_(itemsTable.c.nameColumn.contains(searchStr), itemsTable.c.skimColumn.contains(searchStr))
for item in Items.select().where(whereClause):
print item.name
which translates to
SELECT * FROM items WHERE name LIKE '%foo%' or skim LIKE '%foo%'
This will have the database do all the filtering work for you instead of fetching all 90000 records and doing possibly two regex operations on each record.
You can find some info here on the .contains() method here.
As well as the SQLAlchemy SQL Expression Language Tutorial here.
Of course the example above assumes variable names for your itemsTable and the column it has (nameColumn and skimColumn).

Categories

Resources