I'm having trouble understanding under what circumstances are .values() or .values_list() better than just using Model instances?
I think the following are all equivalent:
results = SomeModel.objects.all()
for result in results:
print(result.some_field)
results = SomeModel.objects.all().values()
for result in results:
print(result['some_field'])
results = SomeModel.objects.all().values_list()
for some_field, another_field in results:
print(some_field)
obviously these are stupid examples, could anyone point out a good reason for using .values() / .values_list() over just using Model instances directly?
edit :
I did some simple profiling, using a noddy model that contained 2 CharField(max_length=100)
Iterating over just 500 instances to copy 'first' to another variable, taking the average of 200 runs I got following results:
Test.objects.all() time: 0.010061947107315063
Test.objects.all().values('first') time: 0.00578328013420105
Test.objects.all().values_list('first') time: 0.005257354974746704
Test.objects.all().values_list('first', flat=True) time: 0.0052023959159851075
Test.objects.all().only('first') time: 0.011166254281997681
So the answer is definitively : performance! (mostly, see knbk answer below)
.values() and .values_list() translate to a GROUP BY query. This means that rows with duplicate values will be grouped into a single value. So say you have a model People the following data:
+----+---------+-----+
| id | name | age |
+----+---------+-----+
| 1 | Alice | 23 |
| 2 | Bob | 42 |
| 3 | Bob | 23 |
| 4 | Charlie | 30 |
+----+---------+-----+
Then People.objects.values_list('name', flat=True) will return 3 rows: ['Alice', 'Bob', 'Charlie']. The rows with name 'Bob' are grouped into a single value. People.objects.all() will return 4 rows.
This is especially useful when doing annotations. You can do e.g. People.objects.values_list('name', Sum('age')), and it will return the following results:
+---------+---------+
| name | age_sum |
+---------+---------+
| Alice | 23 |
| Bob | 65 |
| Charlie | 30 |
+---------+---------+
As you can see, the ages of both Bob's have been summed, and are returned in a single row. This is different from distinct(), which only applies after the annotations.
Performance is just a side-effect, albeit a very useful one.
values() and values_list() are both intended as optimizations for a specific use case: retrieving a subset of data without the overhead of creating a model instance. Good explanation is given in the Django Documentation.
I use "values_list()" to create a Custom Dropdown Single Select Box for Django Admin as shown below:
# "admin.py"
from django.contrib import admin
from django import forms
from .models import Favourite, Food, Fruit, Vegetable
class FoodForm(forms.ModelForm):
# Here
FRUITS = Fruit.objects.all().values_list('id', 'name')
fruits = forms.ChoiceField(choices=FRUITS)
# Here
VEGETABLES = Vegetable.objects.all().values_list('id', 'name')
vegetables = forms.ChoiceField(choices=VEGETABLES)
class FoodInline(admin.TabularInline):
model = Food
form = FoodForm
#admin.register(Favourite)
class FavouriteAdmin(admin.ModelAdmin):
inlines = [FoodInline]
Related
Assume I have models like so:
class Story(...):
name = models.CharField(...)
class Chapter(...):
title = models.CharField(...)
length = models.CharField(...)
story = models.ForeignKey('Story', ..., related_name='chapters')
How can I filter for stories that have N specific chapters, i.e.:
titles = ['Beginning', 'Middle', 'End']
length = 'Long'
# is there a better way to do this?
stories_with_these_chapters = Story.objects.filter(
chapters__title = titles[0],
chapters__length = length
).filter(
chapters__title = titles[1],
chapters__length = length
).filter(
chapters__title = titles[2],
chapters__length = length
)
Edit:
So for example say I have this data:
Stories:
ID | Name
-- | ----
1 | First Story
2 | Second Story
3 | Third Story
Chapters:
ID | Story ID | Title | Length
-- | -------- | --------- | ------
1 | 1 | Beginning | Long
2 | 1 | End | Long
3 | 2 | Beginning | Short
4 | 2 | Middle | Short
5 | 2 | End | Short
6 | 3 | Beginning | Long
7 | 3 | Middle | Long
8 | 3 | End | Long
I want to filter for stories with the Chapters titled "Beginning", "Middle", and "End" and are "Long" - which will only be Story 3 in this example because Story 1 does not have a chapter titled "Middle" and all of the Chapters in Story 2 are "Short".
Ok to try to explain ways of filtering and equivalence.
If you're working with model fields or ForeignKey relationships:
.filter(A, B) means WHERE A AND B
.filter(Q(A) & Q(B)) means WHERE A AND B
.filter(A).filter(B) means WHERE A AND B
With reverse ForeignKey relationships:
.filter(A, B) means WHERE A AND B
.filter(Q(A) & Q(B)) means WHERE A AND B
.filter(A).filter(B) means WHERE A AND B but A and B operates on a table with duplicates, due to the multiple INNER JOINs
Having seen your edit you need to chain Q queries.
from functools import reduce
import operator
from django.db.models import Q
query = reduce(operator.and_, (Q(chapter__title__contains=x) for x in titles))
qs = Story.objects.filter(query)
Putting this into the terminal to show what reduce is doing for you;
>>> from functools import reduce
>>>
>>> import operator
>>> from django.db.models import Q
>>>
>>> titles = ['Beginning', 'Middle', 'End']
>>> query = reduce(operator.and_, (Q(chapter__title__contains=x) for x in titles))
>>> query
<Q: (AND: ('chapter__title__contains', 'Beginning'), ('chapter__title__contains', 'Middle'), ('chapter__title__contains', 'End'))>
So reduce is applying some mathematics to build a Q object which does an AND for each term in your list matching chapter__title__contains
Update for addition of length column
From the above, you can add to the query to add the length query;
>>> query.add(Q(chapter__length='Long'), query.connector)
<Q: (AND: ('chapter__title__contains', 'Beginning'), ('chapter__title__contains', 'Middle'), ('chapter__title__contains', 'End'), ('chapter__length', 'Long'))>
And as a side note, to see what the SQL is behind a django query you can do;
>>> query = User.objects.all()
>>> query.query
<django.db.models.sql.query.Query object at 0x7f910f250eb0>
>>> print(query.query)
SELECT "authentication_user"."id", "authentication_user"."password", "authentication_user"."last_login", "authentication_user"."is_superuser", "authentication_user"."email", "authentication_user"."first_name", "authentication_user"."last_name", "authentication_user"."date_joined", "authentication_user"."is_active", "authentication_user"."is_staff" FROM "authentication_user"
Basically you want to write a query that does this:
Find all chapters matching the set of titles you are interested in
matching the desired length.
Group the result by story and count the number of records per story.
Exclude the ones where the number of records does not match the length of the list of chapter titles.
Something like this results in an optimum query with no joins:
titles = ['Beginning', 'Middle', 'End']
length = 'Long'
story_ids_with_matches = Chapters.objects.order_by().values('story_id').filter(title__in=titles, length=length).annotate(chpt_cnt=Count('story_id')).filter(chpt_cnt=len(titles))
This will produce an output result set with "story_id" and "chpt_cnt" as fields.
You can then print details about the story by wrapping that in a query like this:
Story.objects.filter(id__in= [i['story_id'] for i in story_ids_with_matches])
To get the list of names in a single query you would have to accept having the count in the output and the query is less efficient since it does a join and also groups on a text field:
titles = ['Beginning', 'Middle', 'End']
length = 'Long'
story_ids_with_matches = Story.objects.order_by().values('name').filter(chapters__title__in=titles, chapters__length=length).annotate(chpt_cnt=Count('id')).filter(chpt_cnt=len(titles))
You need queryset "in". For further reference please check docs.
stories_with_these_chapters = Story.objects.filter(chapters__title__in=["Boo", "hoo"])
It will fetch all the Story records where the Chapter title is either "Boo" and "hoo".
I'm trying to use a list but still it shows all the values.
allexpelorer1 = Destination.objects.filter(destination = pk).order_by('-pk')
allexpelorer = []
for checkhu in allexpelorer1:
if Destination.objects.filter(destination = pk, user_pk = checkhu.user_pk) not in allexpelorer:
allexpelorer.append(checkhu)
From your question, what I have understood:
Tourists write about the experience of the countries that they have visited. So, obviously, tourists can write multiple reviews.
The table structure should look like the following:
CustomerReviewsTable / CountryProfileTable
Tourist_One has written 3 times for both countries.
Tourist_Two has written 1 time for each country.
Here is the query you should follow:
country = 1 # Denmark
max_ids = Yourmodel.objects.filter(country = country).values('tourist','country').annotate(max_id=Max('pk')).values('max_id')
Then query again
result = Yourmodel.objects.filter(id__in = max_ids)
The above result Queryset will return your expected result.
-When Country 1 passed:
Denmark | Review1 | Tourist_One
Denmark | Review4 | Tourist_Two
-When Country 2 passed:
Italy | Review3 | Tourist_One
Italy | Review5 | Tourist_Two
Now, this will not bring any duplicate comments/reviews.
I made a grid search that contains 36 models.
For each model the confusion matrix is available with :
grid_search.get_grid(sort_by='a_metrics', decreasing=True)[index].confusion_matrix(valid=valid_set)
My problematic is I only want to access some parts of this confusion matrix in order to make my own ranking, which is not natively available with h2o.
Let's say we have the confusion_matrix of the first model of the grid_search below:
+---+-------+--------+--------+--------+------------------+
| | 0 | 1 | Error | Rate | |
+---+-------+--------+--------+--------+------------------+
| 0 | 0 | 766.0 | 2718.0 | 0.7801 | (2718.0/3484.0) |
| 1 | 1 | 351.0 | 6412.0 | 0.0519 | (351.0/6763.0) |
| 2 | Total | 1117.0 | 9130.0 | 0.2995 | (3069.0/10247.0) |
+---+-------+--------+--------+--------+------------------+
Actually, the only things that really interest me is the precision of the class 0 as 766/1117 = 0,685765443. While h2o consider precision metrics for all the classes and it is done to the detriment of what I am looking for.
I tried to convert it in dataframe with:
model = grid_search.get_grid(sort_by='a_metrics', decreasing=True)[0]
model.confusion_matrix(valid=valid_set).as_data_frame()
Even if some topics on internet suggest it works, actually it does not (or doesn't anymore):
AttributeError: 'ConfusionMatrix' object has no attribute 'as_data_frame'
I search a way to return a list of attributes of the confusion_matrix without success.
According to H2O documentation there is no as_dataframe method: http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/model/confusion_matrix.html
I assume the easiest way is to call to_list().
The .table attribute gives the object with as_data_frame method.
model.confusion_matrix(valid=valid_set).table.as_data_frame()
If you need to access the table header, you can do
model.confusion_matrix(valid=valid_set).table._table_header
Hint: You can use dir() to check the valid attributes of a python object.
So basically I have a database table with photos. Each photo has a rating <0,1> and categories (one or more). I need a way to efficiently chose x elements from this table at weighted random but with respect to categories, I have to do this in python3 + Django (or microservice communicating thru Redis or exposing RESTapi).
eg:
table:
.---------.--------.------------.
| photo | rating | categories |
:---------+--------+------------:
| Value 1 | 0.8 | art, cats |
:---------+--------+------------:
| value 2 | 0.5 | cats |
:---------+--------+------------:
| value 3 | 0.9 | night |
'---------'--------'------------'
And when I ask for 1 photo with categories (cats, dogs). The algorithm should return something like
numpy.random.choice([Value 1, Value 2], 1, [0.8, 0.5], replace=False)
Currently, every time I am asked for it I do something as follow:
photos = Photos.objects.filter(category__in=[list of wanted categories])
photos, weights = zip(*list(photos.values_list('photo', 'rating')))
res = numpy.random.choice(photos, amount_wanted, weights, , replace=False)
Is there more efficent approach to this? I can use any AWS service to achive it.
You may be able to use something like
photo = random.sample(Photos.objects.filter(category__in=[list of wanted categories], rating__gte=random.random())), 1)
This line basically selects all categories you want, filters out entries according to their probability, and returns a random one.
Let's say I have a model like this:
+-----------+--------+--------------+
| Name | Amount | Availability |
+-----------+--------+--------------+
| Milk | 100 | True |
+-----------+--------+--------------+
| Chocolate | 200 | False |
+-----------+--------+--------------+
| Honey | 450 | True |
+-----------+--------+--------------+
Now in a second model I want to have a field (also named 'Amount') which is always equal to the sum of the amounts of the rows which have Availability = True. For example like this:
+-----------+-----------------------------------------------+
| Inventory | Amount |
+-----------+-----------------------------------------------+
| Groceries | 550 #this is the field I want to be dependent |
+-----------+-----------------------------------------------+
Is that possible? Or is there a better way of doing this?
Of course that is possible: i would recommend one of two things:
Do this "on the fly" as one person commented. then store in django cache mechanisim so that it only calculates once in awhile (saving database/computation resources).
create a database view that does the summation; again it will let the database cache the results/etc. to save resources.
That said, I only think #1 or 2 is needed on a very large record set on a very busy site.