Extracting a year from a DateField in django in the database - python

I know its possible to use lookups to filter a DateField in the database by year like so:
MyModel.objects.filter(date__year=2000) # Returns all objects with a year of 2000
But, I want to the extract that when using a values() call:
MyModel.objects.all().values('label','date__year') # Fails!
FieldError: Cannot resolve keyword 'year' into field. Join on 'date' not permitted.
I've tried making a custom lookup, but that doesn't apply to a values call.
How can I extract just the year in this kind of query?
edit: To be clear, I know I could do this a million ways in Python iterating over the queryset once its back from the database. But I don't want to do this.

You can use list comprehension if you want.
The syntax would be this:
[(x['label'], x['date'].year) for MyModel.objects.all().values('label','date')]

Related

Django - Time aggregates of DatetimeField across queryset

(using django 1.11.2, python 2.7.10, mysql 5.7.18)
If we imagine a simple model:
class Event(models.Model):
happened_datetime = DateTimeField()
value = IntegerField()
What would be the most elegant (and quickest) way to run something similar to:
res = Event.objects.all().aggregate(
Avg('happened_datetime')
)
But that would be able to extract the average time of day for all members of the queryset. Something like:
res = Event.objects.all().aggregate(
AvgTimeOfDay('happened_datetime')
)
Would it be possible to do this on the db directly?, i.e., without running a long loop client-side for each queryset member?
EDIT:
There may be a solution, along those lines, using raw SQL:
select sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) from event_event;
Performance-wise, this runs in 0.015 second for ~23k rows on a laptop, not optimised, etc. Assuming that could yield accurate/correct results and since time is only a secondary factor, could I be using that?
Add another integer field to your model that contains only the hour of the day extracted from the happened_datetime.
When creating/updating a model instance you need to update this new field accordingly whenever the happened_datetime is set/updated. You can extract the hours of the day for example by reading datetime.datetime.hour. Or use strftime to create a value to your liking.
Aggregation should then work as proposed by yourself.
EDIT:
Django's ORM has Extract() as a function. Example from the docs adapted to your use case:
>>> # How many experiments completed in the same year in which they started?
>>> Event.objects.aggregate(
... happenend_datetime__hour=Extract('happenend_datetime', 'hour'))
(Not tested!)
https://docs.djangoproject.com/en/1.11/ref/models/database-functions/#extract
So after a little search and tries.. the below seems to work. Any comments on how to improve (or hinting as to why it is completely wrong), are welcome! :-)
res = Event.objects.raw('''
SELECT id, sec_to_time(avg(time_to_sec(extract(HOUR_SECOND from happened_datetime)))) AS average_time_of_day
FROM event_event
WHERE happened_datetime BETWEEN %s AND %s;''', [start_datetime, end_datetime])
print res[0].__dict__
# {'average_time_of_day': datetime.time(18, 48, 10, 247700), '_state': <django.db.models.base.ModelState object at 0x0445B370>, 'id': 9397L}
Now the ID returned is that of the last object falling in the datetime range for the WHERE clause. I believe Django just inserts that because of "InvalidQuery: Raw query must include the primary key".
Quick explanation of the SQL series of function calls:
Extract HH:MM:SS from all datetime fields
Convert the time values to seconds via time_to_sec.
average all seconds values
convert averaged seconds value back into time format (HH:MM:SS)
Don't know why Django insists on returning microseconds but that is not really relevant. (maybe the local ms at which the time object was instantiated?)
Performance note: this seems to be extremely fast but then again I haven't tested that bit. Any insight would be kindly appreciated :)

order a dictionory of distict extracted values from queryset

how to order a dictionory of distict values (extracted from queryset) based on count of occurrence?
For example:
query = self.my_queryset.filter(category='rock').values('first_name').distinct()
I want to order the resulting 'query' by the number occurrence of category 'rock'.
I am using postgresql as backend db, so open to ideas of doing this in postgres itself. :)
Thanks!!!
convert your query set into a list of values where each value is your model object.
then you can sort this list using python default sort function and pass a comparison function which compares two objects based on their count in your original query set.

How to update all object columns in SqlAlchemy?

I have a table of Users(more than 15 columns) and sometimes I need to completely update all the user attributes.For xample, I want to replace
user_in_db = session.query(Users).filter_by(user_twitter_iduser.user_twitter_id).first()
with some other object.
I have found the following solution :
session.query(User).filter_by(id=123).update({"name": user.name})
but I fell that writing all 15+ attributes is error-prone and there should exist a simpler solution.
You can write:
session.query(User).filter_by(id=123).update({column: getattr(user, column) for column in User.__table__.columns.keys()})
This will iterate over the columns of the User model (table) and it'll dynamically create a dictionary with the necessary keys and values.

What model should a SQLalchemy database column be to contain an array of data?

So I am trying to set up a database, the rows of which will be modified frequently. Every hour, for instance, I want to add a number to a particular part of my database. So if self.checkmarks is entered into the database equal to 3, what is the best way to update this part of the database with an added number to make self.checkmarks now equal 3, 2? I tried establishing the column as db.Array but got an attribute error:
AttributeError: 'SQLAlchemy' object has no attribute 'Array'
I have found how to update a database, but I do not know the best way to update by adding to a list rather than replacing. My approach was as follows, but I don't think append will work because the column cannot be an array:
ven = data.query.filter_by(venid=ven['id']).first()
ven.totalcheckins = ven.totalcheckins.append(ven['stats']['checkinsCount'])
db.session.commit()
Many thanks in advance
If you really want to have a python list as a Column in SQLAlchemy you will want to have a look at the PickleType:
array = db.Column(db.PickleType(mutable=True))
Please note that you will have to use the mutable=True parameter to be able to edit the column. SQLAlchemy will detect changes automatically and they will be saved as soon as you commit them.
If you want the pickle to be human-readable you can combine it with json or other converters that suffice your purposes.

group by in django

How can i create simple group by query in trunk version of django?
I need something like
SELECT name
FROM mytable
GROUP BY name
actually what i want to do is simply get all entries with distinct names.
If you need all the distinct names, just do this:
Foo.objects.values('name').distinct()
And you'll get a list of dictionaries, each one with a name key. If you need other data, just add more attribute names as parameters to the .values() call. Of course, if you add in attributes that may vary between rows with the same name, you'll break the .distinct().
This won't help if you want to get complete model objects back. But getting distinct names and getting full data are inherently incompatible goals anyway; how do you know which row with a given name you want returned in its entirety? If you want to calculate some sort of aggregate data for all the rows with a given name, aggregation support was recently added to Django trunk and can take care of that for you.
Add .distinct to your queryset:
Entries.objects.filter(something='xxx').distinct()
this will not work because every row have unique id. So every record is distinct..
To solve my problem i used
foo = Foo.objects.all()
foo.query.group_by = ['name']
but this is not official API.

Categories

Resources