group by in django - python

How can i create simple group by query in trunk version of django?
I need something like
SELECT name
FROM mytable
GROUP BY name
actually what i want to do is simply get all entries with distinct names.

If you need all the distinct names, just do this:
Foo.objects.values('name').distinct()
And you'll get a list of dictionaries, each one with a name key. If you need other data, just add more attribute names as parameters to the .values() call. Of course, if you add in attributes that may vary between rows with the same name, you'll break the .distinct().
This won't help if you want to get complete model objects back. But getting distinct names and getting full data are inherently incompatible goals anyway; how do you know which row with a given name you want returned in its entirety? If you want to calculate some sort of aggregate data for all the rows with a given name, aggregation support was recently added to Django trunk and can take care of that for you.

Add .distinct to your queryset:
Entries.objects.filter(something='xxx').distinct()

this will not work because every row have unique id. So every record is distinct..
To solve my problem i used
foo = Foo.objects.all()
foo.query.group_by = ['name']
but this is not official API.

Related

Pyspark join conditions using dictionary values for keys

I'm working on a script that tests the contents of some newly generated tables against production tables. The newly generated tables may or may not have the same column names and may have multiple columns that have to be used in join conditions. I'm attempting to write out a function with the needed keys being passed using a dictionary.
something like this:
def check_subset_rel(self, remote_df, local_df, keys):
join_conditions = []
for key in keys:
join_conditions.append(local_df.key['local_key']==remote_df.key['remote_key'])
missing_subset_df = local_df.join(remote_df, join_conditions, 'leftanti')
pyspark/python doesn't like the dictionary usage in local_df.key['local_key'] and remote_df.key['remote_key']. I get a "'DataFrame' object has no attribute 'key'" error. I'm pretty sure that it's expecting the actual name of the column instead of any variable, but I'm not sure if I can make that conversation between value and column name.
Does anyone know how I could go about this?

django sort queryset by newest dates from different fields

How can I sort queryset just like forums do?
I mean ordering by one date field, and there will be another order date
to use for sorting, for example in forums: subjects get sorted by on created date (or updated date), but if any old subject got a new reply, then it will be shown before other subjects. This is exactly what I'm trying to do.
What I tried so far:
subjects = Subjects.objects.filter(active=True).order_by('-sticky', '-updated_date', 'reply__updated_date')
but in this case results are repeated as reply counts.
I've also tried:
subjects = Subjects.objects.filter(active=True).order_by('-sticky', '-updated_date').order_by('reply__updated_date')
but in this case the second order_by overrides the first one.
Is there any guideline to follow?
BTW, I'm pretty sure you got it, but just to be clear, Subject is a model and Reply is another model, connected by a foreign key.
If I got it correctly, you can achieve this by using annotations and database functions.
One thing at a time.
You want to sort Subjects by reply date in descending order: in Django's own terms, you need to annotate() each Subjects with the maximun updated_date of its related model reply, and sort Subjects by the new field in reverse order, like this:
from django.db.models import Max
subjects = Subject.objects.annotate(lr=Max('reply__updated_date'))\
.order_by('-lr')
Now you have all subjects with at least one reply sorted, but what about the subjects without replies?
We can rephrase our sorting like this: "sort by last reply date if there are replies, and by last subject update otherwise".
The "otherwise" part can be achieved by the Coalesce() function, which replaces NULLs with another expression of our choice. In our case it'll replace what's inside Max() with the updated_date field of the Subjects:
from django.db.models import Max
from django.db.models.functions import Coalesce
subjects = Subject.objects\
.annotate(lr=Coalesce(Max('reply__updated_date'), 'updated_date'))\
.order_by('-lr')

Extracting a year from a DateField in django in the database

I know its possible to use lookups to filter a DateField in the database by year like so:
MyModel.objects.filter(date__year=2000) # Returns all objects with a year of 2000
But, I want to the extract that when using a values() call:
MyModel.objects.all().values('label','date__year') # Fails!
FieldError: Cannot resolve keyword 'year' into field. Join on 'date' not permitted.
I've tried making a custom lookup, but that doesn't apply to a values call.
How can I extract just the year in this kind of query?
edit: To be clear, I know I could do this a million ways in Python iterating over the queryset once its back from the database. But I don't want to do this.
You can use list comprehension if you want.
The syntax would be this:
[(x['label'], x['date'].year) for MyModel.objects.all().values('label','date')]

How to update all object columns in SqlAlchemy?

I have a table of Users(more than 15 columns) and sometimes I need to completely update all the user attributes.For xample, I want to replace
user_in_db = session.query(Users).filter_by(user_twitter_iduser.user_twitter_id).first()
with some other object.
I have found the following solution :
session.query(User).filter_by(id=123).update({"name": user.name})
but I fell that writing all 15+ attributes is error-prone and there should exist a simpler solution.
You can write:
session.query(User).filter_by(id=123).update({column: getattr(user, column) for column in User.__table__.columns.keys()})
This will iterate over the columns of the User model (table) and it'll dynamically create a dictionary with the necessary keys and values.

What model should a SQLalchemy database column be to contain an array of data?

So I am trying to set up a database, the rows of which will be modified frequently. Every hour, for instance, I want to add a number to a particular part of my database. So if self.checkmarks is entered into the database equal to 3, what is the best way to update this part of the database with an added number to make self.checkmarks now equal 3, 2? I tried establishing the column as db.Array but got an attribute error:
AttributeError: 'SQLAlchemy' object has no attribute 'Array'
I have found how to update a database, but I do not know the best way to update by adding to a list rather than replacing. My approach was as follows, but I don't think append will work because the column cannot be an array:
ven = data.query.filter_by(venid=ven['id']).first()
ven.totalcheckins = ven.totalcheckins.append(ven['stats']['checkinsCount'])
db.session.commit()
Many thanks in advance
If you really want to have a python list as a Column in SQLAlchemy you will want to have a look at the PickleType:
array = db.Column(db.PickleType(mutable=True))
Please note that you will have to use the mutable=True parameter to be able to edit the column. SQLAlchemy will detect changes automatically and they will be saved as soon as you commit them.
If you want the pickle to be human-readable you can combine it with json or other converters that suffice your purposes.

Categories

Resources