Selecting distinct values from a column in Peewee

Selecting distinct values from a column in Peewee - python

I am looking to select all values from one column which are distinct using Peewee.
For example if i had the table
Organization Year
company_1 2000
company_1 2001
company_2 2000
....
To just return unique values in the organization column [i.e.company_1 and company_2]
I had assumed this was possible using the distinct option as documented http://docs.peewee-orm.com/en/latest/peewee/api.html#SelectQuery.distinct
My current code:
organizations_returned = organization_db.select().distinct(organization_db.organization_column).execute()
for item in organizations_returned:
print (item.organization_column)
Does not result in distinct rows returned (it results in e.g. company_1 twice).
The other option i tried:
organization_db.select().distinct([organization_db.organization_column]).execute()
included [ ] within the disctinct option, which although appearing to be more consistent with the documentation, resulted in the error peewee.OperationalError: near "ON": syntax error:
Am i correct in assume that it is possible to return unique values directly from Peewee - and if so, what am i doing wrong?
Model structure:
cd_sql = SqliteDatabase(sql_location, threadlocals=True, pragmas=(("synchronous", "off"),))
class BaseModel(Model):
class Meta:
database = cd_sql
class organization_db(BaseModel):
organization_column = CharField()
year_column = CharField()

So what coleifer was getting at is that Sqlite doesn't support DISTINCT ON. That's not a big issue though, I think you can accomplish what you want like so:
organization_db.select(organization_db.organization).distinct()

Related

How to get the latest (distinct) records filtered by a non unique field in Django

I'll demonstrate by using an example. This is the model (the primary key is implicit):
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
I have a list of skus, I need to get the latest descriptions for all skus in the filter list that are written in the table for the model Item. Latest item == greatest id.
I need a way to annotate the latest description per sku:
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id")
but this won't work for various reasons (excluding the missing aggregate function).

Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').lastest("-id")
Or use this
Item.objects.values("sku").filter(sku__in=list_of_skus).annotate(latest_descr=Latest('description').order_by("-id").reverse()[0]

I used postgres ArrayAgg aggregate function to aggregate the latest description like so:
from django.contrib.postgres.aggregates import ArrayAgg
class ArrayAggLatest(ArrayAgg):
template = "(%(function)s(%(expressions)s ORDER BY id DESC))[1]"
Item.objects.filter(sku__in=skus).values("sku").annotate(descr=ArrayAggLatest("description"))
The aggregate function aggregates all descriptions ordered by descending ID of the original table and gets the 1st element (0 element is None)

Answer from #M.J.GH.PY or #dekomote war not correct.
If you have a model:
class Item(models.Model):
sku = models.CharField(null=False)
description = models.CharField(null=True)
this model has already by default order_by= 'id',
You don't need annotate something. You can:
get the last object:
Item.objects.filter(sku__in=list_of_skus).last()
get the last value of description:
Item.objects.filter(sku__in=list_of_skus).values_list('description', flat=True).last()
Both variants give you a None if a queryset is empty.

Get data from one column in database django

I have table Users in my database:
id
name
last_name
status
1
John
Black
active
2
Drake
Bell
disabled
3
Pep
Guardiola
active
4
Steve
Salt
active
users_data = []
I would like to get all id and all status row from this db and write to empty dict.
What kind of query should I use? Filter, get or something else?
And what if I would like to get one column, not two?

If, you want to access the values of specific columns for all instances of a table :
id_status_list = Users.objects.values_list('id', 'status')
You can have more info here, in the official documentation
Note that Django provides an ORM to ease queries onto the database (See this page for more info on the queries) :
To fetch all column values of all users instances from your Users table :
users_list = Users.objects.all()
To fetch all column values of specific Users in the table :
active_users_list = Users.objects.filter(status="active")
To fetch all column values of a specific User in the table :
user_33 = Users.objects.get(pk=33)

Use the .values() method:
>>> Users.objects.values('id', 'status')
[{'id': 1, 'status': 'actice'}, {}]
The result is a QuerySet which mostly behaves like a list, you can then do list(Users.objects.values('id', 'status')) to get the list object.
users_data = list(Users.objects.values('id', 'status'))

yourmodelname.objects.values('id','status')
this code show you db in two column include id and status
users_data = list(yourmodelname.objects.values('id','status'))
and with this code you can show your result on dictionary

Suppose your model name is User. For the first part of the question use this code:
User.objects.value('id', 'sataus') # to get a dictionary
User.objects.value_list('id', 'sataus') # to get a list of values
And for the second part of the question: 'And what if I would like to get one column, not two?' you can use these codes:
User.objects.values('id') # to get a dictionary
User.objects.values_list('id') # to get a list of values
User.objects.values('status') # to get a dictionary
User.objects.values_list('status') # to get a list of values

Dealing with Arrays in Flask-SqlAlchemy and MySQL

I have a datamodel where I store a list of values separated by comma (1,2,3,4,5...).
In my code, in order to work with arrays instead of string, I have defined the model like this one:
class MyModel(db.Model):
pk = db.Column(db.Integer, primary_key=True)
__fake_array = db.Column(db.String(500), name="fake_array")
#property
def fake_array(self):
if not self.__fake_array:
return
return self.__fake_array.split(',')
#fake_array.setter
def fake_array(self, value):
if value:
self.__fake_array = ",".join(value)
else:
self.__fake_array = None
This works perfect and from the point of view of my source code "fake_array" is an array, It's only transformed into string when it's stored in database.
The problem appears when I try to filter by that field. Expressions like this doesn't work:
MyModel.query.filter_by(fake_array="1").all()
It seems that I cant filter using the SqlAlchemy query model.
What can I do here? Is there any way to filter this kind of fields? Is there is a better pattern for the "fake_array" problem?
Thanks!

What you're trying to do should really be replaced with a pair of tables and a relationship between them.
The first table (which I'll call A) contains everything BUT the array column, and it should have a primary key of some sort. You should have another table (which I'll call B) that contains a primary key, a foreign key column to A (which I'll call a_id, and an integer field.
Using this layout, each row in the A table has its associated array in table B where B's a_id == A.id via a join. You can add or remove values from the array by manipulating the rows in table B. You can filter by using a join.
If the order of the values is needed, then create an order column in table B.

GeoDjango: How to perform a query of spatially close records

I have two Django models (A and B) which are not related by any foreign key, but both have a geometry field.
class A(Model):
position = PointField(geography=True)
class B(Model):
position = PointField(geography=True)
I would like to relate them spatially, i.e. given a queryset of A, being able to obtain a queryset of B containing those records that are at less than a given distance to A.
I haven't found a way using pure Django's ORM to do such a thing.
Of course, I could write a property in A such as this one:
#property
def nearby(self):
return B.objects.filter(position__dwithin=(self.position, 0.1))
But this only allows me to fetch the nearby records on each instance and not in a single query, which is far from efficient.
I have also tried to do this:
nearby = B.objects.filter(position__dwithin=(OuterRef('position'), 0.1))
query = A.objects.annotate(nearby=Subquery(nearby.values('pk')))
list(query) # error here
However, I get this error for the last line:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery
Does anybody know a better way (more efficient) of performing such a query or maybe the reason why my code is failing?
I very much appreciate.

I finally managed to solve it, but I had to perform a raw SQL query in the end.
This will return all A records with an annotation including a list of all nearby B records:
from collections import namedtuple
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''SELECT id, array_agg(b.id) as nearby FROM myapp_a a
LEFT JOIN myapp_b b ON ST_DWithin(a.position, p.position, 0.1)
GROUP BY a.id''')
nt_result = namedtuple('Result', [col[0] for col in cursor.description])
results = [nt_result(*row) for row in cursor.fetchall()]
References:
Raw queries: https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Array aggregation: https://www.postgresql.org/docs/8.4/functions-aggregate.html
ST_DWithin: https://postgis.net/docs/ST_DWithin.html

Does Django ManyToManyField create table with a redundant index?

If I have a model Foo that has a simple M2M field to model Bar:
class Foo(Model):
bar = ManyToManyField(Bar)
Django seems to create a table foo_bar which has the following indices:
index 1: primary, unique (id)
index 2: unique (foo_id, bar_id)
index 3: non_unique (foo_id)
index 4: non_unique (bar_id)
I recall from my basic knowledge of SQL, that if a query needs to look for conditions on foo_id, index 2 would suffice (since the left-most column can be used for lookup). index 3 seems to be redundant.
Am I correct to assume that index 3 does indeed take up index space while offering no benefit? That I'm better off using a through table and manually create a unique index on (foo_id, bar_id), and optionally, another index on (bar_id) if needed?

The key to understanding how a many-to-many association is represented in the database is to realize that each line of the junction table (in this case, foo_bar) connects one line from the left table (foo) with one line from the right table (bar). Each pk of "foo" can be copied many times to "foo_bar"; each pk of "bar" can also be copied many times to "foo_bar". But the same pair of fk's in "foo_bar" can only occur once.
So if you have only one index (pk of "foo" or "bar") in "foo_bar" it can be only one occurrence of it ... and it is not Many to many relation.
For example we have two models (e-commerce): Product, Order.
Each product can be in many orders and one order can contain many products.
class Product(models.Model):
...
class Order(models.Model):
products = ManyToManyField(Product, through='OrderedProduct')
class OrderedProduct(models.Model):
# each pair can be only one time, so in one order you can calculate price for each product (considering ordered amount of it).
# and at the same time you can get somewhere in your template|view all orders which contain same product
order = models.ForeignKey(Order)
product = models.ForeignKey(Product)
amount = models.PositiveSmallIntegerField() # amount of ordered products
price = models.IntegerField() # just int in this simple example
def save(self, *args, **kwargs):
self.price = self.product__price * self.amount
super(OrderedProduct, self).save(*args, **kwargs)

For someone else who is still wondering. This is a known issue and there is an open ticket on the django bug tracker :
https://code.djangoproject.com/ticket/22125

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Selecting distinct values from a column in Peewee - python

So what coleifer was getting at is that Sqlite doesn't support DISTINCT ON. That's not a big issue though, I think you can accomplish what you want like so: organization_db.select(organization_db.organization).distinct()

Related

How to get the latest (distinct) records filtered by a non unique field in Django

Get data from one column in database django

Dealing with Arrays in Flask-SqlAlchemy and MySQL

GeoDjango: How to perform a query of spatially close records

Does Django ManyToManyField create table with a redundant index?

Categories

Resources