SQLAlchemy: how to obtain all distinct values from Array field? - python

I have the following model of a blog post:
title = db.Column(db.String())
content = db.Column(db.String())
tags = db.Column(ARRAY(db.String))
Tags field can be an empty list.
Now I want to select all distinct tags from the database entries with max performance - excluding empty arrays.
So, say I have 3 records with the following values of the tags field:
['database', 'server', 'connection']
[]
['connection', 'security']
The result would be ['database', 'server', 'connection', 'security']
The actual order is not important.

The distinct() method should still work fine with array columns.
from sqlalchemy import func
unique_vals = BlogPost.query(func.unnest(BlogPost.tags)).distinct().all()
https://docs.sqlalchemy.org/en/13/orm/query.html?highlight=distinct#sqlalchemy.orm.query.Query.distinct
This would be identical to running a query in postgres:
SELECT DISTINCT unnest(tags) FROM blog_posts

If you can process the results after(usually you can) and don't want to use a nested query for this sort of thing, I usually resort to doing something like;
func.array_agg(func.array_to_string(BlogPost.tags, "||")).label("tag_lists")
and then split on the join string(||) after.

Related

Get data from one column in database django

I have table Users in my database:
id
name
last_name
status
1
John
Black
active
2
Drake
Bell
disabled
3
Pep
Guardiola
active
4
Steve
Salt
active
users_data = []
I would like to get all id and all status row from this db and write to empty dict.
What kind of query should I use? Filter, get or something else?
And what if I would like to get one column, not two?
If, you want to access the values of specific columns for all instances of a table :
id_status_list = Users.objects.values_list('id', 'status')
You can have more info here, in the official documentation
Note that Django provides an ORM to ease queries onto the database (See this page for more info on the queries) :
To fetch all column values of all users instances from your Users table :
users_list = Users.objects.all()
To fetch all column values of specific Users in the table :
active_users_list = Users.objects.filter(status="active")
To fetch all column values of a specific User in the table :
user_33 = Users.objects.get(pk=33)
Use the .values() method:
>>> Users.objects.values('id', 'status')
[{'id': 1, 'status': 'actice'}, {}]
The result is a QuerySet which mostly behaves like a list, you can then do list(Users.objects.values('id', 'status')) to get the list object.
users_data = list(Users.objects.values('id', 'status'))
yourmodelname.objects.values('id','status')
this code show you db in two column include id and status
users_data = list(yourmodelname.objects.values('id','status'))
and with this code you can show your result on dictionary
Suppose your model name is User. For the first part of the question use this code:
User.objects.value('id', 'sataus') # to get a dictionary
User.objects.value_list('id', 'sataus') # to get a list of values
And for the second part of the question: 'And what if I would like to get one column, not two?' you can use these codes:
User.objects.values('id') # to get a dictionary
User.objects.values_list('id') # to get a list of values
User.objects.values('status') # to get a dictionary
User.objects.values_list('status') # to get a list of values

Dealing with Arrays in Flask-SqlAlchemy and MySQL

I have a datamodel where I store a list of values separated by comma (1,2,3,4,5...).
In my code, in order to work with arrays instead of string, I have defined the model like this one:
class MyModel(db.Model):
pk = db.Column(db.Integer, primary_key=True)
__fake_array = db.Column(db.String(500), name="fake_array")
#property
def fake_array(self):
if not self.__fake_array:
return
return self.__fake_array.split(',')
#fake_array.setter
def fake_array(self, value):
if value:
self.__fake_array = ",".join(value)
else:
self.__fake_array = None
This works perfect and from the point of view of my source code "fake_array" is an array, It's only transformed into string when it's stored in database.
The problem appears when I try to filter by that field. Expressions like this doesn't work:
MyModel.query.filter_by(fake_array="1").all()
It seems that I cant filter using the SqlAlchemy query model.
What can I do here? Is there any way to filter this kind of fields? Is there is a better pattern for the "fake_array" problem?
Thanks!
What you're trying to do should really be replaced with a pair of tables and a relationship between them.
The first table (which I'll call A) contains everything BUT the array column, and it should have a primary key of some sort. You should have another table (which I'll call B) that contains a primary key, a foreign key column to A (which I'll call a_id, and an integer field.
Using this layout, each row in the A table has its associated array in table B where B's a_id == A.id via a join. You can add or remove values from the array by manipulating the rows in table B. You can filter by using a join.
If the order of the values is needed, then create an order column in table B.

GeoDjango: How to perform a query of spatially close records

I have two Django models (A and B) which are not related by any foreign key, but both have a geometry field.
class A(Model):
position = PointField(geography=True)
class B(Model):
position = PointField(geography=True)
I would like to relate them spatially, i.e. given a queryset of A, being able to obtain a queryset of B containing those records that are at less than a given distance to A.
I haven't found a way using pure Django's ORM to do such a thing.
Of course, I could write a property in A such as this one:
#property
def nearby(self):
return B.objects.filter(position__dwithin=(self.position, 0.1))
But this only allows me to fetch the nearby records on each instance and not in a single query, which is far from efficient.
I have also tried to do this:
nearby = B.objects.filter(position__dwithin=(OuterRef('position'), 0.1))
query = A.objects.annotate(nearby=Subquery(nearby.values('pk')))
list(query) # error here
However, I get this error for the last line:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery
Does anybody know a better way (more efficient) of performing such a query or maybe the reason why my code is failing?
I very much appreciate.
I finally managed to solve it, but I had to perform a raw SQL query in the end.
This will return all A records with an annotation including a list of all nearby B records:
from collections import namedtuple
from django.db import connection
with connection.cursor() as cursor:
cursor.execute('''SELECT id, array_agg(b.id) as nearby FROM myapp_a a
LEFT JOIN myapp_b b ON ST_DWithin(a.position, p.position, 0.1)
GROUP BY a.id''')
nt_result = namedtuple('Result', [col[0] for col in cursor.description])
results = [nt_result(*row) for row in cursor.fetchall()]
References:
Raw queries: https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly
Array aggregation: https://www.postgresql.org/docs/8.4/functions-aggregate.html
ST_DWithin: https://postgis.net/docs/ST_DWithin.html

GQL query checking numerical id within a list of ids

I'm trying to compare the unique numerical id of an element in my database with a list of longs.
My GQL query should return those elements which have this id I'm passing as part of their array of longs.
I've tried using a statement of the form:
"SELECT * FROM Table WHERE id IN :1", list_of_stored_ids
I've also tried using this question: GQL query with numeric id in datastore viewer, but I still can't find any way to compare to a list.
Is there such a way? If not, what must I do?
You will need to build up a list of ndb keys, not numeric ids, in order to get this to work.
eg:
ids = [5918782761467904, 5624113645223936, 5463928544952320]
keys = [ndb.Key('<Entity>', id) for id in ids]
entities = ndb.gql("SELECT * FROM <Entity> WHERE __key__ IN :1", keys).fetch()
or (non-GQL version)
entities = ndb.get_multi(keys)

SELECT * in SQLAlchemy?

Is it possible to do SELECT * in SQLAlchemy?
Specifically, SELECT * WHERE foo=1?
Is no one feeling the ORM love of SQLAlchemy today? The presented answers correctly describe the lower-level interface that SQLAlchemy provides. Just for completeness, this is the more-likely (for me) real-world situation where you have a session instance and a User class that is ORM mapped to the users table.
for user in session.query(User).filter_by(name='jack'):
print(user)
# ...
And this does an explicit select on all columns.
The following selection works for me in the core expression language (returning a RowProxy object):
foo_col = sqlalchemy.sql.column('foo')
s = sqlalchemy.sql.select(['*']).where(foo_col == 1)
If you don't list any columns, you get all of them.
query = users.select()
query = query.where(users.c.name=='jack')
result = conn.execute(query)
for row in result:
print row
Should work.
You can always use a raw SQL too:
str_sql = sql.text("YOUR STRING SQL")
#if you have some args:
args = {
'myarg1': yourarg1
'myarg2': yourarg2}
#then call the execute method from your connection
results = conn.execute(str_sql,args).fetchall()
Where Bar is the class mapped to your table and session is your sa session:
bars = session.query(Bar).filter(Bar.foo == 1)
Turns out you can do:
sa.select('*', ...)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(rows, columns=rows.keys())
For joins if columns are not defined manually, only columns of target table are returned. To get all columns for joins(User table joined with Group Table:
sql = User.select(from_obj(Group, User.c.group_id == Group.c.id))
# Add all coumns of Group table to select
sql = sql.column(Group)
session.connection().execute(sql)
I had the same issue, I was trying to get all columns from a table as a list instead of getting ORM objects back. So that I can convert that list to pandas dataframe and display.
What works is to use .c on a subquery or cte as follows:
U = select(User).cte('U')
stmt = select(*U.c)
rows = session.execute(stmt)
Then you get a list of tuples with each column.
Another option is to use __table__.columns in the same way:
stmt = select(*User.__table__.columns)
rows = session.execute(stmt)
In case you want to convert the results to dataframe here is the one liner:
pd.DataFrame.from_records(dict(zip(r.keys(), r)) for r in rows)
If you're using the ORM, you can build a query using the normal ORM constructs and then execute it directly to get raw column values:
query = session.query(User).filter_by(name='jack')
for cols in session.connection().execute(query):
print cols
every_column = User.__table__.columns
records = session.query(*every_column).filter(User.foo==1).all()
When a ORM class is passed to the query function, e.g. query(User), the result will be composed of ORM instances. In the majority of cases, this is what the dev wants and will be easiest to deal with--demonstrated by the popularity of the answer above that corresponds to this approach.
In some cases, devs may instead want an iterable sequence of values. In these cases, one can pass the list of desired column objects to query(). This answer shows how to pass the entire list of columns without hardcoding them, while still working with SQLAlchemy at the ORM layer.

Categories

Resources