How Can I Index an Array Produced by array_agg in Peewee

How Can I Index an Array Produced by array_agg in Peewee - python

I'm trying to access the first element of an array that is contained within a peewee .select_from and created by a postgresql array_agg.
Here is a simplified version of the query:
rows = list(ST.select_from(fn.array_agg(ST.c.sitename)[1].alias("sitename"))
.join(LS, on=ST.c.id == LS.site)
.join(L, on=LS.location == L.id)
.group_by(L).with_cte(ST).objects().dicts())
Sites (ST) and locations (L) have a many-to-many relationship through an intermediary table LS. ST is a cte because it is a filtered down version of the Site table with certain criteria.
The relevant SQL being returned here is
SELECT (array_agg("ST"."sitename") = 1) AS "sitename"
Instead I want the sql to be
SELECT (array_agg("ST"."sitename"))[1] AS "sitename"
It seems that you can index into an ArrayField using [] from the Googling I did, but I'm assuming the result of fn.array_agg() isn't an ArrayField. I would like to know how to index into the results of an fn.array_agg(), or how to convert it into an ArrayField in order to index into it using [].

This is annoyingly obtuse with Peewee at present - my apologies. Part of this is due to Postgres' insistence that the function be wrapped in parentheses before it can be indexed -- peewee tries to eliminate redundant parentheses, which forces an additional workaround. At any rate, here is one way:
p1, p2, p3 = [Post.create(content='p%s' % i) for i in '123']
Comment.create(post=p1, comment='p1-c1')
Comment.create(post=p1, comment='p1-c2')
Comment.create(post=p2, comment='p2-c1')
idx = NodeList([
SQL('(array_agg('),
Comment.comment,
SQL('))[%s]', (1,))])
query = (Post
.select(Post, idx.alias('comment'))
.join(Comment, JOIN.LEFT_OUTER)
.group_by(Post)
.order_by(Post.content))
# p1 p1-c1
# p2 p2-c1
# p3 None

Related

How does set_group_by works in Django?

I was writing the following query:
claim_query = ClaimBillSum.objects.filter(claim__lob__in = lobObj)\
.annotate(claim_count = Count("claim__claim_id", distinct=True))\
.annotate(claim_bill_sum = Sum("bill_sum"))\
.values("claim__body_part", "claim_count", "claim_bill_sum")\
.order_by("claim__body_part")
When I checked the query property, it was grouped by all properties of the tables related in this query, not only the ones selected in the values() function, when I only wanted to group by claim__body_part.
As I searched for a way to change the group by instruction, I found the query.set_group_by() function, that when applied, fixed the query in the way I wanted:
claim_query.query.set_group_by()
SELECT
"CLAIM"."body_part",
COUNT(DISTINCT "claim_bill_sum"."claim_id") AS "claim_count",
SUM("claim_bill_sum"."bill_sum") AS "claim_bill_sum"
FROM
"claim_bill_sum"
INNER JOIN "CLAIM" ON
("claim_bill_sum"."claim_id" = "CLAIM"."claim_id")
WHERE
"CLAIM"."lob_id" IN (SELECT U0."lob_id" FROM "LOB" U0 WHERE U0."client_id" = 1)
GROUP BY
"CLAIM"."body_part"
ORDER BY
"CLAIM"."body_part" ASC
But I couldn't find any information in Django documentation or anywhere else to better describe how this function works. Why the default group by is selecting all properties, and how .set_group_by() works, selecting exactly the property I wanted?

Filter query by linked object key in SQLAlchemy

Judging by the title this would be the exact same question, but I can't see how any of the answers are applicable to my use case:
I have two classes and a relationship between them:
treatment_association = Table('tr_association', Base.metadata,
Column('chronic_treatments_id', Integer, ForeignKey('chronic_treatments.code')),
Column('animals_id', Integer, ForeignKey('animals.id'))
)
class ChronicTreatment(Base):
__tablename__ = "chronic_treatments"
code = Column(String, primary_key=True)
class Animal(Base):
__tablename__ = "animals"
treatment = relationship("ChronicTreatment", secondary=treatment_association, backref="animals")
I would like to be able to select only the animals which have undergon a treatment which has the code "X". I tried quite a few approaches.
This one fails with an AttributeError:
sql_query = session.query(Animal.treatment).filter(Animal.treatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
And this one happily returns a list of unfiltered animals:
sql_query = session.query(Animal.treatment).filter(ChronicTreatment.code == "chrFlu")
for item in sql_query:
pass
mystring = str(session.query(Animal))
The closest thing to the example from the aforementioned thread I could put together:
subq = session.query(Animal.id).subquery()
sql_query = session.query(ChronicTreatment).join((subq, subq.c.treatment_id=="chrFlu"))
for item in sql_query:
pass
mystring = str(session.query(Animal))
mydf = pd.read_sql_query(mystring,engine)
Also fails with an AttributeError.
Can you hel me sort this list?

First, there are two issues with table definitions:
1) In the treatment_association you have Integer column pointing to chronic_treatments.code while the code is String column.
I think it's just better to have an integer id in the chronic_treatments, so you don't duplicate the string code in another table and also have a chance to add more fields to chronic_treatments later.
Update: not exactly correct, you still can add more fields, but it will be more complex to change your 'code' if you decide to rename it.
2) In the Animal model you have a relation named treatment. This is confusing because you have many-to-many relation, it should be plural - treatments.
After fixing the above two, it should be clearer why your queries did not work.
This one (I replaced treatment with treatments:
sql_query = session.query(Animal.treatments).filter(
Animal.treatments.code == "chrFlu")
The Animal.treatments represents a many-to-many relation, it is not an SQL Alchemy mode, so you can't pass it to the query nor use in a filter.
Next one can't work for the same reason (you pass Animal.treatments into the query.
The last one is closer, you actually need join to get your results.
I think it is easier to understand the query as SQL (and you anyway need to know SQL to be able to use sqlalchemy):
animals = session.query(Animal).from_statement(text(
"""
select distinct animals.* from animals
left join tr_association assoc on assoc.animals_id = animals.id
left join chronic_treatments on chronic_treatments.id = assoc.chronic_treatments_id
where chronic_treatments.code = :code
""")
).params(code='chrFlu')
It will select animals and join chronic_treatments through the tr_association and filter the result by code.
Having this it is easy to rewrite it using SQL-less syntax:
sql_query = session.query(Animal).join(Animal.treatments).filter(
ChronicTreatment.code == "chrFlu")
That will return what you want - a list of animals who have related chronic treatment with given code.

Iterate List of tables for specific column

I have a list of tables which I would like to iterate and find a specific row based on a foreign key column, then delete it.
This is what my list of tables look like:
subrep_tables = [ TCableResistance.__table__,
TCapacitorBankTest.__table__,
TCapAndDf.__table__,
TMeasuredData.__table__,
TMultiDeviceData.__table__,
TStepVoltage.__table__,
TTemperatureData.__table__,
TTransformerResistance.__table__,
TTransformerTurnsRatio.__table__,
TTripUnit.__table__,
TVectorDiagram.__table__,
TWithstandTest.__table__,
]
I called the list subrep_tables because all of those tables contains a foreign key called ixSubReport.
What I'm trying to do is iterate the list and find all the rows that have a certain sub report and delete those rows instead of going to each table and running the query to delete them(very tedious)
This is what I've come up with thus far.
for report in DBSession.query(TReport).filter(TReport.ixDevice == device_id).all():
for sub_report in DBSession.query(TSubReport).filter(TSubReport.ixReport == report.ixReport).all():
for table in subrep_tables:
for item in DBSession.query(table).filter(table.ixSubReport == sub_report.ixSubReport).all():
print "item: " + str(item)
#DBSession.delete(item)
I'm having some difficulty accessing the table's ixSubReport column for my WHERE clause. The code I have right now gives me an error saying: 'Table' Object has no attribute 'ixSubReport'.
How can I access my iterated table's ixSubReport column to use in my WHERE clause to find the specific row so I can delete it?

If you really want to query the tables, the columns are under the c attribute, use table.c.ixSubReport.
There's no reason to create a list of the __table__ attributes though, just query the models directly. Also, you can avoid a ton of overhead by not performing the first two queries; you can do all this in a single query per model. (This example assumes there are relationships set up between te models).
from sqlalchemy.orm import contains_eager
has_subrep_models = [TCableResistance, TCapacitorBankTest, ...]
# assuming each has_subrep model has a relationship "subrep"
# assuming TSubReport has a relationship "report"
for has_subrep_model in has_subrep_models:
for item in DBSession.query(has_subrep_model).join(has_subrep_model.subrep, TSubReport.report).filter(TReport.ixDevice == device_id).options(contains_eager(has_subrep_model.subrep), contains_eager(TSubReport.report)):
DBSession.delete(item)
This simply joins the related sub report and report when querying each model that has a sub report, and does the filtering on the report's device there. So you end up doing one query per model, rather than 1 + <num reports> + (<num reports> * <num models with sub reports>) = a lot.

Thanks to Denis for the input, I ended up with this :
for report in DBSession.query(TReport).filter(TReport.ixDevice == device_id).all():
for sub_report in DBSession.query(TSubReport).filter(TSubReport.ixReport == report.ixReport).all():
for table in subrep_tables:
for item in DBSession.query(table).filter(table.c.ixSubreport == sub_report.ixSubReport).all():
DBSession.delete(item)

SQLAlchemy WHERE IN single value (raw SQL)

I'm having trouble with SQLAlchemy when doing a raw SQL which checks against multiple values.
my_sess.execute(
"SELECT * FROM table WHERE `key`='rating' AND uid IN :uids",
params=dict(uids=some_list)
).fetchall()
There are 2 scenarios for this query, one that works and one that doesn't. If some_list = [1], it throws me an SQL error that I have a syntax error near ). But if some_list = [1, 2], the query executes successfully.
Any reason why this would happen?

No, SQL parameters only ever deal with scalar values. You'll have to generate the SQL here; if you need raw SQL, use:
statement = "SELECT * FROM table WHERE `key`='rating' AND uid IN ({})".format(
', '.join([':i{}'.format(i) for i in range(len(some_list))]))
my_sess.execute(
statement,
params={'i{}'.format(i): v for i, v in enumerate(some_list)})
).fetchall()
e.g. generate enough parameters to hold all values in some_list with string formatting, then generate matching parameters to fill them.
Better still would be to use a literal_column() object to do all the generating for you:
from sqlalchemy.sql import literal_column
uid_in = literal_column('uid').in_(some_list)
statement = "SELECT * FROM able WHERE `key`='rating' AND {}".format(uid_in)
my_sess.execute(
statement,
params={'uid_{}'.format(i): v for i, v in enumerate(some_list)})
).fetchall()
but then you perhaps could just generate the whole statement using the `sqlalchemy.sql.expression module, as this would make supporting multiple database dialects much easier.
Moreover, the uid_in object already holds references to the right values for the bind parameters; instead of turning it into a string as we do with the str.format() action above, SQLAlchemy would have the actual object plus the associated parameters and you would no longer have to generate the params dictionary either.
The following should work:
from sqlalchemy.sql import table, literal_column, select
tbl = table('table')
key_clause = literal_column('key') == 'rating'
uid_clause = literal_column('uid').in_(some_list)
my_sess.execute(select('*', key_clause & uid_clause, [tbl]))
where the sqlalchemy.sql.select() takes a column spec (here hard-coded to *), a where clause (generated from the two clauses with & to generate a SQL AND clause) and a list of selectables; here your one sqlalchemy.sql.table() value.
Quick demo:
>>> from sqlalchemy.sql import table, literal_column, select
>>> some_list = ['foo', 'bar']
>>> tbl = table('table')
>>> key_clause = literal_column('key') == 'rating'
>>> uid_clause = literal_column('uid').in_(some_list)
>>> print select('*', key_clause & uid_clause, [tbl])
SELECT *
FROM "table"
WHERE key = :key_1 AND uid IN (:uid_1, :uid_2)
but the actual object tree generated from all this contains the actual values for the bind parameters too, so my_sess.execute() can access these directly.

How to construct a slightly more complex filter using "or_" or "and_" in SQLAlchemy

I'm trying to do a very simple search from a list of terms
terms = ['term1', 'term2', 'term3']
How do I programmatically go through the list of terms and construct the conditions from the list of terms so that I can make the query using filter and or_ or _and?
query.filter(or_(#something constructed from terms))

If you have a list of terms and want to find rows where a field matches one of them, then you could use the in_() method:
terms = ['term1', 'term2', 'term3']
query.filter(Cls.field.in_(terms))
If you want to do something more complex, then or_() and and_() take ClauseElement objects as parameters. ClauseElement and its subclasses basically represent the SQL AST of your query. Typically, you create clause elements by invoking a comparison operator on Column or InstrumentedAttribute objects:
# Create the clause element
clause = (users_table.columns['name'] == "something")
# you can also use the shorthand users_table.c.name
# The clause is a binary expression ...
print(type(clause))
# <class 'sqlalchemy.sql.expression._BinaryExpression'>
# ... that compares a column for equality with a bound value.
print(type(clause.left), clause.operator, type(clause.right))
# <class 'sqlalchemy.schema.Column'>, <built-in function eq>,
# <class 'sqlalchemy.sql.expression._BindParamClause'>
# str() compiles it to SQL
print(str(clause))
# users.name = ?
# You can also do that with ORM attributes
clause = (User.name == "something")
print(str(clause))
# users.name = ?
You can handle clause elements representing your conditions like any Python objects, put them into lists, compose them into other clause elements, etc. So you can do something like this:
# Collect the separate conditions to a list
conditions = []
for term in terms:
conditions.append(User.name == term)
# Combine them with or to a BooleanClauseList
condition = or_(*conditions)
# Can now use the clause element as a predicate in queries
query = query.filter(condition)
# or to view the SQL fragment
print(str(condition))
# users.name = ? OR users.name = ? OR users.name = ?

Assuming that your terms variable contains valid SQL statement fragments, you can simply pass terms preceded by an asterisk to or_ or and_:
>>> from sqlalchemy.sql import and_, or_
>>> terms = ["name='spam'", "email='spam#eggs.com'"]
>>> print or_(*terms)
name='spam' OR email='spam#eggs.com'
>>> print and_(*terms)
name='spam' AND email='spam#eggs.com'
Note that this assumes that terms contains only valid and properly escaped SQL fragments, so this is potentially unsafe if a malicious user can access terms somehow.
Instead of building SQL fragments yourself, you should let SQLAlchemy build parameterised SQL queries using other methods from sqlalchemy.sql. I don't know whether you have prepared Table objects for your tables or not; if so, assume that you have a variable called users which is an instance of Table and it describes your users table in the database. Then you can do the following:
from sqlalchemy.sql import select, or_, and_
terms = [users.c.name == 'spam', users.c.email == 'spam#eggs.com']
query = select([users], and_(*terms))
for row in conn.execute(query):
# do whatever you want here
Here, users.c.name == 'spam' will create an sqlalchemy.sql.expression._BinaryExpression object that records that this is a binary equality relation between the name column of the users table and a string literal that contains spam. When you convert this object to a string, you will get an SQL fragment like users.name = :1, where :1 is a placeholder for the parameter. The _BinaryExpression object also remembers the binding of :1 to 'spam', but it won't insert it until the SQL query is executed. When it is inserted, the database engine will make sure that it is properly escaped. Suggested reading: SQLAlchemy's operator paradigm
If you only have the database table but you don't have a users variable that describes the table, you can create it yourself:
from sqlalchemy import Table, MetaData, Column, String, Boolean
metadata = MetaData()
users = Table('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String),
Column('email', String),
Column('active', Integer)
)
Alternatively, you can use autoloading which queries the database engine for the structure of the database and builds users automatically; obviously this is more time-consuming:
users = Table('users', metadata, autoload=True)

I had the same issue in"SQLAlchemy: an efficient/better select by primary keys?":
terms = ['one', 'two', 'three']
clauses = or_( * [Table.field == x for x in terms] )
query = Session.query(Table).filter(clauses)

You can use the "Conjunctions" documentation to combine conditions And, Or and not.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How Can I Index an Array Produced by array_agg in Peewee - python

Related

How does set_group_by works in Django?

Filter query by linked object key in SQLAlchemy

Iterate List of tables for specific column

SQLAlchemy WHERE IN single value (raw SQL)

How to construct a slightly more complex filter using "or_" or "and_" in SQLAlchemy

Categories

Resources