sqlalchemy scalar subquery conversion

sqlalchemy scalar subquery conversion - python

The query I want to convert looks like this:
SELECT
sum(CASE
WHEN (countryCd3 = (
SELECT
countryCd3 as id2
FROM myTable
where a.countryCd3 = id2
GROUP BY countryCd3
HAVING count(countryCd3) > 1) AND countryCd3 IS NOT NULL) THEN 1 ELSE 0
END) AS unexpected_count
FROM myTable as a;
This is in Teradata and it works. The issue I have is converting this to sqlalchemy, especially passing there where line with the appropriate aliases.
What I have tried so far:
import sqlalchemy as sa
table_outer = sa.table('myTable', sa.column('countryCd3'))
table_inner = sa.table('myTable', sa.column('countryCd3'))
inner_query = (sa.select([sa.column(column)])
.where(table_outer.c.id == table_inner.c.id)
.select_from(table_inner)
.group_by(sa.column(column))
.having(sa.func.count(sa.column(column)) > 1))
print(inner_query)
Output:
SELECT countryCd3
FROM "myTable", "myTable"
WHERE "myTable".countryCd3 = "myTable".countryCd3 GROUP BY countryCd3
HAVING count(countryCd3) > :count_1
Haven't tried to construct the outer query given this issue currently.

Related

Translane sql to sqlalchemy core

I have this sql
SELECT publishers.id, publishers.created_at, publishers.updated_at, publishers.name,
count(case books.is_active when true then 1 else null end) AS is_active,
count(books.id) as book_count
FROM publishers
LEFT OUTER JOIN books ON books.publisher_id = publishers.id
GROUP BY publishers.id
How i can use this on my python-project with sqlalchemy core?
I have this code now
book_count_alias = models.Book.__table__.alias('book_alias')
join = cls._model.__table__.outerjoin(
models.Book.__table__,
sa.and_(models.Book.publisher_id == models.Publisher.id, models.Book.is_active == True)
).outerjoin(
book_count_alias, book_count_alias.c.publisher_id == models.Publisher.id
)
query = sa.select([cls._model, sa.func.count(models.Book.id).label('is_active'),
sa.func.count(book_count_alias.c.id).label('book_count')])
if q:
query = query.where(cls._model.name.ilike(f'%{q}%'))
query = query.select_from(join).group_by(cls._model.id)
results = await database.fetch_all(query)
return [cls._get_parsed_object(x, schema=schemas.PublisherWithBookCountAndIsActive) for x in results
]

SQL statement to sqlalchemy ORM query API

I have the following SQL statement which works as expected, but I want to do the same thing using the Query API of sqlalchemy, I tried the following but it returns empty. Any idea how I can get this SQL statement by composing the Query API operations?
The raw SQL statement is:
SELECT COUNT(mid), mname
FROM(
SELECT missions._id AS mid, missions.name AS mname
FROM missions
INNER JOIN mission_ownership
ON missions._id = mission_ownership.mission_id
INNER JOIN mission_agencies
ON mission_agencies._id = mission_ownership.mission_agency_id
WHERE mission_agencies.name = 'Nasa'
)
GROUP BY mid
HAVING COUNT(mid) > 1
What I currently have using the ORM Query API:
nasa_and_esa_missions = session.query(func.count(Mission._id), Mission).\
join(mission_ownership). \
join(MissionAgency).\
filter(MissionAgency.name == 'Nasa').\
group_by(Mission._id).\
having(func.count(Mission._id) > 1)

If no relationship has been configured between mission_ownership and mission_agency at the ORM level, this can be done by modelling the inner SELECT as a subquery:
subq = (session.query(Mission._id.label('mid'), Mission.name.label('mname'))
.join(mission_ownership)
.join(MissionAgency)
.filter(MissionAgency.name == 'Nasa')
.subquery())
q = (session.query(subq.c.mid, Mission)
.group_by(subq.c.mid)
.having(sa.func.count(subq.c.mid) > 1))
for id_, m in q:
print(id_, m.name)
Which generates this SQL:
SELECT anon_1.mid AS anon_1_mid, missions._id AS missions__id, missions.name AS missions_name
FROM (SELECT missions._id AS mid, missions.name AS mname FROM missions
JOIN mission_ownership ON missions._id = mission_ownership.mission_id
JOIN mission_agencies ON mission_agencies._id = mission_ownership.mission_agency_id
WHERE mission_agencies.name = ?) AS anon_1, missions
GROUP BY anon_1.mid
HAVING count(anon_1.mid) > ?

read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

I am trying to parameterize some parts of a SQL Query using the below dictionary:
query_params = dict(
{'target':'status',
'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
})
sql_data_sample = str("""select *
from table_name
where dt = %(date_to)s
and %(target)s in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *,
from table_name
where dt = %(date_from)s
and %(target)s in ('ACT')
order by random() limit 50000);""")
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)
However this returns a dataframe with no records at all. I am not sure what the error is since no error is being thrown.
df_data_sample.shape
Out[7]: (0, 1211)
The final PostgreSql query would be:
select *
from table_name
where dt = '201805'
and status in ('NPA')
----------------------------------------------------
union all
----------------------------------------------------
(select *
from table_name
where dt = '201712'
and status in ('ACT')
order by random() limit 50000);-- This part of random() is only for running it on my local and not on server.
Below is a small sample of data for replication. The original data has more than a million records and 1211 columns
service_change_3m service_change_6m dt grp_m2 status
0 -2 201805 $50-$75 NPA
0 0 201805 < $25 NPA
0 -1 201805 $175-$200 ACT
0 0 201712 $150-$175 ACT
0 0 201712 $125-$150 ACT
-1 1 201805 $50-$75 NPA
Can someone please help me with this?
UPDATE:
Based on suggestion by #shmee.. I am finally using :
target = 'status'
query_params = dict(
{
'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
})
sql_data_sample = str("""select *
from table_name
where dt = %(date_to)s
and {0} in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *,
from table_name
where dt = %(date_from)s
and {0} in ('ACT')
order by random() limit 50000);""").format(target)
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

Yes, I am quite confident that your issue results from trying to set column names in your query via parameter binding (and %(target)s in ('ACT')) as mentioned in the comments.
This results in your query restricting the result set to records where 'status' in ('ACT') (i.e. Is the string 'status' an element of a list containing only the string 'ACT'?). This is, of course, false, hence no record gets selected and you get an empty result.
This should work as expected:
import psycopg2.sql
col_name = 'status'
table_name = 'public.churn_data'
query_params = {'date_from':'201712',
'date_to':'201805',
'drform_target':'NPA'
}
sql_data_sample = """select *
from {0}
where dt = %(date_to)s
and {1} in (%(drform_target)s)
----------------------------------------------------
union all
----------------------------------------------------
(select *
from {0}
where dt = %(date_from)s
and {1} in ('ACT')
order by random() limit 50000);"""
sql_data_sample = sql.SQL(sql_data_sample).format(sql.Identifier(table_name),
sql.Identifier(col_name))
df_data_sample = pd.read_sql(sql_data_sample,con = cnxn,params = query_params)

SQLAlchemy reference a subquery from a case expression

I have a hybrid_property that returns a string based on some calculations made on a one-to-many relationship.
The raw sql for the hybrid_property expression is:
Here's the raw sql:
SELECT
CASE
WHEN s.quantity_received = 0 THEN "unreceived"
WHEN s.dif = 0.0 THEN "received"
WHEN s.dif > 0.0 THEN "partially_received"
WHEN s.dif < 0.0 THEN "over_received"
END as status
FROM (
SELECT li.quantity_received, sum(li.quantity - li.received) as 'dif'
FROM line_items as li
WHERE li.o_id = xxx
) as s
The Models
class LineItem(BaseModel):
__table__ = Table('line_items', autoload=True)
order = relationship("Order", backreef="line_itms", primaryjoin="Order.id == foregin(LineItem.o_id)")
class Order(BaseModel):
__table__ = Table('orders', autoload=True)
#hybrid_property
def status(self):
qty_received, qty_ordered = 0, 0
for li in self.line_items:
if li.status != "cancelled":
qty_ordered += li.quantity
qty_received += li.quantity_received
if qty_received == 0:
status = "unreceived"
elif qty_received == qty_ordered:
status = "received"
elif qty_received < qty_ordered:
status = "partially_received"
elif qty_received > qty_ordered:
status = "over_received"
return status
#status.expression
def status(cls):
line_items_calc = select([LineItem.quantity_received,
func.sum(LineItem.quantity - LineItem.quantity_received).label('dif')]) \
.where(and_(LineItem.o_id == Order.id,
or_(LineItem.fulfillment_status != "cancelled",
LineItem.fulfillment_status == None))) \
.alias()
qq = select([
case([
(qs.c.quantity_received == 0, "unreceived"),
(qs.c.dif == 0, "received"),
(qs.c.dif > 0, "partially_received"),
(qs.c.dif < 0, "over_received")]
)]) \
.select_from(line_items_calc) \
.as_scalar()
return qq
I have 2 orders, o1 and o2 with line items:
LineItem(o_id=o1.id, quantity=1, quantity_received=1)
LineItem(o_id=o2.id, quantity=1, quantity_received=0)
LineItem(o_id=o2.id, quantity=2, quantity_received=1)
Order1 should have status "received" and Order2 should have "partially_received".
But when I query for "received" I get nothing and when querying for "partially_received" I get 2 results instead of one.
It looks like it is not filtering the LineItems by Order.id and so it uses all to calculate the status (since total_qty would be 4 and total received would be 2, which will give "partially_received")
Order.query().filter(Order.status == 'received').all() # returns []
Order.query().filter(Order.status == 'partially_received').all() # returns [Order1, Order2]
If add the .correlate_except(LineItem) to line_items_calc query, I get theh following error:
OperationalError: (_mysql_exceptions.OperationalError) (1054,
"Unknown column 'orders.id' in 'where clause'") [SQL: u'SELECT
count(*) AS count_1 \nFROM (SELECT * \nFROM orders \nWHERE
orders.account_id = %s AND (SELECT CASE WHEN (a_3.quantity_received =
%s) THEN %s WHEN (a_3.dif = %s) THEN %s WHEN (a_3.dif > %s) THEN %s
WHEN (a_3.dif < %s) THEN %s END AS a_2 \nFROM (SELECT
line_items.quantity_received AS quantity_received,
sum(line_items.quantity - line_items.quantity_received) AS dif \nFROM
line_items \nWHERE line_items.o_id = orders.id AND
(line_items.fulfillment_status != %s OR line_items.fulfillment_status
IS NULL)) AS a_3) = %s) AS a_1'] [parameters: (1L, 0, 'unreceived', 0,
'received', 0, 'partially_received', 0, 'over_received', 'cancelled',
u'over_received')]

It would seem that you're trying to correlate the expression to the outermost query, but as it turned out the current nested subquery approach is not feasible in MySQL, because it does not allow correlated subqueries in FROM clause at all – compared to some other databases that simply do not allow correlating with previous FROM list items, except if using LATERAL.
On the other hand the nested subquery is redundant, since you can use aggregates in a CASE expression in the SELECT list, but in your current subquery you mix non-aggregate and aggregate expressions:
SELECT li.quantity_received, sum(li.quantity - li.received) as 'dif'
which is more than likely not what you wanted. Some other databases would not even allow such a query to execute, but MySQL silently picks a value for li.quantity_received from an unspecified row in the group, if ONLY_FULL_GROUP_BY is disabled. It is by default enabled in 5.7.5 and onwards, and you should consider enabling it. Looking at your hybrid property's other half it looks like you probably meant to take the sum of received quantity as well.
Below is a version of status expression that fulfills the 2 test cases you've presented in your question:
#status.expression
def status(cls):
qty_received = func.coalesce(func.sum(LineItem.quantity_received), 0)
qty_ordered = func.coalesce(func.sum(LineItem.quantity), 0)
return select([case([
(qty_received == 0, "unreceived"),
(qty_received == qty_ordered, "received"),
(qty_received < qty_ordered, "partially_received"),
(qty_received > qty_ordered, "over_received")])]).\
where(and_(func.coalesce(LineItem.fulfillment_status, "") != "cancelled",
LineItem.o_id == cls.id)).\
correlate_except(LineItem).\
as_scalar()
I believe it's a closer representation of the Python side approach than your original. Note the use of COALESCE for NULL handling.

Python MySQLdb SELECT not returning proper value

Here's the code I'm working on:
poljeID = int(cursor.execute("SELECT poljeID FROM stanje"))
xkoord = cursor.execute("SELECT xkoord FROM polje WHERE poljeID = %s;", poljeID)
ykoord = cursor.execute("SELECT ykoord FROM polje WHERE poljeID = %s;", poljeID)
print xkoord, ykoord
It's a snippet from it, basically what it needs to do is fetch the ID of the field (poljeID) where an agent is currently on (stanje) and use it to get the x and y coordinates of that field (xkoord, ykoord).
The initial values for the variables are:
poljeID = 1
xkoord = 0
ykoord = 0
The values that I get with that code are:
poljeID = 1
xkoord = 1
ykoord = 1
What am I doing wrong?

cursor.execute does not return the result of the query, it returns the number of rows affected. To get the result, you need to do cursor.fetchone() (or cursor.fetchall()) for each query.
(Note, really the second and third queries should be done at once: SELECT xkoord, ycoord FROM ...)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

sqlalchemy scalar subquery conversion - python

Related

Translane sql to sqlalchemy core

SQL statement to sqlalchemy ORM query API

read_sql query returns an empty dataframe after I pass parameters as a dict in python pandas

SQLAlchemy reference a subquery from a case expression

Python MySQLdb SELECT not returning proper value

Categories

Resources