python traverse CTE in a double for loop? - python

I have 2 for loops within each-other. For each row 'A', 'B', 'C' in loop1, I need to access the hierarchical tree to find all the parents of a group 'X' in loop2. This makes me use CTE where I need to find the path for each row separately. Using CTE in a loop is not a solution for sure where I can match for each group id. Referred this link, but could not make out much Looping hierarchy CTE
Code snippet for the cron job using flask framework:
s = select([rt_issues]).\
where(
and_(
rt_issues.c.status !='Closed',
rt_issues.c.assigned_to != None
))
rs = conn.execute(s)
if rs.rowcount > 0:
s4 = text('with recursive rec_grp as(select id, parent_id, name, head, 1 as level, array[id] as path_info from groups union all select grp1.id, grp1.parent_id, grp1.name, grp1.head, rc.level + 1, rc.path_info||grp1.id from groups grp1 join rec_grp rc on grp1.id = rc.parent_id) select distinct id, parent_id, name, head, path_info from rec_grp order by id')
rs4 = conn.execute(s4)
for r in rs:
head_list = []
hierarchical_grps = []
for rr in rs4:
if ((rr['path_info'][0] == r[rt_issues.c.assignee_group])):
for g in rr['path_info']:
hierarchical_grps.append(g)
hierarchical_grps = list(set(hierarchical_grps))
send_pending_mail(hierarchical_grps, r['id'])
print hierarchical_grps, 'hierarchical_grps'
exit(0)
I need to send mail to all the group heads for the assignee_group in the hierarchy for the issue. How can this be achieved. How to use the loops correctly? I am using sqlalchemy core only, postgresql, python with flask. I need the exact code for the same.
What works is the snippet below:
mgroup = None
s = select([rt_issues]).\
where(
and_(
rt_issues.c.status !='Closed',
rt_issues.c.assigned_to != None
))
rs = conn.execute(s)
if rs.rowcount > 0:
for r in rs:
head_list = []
hierarchical_grps = []
mgroup = r[rt_issues.c.assignee_group]
s4 = text('with recursive rec_grp as(select id, parent_id, name, head, 1 as level, array[id] as path_info from groups where id=' +str(mgroup) + 'union all select grp1.id, grp1.parent_id, grp1.name, grp1.head, rc.level + 1, rc.path_info||grp1.id from groupsgrp1 join rec_grp rc on grp1.id = rc.parent_id) select distinct id,parent_id, name, head, path_info from rec_grp order by id')
rs4 = conn.execute(s4)
for rr in rs4:
if ((rr['path_info'][0] == r[rt_issues.c.assignee_group])):
for g in rr['path_info']:
hierarchical_grps.append(g)
hierarchical_grps = list(set(hierarchical_grps))
print hierarchical_grps, 'hierarchical_grps'
send_pending_mail(hierarchical_grps, r['id'])
exit(0)

Assuming that the head column is boolean, this will collect the groups with the head flag set:
rs4 = con.execute(s4)
for rr in rs4:
if rr['head']:
head_list.append(rr['id'])
print 'group heads:', head_list
This is assuming the query from your second example is used (note the correction to the from clause, "from groupsgrp1" should be "from groups grp1"):
WITH RECURSIVE rec_grp AS (
SELECT
id,
parent_id,
name,
head,
1 AS level,
ARRAY [id] AS path_info
FROM groups
WHERE id = 4
UNION ALL
SELECT
grp1.id,
grp1.parent_id,
grp1.name,
grp1.head,
rc.level + 1,
rc.path_info || grp1.id
FROM groups grp1
JOIN rec_grp rc ON grp1.id = rc.parent_id
)
SELECT DISTINCT
id,
parent_id,
name,
head,
path_info
FROM rec_grp
ORDER BY id;

Related

SQL statement to sqlalchemy ORM query API

I have the following SQL statement which works as expected, but I want to do the same thing using the Query API of sqlalchemy, I tried the following but it returns empty. Any idea how I can get this SQL statement by composing the Query API operations?
The raw SQL statement is:
SELECT COUNT(mid), mname
FROM(
SELECT missions._id AS mid, missions.name AS mname
FROM missions
INNER JOIN mission_ownership
ON missions._id = mission_ownership.mission_id
INNER JOIN mission_agencies
ON mission_agencies._id = mission_ownership.mission_agency_id
WHERE mission_agencies.name = 'Nasa'
)
GROUP BY mid
HAVING COUNT(mid) > 1
What I currently have using the ORM Query API:
nasa_and_esa_missions = session.query(func.count(Mission._id), Mission).\
join(mission_ownership). \
join(MissionAgency).\
filter(MissionAgency.name == 'Nasa').\
group_by(Mission._id).\
having(func.count(Mission._id) > 1)
If no relationship has been configured between mission_ownership and mission_agency at the ORM level, this can be done by modelling the inner SELECT as a subquery:
subq = (session.query(Mission._id.label('mid'), Mission.name.label('mname'))
.join(mission_ownership)
.join(MissionAgency)
.filter(MissionAgency.name == 'Nasa')
.subquery())
q = (session.query(subq.c.mid, Mission)
.group_by(subq.c.mid)
.having(sa.func.count(subq.c.mid) > 1))
for id_, m in q:
print(id_, m.name)
Which generates this SQL:
SELECT anon_1.mid AS anon_1_mid, missions._id AS missions__id, missions.name AS missions_name
FROM (SELECT missions._id AS mid, missions.name AS mname FROM missions
JOIN mission_ownership ON missions._id = mission_ownership.mission_id
JOIN mission_agencies ON mission_agencies._id = mission_ownership.mission_agency_id
WHERE mission_agencies.name = ?) AS anon_1, missions
GROUP BY anon_1.mid
HAVING count(anon_1.mid) > ?

Python variable in long SQL string

What is a safe way to replace the number in the second-to-last line of this SQL query with a variable?
Say my variable is customer_id. Can I use {} in place of 2 and put .format(customer_id) at the end of this string?
unlicensed_query = """
SELECT SUM(x.quantity), SUM(x.quantity * p.list_price)
FROM (
SELECT cu.customer_id, cu.product_id, cu.quantity
FROM csi_usage cu LEFT JOIN csi c
ON cu.customer_id = c.customer_id
AND cu.product_id = c.product_id
WHERE c.product_id IS NULL
AND cu.customer_id = 2) x, product p
WHERE x.product_id = p.id;
"""
As stated by thebjorn, the correct way to do this is to use bound parameters (http://docs.sqlalchemy.org/en/latest/core/tutorial.html#specifying-bound-parameter-behaviors). An example is here:
from sqlalchemy.sql import text
fully_utilized_query = text("""
SELECT SUM(x.quantity)
FROM (
SELECT cu.customer_id, cu.product_id, cu.quantity
FROM csi_usage cu
JOIN csi c
ON cu.customer_id = c.customer_id
AND cu.product_id = c.product_id
AND cu.quantity = c.licence_qty
WHERE cu.customer_id = :customer_id) x;
""")
fully_utilized = self.session.execute(fully_utilized_query, {'customer_id': current_user.customer_id}).scalar()

Build a dynamic update query in psycopg2

I have to construct a dynamic update query for postgresql.
Its dynamic, because beforehand I have to determine which columns to update.
Given a sample table:
create table foo (id int, a int, b int, c int)
Then I will construct programmatically the "set" clause
_set = {}
_set['a'] = 10
_set['c'] = NULL
After that I have to build the update query. And here I'm stuck.
I have to construct this sql Update command:
update foo set a = 10, b = NULL where id = 1
How to do this with the psycopg2 parametrized command? (i.e. looping through the dict if it is not empty and build the set clause) ?
UPDATE
While I was sleeping I have found the solution by myself. It is dynamic, exactly how I wanted to be :-)
create table foo (id integer, a integer, b integer, c varchar)
updates = {}
updates['a'] = 10
updates['b'] = None
updates['c'] = 'blah blah blah'
sql = "upgrade foo set %s where id = %s" % (', '.join("%s = %%s" % u for u in updates.keys()), 10)
params = updates.values()
print cur.mogrify(sql, params)
cur.execute(sql, params)
And the result is what and how I needed (especially the nullable and quotable columns):
"upgrade foo set a = 10, c = 'blah blah blah', b = NULL where id = 10"
There is actually a slightly cleaner way to make it, using the alternative column-list syntax:
sql_template = "UPDATE foo SET ({}) = %s WHERE id = {}"
sql = sql_template.format(', '.join(updates.keys()), 10)
params = (tuple(addr_dict.values()),)
print cur.mogrify(sql, params)
cur.execute(sql, params)
Using psycopg2.sql – SQL string composition module
The module contains objects and functions useful to generate SQL dynamically, in a convenient and safe way.
from psycopg2 import connect, sql
conn = connect("dbname=test user=postgres")
upd = {'name': 'Peter', 'age': 35, 'city': 'London'}
ref_id = 12
sql_query = sql.SQL("UPDATE people SET {data} WHERE id = {id}").format(
data=sql.SQL(', ').join(
sql.Composed([sql.Identifier(k), sql.SQL(" = "), sql.Placeholder(k)]) for k in upd.keys()
),
id=sql.Placeholder('id')
)
upd.update(id=ref_id)
with conn:
with conn.cursor() as cur:
cur.execute(sql_query, upd)
conn.close()
Running print(sql_query.as_string(conn)) before closing connection will reveal this output:
UPDATE people SET "name" = %(name)s, "age" = %(age)s, "city" = %(city)s WHERE id = %(id)s
No need for dynamic SQL. Supposing a is not nullable and b is nullable.
If you want to update both a and b:
_set = dict(
id = 1,
a = 10,
b = 20, b_update = 1
)
update = """
update foo
set
a = coalesce(%(a)s, a), -- a is not nullable
b = (array[b, %(b)s])[%(b_update)s + 1] -- b is nullable
where id = %(id)s
"""
print cur.mogrify(update, _set)
cur.execute(update, _set)
Output:
update foo
set
a = coalesce(10, a), -- a is not nullable
b = (array[b, 20])[1 + 1] -- b is nullable
where id = 1
If you want to update none:
_set = dict(
id = 1,
a = None,
b = 20, b_update = 0
)
Output:
update foo
set
a = coalesce(NULL, a), -- a is not nullable
b = (array[b, 20])[0 + 1] -- b is nullable
where id = 1
An option without python format using psycopg2's AsIs function for column names (although that doesn't prevent you from SQL injection over column names). Dict is named data:
update_statement = f'UPDATE foo SET (%s) = %s WHERE id_column=%s'
columns = data.keys()
values = [data[column] for column in columns]
query = cur.mogrify(update_statement, (AsIs(','.join(columns)), tuple(values), id_value))
Here's my solution that I have within a generic DatabaseHandler class that provides a lot of flexibility when using pd.DataFrame as your source.
def update_data(
self,
table: str,
df: pd.DataFrame,
indexes: Optional[list] = None,
column_map: Optional[dict] = None,
commit: Optional[bool] = False,
) -> int:
"""Update data in the media database
Args:
table (str): the "tablename" or "namespace.tablename"
df (pandas.DataFrame): dataframe containing the data to update
indexes (list): the list of columns in the table that will be in the WHERE clause of the update statement.
If not provided, will use df indexes.
column_map (dict): dictionary mapping the columns in df to the columns in the table
columns in the column_map that are also in keys will not be updated
Key = df column.
Value = table column.
commit (bool): if True, the transaction will be committed (default=False)
Notes:
If using a column_map, only the columns in the data_map will be updated or used as indexes.
Order does not matter. If not using a column_map, all columns in df must exist in table.
Returns:
int : rows updated
"""
try:
if not indexes:
# Use the dataframe index instead
indexes = []
for c in df.index.names:
if not c:
raise Exception(
f"Dataframe contains indexes without names. Unable to determine update where clause."
)
indexes.append(c)
update_strings = []
tdf = df.reset_index()
if column_map:
target_columns = [c for c in column_map.keys() if c not in indexes]
else:
column_map = {c: c for c in tdf.columns}
target_columns = [c for c in df.columns if c not in indexes]
for i, r in tdf.iterrows():
upd_params = ", ".join(
[f"{column_map[c]} = %s" for c in target_columns]
)
upd_list = [r[c] if pd.notna(r[c]) else None for c in target_columns]
upd_str = self._cur.mogrify(upd_params, upd_list).decode("utf-8")
idx_params = " AND ".join([f"{column_map[c]} = %s" for c in indexes])
idx_list = [r[c] if pd.notna(r[c]) else None for c in indexes]
idx_str = self._cur.mogrify(idx_params, idx_list).decode("utf-8")
update_strings.append(f"UPDATE {table} SET {upd_str} WHERE {idx_str};")
full_update_string = "\n".join(update_strings)
print(full_update_string) # Debugging
self._cur.execute(full_update_string)
rowcount = self._cur.rowcount
if commit:
self.commit()
return rowcount
except Exception as e:
self.rollback()
raise e
Example usages:
>>> df = pd.DataFrame([
{'a':1,'b':'asdf','c':datetime.datetime.now()},
{'a':2,'b':'jklm','c':datetime.datetime.now()}
])
>>> cls.update_data('my_table', df, indexes = ['a'])
UPDATE my_table SET b = 'asdf', c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1;
UPDATE my_table SET b = 'jklm', c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2;
>>> cls.update_data('my_table', df, indexes = ['a','b'])
UPDATE my_table SET c = '2023-01-17T22:13:37.095245'::timestamp WHERE a = 1 AND b = 'asdf';
UPDATE my_table SET c = '2023-01-17T22:13:37.095250'::timestamp WHERE a = 2 AND b = 'jklm';
>>> cls.update_data('my_table', df.set_index('a'), column_map={'a':'db_a','b':'db_b','c':'db_c'} )
UPDATE my_table SET db_b = 'asdf', db_c = '2023-01-17T22:13:37.095245'::timestamp WHERE db_a = 1;
UPDATE my_table SET db_b = 'jklm', db_c = '2023-01-17T22:13:37.095250'::timestamp WHERE db_a = 2;
Note however that this is not safe from SQL injection due to the way it generates the where clause.

Sqlalchemy: How to perform outer join with itself?

I want to perform this SQL query using Sqlalchemy (with model Evaluation):
select e1.user, sum(e1.points) as s from
(select e1.*
from evaluations e1 left outer join evaluations e2
on (e1.user = e2.user and e1.module = e2.module and e1.time < e2.time)
where e2.user is null and e1.module in (__another subquery__))
group by e1.user order by s limit 5
I don't know how to perform left outer join (especialy the renaming and referencing of renamed comlumns). Could you help me?
# sample sub-query for testing
_another_query = session.query(Evaluation.module).filter(Evaluation.module > 3)
# define aliases
E1 = aliased(Evaluation, name="e1")
E2 = aliased(Evaluation, name="e2")
# inner query
sq = (
session
# .query(E1)
# select columns explicitely to control labels
.query(E1.user.label("user"), E1.points.label("points"))
.outerjoin(E2, and_(
E1.user == E2.user,
E1.module == E2.module,
E1.time < E2.time,
))
.filter(E2.user == None)
.filter(E1.module.in_(_another_query))
)
sq = sq.subquery(name="sq")
# now lets group by
q = (
session
.query(sq.c.user, func.sum(sq.c.points))
.group_by(sq.c.user)
.order_by(func.sum(sq.c.points))
.limit(5)
)

Insert tree kind of data taken from a database into a python dictionary

I have a database table as follows. The data is in the form of a tree with
CREATE TABLE IF NOT EXISTS DOMAIN_HIERARCHY (
COMPONENT_ID INT NOT NULL ,
LEVEL INT NOT NULL ,
COMPONENT_NAME VARCHAR(127) NOT NULL ,
PARENT INT NOT NULL ,
PRIMARY KEY ( COMPONENT_ID )
);
The following data is in the table
(1,1,'A',0)
(2,2,'AA',1)
(3,2,'AB',1)
(4,3,'AAA',2)
(5,3,'AAB',2)
(6,3,'ABA',3)
(7,3,'ABB',3)
I have to retrieve the data and store in a python dictionary
I wrote the below code
conx = sqlite3.connect( 'nameofdatabase.db' )
curs = conx.cursor()
curs.execute( 'SELECT COMPONENT_ID, LEVEL, COMPONENT_NAME, PARENT FROM DOMAIN_HIERARCHY' )
rows = curs.fetchall()
cmap = {}
for row in rows:
cmap[row[0]] = row[2]
hrcy={}
for level in range( 1, maxl + 1 ):
for row in rows:
if row[1] == level:
if hrcy == {}:
hrcy[row[2]] = []
continue
parent = cmap[row[3]]
hrcy[parent].append( { row[2]: [] } )
The problem I'm facing is for nodes more than 2nd level ,they are getting added to the root instead of their parent ; where should I do the change in the code?
The problem is that you can't directly see the nodes for the second level after you insert them. Try this:
conx = sqlite3.connect( 'nameofdatabase.db' )
curs = conx.cursor()
curs.execute( 'SELECT COMPONENT_ID, LEVEL, COMPONENT_NAME, PARENT ' +
'FROM DOMAIN_HIERARCHY' )
rows = curs.fetchall()
cmap = {}
hrcy = None
for row in rows:
entry = (row[2], {})
cmap[row[0]] = entry
if row[1] == 1:
hrcy = {entry[0]: entry[1]}
# raise if hrcy is None
for row in rows:
item = cmap[row[0]]
parent = cmap.get(row[3], None)
if parent is not None:
parent[1][row[2]] = item[1]
print hrcy
By keeping each component's map of subcomponents in cmap, I can always reach each parent's map to add the next component to it. I tried it with the following test data:
rows = [(1,1,'A',0),
(2,2,'AA',1),
(3,2,'AB',1),
(4,3,'AAA',2),
(5,3,'AAB',2),
(6,3,'ABA',3),
(7,3,'ABB',3)]
The output was this:
{'A': {'AA': {'AAA': {}, 'AAB': {}}, 'AB': {'ABA': {}, 'ABB': {}}}}

Categories

Resources