Debugging: SQL inside Python Psycopg2 - python

sql = "WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN XXXX.sent_hidden_users h USING(user_id)\
WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)\
SELECT\
cu.user_id,\
CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender,\
CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation,\
CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device,\
ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age,\
SUM(dsb.likes) AS likes,\
SUM(dsb.dislikes) AS dislikes,\
SUM(dsb.blocks) AS blocks,\
SUM(dsb.matches) AS matches,\
SUM(dsb.received_likes) AS received_likes,\
SUM(dsb.received_dislikes) AS received_dislikes,\
SUM(dsb.received_blocks) AS received_blocks,\
cu.search_radius,\
cu.search_min_age,\
cu.search_max_age,\
'' AS recall_case,\
'' AS recall_retention\
FROM \
users cu\
LEFT JOIN \
yay.daily_swipes_by_users dsb ON (dsb.user_id = cu.user_id) \
LEFT JOIN LATERAL (\
SELECT \
cd.os_name \
FROM \
stats.core_devices cd \
WHERE \
cu.user_id = cd.user_id \
ORDER BY cd.updated_time DESC LIMIT 1) e2 ON TRUE \
GROUP BY 1,2,3,4,5,13,14,15,16,17\
;"
Error Information:
File "", line 5
sql = "WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN zhangqiao.sent_hidden_users h USING(user_id) WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)SELECT cu.user_id, CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender, CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation, CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device, ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age, SUM(dsb.likes) AS likes, SUM(dsb.dislikes) AS dislikes, SUM(dsb.blocks) AS blocks, SUM(dsb.matches) AS matches,\
^
SyntaxError: EOL while scanning string literal

There seems to be a space after the \ in SUM(dsb.matches) AS matches,\. Get rid of that. As currently written, you are escaping the space with \ rather than the newline.
Your second error is because you need a space before the \ in this line:
'' AS recall_retention\
Because when you write:
'' AS recall_retention\
FROM \
users cu\
You get as a result:
'' AS recall_retentionFROM users cu
Hopefully the error there is obvious. Rather than mucking around with all these escapes, maybe you should just simplify your code by using multiline quotations (either ''' or """), like this:
sql = """WITH users AS(SELECT * FROM stats.core_users cu LEFT JOIN XXXX.sent_hidden_users h USING(user_id)
WHERE cu.status = 'hidden' AND h.user_id is null AND cu.country_code = 86 LIMIT 100)
SELECT
cu.user_id,
CASE WHEN cu.gender = 'male' THEN 0 ELSE 1 END AS gender,
CASE WHEN cu.looking_for_gender = cu.gender THEN 2 WHEN cu.looking_for_gender = 'both' THEN 1 ELSE 0 END AS sexual_orientation,
CASE WHEN e2.os_name = 'iOS' THEN 0 ELSE 1 END AS device,
ROUND((DATE(NOW()) - cu.birthdate)/365.25) AS user_age,
SUM(dsb.likes) AS likes,
SUM(dsb.dislikes) AS dislikes,
SUM(dsb.blocks) AS blocks,
SUM(dsb.matches) AS matches,
SUM(dsb.received_likes) AS received_likes,
SUM(dsb.received_dislikes) AS received_dislikes,
SUM(dsb.received_blocks) AS received_blocks,
cu.search_radius,
cu.search_min_age,
cu.search_max_age,
'' AS recall_case,
'' AS recall_retention
FROM
users cu
LEFT JOIN
yay.daily_swipes_by_users dsb ON (dsb.user_id = cu.user_id)
LEFT JOIN LATERAL (
SELECT
cd.os_name
FROM
stats.core_devices cd
WHERE
cu.user_id = cd.user_id
ORDER BY cd.updated_time DESC LIMIT 1) e2 ON TRUE
GROUP BY 1,2,3,4,5,13,14,15,16,17
;"""

Related

ASSERTION ERROR: Issue in running SQL query

Question #1
List all the directors who directed a 'Comedy' movie in a leap year. (You need to check that the genre is 'Comedy’ and year is a leap year) Your query should return director name, the movie name, and the year.
%%time
def grader_1(q1):
q1_results = pd.read_sql_query(q1,conn)
print(q1_results.head(10))
assert (q1_results.shape == (232,3))
#m as movie , m_director as md,Genre as g,Person as p
query1 ="""SELECT m.Title,p.Name,m.year
FROM Movie m JOIN
M_director d
ON m.MID = d.MID JOIN
Person p
ON d.PID = p.PID JOIN
M_Genre mg
ON m.MID = mg.MID JOIN
Genre g
ON g.GID = mg.GID
WHERE g.Name LIKE '%Comedy%'
AND ( m.year%4 = 0
AND m.year % 100 <> 0
OR m.year % 400 = 0 ) LIMIT 2"""
grader_1(query1)
ERROR:
title Name year
0 Mastizaade Milap Zaveri 2016
1 Harold & Kumar Go to White Castle Danny Leiner 2004
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-17-a942fcc98f72> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'def grader_1(q1):\n q1_results = pd.read_sql_query(q1,conn)\n print(q1_results.head(10))\n assert (q1_results.shape == (232,3))\n\n#m as movie , m_director as md,Genre as g,Person as p\nquery1 ="""SELECT m.Title,p.Name,m.year\nFROM Movie m JOIN \n M_director d\n ON m.MID = d.MID JOIN \n Person p\n ON d.PID = p.PID JOIN\n M_Genre mg\n ON m.MID = mg.MID JOIN\n Genre g \n ON g.GID = mg.GID\n WHERE g.Name LIKE \'%Comedy%\'\nAND ( m.year%4 = 0\nAND m.year % 100 <> 0\nOR m.year % 400 = 0 ) LIMIT 2"""\ngrader_1(query1)')
2 frames
<decorator-gen-53> in time(self, line, cell, local_ns)
/usr/local/lib/python3.7/dist-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
1191 else:
1192 st = clock2()
-> 1193 exec(code, glob, local_ns)
1194 end = clock2()
1195 out = None
<timed exec> in <module>()
<timed exec> in grader_1(q1)
AssertionError:
I have run this SQL query on IMDB DATASET without grad_1 function, I am able to run this query. However when I try to run within grader_1 function. I am getting assertion error.
How can I fix this?
Your query has a LIMIT clause, which prevents the SQL engine to fetch all data.
Just run it again without this clause.
query1 = """ SELECT M.title,Pe.Name,M.year FROM Movie M JOIN M_Director MD ON M.MID = MD.MID JOIN M_Genre MG ON M.MID = MG.MID JOIN Genre Ge ON MG.GID = Ge.GID JOIN Person Pe ON MD.PID = Pe.PID WHERE Ge.Name LIKE '%Comedy%' AND CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 4 = 0 AND (CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 100 <> 0 OR CAST(SUBSTR(TRIM(M.year),-4) AS INTEGER) % 400 = 0) """
Run this query all your problem resolves.

Avoid an SQL injection attack in my Python SQL API

I have designed a Python SQLite API which interfaces with a GUI. The GUI allows the user to select a given column whose data will be summed for each month. From what I have learned from https://docs.python.org/2/library/sqlite3.html I know that the way I’ve written this makes my code vulnerable to an SQL injection attack; I’ve assembled my query using Python’s string operations. However, I am unable to make this module work doing it the “right” way; using the DB-API’s parameter substitution to put a “?” as a placeholder wherever you want to use a value. I’m guessing the issue is that I want to make a table column the variable and not a value. Please help me to restructure this module so that it is more secure and less vulnerable to an SQL injection attack.
The code below works (it functions as I would like it to) I just know that it is not the correct/most secure way to do this.
def queryEntireCategoryAllEmployees(self, column):
table_column = 'Name_Data_AllDaySums.%s' % column
cursor = self.conn.execute("SELECT \
SUBSTR(data_date,1,7), \
SUM(%s) \
FROM ( \
SELECT \
SS_Installations.data_date AS 'data_date', \
SS_Installations.Installations_day_sum, \
SS_PM_Site_Visits.PM_Site_Visits_day_sum, \
SS_Rpr_Maint_Site_Visits.Inst_Repair_or_Maintenance_on_Site_day_sum, \
SS_Rmt_Hrdwr_Spt.Rmt_Hardware_Support_day_sum, \
SS_Rmt_Sftwr_Spt.Rmt_Software_Support_day_sum, \
SS_Rpr_Mant_RFB_in_House.Inst_Repair_Maint_Rfb_In_House_day_sum, \
Miscellaneous.Miscellaneous_day_sum, \
SS_Doc_Gen.Document_Generation_day_sum, \
SS_Inter_Dep_Spt.Inter_Dep_Spt_day_sum, \
SS_Online_Training.Online_Training_day_sum, \
SS_Onsite_Training.Onsite_Training_day_sum, \
SS_In_House_Training.In_House_Training_day_sum, \
Validation_Duties.Validation_Duties_day_sum \
FROM \
SS_Installations \
INNER JOIN SS_PM_Site_Visits ON \
SS_Installations.employee_clk_no = SS_PM_Site_Visits.employee_clk_no AND \
SS_Installations.data_date = SS_PM_Site_Visits.data_date \
INNER JOIN SS_Rpr_Maint_Site_Visits ON \
SS_Installations.employee_clk_no = SS_Rpr_Maint_Site_Visits.employee_clk_no AND \
SS_PM_Site_Visits.data_date = SS_Rpr_Maint_Site_Visits.data_date \
INNER JOIN SS_Rmt_Hrdwr_Spt ON \
SS_Installations.employee_clk_no = SS_Rmt_Hrdwr_Spt.employee_clk_no AND \
SS_Rpr_Maint_Site_Visits.data_date = SS_Rmt_Hrdwr_Spt.data_date \
INNER JOIN SS_Rmt_Sftwr_Spt ON \
SS_Installations.employee_clk_no = SS_Rmt_Sftwr_Spt.employee_clk_no AND \
SS_Rmt_Hrdwr_Spt.data_date = SS_Rmt_Sftwr_Spt.data_date \
INNER JOIN SS_Rpr_Mant_RFB_in_House ON \
SS_Installations.employee_clk_no = SS_Rpr_Mant_RFB_in_House.employee_clk_no AND \
SS_Rmt_Sftwr_Spt.data_date = SS_Rpr_Mant_RFB_in_House.data_date \
INNER JOIN Miscellaneous ON \
SS_Installations.employee_clk_no = Miscellaneous.employee_clk_no AND \
SS_Rpr_Mant_RFB_in_House.data_date = Miscellaneous.data_date \
INNER JOIN SS_Doc_Gen ON \
SS_Installations.employee_clk_no = SS_Doc_Gen.employee_clk_no AND \
Miscellaneous.data_date = SS_Doc_Gen.data_date \
INNER JOIN SS_Inter_Dep_Spt ON \
SS_Installations.employee_clk_no = SS_Inter_Dep_Spt.employee_clk_no AND \
SS_Doc_Gen.data_date = SS_Inter_Dep_Spt.data_date \
INNER JOIN SS_Online_Training ON \
SS_Installations.employee_clk_no = SS_Online_Training.employee_clk_no AND \
SS_Inter_Dep_Spt.data_date = SS_Online_Training.data_date \
INNER JOIN SS_Onsite_Training ON \
SS_Installations.employee_clk_no = SS_Onsite_Training.employee_clk_no AND \
SS_Online_Training.data_date = SS_Onsite_Training.data_date \
INNER JOIN SS_In_House_Training ON \
SS_Installations.employee_clk_no = SS_In_House_Training.employee_clk_no AND \
SS_Onsite_Training.data_date = SS_In_House_Training.data_date \
INNER JOIN Validation_Duties ON \
SS_Installations.employee_clk_no = Validation_Duties.employee_clk_no AND \
SS_In_House_Training.data_date = Validation_Duties.data_date \
WHERE \
(SS_Installations.Installations_day_sum != 0 OR \
SS_PM_Site_Visits.PM_Site_Visits_day_sum !=0 OR \
SS_Rpr_Maint_Site_Visits.Inst_Repair_or_Maintenance_on_Site_day_sum != 0 OR \
SS_Rmt_Hrdwr_Spt.Rmt_Hardware_Support_day_sum != 0 OR \
SS_Rmt_Sftwr_Spt.Rmt_Software_Support_day_sum != 0 OR \
SS_Rpr_Mant_RFB_in_House.Inst_Repair_Maint_Rfb_In_House_day_sum != 0 OR \
Miscellaneous.Miscellaneous_day_sum != 0 OR \
SS_Doc_Gen.Document_Generation_day_sum != 0 OR \
SS_Inter_Dep_Spt.Inter_Dep_Spt_day_sum != 0 OR \
SS_Online_Training.Online_Training_day_sum != 0 OR \
SS_Onsite_Training.Onsite_Training_day_sum != 0 OR \
SS_In_House_Training.In_House_Training_day_sum != 0 OR \
Validation_Duties.Validation_Duties_day_sum != 0)) Name_Data_AllDaySums \
GROUP BY SUBSTR(data_date,1,7) \
ORDER BY SUBSTR(data_date,1,7) ASC" % table_column)
dataList = cursor.fetchall()
return dataList
To start, I would read up on this incredibly informative SO post on preventing SQL injection in PHP, as many of the principles apply: How can I prevent SQL Injection in PHP?
Additionally, because you are working with SQL Server, I would consider creating a stored procedure and running it with the EXEC command in T-SQL and passing your column name as a parameter (since your query seems to only dynamically change based on the column), similar to this MSSQL Docs example Execute a Stored Procedure and using this SO thread for dynamically changing a query based on a parameter Can I Pass Column Name As Input...
Doing so this way will help you to obscure your code from prying eyes and also secure it from injection attacks as you will be able to validate that the input matches what you expect.
Finally, consider using a drop-down list of columns to choose from so that the end user can only pick a pre-defined set of inputs and thus make your application even more secure. This approach as well as obscuring the code in a stored procedure will help also make it much easier to push out updates over time.

Python variable in long SQL string

What is a safe way to replace the number in the second-to-last line of this SQL query with a variable?
Say my variable is customer_id. Can I use {} in place of 2 and put .format(customer_id) at the end of this string?
unlicensed_query = """
SELECT SUM(x.quantity), SUM(x.quantity * p.list_price)
FROM (
SELECT cu.customer_id, cu.product_id, cu.quantity
FROM csi_usage cu LEFT JOIN csi c
ON cu.customer_id = c.customer_id
AND cu.product_id = c.product_id
WHERE c.product_id IS NULL
AND cu.customer_id = 2) x, product p
WHERE x.product_id = p.id;
"""
As stated by thebjorn, the correct way to do this is to use bound parameters (http://docs.sqlalchemy.org/en/latest/core/tutorial.html#specifying-bound-parameter-behaviors). An example is here:
from sqlalchemy.sql import text
fully_utilized_query = text("""
SELECT SUM(x.quantity)
FROM (
SELECT cu.customer_id, cu.product_id, cu.quantity
FROM csi_usage cu
JOIN csi c
ON cu.customer_id = c.customer_id
AND cu.product_id = c.product_id
AND cu.quantity = c.licence_qty
WHERE cu.customer_id = :customer_id) x;
""")
fully_utilized = self.session.execute(fully_utilized_query, {'customer_id': current_user.customer_id}).scalar()

python traverse CTE in a double for loop?

I have 2 for loops within each-other. For each row 'A', 'B', 'C' in loop1, I need to access the hierarchical tree to find all the parents of a group 'X' in loop2. This makes me use CTE where I need to find the path for each row separately. Using CTE in a loop is not a solution for sure where I can match for each group id. Referred this link, but could not make out much Looping hierarchy CTE
Code snippet for the cron job using flask framework:
s = select([rt_issues]).\
where(
and_(
rt_issues.c.status !='Closed',
rt_issues.c.assigned_to != None
))
rs = conn.execute(s)
if rs.rowcount > 0:
s4 = text('with recursive rec_grp as(select id, parent_id, name, head, 1 as level, array[id] as path_info from groups union all select grp1.id, grp1.parent_id, grp1.name, grp1.head, rc.level + 1, rc.path_info||grp1.id from groups grp1 join rec_grp rc on grp1.id = rc.parent_id) select distinct id, parent_id, name, head, path_info from rec_grp order by id')
rs4 = conn.execute(s4)
for r in rs:
head_list = []
hierarchical_grps = []
for rr in rs4:
if ((rr['path_info'][0] == r[rt_issues.c.assignee_group])):
for g in rr['path_info']:
hierarchical_grps.append(g)
hierarchical_grps = list(set(hierarchical_grps))
send_pending_mail(hierarchical_grps, r['id'])
print hierarchical_grps, 'hierarchical_grps'
exit(0)
I need to send mail to all the group heads for the assignee_group in the hierarchy for the issue. How can this be achieved. How to use the loops correctly? I am using sqlalchemy core only, postgresql, python with flask. I need the exact code for the same.
What works is the snippet below:
mgroup = None
s = select([rt_issues]).\
where(
and_(
rt_issues.c.status !='Closed',
rt_issues.c.assigned_to != None
))
rs = conn.execute(s)
if rs.rowcount > 0:
for r in rs:
head_list = []
hierarchical_grps = []
mgroup = r[rt_issues.c.assignee_group]
s4 = text('with recursive rec_grp as(select id, parent_id, name, head, 1 as level, array[id] as path_info from groups where id=' +str(mgroup) + 'union all select grp1.id, grp1.parent_id, grp1.name, grp1.head, rc.level + 1, rc.path_info||grp1.id from groupsgrp1 join rec_grp rc on grp1.id = rc.parent_id) select distinct id,parent_id, name, head, path_info from rec_grp order by id')
rs4 = conn.execute(s4)
for rr in rs4:
if ((rr['path_info'][0] == r[rt_issues.c.assignee_group])):
for g in rr['path_info']:
hierarchical_grps.append(g)
hierarchical_grps = list(set(hierarchical_grps))
print hierarchical_grps, 'hierarchical_grps'
send_pending_mail(hierarchical_grps, r['id'])
exit(0)
Assuming that the head column is boolean, this will collect the groups with the head flag set:
rs4 = con.execute(s4)
for rr in rs4:
if rr['head']:
head_list.append(rr['id'])
print 'group heads:', head_list
This is assuming the query from your second example is used (note the correction to the from clause, "from groupsgrp1" should be "from groups grp1"):
WITH RECURSIVE rec_grp AS (
SELECT
id,
parent_id,
name,
head,
1 AS level,
ARRAY [id] AS path_info
FROM groups
WHERE id = 4
UNION ALL
SELECT
grp1.id,
grp1.parent_id,
grp1.name,
grp1.head,
rc.level + 1,
rc.path_info || grp1.id
FROM groups grp1
JOIN rec_grp rc ON grp1.id = rc.parent_id
)
SELECT DISTINCT
id,
parent_id,
name,
head,
path_info
FROM rec_grp
ORDER BY id;

PLPGSQL using single quotes in function call (python)

I am having problems when using single quotes in a insert value for a plpgsql function
It looks like this:
"AND (u.firstname LIKE 'koen') OR
(u.firstname LIKE 'dirk')"
This is done with python
I have tried \' and '' and ''' and '''' and ''''' and even '''''''
none of them seem to be working and return the following error:
[FAIL][syntax error at or near "koen"
LINE 1: ...'u.firstname', 'ASC', 'AND (u.firstname LIKE 'koe...
Any help is appreciated!
Thanks a lot!
======================== EDIT =========================
Sorry! here is my plpgsql function:
CREATE FUNCTION get_members(IN in_company_uuid uuid, IN in_start integer, IN in_limit integer, IN in_sort character varying, IN in_order character varying, IN in_querystring CHARACTER VARYING, IN in_filterstring CHARACTER VARYING, IN OUT out_status integer, OUT out_status_description character varying, OUT out_value character varying[]) RETURNS record
LANGUAGE plpgsql
AS $$DECLARE
temp_record RECORD;
temp_out_value VARCHAR[];
--temp_member_struct MEMBER_STRUCT;
temp_iterator INTEGER := 0;
BEGIN
FOR temp_record IN EXECUTE '
SELECT DISTINCT ON
(' || in_sort || ')
u.user_uuid,
u.firstname,
u.preposition,
u.lastname,
array_to_string_ex(ARRAY(SELECT email FROM emails WHERE user_uuid = u.user_uuid)) as emails,
array_to_string_ex(ARRAY(SELECT mobilenumber FROM mobilenumbers WHERE user_uuid = u.user_uuid)) as mobilenumbers,
array_to_string_ex(ARRAY(SELECT c.name FROM targetgroupusers AS tgu LEFT JOIN membercategories as mc ON mc.targetgroup_uuid = tgu.targetgroup_uuid LEFT JOIN categories AS c ON mc.category_uuid = c.category_uuid WHERE tgu.user_uuid = u.user_uuid)) as categories,
array_to_string_ex(ARRAY(SELECT color FROM membercategories WHERE targetgroup_uuid IN(SELECT targetgroup_uuid FROM targetgroupusers WHERE user_uuid = u.user_uuid))) as colors
FROM
membercategories AS mc
LEFT JOIN
targetgroups AS tg
ON
tg.targetgroup_uuid = mc.targetgroup_uuid
LEFT JOIN
targetgroupusers AS tgu
ON
tgu.targetgroup_uuid = tg.targetgroup_uuid
LEFT JOIN
users AS u
ON
u.user_uuid = tgu.user_uuid
WHERE
mc.company_uuid = ''' || in_company_uuid || '''
' || in_querystring || '
' || in_filterstring || '
ORDER BY
' || in_sort || ' ' || in_order || '
OFFSET
' || in_start || '
LIMIT
' || in_limit
LOOP
temp_out_value[temp_iterator] = ARRAY[temp_record.user_uuid::VARCHAR(36),
temp_record.firstname,
temp_record.preposition,
temp_record.lastname,
temp_record.emails,
temp_record.mobilenumbers,
temp_record.categories,
temp_record.colors];
temp_iterator = temp_iterator+1;
END LOOP;
out_status := 0;
out_status_description := 'Members retrieved';
out_value := temp_out_value;
RETURN;
END$$;
Here is how i call the function:
def get_members(companyuuid, start, limit, sort, order, querystring = None, filterstring = None):
logRequest()
def doWork(cursor):
if not companyuuid:
raise Exception("companyuuid cannot be None!")
queryarray = [str(s) for s in querystring.split("|")]
queryfields = ['firstname', 'preposition', 'lastname', 'emails', 'mobilenumbers']
temp_querystring = ""
for j in xrange(len(queryfields)):
for i in xrange(len(queryarray)):
temp_querystring += "(u.%s LIKE ''%%%s%'') OR "%(queryfields[j], queryarray[i])
temp_querystring = "AND %s"%temp_querystring.rstrip(" OR ")
temp_filterstring = filterstring
print "querystring: %s"%temp_querystring
heizoodb.call(cursor=cursor,
scheme="public",
function="get_members",
functionArgs=(companyuuid, start, limit, sort, order, temp_querystring, temp_filterstring),
returnsValue=True)
And my latest error =D
com.gravityzoo.core.libs.sql.PostgreSQLDB.runSQLTransaction: not enough arguments for format string, result=[None]
SQLinjection to be added later ;)
Thanks!
I don't know Python well, but it probably supports data binding where you first prepare a statement (and you don't need quotes around the question marks there):
prepare("..... AND (u.firstname LIKE ?) OR (u.firstname LIKE ?)")
and then you call execute('koen', 'dirk') or whatever that function is called in Python.

Categories

Resources