Query parameterization in PyMySQL for query with "IN" keyword

Query parameterization in PyMySQL for query with "IN" keyword - python

I have a table which looks like
mysql> select * from statements;
+----+----------------+--------+---------+
| id | account_number | amount | type |
+----+----------------+--------+---------+
| 1 | 1 | 1000 | Deposit |
| 2 | 1 | 500 | Fees |
+----+----------------+--------+---------+
2 rows in set (0.00 sec)
I have a PyMySQL connector through which I want to execute a query select * from statements where type in ('Deposit', 'Fees')
My question is different from possible duplicates as it asks particularly for "IN" type of queries, where list sizes can be dynamic and are slightly difficult to write than the usual %s hardcoded select * from statements where type in (%s, %s) type of queries.
I am wondering how to exactly write the query in a way that it is parameterized and relatively safe from SQL injection. My current code snippet is as follows:
import pymysql
connection = pymysql.connect('''SQL DB credentials''')
cur = connection.cursor()
l = ['Deposit', 'Fees']
st = f'select * from statements where type in (' + ','.join(['%s'] * len(l)) + ')'
cur.execute(st, l)
cur.fetchall()
Result:
((1, 1, 1000, 'Deposit'), (2, 1, 500, 'Fees'))
My question is, is this SQL statement parameterized well as safe from basic SQL injection?

Related

django mysql connector - is allowing >1 entry for per specifc field in django

ive written some code to parse a website, and input it into a mysql db.
The problem is I am getting a lot of duplicates per FKToTech_id like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
| 2 | website.com/path | 1 |
| 3 | website.com/path | 1
What Im looking for is instead to have (1) row in this database, based on if ref has been entered already for FKToTech_id and not have multiple of the same row like:
id | ref | FKToTech_id |
+----+--------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------+
| 1 | website.com/path | 1 |
How can I modify my code below to just python pass if the above is True (==1 ref with same FKToTech_id?
for i in elms:
allcves = {cursor.execute("INSERT INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", (i.attrs["href"], row[1])) for row in cves}
mydb.commit()
Thanks

Make ref a unique column, then use INSERT IGNORE to skip the insert if it would cause a duplicate key error.
ALTER TABLE TechBooks ADD UNIQUE INDEX (ref);
for i in elms:
cursor.executemany("INSERT IGNORE INTO TechBooks (ref, FKToTech_id) VALUES (%s, %s) ", [(i.attrs["href"], row[1]) for row in cves])
mydb.commit()
I'm not sure what your intent was by assigning the results of cursor.execute() to allcves. cursor.execute() doesn't return a value unless you use multi=True. I've replaced the useless set comprehension with use of cursor.executemany() to insert many rows at once.

Inserting a list holding multiple values in MySQL using pymysql

I have a database holding names, and I have to create a new list which will hold such values as ID, name, and gender and insert it in the current database. I have to create a list of the names which are not in the database yet. So I simply checked only 3 names and trying to work with them.
I am not sure what sort of list I suppose to create and how I can loop through it to insert all the new values in the proper way.
That's what I have so far:
mylist = [["Betty Beth", "1", "Female"], ["John Cena", "2", "Male"]]
#get("/list_actors")
def list_actors():
with connection.cursor() as cursor:
sql = "INSERT INTO imdb VALUES (mylist)"
cursor.execute(sql)
connection.commit()
return "done"
I am very new to this material so I will appreciate any help. Thanks in advance!

vals = [["TEST1", 1], ["TEST2", 2]]
with connection.cursor() as cursor:
cursor.executemany("insert into test(prop, val) values (%s, %s)", vals )
connection.commit()
mysql> select * from test;
+----+-------+------+---------------------+
| id | prop | val | ts |
+----+-------+------+---------------------+
| 1 | TEST1 | 1 | 2017-05-19 09:46:16 |
| 2 | TEST2 | 2 | 2017-05-19 09:46:16 |
+----+-------+------+---------------------+
Adapted from https://groups.google.com/forum/#!searchin/pymysql-users/insert%7Csort:relevance/pymysql-users/4_D8bYusodc/EHFxjRh89XEJ

Create a temporary table in python to join with a sql table

I have the following data in a vertica db, Mytable:
+----+-------+
| ID | Value |
+----+-------+
| A | 5 |
| B | 9 |
| C | 10 |
| D | 7 |
+----+-------+
I am trying to create a query in python to access a vertica data base. In python I have a list:
ID_list= ['A', 'C']
I would like to create a query that basically inner joins Mytable with the ID_list and then I could make a WHERE query.
So it will be basically something like this
SELECT *
FROM Mytable
INNER JOIN ID_list
ON Mytable.ID = ID_list as temp_table
WHERE Value = 5
I don't have writing rights on the data base, so the table needs to be created localy. Or is there an alternative way of doing this?

If you have a small table, then you can do as Tim suggested and create an in-list.
I kind of prefer to do this using python ways, though. I would probably also make ID_list a set as well to keep from having dups, etc.
in_list = '(%s)' % ','.join(str(id) for id in ID_list)
or better use bind variables (depends on the client you are using, and probably not strictly necessary if you are dealing with a set of ints since I can't imagine a way to inject sql with that):
in_list = '(%s)' % ','.join(['%d'] * len(ID_list)
and send in your ID_list as a parameter list for your cursor.execute. This method is positional, so you'll need to arrange your bind parameters correctly.
If you have a very, very large list... you could create a local temp and load it before doing your query with join.
CREATE LOCAL TEMP TABLE mytable ( id INTEGER );
COPY mytable FROM STDIN;
-- Or however you need to load the data. Using python, you'll probably need to stream in a list using `cursor.copy`
Then join to mytable.
I wouldn't bother doing the latter with a very small number of rows, too much overhead.

So I used the approach from Tim:
# create a String of all the ID_list so they can be inserted into a SQL query
Sql_string='(';
for ss in ID_list:
Sql_string= Sql_string + " " + str(ss) + ","
Sql_string=Sql_string[:-1] + ")"
"SELECT * FROM
(SELECT * FROM Mytable WHERE ID IN " + Sql_string) as temp
Where Value = 5"
works surprisingly fast

Copy the result of a selection grouped by into a table

Ok, first of all: I am quite new to PostgreSQL and programming in general.
So I have two tables. One table (cars) is:
id | brand | model | price
----+---------+-------+-------
1 | Opel | Astra | 12000
2 | Citroen | C1 | 12000
3 | Citroen | C2 | 15000
4 | Citroen | C3 | 18000
5 | Audi | A3 | 20000
And the other is:
id | brand | max_price
----+---------+-----------
4 | Opel |
5 | Citroen |
6 | Audi |
What I would like to do is, make a selection on cars so that I have the max price grouped by brand and then I would like to insert the price to the correspondent brand in max price.
I tried to use python and this is what I have done:
cur = conn.cursor()
cur.execute ("""DROP TABLE IF EXISTS temp """)
cur.execute ("""CREATE TABLE temp (brand text, max_price integer)""")
conn.commit()
cur.execute ("""SELECT cars.brand, MAX(cars.price) FROM cars GROUP BY brand;""")
results = cur.fetchall()
for results in results:
cur.execute ("""INSERT INTO temp (brand, max_price) VALUES %s""" % str(results))
conn.commit()
cur.execute ("""UPDATE max_price SET max_price.max_price=temp.max_price WHERE max_price.brand = temp.brand;""")
conn.commit()
It gets stuck in the update part, signalling an error max_price.brand = temp.brand
Can anybody help me?
EDIT: thanks to the suggestion of domino I changed the last line with cur.execute ("""UPDATE max_price SET max_price.max_price=temp.max_price_int from temp WHERE max_price.brand = temp.brand;""") Now I have the problem that temp.max_price is a recognised not as an integer but as a tuple. So, to solve the problem I tried to add before this last line the following code:
for results in results:
results =results[0]
results = int(results)
cur.execute ("""INSERT INTO temp (max_price_int) VALUES %s""" % str(results))
conn.commit()
It gives me an error
cur.execute ("""INSERT INTO temp (max_price_int) VALUES %s""" % str(results))
psycopg2.ProgrammingError: syntax error at or near "12000"
LINE 1: INSERT INTO temp (max_price_int) VALUES 12000
12000 is exactly the first value I want it to insert!

When using cur.execute, you should never use the % operator. It opens up your queries to SQL injection attacks.
Instead, use the built-in query parameterization like so:
cur.execute ("""INSERT INTO temp (max_price_int) VALUES (%s)""",(results,))
See documentation here: http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries
A different approach would be to use SQL to do your update in a single query using the with clauses. The single query would look like this:
with max (brand, max_price) as (
select brand, max(price) from cars
group by brand
)
update max_price
set max_price = max.max_price
from max
where max_price.brand = max.brand
;
Read more about Common Table Expressions (CTEs) here: https://www.postgresql.org/docs/9.5/static/queries-with.html

Inefficient SQL query while excluding results on QuerySet

I'm trying to figure out why django ORM has such strange (as I think) behaviour. I have 2 basic models (simplified to get the main idea):
class A(models.Model):
pass
class B(models.Model):
name = models.CharField(max_length=15)
a = models.ForeignKey(A)
Now I want to select rows from table a that are refered from table b that dont have some value in column name.
Here is sample SQL I expect Django ORM to produce:
SELECT * FROM inefficient_foreign_key_exclude_a a
INNER JOIN inefficient_foreign_key_exclude_b b ON a.id = b.a_id
WHERE NOT (b.name = '123');
In case of filter() method of django.db.models.query.QuerySet it works as expected:
>>> from inefficient_foreign_key_exclude.models import A
>>> print A.objects.filter(b__name='123').query
SELECT `inefficient_foreign_key_exclude_a`.`id`
FROM `inefficient_foreign_key_exclude_a`
INNER JOIN `inefficient_foreign_key_exclude_b` ON (`inefficient_foreign_key_exclude_a`.`id` = `inefficient_foreign_key_exclude_b`.`a_id`)
WHERE `inefficient_foreign_key_exclude_b`.`name` = 123
But if I use exclude() method (a negative form of Q object in underlaying logic) it creates a really strange SQL query:
>>> print A.objects.exclude(b__name='123').query
SELECT `inefficient_foreign_key_exclude_a`.`id`
FROM `inefficient_foreign_key_exclude_a`
WHERE NOT ((`inefficient_foreign_key_exclude_a`.`id` IN (
SELECT U1.`a_id` FROM `inefficient_foreign_key_exclude_b` U1 WHERE (U1.`name` = 123 AND U1.`a_id` IS NOT NULL)
) AND `inefficient_foreign_key_exclude_a`.`id` IS NOT NULL))
Why does ORM make a subquery instead of just JOIN?
UPDATE:
I've made a test to prove that using a subquery is not efficient at all.
I created 500401 rows in both a and b tables. And here what I got:
For join:
mysql> SELECT count(*)
-> FROM inefficient_foreign_key_exclude_a a
-> INNER JOIN inefficient_foreign_key_exclude_b b ON a.id = b.a_id
-> WHERE NOT (b.name = 'abc');
+----------+
| count(*) |
+----------+
| 500401 |
+----------+
1 row in set (0.97 sec)
And for subquery:
mysql> SELECT count(*)
-> FROM inefficient_foreign_key_exclude_a a
-> WHERE NOT ((a.id IN (
-> SELECT U1.`a_id` FROM `inefficient_foreign_key_exclude_b` U1 WHERE (U1.`name` = 'abc' AND U1.`a_id` IS NOT NULL)
-> ) AND a.id IS NOT NULL));
+----------+
| count(*) |
+----------+
| 500401 |
+----------+
1 row in set (3.76 sec)
Join is almost 4 times faster.

It looks like it's a kind of optimization.
While filter() can be 'any' condition, it makes the join and the applies the restriction.
exclude() is more restrictive, so you are not forced to join the tables and it can build the query using subqueries which I suppose would make the query faster (due to index usage).
If you are using MySQL you could use explain command on the queries and see if my suggestion is right.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Query parameterization in PyMySQL for query with "IN" keyword - python

Related

django mysql connector - is allowing >1 entry for per specifc field in django

Inserting a list holding multiple values in MySQL using pymysql

Create a temporary table in python to join with a sql table

Copy the result of a selection grouped by into a table

Inefficient SQL query while excluding results on QuerySet

Categories

Resources