Parameterized startswith query in sqlite - python

I'm using the python bindings for sqlite3 and I'm attempting to do a query something like this
table1
col1 | col2
------------
aaaaa|1
aaabb|2
bbbbb|3
test.py
def get_rows(db, ugc):
# I want a startswith query. but want to protect against potential sql injection
# with the user-generated-content
return db.execute(
# Does not work :)
"SELECT * FROM table1 WHERE col1 LIKE ? + '%'",
[ugc],
).fetchall()
Is there a way to do this safely?
Expected behaviour:
>>> get_rows('aa')
[('aaaaa', 1), ('aaabb', 2)]

In SQL, + is used to add numbers.
Your SQL ends up as ... WHERE col1 LIKE 0.
To concatenate strings, use ||:
db.execute(
"SELECT * FROM table1 WHERE col1 LIKE ? || '%'",
[ugc],
)

Related

Python MySQL query using string comparison in where clause

I have a strange issue where I can't get an SQL query with parameter to work with a string comparison in the where clause - I don't get a row back. when i connect to the MySQL db via bash, the query works.
python 3.7.3
mysql-connector-python==8.0.11
mysql 5.7
works (getting my row):
select * from my_table where my_column = 'my_string';
also works (getting my row):
cursor.execute(
"""
select *
from my_table
where my_column = 'my_string'
"""
)
doesn't work (cursor.fetchall() is []):
cursor.execute(
"""
select *
from my_table
where my_column = '%s'
""",
('my_string')
)
Be careful with tuples. I think you need ('my_string',).
FYI I wrote the original comment mentioned by #tscherg in his comment below the question.
Remove the quotes:
cursor.execute(
"""
select *
from my_table
where my_column = %s
""",
('my_string')
)

Count the number of non-null values in each column of each table in MySQL

Is there a way to produce this output using SQL for all tables in a given database (using MySQL) without having to specify individual table names and columns?
Table Column Count
---- ---- ----
Table1 Col1 0
Table1 Col2 100
Table1 Col3 0
Table1 Col4 67
Table1 Col5 0
Table2 Col1 30
Table2 Col2 0
Table2 Col3 2
... ... ...
The purpose is to identify columns for analysis based on how much data they contain (a significant number of columns are empty).
The 'workaround' solution using python (one table at a time):
# Libraries
import pymysql
import pandas as pd
import pymysql.cursors
# Connect to mariaDB
connection = pymysql.connect(host='localhost',
user='root',
password='my_password',
db='my_database',
charset='latin1',
cursorclass=pymysql.cursors.DictCursor)
# Get column metadata
sql = """SELECT *
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='my_database'
"""
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
# Store in dataframe
df = pd.DataFrame(result)
df = df[['TABLE_NAME', 'COLUMN_NAME']]
# Build SQL string (one table at a time for now)
my_table = 'my_table'
df_my_table = df[df.TABLE_NAME==my_table].copy()
cols = list(df_my_table.COLUMN_NAME)
col_strings = [''.join(['COUNT(', x, ') AS ', x, ', ']) for x in cols]
col_strings[-1] = col_strings[-1].replace(',','')
sql = ''.join(['SELECT '] + col_strings + ['FROM ', my_table])
# Execute
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
The result is a dictionary of column names and counts.
Basically, no. See also this answer.
Also, note that the closest match of the answer above is actually the method you're already using, but less efficiently implemented in reflective SQL.
I'd do the same as you did - build a SQL like
SELECT
COUNT(*) AS `count`,
SUM(IF(columnName1 IS NULL,1,0)) AS columnName1,
...
SUM(IF(columnNameN IS NULL,1,0)) AS columnNameN
FROM tableName;
using information_schema as a source for table and column names, then execute it for each table in MySQL, then disassemble the single row returned into N tuple entries (tableName, columnName, total, nulls).
It is possible, but it's not going to be quick.
As mentioned in a previous answer you can work your way through the columns table in the information_schema to build queries to get the counts. It's then just a question of how long you are prepared to wait for the answer because you end up counting every row, for every column, in every table. You can speed things up a bit if you exclude columns that are defined as NOT NULL in the cursor (i.e. IS_NULLABLE = 'YES').
The solution suggested by LSerni is going to be much faster, particularly if you have very wide tables and/or high row counts, but would require more work handling the results.
e.g.
DELIMITER //
DROP PROCEDURE IF EXISTS non_nulls //
CREATE PROCEDURE non_nulls (IN sname VARCHAR(64))
BEGIN
-- Parameters:
-- Schema name to check
-- call non_nulls('sakila');
DECLARE vTABLE_NAME varchar(64);
DECLARE vCOLUMN_NAME varchar(64);
DECLARE vIS_NULLABLE varchar(3);
DECLARE vCOLUMN_KEY varchar(3);
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE cur1 CURSOR FOR
SELECT `TABLE_NAME`, `COLUMN_NAME`, `IS_NULLABLE`, `COLUMN_KEY`
FROM `information_schema`.`columns`
WHERE `TABLE_SCHEMA` = sname
ORDER BY `TABLE_NAME` ASC, `ORDINAL_POSITION` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
DROP TEMPORARY TABLE IF EXISTS non_nulls;
CREATE TEMPORARY TABLE non_nulls(
table_name VARCHAR(64),
column_name VARCHAR(64),
column_key CHAR(3),
is_nullable CHAR(3),
rows BIGINT,
populated BIGINT
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO vTABLE_NAME, vCOLUMN_NAME, vIS_NULLABLE, vCOLUMN_KEY;
IF done THEN
LEAVE read_loop;
END IF;
SET #sql := CONCAT('INSERT INTO non_nulls ',
'(table_name,column_name,column_key,is_nullable,rows,populated) ',
'SELECT \'', vTABLE_NAME, '\',\'', vCOLUMN_NAME, '\',\'', vCOLUMN_KEY, '\',\'',
vIS_NULLABLE, '\', COUNT(*), COUNT(`', vCOLUMN_NAME, '`) ',
'FROM `', sname, '`.`', vTABLE_NAME, '`');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END LOOP;
CLOSE cur1;
SELECT * FROM non_nulls;
END //
DELIMITER ;
call non_nulls('sakila');

Passing Array Parameter to SQL command

In Python 2.7, I can do this pass a parameter to an sql command like this:
cursor.execute("select * from my_table where id = %s", [2])
I can not get the array equivalent working like this:
cursor.execute("select * from my_table where id in %s", [[10,2]])
Obviously, I can just do string formatting, but I would like to do a proper parameter if possible. I'm using a postgresql database if that matters.
cursor.execute("select * from my_table where id = ANY(%s);", [[10, 20]])
See note. To use IN see section below.
cursor.execute(cursor.mogrify("select * from my_table where id in %s",
[tuple([10, 20])]))

django querying multiple tables - passing parameters to the query

I am using Django 1.7.
I am trying to implement a search functionality. When a search term is entered, I need to search all the tables and all the columns for that term in the DB(I have only 7 tables and probably 40 columns in total and the DB is not very huge). I am using MySQL as the DB.
I can query 1 table, all columns with the following code
query = Q(term__contains=tt) | Q(portal__contains=tt) | ......so on
data = ABC.objects.filter(query)
I tried to use the UNION, write a SQL like
select * from table A where col1 like %s OR col2 like %s .....
UNION
select * from table B where col1 like %s OR col2 like %s .....
When I tried to implement this like below, I got an error "not enough arguments for format string"
cursor = connection.cursor()
cursor.execute("select * from table A where col1 like %s OR col2 like %s
UNION
select * from table B where col1 like %s OR col2 like %s", tt)
So how do I pass the parameters for multiple variables(even though in this case they are the same)? I tried passing it multiple times too.
Thanks.
You should pass a list of parameters. Number of parameters should match the number of %s placeholders:
cursor.execute("select * from table A where col1 like %s OR col2 like %s
UNION
select * from table B where col1 like %s OR col2 like %s",
[tt] * 4) # four `%s`
As alternative you can try to use numeric paramstyle for the query. In this case the list of a single parameter will be sufficient:
cursor.execute("select * from table A where col1 like :1 OR col2 like :1
UNION
select * from table B where col1 like :1 OR col2 like :1",
[tt])
UPDATE: Note that tt variable should contain % signs at start/end:
tt = u'%' + string_to_find + u'%'
UPDATE 2: cursor.fetchall() returns list of tuples (not dicts) so you should access this data by indexes:
{% for row in data %}
<div>Col1: {{ row.0 }} - Col2: {{ row.1 }}</div>
{% endfor %}

Return all values when a condition value is NULL

Suppose I have the following very simple query:
query = 'SELECT * FROM table1 WHERE id = %s'
And I'm calling it from a python sql wrapper, in this case psycopg:
cur.execute(query, (row_id))
The thing is that if row_id is None, I would like to get all the rows, but that query would return an empty table instead.
The easy way to approach this would be:
if row_id:
cur.execute(query, (row_id))
else:
cur.execute("SELECT * FROM table1")
Of course this is non idiomatic and gets unnecessarily complex with non-trivial queries. I guess there is a way to handle this in the SQL itself but couldn't find anything. What is the right way?
Try to use COALESCE function as below
query = 'SELECT * FROM table1 WHERE id = COALESCE(%s,id)'
SELECT * FROM table1 WHERE id = %s OR %s IS NULL
But depending how the variable is forwarded to the query it might be better to make it 0 if it is None
SELECT * FROM table1 WHERE id = %s OR %s = 0

Categories

Resources