python psycopg2 conditional insert statements

python psycopg2 conditional insert statements - python

I need to write an INSERT statement that first checks to see if the data already exists. The current code is inside python using psycopg2 to connect to a postgresql db:
sql = """IF NOT EXISTS (SELECT * FROM table \
WHERE col_1 = (%s) AND col_2 = (%s) ) \
INSERT INTO table (col1, col2) \
VALUES (%s, %s);"""
data = ( col1_data, col2_data, col1_data, col2_data)
try:
CURSOR.execute(sql, data)
DB.commit()
except:
print "Cursor failed INSERT INTO table.\n"
which does not work (and I haven't done quality error handling so I don't get any good information).
So, I went into psql and tried just:
IF NOT EXISTS (SELECT * FROM t WHERE c1=d1 AND c2=d2)
INSERT INTO t (c1, c2) VALUES (d1,d2);
and I got the following error:
ERROR: syntax error at or near "IF"
LINE 1: IF NOT EXISTS (SELECT * FROM table WHERE c1 = d1...
^
So I BELIEVE my error is in the sql not the python (though I could be wrong) since this works:
sql = """INSERT INTO t2 (col_0, col_1, col_2) \
VALUES (%s, %s, %s);"""
data = (d1, d2, time.time())
try:
CURSOR.execute(sql, data)
DB.commit()
except:
print "Cursor failed to INSERT INTO t2.\n"
For table 1, my CREATE was:
db=> CREATE TABLE table (
col_0 SERIAL PRIMARY KEY,
col_1 varchar(16),
col_2 smallint
);
NOTICE: CREATE TABLE will create implicit sequence "pm_table_ip_id_seq" for serial column "pm_table.ip_id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pm_table_pkey" for table "pm_table"
CREATE TABLE
I am grateful for any help and guidance.

I used plpgsql for such a requirement in my project
insert_function = """
CREATE LANGUAGE plpgsql;
CREATE FUNCTION insert_if_unique (sql_insert TEXT)
RETURNS VOID
LANGUAGE plpgsql
AS $$
BEGIN
EXECUTE sql_insert;
RETURN;
EXCEPTION WHEN unique_violation THEN
RETURN;
-- do nothing
END;
$$;
"""
cursor.execute(insert_function);
You can use something like below to use it:
cursor.execute("insert_if_unique("+sql+")"%data)
The above query is not parameterized. So please be wary of SQL injection if you are getting the input from an external source.
Note: You can use cursor.mogrify() to evade SQL injection attacks.
sql = cursor.mogrify(sql,data)
cursor.execute("insert_if_unique("+sql+")")

Try reversing those. NOT EXISTS condition with a subquery:
INSERT INTO t (c1, c2) VALUES (d1,d2)
WHERE NOT EXISTS (SELECT * FROM t WHERE c1=d1 AND c2=d2)

Related

Python teradatasql get column_name and data_type

I am using teradatasql to connect DB and get the table definition. Below is my code which returns the definition for table. Here I trying find any default function which returns the colum_name and data_type of table as a separate function.
with teradatasql.connect ('{"host":"whomooz","user":"guest","password":"please"}') as con:
with con.cursor () as cur:
try:
sRequest = "show table MYTABLE"
print(sRequest)
cur.execute(sRequest)
[print(row) for row in sorted(cur.fetchall())]
except Exception as ex:
print("Ignoring", str(ex).split("\n")[0])
here I am looking for any inbuilt function which can return column_name and data_type.
output should be like
customer_name VARCHAR
address VARCHAR
type SMALLINT
I looked at the teradatasql docs but did not find any reference

We offer a sample program that demonstrates how to prepare a SQL request and use the fake_result_sets feature to obtain result set column metadata and question-mark parameter marker metadata.
See below for example code that prepares a select * for a table in conjunction with fake_result_sets and prints the result set column metadata.
import json
import teradatasql
with teradatasql.connect ('{"host":"whomooz","user":"guest","password":"please"}') as con:
with con.cursor () as cur:
cur.execute ("{fn teradata_rpo(S)}{fn teradata_fake_result_sets}select * from dbc.dbcinfo")
[ print ("DatabaseName={} TableName={} ColumnName={} TypeName={}".format (m ["DatabaseName"], m ["ObjectName"], m ["Name"], m ["TypeName"])) for m in json.loads (cur.fetchone () [7]) ]
Prints the following:
DatabaseName=DBC TableName=dbcinfo ColumnName=InfoKey TypeName=VARCHAR
DatabaseName=DBC TableName=dbcinfo ColumnName=InfoData TypeName=VARCHAR

psycopg2 prepared delete statement

I am struggling with generating the delete query where parameters for the query is actually a set of values.
So I need to delete rows where parameters are a pair values for example:
delete from table where col1 = %s and col2 = %s
which can be executed in Python like:
cur = conn.cursor()
cur.execute(query, (col1_value, col2_value))
Now I would like to run a query:
delete from table where (col1, col2) in ( (col1_value1, col2_value1), (col1_value2, col2_value2) );
I can generate the queries and values and execute the exact SQL but I can't quite generate prepared statement.
I tried:
delete from table where (col1, col2) in %s
and
delete from table where (col1, col2) in (%s)
But when I try to execute:
cur.execute(query, list_of_col_value_tuples)
or
cur.execute(query, tuple_of_col_value_tuples)
I get an exception that indicates that psycopg2 cannot convert arguments to strings.
Is there any way to use psycopg2 to execute a query like this?

You could dynamically add %s placeholders to your query:
cur = con.cursor()
query = "delete from table where (role, username) in (%s)"
options = [('admin', 'foo'), ('user', 'bar')]
placeholders = '%s,' * len(options)
query = query % placeholders[:-1] # remove last comma
print(query)
print(cur.mogrify(query, options).decode('utf-8'))
Out:
delete from table where (role, user) in (%s,%s)
delete from table where (role, user) in (('admin', 'foo'),('user', 'bar'))
Alternatively, build the query using psycopg2.sql as answered there.

Actually the resolution is quite easy if carefully constructed.
In the miscellaneous goodies of psycopg2 there is a function execute_values.
While all the examples that are given by psycopg2 deal with inserts as the function basically converts the list of arguments into a VALUES list if the call to delete is formatted like so:
qry = "delete from table where (col1, col2) in (%s)"
The call:
execute_values(cur=cur, qry=qry, argslist=<list of value tuples>)
will make the delete perform exactly as required.

Python MySQL query using string comparison in where clause

I have a strange issue where I can't get an SQL query with parameter to work with a string comparison in the where clause - I don't get a row back. when i connect to the MySQL db via bash, the query works.
python 3.7.3
mysql-connector-python==8.0.11
mysql 5.7
works (getting my row):
select * from my_table where my_column = 'my_string';
also works (getting my row):
cursor.execute(
"""
select *
from my_table
where my_column = 'my_string'
"""
)
doesn't work (cursor.fetchall() is []):
cursor.execute(
"""
select *
from my_table
where my_column = '%s'
""",
('my_string')
)

Be careful with tuples. I think you need ('my_string',).
FYI I wrote the original comment mentioned by #tscherg in his comment below the question.

Remove the quotes:
cursor.execute(
"""
select *
from my_table
where my_column = %s
""",
('my_string')
)

Count the number of non-null values in each column of each table in MySQL

Is there a way to produce this output using SQL for all tables in a given database (using MySQL) without having to specify individual table names and columns?
Table Column Count
---- ---- ----
Table1 Col1 0
Table1 Col2 100
Table1 Col3 0
Table1 Col4 67
Table1 Col5 0
Table2 Col1 30
Table2 Col2 0
Table2 Col3 2
... ... ...
The purpose is to identify columns for analysis based on how much data they contain (a significant number of columns are empty).
The 'workaround' solution using python (one table at a time):
# Libraries
import pymysql
import pandas as pd
import pymysql.cursors
# Connect to mariaDB
connection = pymysql.connect(host='localhost',
user='root',
password='my_password',
db='my_database',
charset='latin1',
cursorclass=pymysql.cursors.DictCursor)
# Get column metadata
sql = """SELECT *
FROM `INFORMATION_SCHEMA`.`COLUMNS`
WHERE `TABLE_SCHEMA`='my_database'
"""
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
# Store in dataframe
df = pd.DataFrame(result)
df = df[['TABLE_NAME', 'COLUMN_NAME']]
# Build SQL string (one table at a time for now)
my_table = 'my_table'
df_my_table = df[df.TABLE_NAME==my_table].copy()
cols = list(df_my_table.COLUMN_NAME)
col_strings = [''.join(['COUNT(', x, ') AS ', x, ', ']) for x in cols]
col_strings[-1] = col_strings[-1].replace(',','')
sql = ''.join(['SELECT '] + col_strings + ['FROM ', my_table])
# Execute
with connection.cursor() as cursor:
cursor.execute(sql)
result = cursor.fetchall()
The result is a dictionary of column names and counts.

Basically, no. See also this answer.
Also, note that the closest match of the answer above is actually the method you're already using, but less efficiently implemented in reflective SQL.
I'd do the same as you did - build a SQL like
SELECT
COUNT(*) AS `count`,
SUM(IF(columnName1 IS NULL,1,0)) AS columnName1,
...
SUM(IF(columnNameN IS NULL,1,0)) AS columnNameN
FROM tableName;
using information_schema as a source for table and column names, then execute it for each table in MySQL, then disassemble the single row returned into N tuple entries (tableName, columnName, total, nulls).

It is possible, but it's not going to be quick.
As mentioned in a previous answer you can work your way through the columns table in the information_schema to build queries to get the counts. It's then just a question of how long you are prepared to wait for the answer because you end up counting every row, for every column, in every table. You can speed things up a bit if you exclude columns that are defined as NOT NULL in the cursor (i.e. IS_NULLABLE = 'YES').
The solution suggested by LSerni is going to be much faster, particularly if you have very wide tables and/or high row counts, but would require more work handling the results.
e.g.
DELIMITER //
DROP PROCEDURE IF EXISTS non_nulls //
CREATE PROCEDURE non_nulls (IN sname VARCHAR(64))
BEGIN
-- Parameters:
-- Schema name to check
-- call non_nulls('sakila');
DECLARE vTABLE_NAME varchar(64);
DECLARE vCOLUMN_NAME varchar(64);
DECLARE vIS_NULLABLE varchar(3);
DECLARE vCOLUMN_KEY varchar(3);
DECLARE done BOOLEAN DEFAULT FALSE;
DECLARE cur1 CURSOR FOR
SELECT `TABLE_NAME`, `COLUMN_NAME`, `IS_NULLABLE`, `COLUMN_KEY`
FROM `information_schema`.`columns`
WHERE `TABLE_SCHEMA` = sname
ORDER BY `TABLE_NAME` ASC, `ORDINAL_POSITION` ASC;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done := TRUE;
DROP TEMPORARY TABLE IF EXISTS non_nulls;
CREATE TEMPORARY TABLE non_nulls(
table_name VARCHAR(64),
column_name VARCHAR(64),
column_key CHAR(3),
is_nullable CHAR(3),
rows BIGINT,
populated BIGINT
);
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO vTABLE_NAME, vCOLUMN_NAME, vIS_NULLABLE, vCOLUMN_KEY;
IF done THEN
LEAVE read_loop;
END IF;
SET #sql := CONCAT('INSERT INTO non_nulls ',
'(table_name,column_name,column_key,is_nullable,rows,populated) ',
'SELECT \'', vTABLE_NAME, '\',\'', vCOLUMN_NAME, '\',\'', vCOLUMN_KEY, '\',\'',
vIS_NULLABLE, '\', COUNT(*), COUNT(`', vCOLUMN_NAME, '`) ',
'FROM `', sname, '`.`', vTABLE_NAME, '`');
PREPARE stmt1 FROM #sql;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END LOOP;
CLOSE cur1;
SELECT * FROM non_nulls;
END //
DELIMITER ;
call non_nulls('sakila');

Return all values when a condition value is NULL

Suppose I have the following very simple query:
query = 'SELECT * FROM table1 WHERE id = %s'
And I'm calling it from a python sql wrapper, in this case psycopg:
cur.execute(query, (row_id))
The thing is that if row_id is None, I would like to get all the rows, but that query would return an empty table instead.
The easy way to approach this would be:
if row_id:
cur.execute(query, (row_id))
else:
cur.execute("SELECT * FROM table1")
Of course this is non idiomatic and gets unnecessarily complex with non-trivial queries. I guess there is a way to handle this in the SQL itself but couldn't find anything. What is the right way?

Try to use COALESCE function as below
query = 'SELECT * FROM table1 WHERE id = COALESCE(%s,id)'

SELECT * FROM table1 WHERE id = %s OR %s IS NULL
But depending how the variable is forwarded to the query it might be better to make it 0 if it is None
SELECT * FROM table1 WHERE id = %s OR %s = 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python psycopg2 conditional insert statements - python

Try reversing those. NOT EXISTS condition with a subquery: INSERT INTO t (c1, c2) VALUES (d1,d2) WHERE NOT EXISTS (SELECT * FROM t WHERE c1=d1 AND c2=d2)

Related

Python teradatasql get column_name and data_type

psycopg2 prepared delete statement

Python MySQL query using string comparison in where clause

Count the number of non-null values in each column of each table in MySQL

Return all values when a condition value is NULL

Categories

Resources