Efficient way to pass this variable multiple times - python

I'm using Pyodbc in Python to run some SQL queries. What I'm working with is actually longer than this, but this example captures what I'm trying to do:
connection = pyodbc.connect(...)
cursor = connection.cursor(...)
dte = '2018-10-24'
#note the placeholders '{}'
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{}'"""
#this is where I need help as explained below
cursor.execute(query.format(dte, dte))
output = pd.read_sql("""select *
from #output"""
, connection)
In the above, since there are only two '{}', I'm passing dte to query.format() twice. However, in the more complicated version I'm working with, I have 19 '{}', so I'd imagine this means I need to pass 'dte' to 'query.format{}' 19 times. I tried passing this as a list, but it didn't work. Do I really need to write out the variable 19 times when passing it to the function?

Consider using a UNION ALL query to avoid the temp table needs and parameterization where you set qmark placeholders and in a subsequent step bind values to them. And being the same value multiply the parameter list/tuple by needed number:
dte = '2018-10-24'
# NOTE THE QMARK PLACEHOLDERS
query = """select invoice_id
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = ?
union all
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = ?"""
output = pd.read_sql(query, connection, params=(dte,)*2)

I agree with the comments, pandas.read_sql has a params argument which prevent from sql injection.
See this post to understand how to use it depending on the database.
Pyodbc has the same parameter on the execute method.
# standard
cursor.execute("select a from tbl where b=? and c=?", (x, y))
# pyodbc extension
cursor.execute("select a from tbl where b=? and c=?", x, y)
To answer to the initial question, even if it is bad practice for building SQL queries :
Do I really need to write out the variable 19 times when passing it to the function?
Of course you don't :
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'""".format(**{'dte': dte})
or :
query = """select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{0}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{0}'""".format(dte)
Python 3.6+ :
query = f"""select invoice_id
into #output
from table1 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'
insert into #output
select invoice_id
from table2 with (nolock)
where system_id = 'PrimaryColor'
and posting_date = '{dte}'"""
Note the usage of f before """ ... """

Related

Python MySQL SELECT WHERE with list

I have the following Python MySQL code.
cursor = mydb.cursor()
cursor.execute('SELECT id FROM table1 WHERE col1=%s AND col2=%s', (val1, val2))
ids = cursor.fetchall()
for id in ids:
cursor.execute('SELECT record_key FROM table2 WHERE id=%s limit 1', (id[0], ))
record_keys = cursor.fetchall()
print(record_keys[0][0])
How can I make this more efficient? I am using 5.5.60-MariaDB and Python 2.7.5. I have approximately 350 million entries in table1 and 15 million entries in table2.
Happily, you can do this in a single query using a LEFT JOIN.
cursor = mydb.cursor()
cursor.execute(
"SELECT t1.id, t2.record_key FROM table1 t1 "
"LEFT JOIN table2 t2 ON (t1.id = t2.id) "
"WHERE t1.col1=%s AND t2.col2=%s",
(val1, val2),
)
for id, record_key in cursor.fetchall():
pass # do something...

Fetch DB tables blueprint like "describe table_name" commande from redshift and DB2 from python

I want to fetch data using my python code like we do with describe [tableName] statement. I want to do that on Redshift and DB2.
I tried to do that using Pandas and cursors, I tried the following chunks of commands:
"set search_path to SCHEMA; select * from pg_table_def where schemaname = 'schema' and LOWER(tablename) = 'TableName';
describe schema.tableName
select column_name, data_type, character_manimum_length from information_schema.columns where table_schema = 'Schema' and table_name = 'TableName';
\d or \d+
.
import psycopg2
import numpy as np
import pandas as pd
con=psycopg2.connect(dbname= 'DBNAME', host="Host-Link",
port= '5439', user= 'username', password= 'password')
print(con)
cur = con.cursor()
query = "set search_path to Schema; select * from pg_table_def where schemaname = 'Schema' and LOWER(tablename) = 'TableName';"
cur.execute(query)
temp = cur.fetchall()
print(temp)
data_frame = pd.read_sql("set search_path to Schema; select * from pg_table_def where schemaname = 'Schema' and LOWER(tablename) = 'TableName';", con)
print(data_frame)
con.close()
I want the output as following:
COLUMN_NAME DATA_TYPE PK NULLABLE DEFAULT AUTOINCREMENT COMPUTED REMARKS POSITION
col1 varchar(10) YES NO NO NO 1
col2 varchar(50) NO NO NO NO 2
col3 smallint NO NO NO NO 3
A lot of this data is included in the SVV_COLUMNS system view. You can query that table using the table_name and table_schema columns.
https://docs.aws.amazon.com/redshift/latest/dg/r_SVV_COLUMNS.html

Python filter one list based on values that do not exist in another list

Trying to filter results of a query on a Table A by 2 values not found in a Table B. What would be the proper syntax and approach?
import pyodbc
MDB = 'C:/db/db1.mdb'; DRV = '{Microsoft Access Driver (*.mdb)}'; PWD = 'pw'
con = pyodbc.connect('DRIVER={};DBQ={};PWD={}'.format(DRV,MDB,PWD))
cur = con.cursor()
SQLA = 'SELECT * FROM TABLE1;' # your query goes here
SQLB = 'SELECT * FROM TABLE2;' # your query goes here
rows1 = cura.execute(SQLA).fetchall()
rows2 = cura.execute(SQLB).fetchall()
cur.close()
con.close()
for rit in rows1:
for git in rows2:
if (rit[1] and rit[2]) not in (git[1] and git[2]):
print ((rit[1]) (rit[2]))
Simply use a pure SQL solution with the familiar LEFT JOIN... IS NULL / NOT EXISTS / NOT IN. Below are equivalent queries, compliant in MS Access, returning rows in TableA not in TableB based on col1 and col2.
LEFT JOIN...IS NULL
SELECT a.*
FROM TABLEA a
LEFT JOIN TABLEB b
ON a.col1 = b.col1 AND a.col2 = b.col2
WHERE b.col1 IS NULL AND b.col2 IS NULL
NOT EXISTS
SELECT a.*
FROM TABLEA a
WHERE NOT EXISTS
(SELECT 1 FROM TABLEB b
WHERE a.col1 = b.col1 AND a.col2 = b.col2)
NOT IN
SELECT a.*
FROM TABLEA a
WHERE a.col1 NOT IN (SELECT col1 FROM TABLEB)
AND a.col2 NOT IN (SELECT col1 FROM TABLEB)
The SQL statements offered by Parfait are the preferred solution, but if you really wanted to use your double-loop approach it would need to be more like this:
for rit in rows1:
match_found = False
for git in rows2:
if (rit[1] == git[1]) and (rit[2] == git[2]):
match_found = True
break
if not match_found:
print(rit)

How do I escape multiple query params for an IN() clause?

I am trying to perform a SELECT query with an IN() clause, and have sqlalchemy perform the
parameter escaping for me. I am using pyodbc as my database connector.
This is the code I have written so far:
tables = ['table1', 'table2', ... ]
sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME IN(:tables)"
result = session.execute(sql, {"tables": tables})
Unfortunatenly this fails with an error:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('Invalid parameter type. param-index=0 param-type=list', 'HY105')
Is there any way I can have sqlalchemy escape the whole list of parameters and join them with ,
without manually adding a :tableX placeholder for each item of the list?
Try something like this....
DECLARE #string Varchar(100) = 'Table1,table2,table3'
declare #xml xml
set #xml = N'<root><r>' + replace(#string,',','</r><r>') + '</r></root>'
SELECT * FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME IN( select
r.value('.','varchar(max)') as item
from #xml.nodes('//root/r') as records(r)
)
For good reasons it is not possible to expand a list of arguments as you wish.
If you really would like to create a raw SQL query, then you can just enumerate over your list and dynamically create the query:
vals = {"param{}".format(i): table for i, table in enumerate(tables)}
keys = ", ".join([":{}".format(k) for k in vals])
sql = "SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME IN ({keys})".format(keys=keys)
result = session.execute(sql, vals)
for tbl in result:
print(tbl)
But you could ask sqlalchemy to do this for you. Here we make a fake mapping of the INFORMATION_SCHEMA.tables view, and query it using sqlalchemy toolset:
# definition (done just once)
class ISTable(Base):
__tablename__ = 'tables'
__table_args__ = {'schema': 'INFORMATION_SCHEMA'}
_fake_id = Column(Integer, primary_key=True)
table_catalog = Column(String)
table_schema = Column(String)
table_name = Column(String)
table_type = Column(String)
# actual usage
result = session.query(
ISTable.table_catalog, ISTable.table_schema,
ISTable.table_name, ISTable.table_type,
).filter(
ISTable.table_name.in_(tables))
for tbl in result:
print(tbl)
One gotcha: you cannot query for the whole mapped class (like this query(ISTable)) because the primary_key does not exist and an exception will be raised. But querying only columns we can about (as shown above) is good enough for the purpose.

select multiple columns using SQLite3 in Python

I have a list that contains the name of columns I want to retrieve from a table in the database.
My question is how to make the cursor select columns specified in the list. Do I have to convert nameList to a string variable before include it in the select statement? Thanks
nameList = ['A','B','C','D',...]
with sqlite3.connect(db_fileName) as conn:
cursor = conn.cursor()
cursor.execute("""
select * from table
""")
As long as you can be sure your input is sanitized -- to avoid SQL injection attack -- you can do:
...
qry = "select {} from table;"
qry.format( ','.join(nameList) )
cursor.execute(qry)
If you're on a really old version of Python do instead:
...
qry = "select %s from table;"
qry % ','.join(nameList)
cursor.execute(qry)
nameList = ["'A(pct)'",'B','C','D',...]
with sqlite3.connect(db_fileName) as conn:
cursor = conn.cursor()
cursor.execute("""
select {} from table
""".format(", ".join(nameList)))

Categories

Resources