Write variable after reading .sql query - python

When I have to pass a parameter before running a sql query, I usually do
date = '20220101'
query = f'''SELECT * FROM TABLE WHERE DATE = '{date}''''
On an attempt to reduce the lenght of code, I created a query.sql file with the query above but I'm failing to pass the date variable inside my query, before running the sql.
For reading I'm using
sql_query = open("query.sql", "r")
sql_as_string = sql_query.read()
df = pd.read_sql(sql_as_string, conn)
Is there a way around, instead of pasting the whole SQL query at my .py code?
I'm using pyodbc, ODBC Driver 17 for SQL Server

Use a parametrized query, not string formatting.
The file should just contain the query, with a ? placeholder for the variable.
SELECT * FROM TABLE WHERE DATE = ?
Then you can do
with open("query.sql", "r") as f:
sql_query = f.read()
df = pd.read_sql(sql_query, conn, params=(date, ))

Related

More efficient way to query this SQL table from python?

I need to query rows where a column matches my list of ~60K IDs out of a table that contains millions of IDs. I think normally you would insert a temporary table into the database and merge on that but I can't edit this database. I am doing it like this using a loop w/ a python wrapper, but is there a better way? I mean it works, but still:
import pyodbc
import pandas as pd
# connect to the database using windows authentication
conn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=my_fav_server;DATABASE=my_fav_db;Trusted_Connection=yes;')
cursor = conn.cursor()
# read in all the ids
ids_list = [...60K ids in here..]
# query in 10K chunks to prevent memory error
def chunks(l,n):
# split list into n lists of evenish size
n = max(1,n)
return [l[i:i+n] for i in range(0,len(l), n)]
chunked_ids_lists = chunks(ids_list, 10000)
# looping through to retrieve all cols
for chunk_num, chunked_ids_list in enumerate(chunked_ids_lists):
temp_ids_string = "('" + "','".join(chunked_ids_list) + "')"
temp_sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN {temp_ids_string};"
temp_data = pd.read_sql_query(temp_sql, conn)
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data.to_csv(temp_path, sep='\t', index=None)
# read the query chunks
all_data_list = []
for chunk_num in range(len(chunked_ids_lists)):
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data = pd.read_csv(temp_path, sep='\t')
all_data_list.append(temp_data)
all_data = pd.concat(all_data_list)
Another way use Psycopg's cursor.
import psycopg2
# Connect to an existing database
conn = psycopg2.connect("dbname=test user=postgres")
# Open a cursor to perform database operations
cur = conn.cursor()
# get data from query
# no need construct 'SQL-correct syntax' filter
cur.execute("SELECT * FROM dbo.my_fav_table WHERE ID IN %(filter)s;", {"filter": chunked_ids_lists})
# loop over getted rows
for record in cur:
# we got one record
print(record) # or make other data treatment
Use parameters rather than concatenating strings.
I don't see the need for the CSV files, if you're just going to read them all into Python in the next loop. Just put everything into all_data_list during the query loop.
all_data_list = []
for chunk in chunked_ids_lists:
params = ','.join(['?'] * len(chunk))
sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN ({params});"
cursor.execute(sql, chunk)
rows = cursor.fetchall()
all_data_list.extend(rows)
all_data = pd.dataFrame(all_data_list)

How do I issue a SQL query using pyodbc that involves python variables as the parameters?

I am writing a program that searches a database and inputs the data into a excel file based on user selection from two calendars (on the GUI). I have a start and end date, I am passing those variables as parameters into a function for my SQL query.
How do I format the SQL query to select the values from the given dates?
Here is my code so far,
def load_database(startDate, endDate):
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};'
'Server=DESKTOP-KVR7GNJ\SQLBULKROLL;'
'Database=db_MasterRoll;'
'UID=sa;'
'PWD=Z28#6420d')
wb = Workbook()
ws = wb.active
ws.title = "Star Ledger"
cursor = conn.cursor()
cursor.execute('SELECT ID, BarCode, DateTm, EntryDoor FROM dbo.tab_Rolls WHERE (DateTm >= #startDate) AND (DateTm <= #endDate)')
a button invokes this code by running the function "call_db_code()":
def call_db_code():
firstDate = start_dateCalendar.selection_get()
print(firstDate)
finalDate = end_dateCalendar.selection_get()
print(finalDate)
load_database(firstDate, finalDate)
Use ? as placeholders in the query, and provide a tuple of values to substitute for them.
cursor.execute('SELECT ID, BarCode, DateTm, EntryDoor FROM dbo.tab_Rolls WHERE (DateTm >= ?) AND (DateTm <= ?)', (startDate, endDate))
See the examples in the cursor.execute() documentation.

How to conduct SQL queries on multiple .db files and store the results in a .csv?

I have about 100 .db files stored on my Google Drive which I want to run the same SQL query on. I'd like to store these query results in a single .csv file.
I've managed to use the following code to write the results of a single SQL query into a .csv file, but I am unable to make it work for multiple files.
conn = sqlite3.connect('/content/drive/My Drive/Data/month_2014_01.db')
df = pd.read_sql_query("SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'", conn)
df.to_csv('/content/drive/My Drive/Data/Query_Results.csv')
This is the code that I have used so far to try and make it work for all files, based on this post.
databases = []
directory = '/content/drive/My Drive/Data/'
for filename in os.listdir(directory):
flname = os.path.join(directory, filename)
databases.append(flname)
for database in databases:
try:
with sqlite3.connect(database) as conn:
conn.text_factory = str
cur = conn.cursor()
cur.execute(row["SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'"])
df.loc[index,'Results'] = cur.fetchall()
except sqlite3.Error as err:
print ("[INFO] %s" % err)
But this throws me an error: TypeError: tuple indices must be integers or slices, not str.
I'm obviously doing something wrong and I would much appreciate any tips that would point towards an answer.
Consider building a list of data frames, then concatenate them together in a single data frame with pandas.concat:
gdrive = "/content/drive/My Drive/Data/"
sql = """SELECT * FROM messages
INNER JOIN users ON messages.id = users.id
WHERE text LIKE '%house%'
"""
def build_df(db)
with sqlite3.connect(os.path.join(gdrive, db)) as conn:
df = pd.read_sql_query(sql, conn)
return df
# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [build_df(db) for db in os.listdir(gdrive) if db.endswith('.db')]
# CONCATENATE ALL DFs INTO SINGLE DF FOR EXPORT
final_df = pd.concat(df_list, ignore_index = True)
final_df.to_csv(os.path.join(gdrive, 'Query_Results.csv'), index = False)
Better yet, consider SQLite's ATTACH DATABASE and append query results into a master table. This also avoids using the heavy data science, third-party library, pandas, for simple data migration needs. Plus, you keep all database data inside SQLite without worrying about data type conversion and i/o transfer issues.
import csv
import sqlite3
with sqlite3.connect(os.path.join(gdrive, 'month_2014_01')) as conn:
# CREATE MASTER TABLE
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS master_query")
cur.execute("""CREATE TABLE master_query AS
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
conn.commit()
# ITERATIVELY ATTACH AND APPEND RESULTS
for db in os.listdir(gdrive):
if db.endswith('.db'):
cur.execute("ATTACH DATABASE ? AS tmp", [db])
cur.execute("""INSERT INTO master_query
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
cur.execute("DETACH DATABASE tmp")
conn.commit()
# WRITE TUPLE OF ROWS TO CSV
data = cur.execute("SELECT * FROM master_query")
with open(os.path.join(gdrive, 'Query_Results.csv'), 'wb') as f:
writer = csv.writer(f)
writer.writerow([i[0] for i in cur.description]) # HEADERS
writer.writerows(data) # DATA
cur.close()

Python - Apply a filter on query using a Python Parameter

I've a script that makes a query to my database on MySQL. And my doubt is if I can pass any parameter to that query through Python.
For example, on the following script I want to calculate the date_filter using Python and then apply that filter on the query.
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = connection.cursor(buffered=True)
query = ('''
SELECT *
FROM myTable
WHERE date >= ''' + date_filter + '''
''')
I try it on that way but I got the following error:
builtins.TypeError: can only concatenate str (not "datetime.datetime") to str
It is possible to apply the filter like that?
Thanks!
Yes, you can do it. To avoid sql injections, the best way is not using the python formatting facilities, but the sql parameters & placeholders (see that you donĀ“t need the single quotes ' as the placeholder does the job and converts the type of the variable):
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = db_connection.cursor(buffered=True)
query = "SELECT * FROM myTable WHERE date >=%s"
cursor.execute(query,(date_filter,))
Also, you had a mistake in your cursor, it should be db_connection.cursor. The last comma after date_filter is ok because you need to send a tuple.
In case you need more than one paremeter, you can place more than one placeholder:
query = "SELECT * FROM myTable WHERE date >=%s and date<=%s"
cursor.execute(query,(date_filter,other_date))
You can just do something like:
WHERE date >= ''' + str(date_filter) + '''
to represent the date as string as not a datetime object.
You can try with this:
date_f = str(date_filter)
query = ('''
SELECT *
FROM myTable
WHERE date >= "{}"
'''.format(date_f))

reading and executing sql queries into pandas data frame

I have a long-assed sql query that runs quite well in Python, into a data frame
but I have hundreds of them, so I tried creating a function that reads my files and executes them.
The sql statements look like this:
"SELECT IIf(Left([Milestone_Next_Expected],4)='Proc',1, \
....\
120 lines
....\
dbo.MY_data_value"
This is the function
def Execute_SQL_from_a_File(filename,home,conn1):
FORMAT1 = '%Y%m%d%H%M'
fd = open(filename, 'r')
sqlFile = fd.read()
fd.close()
KIC53 = pd.read_sql(sqlFile, conn1)
f_out = home + out1 + ".xls"
writer = pd.ExcelWriter(f_out)
KIC53.to_excel(writer,f_out)
writer.save()
This is what calls the function:
Execute_SQL_from_a_File(QRYHOME + "qryBook" + str(BNUM) + "_" + str(IND) + ".sql", BNUM, home, conn1)
when I run the query as function I received this error:
: ('42000', "[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]The
identifier that starts with 'SELECT
IIf(Left([Milestone_Next_Expected],4)='Proc',1,
\\\nIIf(Left([Milestone_Next_Expected],4)='Subm',2,
\\\nIIf(Left([Milestone_N' is too long. Maximum length is 128.")
I cant figure out why I'm getting the length error, because I can run the same query by creating sqlFile as one long string:
"SELECT IIf(Left([Milestone_Next_Expected],4)='Proc',1, \
....\
120 lines
....\
dbo.MY_data_value"
ANY help would be greatly appreciated!
The correct method is:
1. the sql script script does not require any line continuation symbols, "\"
and does not need to be encased in quotes
2. The correct way to read the input file is:
file=open(filename,'r')
SQLfile = s = " ".join(file.readlines())
Now, when the code is executed, via pd.read_sql_query(SQLfile, conn1)
there are no errors

Categories

Resources