SQL returning empty dataframe

SQL returning empty dataframe - python

I've built a query that retrieves multiple rows from the database and it works great running directly from SQL Server Management Studio. When I tried the same query in python's pyodbc I keep getting empty DF. Cursor is working perfectly, all parameters are strings. What am I missing?
Error
def get_general_tables(conn, cursor, boletim, area, year, city):
print(boletim, area, year, city)
get_t1_query = f'''
SELECT t2.Acao as Acao,\
SUM(t1.ValorD) as Dot,\
SUM(t1.ValorE) as Emp,\
SUM(t1.ValorL) as Liq,\
SUM(t1.ValorP) as Pag\
FROM View_General as t1\
INNER JOIN Boletins as t2\
ON (t1.DescProg = t2.Prog\
AND t1.Desc = t2.Acao)\
WHERE (t1.NameCity = ? AND t1.Year = ? AND t2.Area = ? AND t2.Boletim = ?)\
GROUP BY t2.Acao
'''
params = [city, ano, area, boletim]
cursor.execute(get_t1_query, params)#.fetchall()
df = pd.DataFrame(cursor.fetchall())
print(df)
If I use a single parameter I get the expected result. Error persists with pd.read_sql approach:
df = pd.DataFrame(pd.read_sql(get_t1_query, con = conn, params=params))

Related

Write variable after reading .sql query

When I have to pass a parameter before running a sql query, I usually do
date = '20220101'
query = f'''SELECT * FROM TABLE WHERE DATE = '{date}''''
On an attempt to reduce the lenght of code, I created a query.sql file with the query above but I'm failing to pass the date variable inside my query, before running the sql.
For reading I'm using
sql_query = open("query.sql", "r")
sql_as_string = sql_query.read()
df = pd.read_sql(sql_as_string, conn)
Is there a way around, instead of pasting the whole SQL query at my .py code?
I'm using pyodbc, ODBC Driver 17 for SQL Server

Use a parametrized query, not string formatting.
The file should just contain the query, with a ? placeholder for the variable.
SELECT * FROM TABLE WHERE DATE = ?
Then you can do
with open("query.sql", "r") as f:
sql_query = f.read()
df = pd.read_sql(sql_query, conn, params=(date, ))

More efficient way to query this SQL table from python?

I need to query rows where a column matches my list of ~60K IDs out of a table that contains millions of IDs. I think normally you would insert a temporary table into the database and merge on that but I can't edit this database. I am doing it like this using a loop w/ a python wrapper, but is there a better way? I mean it works, but still:
import pyodbc
import pandas as pd
# connect to the database using windows authentication
conn = pyodbc.connect('DRIVER={SQL Server Native Client 11.0};SERVER=my_fav_server;DATABASE=my_fav_db;Trusted_Connection=yes;')
cursor = conn.cursor()
# read in all the ids
ids_list = [...60K ids in here..]
# query in 10K chunks to prevent memory error
def chunks(l,n):
# split list into n lists of evenish size
n = max(1,n)
return [l[i:i+n] for i in range(0,len(l), n)]
chunked_ids_lists = chunks(ids_list, 10000)
# looping through to retrieve all cols
for chunk_num, chunked_ids_list in enumerate(chunked_ids_lists):
temp_ids_string = "('" + "','".join(chunked_ids_list) + "')"
temp_sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN {temp_ids_string};"
temp_data = pd.read_sql_query(temp_sql, conn)
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data.to_csv(temp_path, sep='\t', index=None)
# read the query chunks
all_data_list = []
for chunk_num in range(len(chunked_ids_lists)):
temp_path = f"temp_chunk_{chunk_num}.txt"
temp_data = pd.read_csv(temp_path, sep='\t')
all_data_list.append(temp_data)
all_data = pd.concat(all_data_list)

Another way use Psycopg's cursor.
import psycopg2
# Connect to an existing database
conn = psycopg2.connect("dbname=test user=postgres")
# Open a cursor to perform database operations
cur = conn.cursor()
# get data from query
# no need construct 'SQL-correct syntax' filter
cur.execute("SELECT * FROM dbo.my_fav_table WHERE ID IN %(filter)s;", {"filter": chunked_ids_lists})
# loop over getted rows
for record in cur:
# we got one record
print(record) # or make other data treatment

Use parameters rather than concatenating strings.
I don't see the need for the CSV files, if you're just going to read them all into Python in the next loop. Just put everything into all_data_list during the query loop.
all_data_list = []
for chunk in chunked_ids_lists:
params = ','.join(['?'] * len(chunk))
sql = f"SELECT * FROM dbo.my_fav_table WHERE ID IN ({params});"
cursor.execute(sql, chunk)
rows = cursor.fetchall()
all_data_list.extend(rows)
all_data = pd.dataFrame(all_data_list)

How do I issue a SQL query using pyodbc that involves python variables as the parameters?

I am writing a program that searches a database and inputs the data into a excel file based on user selection from two calendars (on the GUI). I have a start and end date, I am passing those variables as parameters into a function for my SQL query.
How do I format the SQL query to select the values from the given dates?
Here is my code so far,
def load_database(startDate, endDate):
conn = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};'
'Server=DESKTOP-KVR7GNJ\SQLBULKROLL;'
'Database=db_MasterRoll;'
'UID=sa;'
'PWD=Z28#6420d')
wb = Workbook()
ws = wb.active
ws.title = "Star Ledger"
cursor = conn.cursor()
cursor.execute('SELECT ID, BarCode, DateTm, EntryDoor FROM dbo.tab_Rolls WHERE (DateTm >= #startDate) AND (DateTm <= #endDate)')
a button invokes this code by running the function "call_db_code()":
def call_db_code():
firstDate = start_dateCalendar.selection_get()
print(firstDate)
finalDate = end_dateCalendar.selection_get()
print(finalDate)
load_database(firstDate, finalDate)

Use ? as placeholders in the query, and provide a tuple of values to substitute for them.
cursor.execute('SELECT ID, BarCode, DateTm, EntryDoor FROM dbo.tab_Rolls WHERE (DateTm >= ?) AND (DateTm <= ?)', (startDate, endDate))
See the examples in the cursor.execute() documentation.

How to conduct SQL queries on multiple .db files and store the results in a .csv?

I have about 100 .db files stored on my Google Drive which I want to run the same SQL query on. I'd like to store these query results in a single .csv file.
I've managed to use the following code to write the results of a single SQL query into a .csv file, but I am unable to make it work for multiple files.
conn = sqlite3.connect('/content/drive/My Drive/Data/month_2014_01.db')
df = pd.read_sql_query("SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'", conn)
df.to_csv('/content/drive/My Drive/Data/Query_Results.csv')
This is the code that I have used so far to try and make it work for all files, based on this post.
databases = []
directory = '/content/drive/My Drive/Data/'
for filename in os.listdir(directory):
flname = os.path.join(directory, filename)
databases.append(flname)
for database in databases:
try:
with sqlite3.connect(database) as conn:
conn.text_factory = str
cur = conn.cursor()
cur.execute(row["SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'"])
df.loc[index,'Results'] = cur.fetchall()
except sqlite3.Error as err:
print ("[INFO] %s" % err)
But this throws me an error: TypeError: tuple indices must be integers or slices, not str.
I'm obviously doing something wrong and I would much appreciate any tips that would point towards an answer.

Consider building a list of data frames, then concatenate them together in a single data frame with pandas.concat:
gdrive = "/content/drive/My Drive/Data/"
sql = """SELECT * FROM messages
INNER JOIN users ON messages.id = users.id
WHERE text LIKE '%house%'
"""
def build_df(db)
with sqlite3.connect(os.path.join(gdrive, db)) as conn:
df = pd.read_sql_query(sql, conn)
return df
# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [build_df(db) for db in os.listdir(gdrive) if db.endswith('.db')]
# CONCATENATE ALL DFs INTO SINGLE DF FOR EXPORT
final_df = pd.concat(df_list, ignore_index = True)
final_df.to_csv(os.path.join(gdrive, 'Query_Results.csv'), index = False)
Better yet, consider SQLite's ATTACH DATABASE and append query results into a master table. This also avoids using the heavy data science, third-party library, pandas, for simple data migration needs. Plus, you keep all database data inside SQLite without worrying about data type conversion and i/o transfer issues.
import csv
import sqlite3
with sqlite3.connect(os.path.join(gdrive, 'month_2014_01')) as conn:
# CREATE MASTER TABLE
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS master_query")
cur.execute("""CREATE TABLE master_query AS
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
conn.commit()
# ITERATIVELY ATTACH AND APPEND RESULTS
for db in os.listdir(gdrive):
if db.endswith('.db'):
cur.execute("ATTACH DATABASE ? AS tmp", [db])
cur.execute("""INSERT INTO master_query
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
cur.execute("DETACH DATABASE tmp")
conn.commit()
# WRITE TUPLE OF ROWS TO CSV
data = cur.execute("SELECT * FROM master_query")
with open(os.path.join(gdrive, 'Query_Results.csv'), 'wb') as f:
writer = csv.writer(f)
writer.writerow([i[0] for i in cur.description]) # HEADERS
writer.writerows(data) # DATA
cur.close()

Python - Apply a filter on query using a Python Parameter

I've a script that makes a query to my database on MySQL. And my doubt is if I can pass any parameter to that query through Python.
For example, on the following script I want to calculate the date_filter using Python and then apply that filter on the query.
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = connection.cursor(buffered=True)
query = ('''
SELECT *
FROM myTable
WHERE date >= ''' + date_filter + '''
''')
I try it on that way but I got the following error:
builtins.TypeError: can only concatenate str (not "datetime.datetime") to str
It is possible to apply the filter like that?
Thanks!

Yes, you can do it. To avoid sql injections, the best way is not using the python formatting facilities, but the sql parameters & placeholders (see that you don´t need the single quotes ' as the placeholder does the job and converts the type of the variable):
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = db_connection.cursor(buffered=True)
query = "SELECT * FROM myTable WHERE date >=%s"
cursor.execute(query,(date_filter,))
Also, you had a mistake in your cursor, it should be db_connection.cursor. The last comma after date_filter is ok because you need to send a tuple.
In case you need more than one paremeter, you can place more than one placeholder:
query = "SELECT * FROM myTable WHERE date >=%s and date<=%s"
cursor.execute(query,(date_filter,other_date))

You can just do something like:
WHERE date >= ''' + str(date_filter) + '''
to represent the date as string as not a datetime object.

You can try with this:
date_f = str(date_filter)
query = ('''
SELECT *
FROM myTable
WHERE date >= "{}"
'''.format(date_f))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SQL returning empty dataframe - python

Related

Write variable after reading .sql query

More efficient way to query this SQL table from python?

How do I issue a SQL query using pyodbc that involves python variables as the parameters?

How to conduct SQL queries on multiple .db files and store the results in a .csv?

Python - Apply a filter on query using a Python Parameter

Categories

Resources