I am trying to retrieve data from sqlite3 with the help of variables. It is working fine with execute() statement but i would like to retrieve columns also and for that purpose i am using read_sql_query() but i am unable to pass variables in read_sql_query(), please follow below code:
def cal():
tab = ['LCOLOutput']
column_name = 'CUSTOMER_EMAIL_ID'
xyz = '**AVarma1#ra.rockwell.com'
for index, m in enumerate(tab):
table_name = m
sq = "SELECT * FROM ? where ?=?;" , (table_name, column_name, xyz,)
df = pandas.read_sql_query(sq,conn)
writer =
pandas.ExcelWriter('D:\pandas_simple.xlsx',engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()
You need to change the syntax with the method read_sql_query() from pandas, check the doc.
For sqlite, it should work with :
sq = "SELECT * FROM ? where ?=?;"
param = (table_name, column_name, xyz,)
df = pandas.read_sql_query(sq,conn, params=param)
EDIT :
otherwise try with the following formatting for the table :
sq = "SELECT * FROM {} where ?=?;".format(table_name)
param = (column_name, xyz,)
df = pandas.read_sql_query(sq,conn, params=param)
Check this answer explaining why table cannot be passed as parameter directly.
Related
When I have to pass a parameter before running a sql query, I usually do
date = '20220101'
query = f'''SELECT * FROM TABLE WHERE DATE = '{date}''''
On an attempt to reduce the lenght of code, I created a query.sql file with the query above but I'm failing to pass the date variable inside my query, before running the sql.
For reading I'm using
sql_query = open("query.sql", "r")
sql_as_string = sql_query.read()
df = pd.read_sql(sql_as_string, conn)
Is there a way around, instead of pasting the whole SQL query at my .py code?
I'm using pyodbc, ODBC Driver 17 for SQL Server
Use a parametrized query, not string formatting.
The file should just contain the query, with a ? placeholder for the variable.
SELECT * FROM TABLE WHERE DATE = ?
Then you can do
with open("query.sql", "r") as f:
sql_query = f.read()
df = pd.read_sql(sql_query, conn, params=(date, ))
I am trying to create a view that contains a variable in Snowflake SQL. The whole thing is being done in Python script. Initially, I tried the binding variable approach but binding does not work in view creation SQL. Is there any other way I can proceed with this? I have given the code below.
Code:
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = '',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS SELECT * FROM (SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE FROM TBL_DIM DIM INNER JOIN TBL_MSTR MSTR ON DIM.ATTR_KEY=MSTR.ATTR_KEY ) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (*row))
AS P
ORDER BY P.PROD_CODE;""")
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)
finally:
cs.close()
ctx.close()
Error:
File "C:\Users\Anand Singh\anaconda3\lib\site-packages\snowflake\connector\errors.py", line 179, in default_errorhandler
raise error_class(
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 2 at position 65 unexpected 'row'.
Looking at the Python binding example
and your code it appears, you need
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", row)
but *row will pass the many argugments to I have changed to build the string or comman seperated as a single string.
More pythonic way to implement this is using f-string
row1 = cs.execute(f"""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN ({row}))
AS P
ORDER BY P.PROD_CODE;""")
It is also more readable especially if you have multiple parameters in the f-string
Issue resolved! Thanks a lot, Simeon for your help.
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = 'AzureSn0flake#123',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", ','.join(row))
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)
I am running redshift query which is having 40 millions of record. But when I am saving into csv file it is showing only 7 thousands of record. Could you please help me how to solve this?
Example:
Code:
conn = gcso_conn1()
with conn.cursor() as cur:
query = "select * from (select a.src_nm Source_System ,b.day_id Date,b.qty Market_Volume,b.cntng_unt Volume_Units,b.sls_in_lcl_crncy Market_Value,b.crncy_cd Value_Currency,a.panel Sales_Channel,a.cmpny Competitor_Name,a.lcl_mnfcr Local_Manufacturer ,a.src_systm_id SKU_PackID_ProductNumber,upper(a.mol_list) Molecule_Name,a.brnd_nm BrandName_Intl,a.lcl_prod_nm BrandName_Local,d.atc3_desc Brand_Indication,a.prsd_strngth_1_nbr Strength,a.prsd_strngth_1_unt Strength_Units,a.pck_desc Pack_Size_Number,a.prod_nm Product_Description,c.iso3_cntry_cd Country_ISO_Code,c.cntry_nm Country_Name from gcso_prd_cpy.dim_prod a join gcso_prd_cpy.fct_sales b on (a.SRC_NM='IMS' and b.SRC_NM='IMS' and a.prod_id = b.prod_id) join gcso_prd_cpy.dim_cntry c on (a.cntry_id = c.cntry_id) left outer join gcso_prd_cpy.dim_thrc_area d on (a.prod_id = d.prod_id) WHERE a.SRC_NM='IMS' and c.iso3_cntry_cd in ('JPN','IND','CAN','USA') and upper(a.mol_list) in ('AMBRISENTAN', 'BERAPROST','BOSENTAN') ORDER BY b.day_id ) a"
#print(query)
cur.execute(query)
result = cur.fetchall()
conn.commit()
column = [i[0] for i in cur.description]
sqldf = pd.DataFrame(result, columns= column)
print(sqldf.count())
#print(df3)
sqldf.to_csv(Output_Path, index= False, sep= '\001', encoding = 'utf-8')
Everything should work correctly. I think the main problem is debugging using count(). You expect number of records but docs says:
Count non-NA cells for each column or row.
Better to use when debugging DataFrame:
print(len(df))
print(df.shape)
print(df.info())
Also you can do it easier using read_sql:
import pandas as pd
from sqlalchemy import create_engine
header = True
for chunk in pd.read_sql(
'your query here - SELECT * FROM... ',
con=create_engine('creds', echo=True), # set creds - postgres+psycopg2://user:password#host:5432/db_name
chunksize=1000, # read by chunks
):
file_path = '/tmp/path_to_your.csv'
chunk.to_csv(
file_path,
header=header,
mode='a',
index=False,
)
header = False
I am writing a sql query where I want to pass a WHERE condition with parameters in pandas.read_sql_query.
It works fine for the value but I encounters problems with the variable.
My workaround is a concated string which I pass to pandas, but I don't like to see my code so.
I already figured out, that the column name of the table is written wrong. It is e.g. 'colname' instead of colname.
I wrote the sql as string:
command=("SELECT * FROM review r "
"WHERE 1=1 "
"AND "+selected_var+"= "+selected_val
)
And then i passed it to pandas
self.reviews = pd.read_sql_query(command, con = self.cnxn)
But I would like to include it without workaround.
import pandas as pd
import mysql.connector
self.reviews = pd.read_sql_query("""
SELECT *
FROM review r
WHERE 1=1
AND %(sel_var)s = %(sel_val)s;
""", con = self.cnxn, params = {'sel_var': selected_var,
'sel_val': selected_val
})
I expect that the query shows results without writing everything as command string.
What about string formatting?
input_params = {'sel_var': selected_var,
'sel_val': selected_val}
self.reviews = pd.read_sql_query(""" SELECT * FROM review r WHERE 1=1
AND {sel_var}={sel_val};""".format(**input_params),
con = self.cnxn)
So sorry, i have one sqlite file and it includes many tables(Like table_A to table_Z)
Sorry, i only can use the c.execute('SELECT ST_Name FROM table_A') and do table_B again.
How can i use the loop to do it, i already search all day, but i don't get the answer, please help me.
Thx!!
About my code, please refer below
import sqlite3
import numpy as np
Sqlite_Path = 'D:\Student.sqlite'
conn = sqlite3.connect(Sqlite_Path)
c = conn.cursor()
c.execute('SELECT ST_Name FROM Table_A')
data = c.fetchall()
# do something
c.execute('SELECT ST_Name FROM Table_B')
data = c.fetchall()
# do something again
This looks like a homework to me, but anyway:
tables = ["Table_A ", "Table_B", "Table_C"]
for table in tables:
c.execute('SELECT ST_Name FROM {}'.format(table))
data = c.fetchall()
# do something
If you don't know all the tables or for some other reason you can do something like this:
tables = [r[0] for r in db.execute('select name from sqlite_master where name like "table_%" and type = "table"')]
for table in tables:
stmt = 'SELECT * FROM {};'.format(table)
c = db.execute(stmt)
rows = c.fetch_all()
# ... do something with results
this returns ALL table names in your sqlitefile