Does pandas support reading data from multiple tables into a dataframe?

Does pandas support reading data from multiple tables into a dataframe? - python

I'm using pandas to read SQLl output into a dataframe. I'm calling a stored procedure which returns a table output. Following code works fine.If my stored procedure return more than one table outputs[1], How can I read those from dataframe. I want to write different table outputs into different excel sheets
query='exec [aa].[dbo].[sp_cc]?,?'
df = pd.read_sql(query, cnxn, params=[start,end)
writer = pd.ExcelWriter('output.xlsx')
df.to_excel(writer, index=False, sheet_name='customers')
writer.save()
[1]
CREATE procedure [dbo].[usp_vvvv] (....)
BEGIN
SET NOCOUNT ON
.....
select *
FROM #_temp_client_details
select *
FROM #_temp_address_details
select *
FROM #_temp_invoice_details
drop table #_temp_client_details
drop table #_temp_address_details
drop table #_temp_invoice_details
....
END TRY
BEGIN CATCH
..
END CATCH
END

I hope this can help you :
import pandas as pd
import pyodbc
conn = pyodbc.connect('driver={SQL Server};server=xxx.xxx.x.xxx;uid=myuser;pwd=mypass;database=mybd;autocommit=True')
cursor = conn.cursor()
cursor.execute('exec usp_with_2_select')
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
column_names = [col[0] for col in cursor.description]
df1_data = []
for row in cursor.fetchall():
df1_data.append({name: row[i] for i, name in enumerate(column_names)})
df1 = pd.DataFrame(df1_data)
print(df1)
df1.to_excel(writer,'sheet1')
# this for pass the next result
cursor.nextset ()
df2_data = []
for row in cursor.fetchall():
df2_data.append({name: row[i] for i, name in enumerate(column_names)})
df2 = pd.DataFrame(df2_data)
print(df2)
df2.to_excel(writer,'sheet2')
writer.save()

Why do you need Pandas for this? You can go from SQL Server directly to Excel many different ways. Here is one concept that will work for you. There are many ways to skin this cat...
Sub ADOExcelSQLServer()
' Carl SQL Server Connection
'
' FOR THIS CODE TO WORK
' In VBE you need to go Tools References and check Microsoft Active X Data Objects 2.x library
'
Dim Cn As ADODB.Connection
Dim Server_Name As String
Dim Database_Name As String
Dim User_ID As String
Dim Password As String
Dim SQLStr As String
Dim rs As ADODB.Recordset
Set rs = New ADODB.Recordset
Server_Name = "your_server_name" ' Enter your server name here
Database_Name = "NORTHWND" ' Enter your database name here
User_ID = "" ' enter your user ID here
Password = "" ' Enter your password here
SQLStr = "SELECT * FROM [Customers]" ' Enter your SQL here
Set Cn = New ADODB.Connection
Cn.Open "Driver={SQL Server};Server=" & Server_Name & ";Database=" & Database_Name & _
";Uid=" & User_ID & ";Pwd=" & Password & ";"
rs.Open SQLStr, Cn, adOpenStatic
' Dump to spreadsheet
For iCols = 0 To rs.Fields.Count - 1
Worksheets("Sheet1").Cells(1, iCols + 1).Value = rs.Fields(iCols).Name
Next
With Worksheets("sheet1").Range("a2:z500") ' Enter your sheet name and range here
'.ClearContents
.CopyFromRecordset rs
End With
' Tidy up
rs.Close
Set rs = Nothing
Cn.Close
Set Cn = Nothing
End Sub

Related

Using variable in Snowflake SQL in Python script

I am trying to create a view that contains a variable in Snowflake SQL. The whole thing is being done in Python script. Initially, I tried the binding variable approach but binding does not work in view creation SQL. Is there any other way I can proceed with this? I have given the code below.
Code:
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = '',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS SELECT * FROM (SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE FROM TBL_DIM DIM INNER JOIN TBL_MSTR MSTR ON DIM.ATTR_KEY=MSTR.ATTR_KEY ) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (*row))
AS P
ORDER BY P.PROD_CODE;""")
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)
finally:
cs.close()
ctx.close()
Error:
File "C:\Users\Anand Singh\anaconda3\lib\site-packages\snowflake\connector\errors.py", line 179, in default_errorhandler
raise error_class(
ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 2 at position 65 unexpected 'row'.

Looking at the Python binding example
and your code it appears, you need
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", row)
but *row will pass the many argugments to I have changed to build the string or comman seperated as a single string.

More pythonic way to implement this is using f-string
row1 = cs.execute(f"""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN ({row}))
AS P
ORDER BY P.PROD_CODE;""")
It is also more readable especially if you have multiple parameters in the f-string

Issue resolved! Thanks a lot, Simeon for your help.
import snowflake.connector as sf
import pandas
ctx = sf.connect (
user = 'floatinginthecloud89',
password = 'AzureSn0flake#123',
account = 'nq13914.southeast-asia.azure',
warehouse = 'compute_wh',
database = 'util_db',
schema = 'public'
)
print("Got the context object")
cs = ctx.cursor()
print("Got the cursor object")
column1 = 'attr_name';
try:
row = cs.execute("select listagg(('''' || attr_name || ''''), ',') from util_db.public.TBL_DIM;")
rows = cs.fetchall()
for row in rows:
print(row)
print(rows)
row1 = cs.execute("""CREATE OR REPLACE table util_db.public.HIERARCHY_VIEW_2 AS
SELECT * FROM (
SELECT MSTR.PROD_CODE AS PROD_CODE,DIM.ATTR_NAME AS ATTR_NAME,MSTR.ATTR_VALUE AS ATTR_VALUE
FROM TBL_DIM DIM
INNER JOIN TBL_MSTR MSTR
ON DIM.ATTR_KEY=MSTR.ATTR_KEY
) Q
PIVOT (MAX (Q.ATTR_VALUE) FOR Q.ATTR_NAME IN (%s))
AS P
ORDER BY P.PROD_CODE;""", ','.join(row))
rows1 = cs.fetchall()
for row1 in rows1:
print(row1)

Python brings empty dataframe with no reason

I'm using pd.read_sql to generate dataframes. I'm using parameters in a list to generate separate dataframes based on filters. I did same query from my script in python on database and it worked.
In SQL database I always had data returned for months 1,2,3 and/or project x,y. But python SOMETIMES doesn't bring nonthing and I don't know. This generate empty dataframe
if I put months like [1,'2',3] sometimes the where condition works, but in my database the field is varchar, I don't why if I put int in a list the data comes or depend on type the filter doesn't works
server = 'xxxxx'
database = 'mydatabase'
username = 'user'
password = '*************'
driver = '{SQL Server Native Client 11.0}'
turbodbc_options = 'options'
timeout = '1'
project = ['x','y']
months = ['1','2','3']
def generatefile():
for pr in project:
for index in months:
print(type(index))
print(index)
print(pr)
db = pd.read_sql('''select * from table WHERE monthnumber =(?)and pro=(?)''',conn,params={str(index),(pr)})
print(db)
print("generate...")
db.to_excel('C:\\localfile\\' + pr + '\\' + pr + tp + str(index) + 'file.xlsx',index=False,engine='xlsxwriter' ),index=False,engine='xlsxwriter' )
generatefile()

consider the difference between:
1.
sql = '''
select * from table WHERE monthnumber = '1' and pro= 'x'
'''
sql = '''
select * from table WHERE monthnumber = 1 and pro= 'x'
'''
so, the sql should be something like below, if the field is varchar.
db = pd.read_sql(f'''
select * from table WHERE monthnumber ='{index}'and pro='{pr}'
''',conn)

Dropping some data while saving dataframe into csv file

I am running redshift query which is having 40 millions of record. But when I am saving into csv file it is showing only 7 thousands of record. Could you please help me how to solve this?
Example:
Code:
conn = gcso_conn1()
with conn.cursor() as cur:
query = "select * from (select a.src_nm Source_System ,b.day_id Date,b.qty Market_Volume,b.cntng_unt Volume_Units,b.sls_in_lcl_crncy Market_Value,b.crncy_cd Value_Currency,a.panel Sales_Channel,a.cmpny Competitor_Name,a.lcl_mnfcr Local_Manufacturer ,a.src_systm_id SKU_PackID_ProductNumber,upper(a.mol_list) Molecule_Name,a.brnd_nm BrandName_Intl,a.lcl_prod_nm BrandName_Local,d.atc3_desc Brand_Indication,a.prsd_strngth_1_nbr Strength,a.prsd_strngth_1_unt Strength_Units,a.pck_desc Pack_Size_Number,a.prod_nm Product_Description,c.iso3_cntry_cd Country_ISO_Code,c.cntry_nm Country_Name from gcso_prd_cpy.dim_prod a join gcso_prd_cpy.fct_sales b on (a.SRC_NM='IMS' and b.SRC_NM='IMS' and a.prod_id = b.prod_id) join gcso_prd_cpy.dim_cntry c on (a.cntry_id = c.cntry_id) left outer join gcso_prd_cpy.dim_thrc_area d on (a.prod_id = d.prod_id) WHERE a.SRC_NM='IMS' and c.iso3_cntry_cd in ('JPN','IND','CAN','USA') and upper(a.mol_list) in ('AMBRISENTAN', 'BERAPROST','BOSENTAN') ORDER BY b.day_id ) a"
#print(query)
cur.execute(query)
result = cur.fetchall()
conn.commit()
column = [i[0] for i in cur.description]
sqldf = pd.DataFrame(result, columns= column)
print(sqldf.count())
#print(df3)
sqldf.to_csv(Output_Path, index= False, sep= '\001', encoding = 'utf-8')

Everything should work correctly. I think the main problem is debugging using count(). You expect number of records but docs says:
Count non-NA cells for each column or row.
Better to use when debugging DataFrame:
print(len(df))
print(df.shape)
print(df.info())
Also you can do it easier using read_sql:
import pandas as pd
from sqlalchemy import create_engine
header = True
for chunk in pd.read_sql(
'your query here - SELECT * FROM... ',
con=create_engine('creds', echo=True), # set creds - postgres+psycopg2://user:password#host:5432/db_name
chunksize=1000, # read by chunks
):
file_path = '/tmp/path_to_your.csv'
chunk.to_csv(
file_path,
header=header,
mode='a',
index=False,
)
header = False

Inserting selective columns to postgres using pandas python

Objective is to write excel column data into postgres table.But all the columns names in excel doesn't match with the table column.
So what I am doing is trying to insert only the common columns.
I am able to get the common data in the set.
I am stuck as to how to insert the data in a single query.
I am using pandas dataframe.
#Getting table columns in a list
conn = psycopg2.connect(dbname=dbname, host=host, port=port, user=user, password=pwd)
print("Connecting to Database")
cur = conn.cursor()
cur.execute("SELECT * FROM " + table_name + " LIMIT 0")
table_columns = [desc[0] for desc in cur.description]
#print table_columns
#Getting excel sheet columns in a list
df = pd.read_excel('/Users/.../plans.xlsx', sheet_name='plans')
engine = create_engine('postgresql://postgres:postgres#localhost:5432/test_db')
column_list = df.columns.values.tolist()
#print(column_list)
s = set(column_list).intersection(set(table_columns))
for x in df['column_1'] :
sql = "insert into test_table(column_1) values ('" + x + "')"
cur.execute(sql)
cur.execute("commit;")
conn.close()
update the code based on the answers but with this every time i run the program it inserts new records. Is there any option where I can just do the update.
s = set(column_list).intersection(set(table_columns))
df1 = df[df.columns.intersection(table_columns)]
#print df1
df1.to_sql('medical_plans', con=engine, if_exists='append', index=False, index_label=None)

how to insert variables in read_sql_query using python

I am trying to retrieve data from sqlite3 with the help of variables. It is working fine with execute() statement but i would like to retrieve columns also and for that purpose i am using read_sql_query() but i am unable to pass variables in read_sql_query(), please follow below code:
def cal():
tab = ['LCOLOutput']
column_name = 'CUSTOMER_EMAIL_ID'
xyz = '**AVarma1#ra.rockwell.com'
for index, m in enumerate(tab):
table_name = m
sq = "SELECT * FROM ? where ?=?;" , (table_name, column_name, xyz,)
df = pandas.read_sql_query(sq,conn)
writer =
pandas.ExcelWriter('D:\pandas_simple.xlsx',engine='xlsxwriter')
df.to_excel(writer, sheet_name='Sheet1')
writer.save()

You need to change the syntax with the method read_sql_query() from pandas, check the doc.
For sqlite, it should work with :
sq = "SELECT * FROM ? where ?=?;"
param = (table_name, column_name, xyz,)
df = pandas.read_sql_query(sq,conn, params=param)
EDIT :
otherwise try with the following formatting for the table :
sq = "SELECT * FROM {} where ?=?;".format(table_name)
param = (column_name, xyz,)
df = pandas.read_sql_query(sq,conn, params=param)
Check this answer explaining why table cannot be passed as parameter directly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Does pandas support reading data from multiple tables into a dataframe? - python

Related

Using variable in Snowflake SQL in Python script

Python brings empty dataframe with no reason

Dropping some data while saving dataframe into csv file

Inserting selective columns to postgres using pandas python

how to insert variables in read_sql_query using python

Categories

Resources