I am querying a SQL database and I want to use pandas to process the data. However, I am not sure how to move the data. Below is my input and output.
import pyodbc
import pandas
from pandas import DataFrame
cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\users\bartogre\desktop\CorpRentalPivot1.accdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
print(table_name)
cursor = cnxn.cursor()
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
cursor.execute(sql)
for data in cursor.fetchall():
print (data)
('C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb', None, 'Data', 'TABLE', None)
('C:\\users\\bartogre\\desktop\\CorpRentalPivot1.accdb', None, 'SFDB', 'TABLE', None)
(Decimal('78071898.71'), Decimal('82192672.29'), 'A')
(Decimal('12120663.79'), Decimal('13278814.52'), 'B')
A shorter and more concise answer
import pyodbc
import pandas as pd
cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};'
r'DBQ=C:\users\bartogre\desktop\data.mdb;')
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
data = pd.read_sql(sql,cnxn) # without parameters [non-prepared statement]
# with a prepared statement, use list/tuple/dictionary of parameters depending on DB
#data = pd.read_sql(sql=sql, con=cnxn, params=query_params)
I was way over thinking this one!
cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\users\bartogre\desktop\CorpRentalPivot1.accdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
print(table_name)
cursor = cnxn.cursor()
sql = "Select sum(CYTM), sum(PYTM), BRAND From data Group By BRAND"
cursor.execute(sql)
data = cursor.fetchall()
print(data)
Data = pandas.DataFrame(data)
print(Data)
Another, faster method. Please see data = pd.read_sql(sql, cnxn)
import pyodbc
import pandas as pd
from pandas import DataFrame
from pandas.tools import plotting
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns
cnxn = pyodbc.connect(r'DRIVER={Microsoft Access Driver (*.mdb, *.accdb)}; DBQ=C:\users\bartogre\desktop\data.mdb;UID="";PWD="";')
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
print(table_name)
cursor = cnxn.cursor()
sql = "Select *"
sql = sql + " From data"
print(sql)
cursor.execute(sql)
data = pd.read_sql(sql, cnxn)
Related
Scenario:
1.I am trying to insert the dataframe directly into SQL Table.
engine_azure = sqlalchemy.create_engine(sqlchemy_conn_str,echo=True,fast_executemany = True, poolclass=NullPool)
conn = engine_azure.connect()
df_final_result.to_sql('Employee', engine_azure,schema='dbo', index = False, if_exists = 'replace')
2.So is there any alternative to the above .to_sql using pypyodbc connection?
3.Below code we can use but i have 90 columns, so i want to avoid code with below iteration.
import pyodbc
cnxn = pyodbc.connect(driver='{ODBC Driver 17 for SQL Server}', server='xyz', database='xyz',
trusted_connection='yes'
cursor = cnxn.cursor()
for index, row in df2.iterrows():
cursor.execute("INSERT INTO Employee(ContactNumber,Name,Salary,Address) values(?,?,?,?,?,?)", row.ContactNumber, row.Name, row.Salary,row['Address'])
cnxn.commit()
cnxn.close()
I'm trying to use Python to create a dataframe which consists of certain rows (based on condition criteria) extracted from an MS Access table.
I can't seem to get the condition to work.
The MS Access table has column names such as Date, Course, Horse etc.
I want to, for example, get all the rows with Date = "01-Dec-2021" and Course = "Kempton".
I have managed to get the following code working with one criterion:
import pyodbc
connStr = (r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};" r"DBQ=C:\Users\chris\Documents\UKHR\SFF_Cum\SFFCum_py.accdb;")
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
sql = "select * FROM SFF_cumQ_O where Course = ?"
cursor.execute(sql, ["Kempton"])
#print(cursor.fetchone())
print(cursor.fetchall())
cursor.close()
conn.close()
Here is my import of the rows based on Date = "01-Dec-2021" and Course = "Kempton"
import pyodbc
connStr = (r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};" r"DBQ=C:\Users\chris\Documents\UKHR\SFF_Cum\SFFCum_py.accdb;")
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
sql = "select * FROM SFF_cumQ_O WHERE Date = '01-Dec-2021' and Course = 'Kempton'"
cursor.execute(sql)
print(cursor.fetchall())
However, when I try to import the rows based on Date = "01-Dec-2021" and Course = "Kempton" I run into this error :
"Exception has occurred: Error
('07002', '[07002] [Microsoft][ODBC Microsoft Access Driver] Too few parameters. Expected 1. (-3010) (SQLExecDirectW)')"
I found the problem: the criteria needed to be bracketed.
Final code looks like this:
Note the table name is not necessary with the field name. So SFF_cumQ_O.Course can just be Course.
import pyodbc
connStr = (r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};" r"DBQ=C:\Users\chris\Documents\UKHR\SFF_Cum\SFFCum_py.accdb;")
conn = pyodbc.connect(connStr)
cursor = conn.cursor()
sql = "select * FROM SFF_cumQ_O WHERE ((SFF_cumQ_O.Course)='Kempton') AND ((SFF_cumQ_O.RaceDate)='01-Dec-21')"
#sql = "select * FROM SFF_cumQ_O WHERE Date = '01-Dec-21' and Course = ?"
#cursor.execute(sql, ["Kempton"])
cursor.execute(sql)
print(cursor.fetchall())
cursor.close()
conn.close()
Ok, I have tried several kinds of solutions recommended by others on this site and other sited. However, I can't get it work as I would like it to do.
I get a XML-response which I normalize and then save to a CSV. This first part works fine.
Instead of saving it to CSV I would like to save it into an existing table in an access database. The second part below:
Would like to use an existing table instead of creating a new one
The result is not separated with ";" into different columns. Everything ends up in the same column not separated, see image below
response = requests.get(u,headers=h).json()
dp = pd.json_normalize(response,'Units')
response_list.append(dp)
export = pd.concat(response_list)
export.to_csv(r'C:\Users\username\Documents\Python Scripts\Test\Test2_'+str(now)+'.csv', index=False, sep=';',encoding='utf-8')
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
strSQL = "SELECT * INTO projects2 FROM [text;HDR=Yes;FMT=sep(;);" + \
"Database=C:\\Users\\username\\Documents\\Python Scripts\\Test].Testdata.csv;"
cur = conn.cursor()
cur.execute(strSQL)
conn.commit()
conn.close()
If you already have the data in a well-formed pandas DataFrame then you don't really need to dump it to a CSV file; you can use the sqlalchemy-access dialect to push the data directly into an Access table using pandas' to_sql() method:
from pprint import pprint
import urllib
import pandas as pd
import sqlalchemy as sa
connection_string = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
r"ExtendedAnsiSQL=1;"
)
connection_uri = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
engine = sa.create_engine(connection_uri)
with engine.begin() as conn:
# existing data in table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com')]
"""
# DataFrame to insert
df = pd.DataFrame(
[
("newdev", "newdev#example.com"),
("newerdev", "newerdev#example.com"),
],
columns=["username", "email"],
)
df.to_sql("user_table", engine, index=False, if_exists="append")
with engine.begin() as conn:
# updated table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com'),
('newdev', 'newdev#example.com'),
('newerdev', 'newerdev#example.com')]
"""
(Disclosure: I am currently the maintainer of the sqlalchemy-access dialect.)
Solved with the following code
SE_export_Tuple = list(zip(SE_export.Name,SE_export.URL,SE_export.ImageUrl,......,SE_export.ID))
print(SE_export_Tuple)
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
cursor = conn.cursor()
mySql_insert_query="INSERT INTO Temp_table (UnitName,URL,ImageUrl,.......,ID) VALUES (?,?,?,......,?)"
cursor.executemany(mySql_insert_query,SE_export_Tuple)
conn.commit()
conn.close()
However, when I add many fields I get an error at "executemany", saying:
cursor.executemany(mySql_insert_query,SE_export_Tuple)
Error: ('HY004', '[HY004] [Microsoft][ODBC Microsoft Access Driver]Invalid SQL data type (67) (SQLBindParameter)')
I have been trying to insert data from a dataframe in Python to a table already created in SQL Server. The data frame has 90K rows and wanted the best possible way to quickly insert data in the table. I only have read,write and delete permissions for the server and I cannot create any table on the server.
Below is the code which is inserting the data but it is very slow. Please advise.
import pandas as pd
import xlsxwriter
import pyodbc
df = pd.read_excel(r"Url path\abc.xlsx")
conn = pyodbc.connect('Driver={ODBC Driver 11 for SQL Server};'
'SERVER=Server Name;'
'Database=Database Name;'
'UID=User ID;'
'PWD=Password;'
'Trusted_Connection=no;')
cursor= conn.cursor()
#Deleting existing data in SQL Table:-
cursor.execute("DELETE FROM datbase.schema.TableName")
conn.commit()
#Inserting data in SQL Table:-
for index,row in df.iterrows():
cursor.execute("INSERT INTO Table Name([A],[B],[C],) values (?,?,?)", row['A'],row['B'],row['C'])
conn.commit()
cursor.close()
conn.close()
To insert data much faster, try using sqlalchemy and df.to_sql. This requires you to create an engine using sqlalchemy, and to make things faster use the option fast_executemany=True
connect_string = urllib.parse.quote_plus(f'DRIVER={{ODBC Driver 11 for SQL Server}};Server=<Server Name>,<port>;Database=<Database name>')
engine = sqlalchemy.create_engine(f'mssql+pyodbc:///?odbc_connect={connect_string}', fast_executemany=True)
with engine.connect() as connection:
df.to_sql(<table name>, connection, index=False)
Here is the script and hope this works for you.
import pandas as pd
import pyodbc as pc
connection_string = "Driver=SQL Server;Server=localhost;Database={0};Trusted_Connection=Yes;"
cnxn = pc.connect(connection_string.format("DataBaseNameHere"), autocommit=True)
cur=cnxn.cursor()
df= pd.read_csv("your_filepath_and_filename_here.csv").fillna('')
query = 'insert into TableName({0}) values ({1})'
query = query.format(','.join(df.columns), ','.join('?' * len(df1.columns)))
cur.fast_executemany = True
cur.executemany(query, df.values.tolist())
cnxn.close()
This should do what you want...very generic example...
# Insert from dataframe to table in SQL Server
import time
import pandas as pd
import pyodbc
# create timer
start_time = time.time()
from sqlalchemy import create_engine
df = pd.read_csv("C:\\your_path\\CSV1.csv")
conn_str = (
r'DRIVER={SQL Server Native Client 11.0};'
r'SERVER=Excel-PC\SQLEXPRESS;'
r'DATABASE=NORTHWND;'
r'Trusted_Connection=yes;'
)
cnxn = pyodbc.connect(conn_str)
cursor = cnxn.cursor()
for index,row in df.iterrows():
cursor.execute('INSERT INTO dbo.Table_1([Name],[Address],[Age],[Work]) values (?,?,?,?)',
row['Name'],
row['Address'],
row['Age'],
row['Work'])
cnxn.commit()
cursor.close()
cnxn.close()
# see total time to do insert
print("%s seconds ---" % (time.time() - start_time))
Try that and post back if you have additional questions/issues/concerns.
Replace df.iterrows() with df.apply() for one thing. Remove the loop for something much more efficient.
Try to populate a temp table with 1 or none indexes then insert it into your good table all at once.
Might speed things up due to not having to update the indexes after each insert??
i am trying to save my sql output to pandas dataframe, using that i have to apply some logic and output save it to table.
how can i save the resultset to pandas dataframe.
code :
import pyodbc
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DESKTOP-XXXXX;"
"Database=MOVIE_INFO;"
"Trusted_Connection=yes;")
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM MOVIE_SRC')
for row in cursor:
print('row = %r' % (row,)
Thanks
i tried another approach like
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=DESKTOP-XXXX;"
"Database=MOVIE;"
"Trusted_Connection=yes;")
cnxn = cnxn.cursor()
crsr = cnxn.cursor()
for table_name in crsr.tables(tableType='TABLE'):
print(table_name)
cursor = cnxn.cursor()
sql = "Select *"
sql = sql + " From MOVIE"
print(sql)
cursor.execute(sql)
data = pd.read_sql(sql, cnxn)
but getting error
AttributeError: 'pyodbc.Cursor' object has no attribute 'cursor'
Please share your suggestion.
Thanks
Although there are direct read methods in Pandas like pandas.read_sql() you should be able to take your successful cursor object, define new variables as empty Python lists and append the rows, then create a Pandas dataframe. Assuming your table is setup with columns as separate variables, here is some example code:
import Pandas as pd
# create some empty lists:
var1 = []
var2 = []
var3 = []
# append rows from the cursor object:
for row in cursor:
var1.append(row[0])
var2.append(row[1])
var3.append(row[2])
# Create a dictionary with header names if desired:
my_data = {'header1': var1,
'header2': var2,
'header3': var3}
# Make a Pandas dataframe:
df = pd.DataFrame(data = my_data)