Pandas Change To_SQL Column Mappings - python

I have a slight problem in regards to pd.to_sql(). My task is to load excel files into a MSSQL database (import wizard is not an option). I've used sqlalchemy along with pandas in the past with success but can't seem to crack this.
from sqlalchemy import create_engine
import pandas as pd
# Parameters for SQL
ServerName = "SERVER_NAME_HERE"
Database = "MY_NAME_HERE"
Driver = "driver=SQL Server Native Client 11.0"
# Create the connection
engine = create_engine('mssql+pyodbc://' + ServerName + '/' + Database + "?" + Driver)
df1=read_excel('MY_PATH_HERE')
# do my manipulations below and make sure the dtypes are correct....
#... end my manipulations
df2.to_sql('Auvi-Q_Evzio_Log', engine, if_exists='append', index=False)
ERROR:
pyodbc.ProgrammingError: ('42S22', "[42S22] [Microsoft][SQL Server Native
Client 11.0][SQL Server]Invalid column name 'Created On'. (207)
(SQLExecDirectW)")
My issue is that the schema of the database is already set up and cannot change it. I have a column in my dataframe Created On, but the column name in the Database is CreatedOn. I have a handful of columns where this issue arises. Is there a way to set the mappings or schema correctly in to_sql? there is a schema parameter in the documentation, but I can't find a valid example.
I could just change the column names of my dataframe to match the scehma, but my interest has been peeked otherwise.

I'd try the following approach:
db_tab_cols = pd.read_sql("select * from [Auvi-Q_Evzio_Log] where 1=2", engine) \
.columns.tolist()
df2.columns = db_tab_cols
df2.to_sql('Auvi-Q_Evzio_Log', engine, if_exists='append', index=False)
PS this solution assumes that df2 has the same order of columns as Auvi-Q_Evzio_Log table

Related

django panda read sql query map parameters

I am trying to connect sql server database within django framework, to read sql query result into panda dataframe
from django.db import connections
query = """SELECT * FROM [dbo].[table] WHERE project=%(Name)s"""
data = pd.read_sql(query, connections[database], params={'Name': input} )
the error message I got is 'format requires a mapping'
if I do it something like below, it will work, but I really want to be able to map each parameter with names:
from django.db import connections
query = """SELECT * FROM [dbo].[table] WHERE project=%s"""
data = pd.read_sql(query, connections[database], params={input} )
I was using odbc driver 17 for sql server
you can format at string level and then run pd.read_sql

Utility to find join columns

I have been given several tables in SQL Server and am trying to figure out the best way to join them.
What I've done is:
1) open a connection in R to the database
2) pull all the column names from the INFORMATION_SCHEMA.COLUMNS table
3) build loops in R to try every combination of columns and see what the row count is of the inner join of the 2 columns
I'm wondering if there's a better way to do this or if there's a package or utility that helps with this type of problem.
You could do your joins in python using pandas. Pandas has a powerful IO engine, so you could import from SQL Server into a pandas dataframe, perform your joins with python and write back to SQL server.
Below is a script I use to perform an import from SQL Server and an export to a MySQL table. I use the python package sqlalchemy for my ORM connections. You could follow this example and read up on joins in pandas.
import pyodbc
import pandas as pd
from sqlalchemy import create_engine
# MySQL info
username = 'user'
password = 'pw'
sqlDB = 'mydb'
# Create MSSQL PSS Connector
server = 'server'
database = 'mydb'
connMSSQL = pyodbc.connect(
'DRIVER={ODBC Driver 13 for SQL Server};' +
f'SERVER={server};PORT=1433;DATABASE={database};Trusted_Connection=yes;')
# Read Table into pandas dataframe
tsql = '''
SELECT [Index],
Tag,
FROM [dbo].[Tags]
'''
df = pd.read_sql(tsql, connMSSQL, index_col='Index')
# Write df to MySQL db
engine = create_engine(
f'mysql+mysqldb://{username}:{password}#localhost/mydb', pool_recycle=3600)
with engine.connect() as connMySQL:
df.to_sql('pss_alarms', connMySQL, if_exists='replace')

python cx_oracle hang when storing as DataFrame?

I'm trying to store the results of an Oracle SQL query into a dataframe and the execution hangs infinitely. But, when I print the query it comes out instantly. What is causing the error when saving this as a DataFrame?
import cx_Oracle
import pandas as pd
dsn_tns = cx_Oracle.makedsn('HOST', 'PORT', service_name='SID')
conn = cx_Oracle.connect(user='USER', password='PASSWORD', dsn=dsn_tns)
curr =conn.cursor()
curr.execute('alter session set current_schema= apps')
df = pd.read_sql('select * from TABLE', curr)
####THE ALTERNATIVE CODE TO PRINT THE RESULTS
# curr.execute('select * from TABLE')
# for line in curr:
# print(line)
curr.close()
conn.close()
Pandas's read_sql requires a connection object for its con argument not the result of a cursor's execute. Also, consider using SQLAlchemy the recommended interface between pandas and databases where you define the schema in the engine connection assignment. This engine also allows to_sql calls.
engine = create_engine("oracle+cx_oracle://user:pwd#host:port/dbname")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()
And as mentioned on this DBA post, in Oracle users and schemas are essentially the same thing (unlike other RBDMS). Therefore, try passing apps as the user in create_engine call with needed credentials:
engine = create_engine("oracle+cx_oracle://apps:PASSWORD#HOST:PORT/SID")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()

Load table to Oracle through pandas io SQL

Im executing the following code, the purposes of the exeuction is to create a lookup-table in the Oracle data base to speed up my load of data. The table I want to load in is simply a vector with ID values, so only one column is loaded.
The code is written per below:
lookup = df.id_variable.drop_duplicates()
conn = my_oracle_connection()
obj = lookup.to_sql(name = 'lookup', con = conn, if_exists = 'replace')
I get the following error when exeucting this:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': ORA-01036: illegal variable
name/number
I can execute a psql.read_sql() query but above fails.
Now, I dont exactly know how to go about fixing it, im quite new to the technical aspects of getting this to work so any pointers in what direction to take it would be greately appriciated.
Thanks for any time and input!
I had the same issue when using cx_Oracle connection (I was able to use .read_sql function, but not the .to_sql one)
Use SQLalchemy connection instead:
import sqlalchemy as sa
oracle_db = sa.create_engine('oracle://username:password#database')
connection = oracle_db.connect()
dataframe.to_sql('table_name', connection, schema='schema_name', if_exists='append', index=False)
I think the problem happens writing to the Oracle DB using a connection object created by cx_Oracle. SqlAlchemy has a work around:
import cx_Oracle
from sqlalchemy import types, create_engine
conn = create_engine('oracle+cx_oracle://Jeremy:SuperSecret#databasehost:1521/?service_name=gdw')
df.to_sql('TEST', conn, if_exists='replace')

Export pandas dataframe to Microsoft SQL Server 2014

I'm trying to parse an API response which is in json format to a tabular format and write the same to a table residing on Microsoft SQL server.
Each json api response has 51 columns and there are around 15 million rows of data to be written to the SQL server.
I have tried combinations of pyodbc and sqlalchemy as mentioned in other posts on SO.
Using the traditional SQL "insert into " involves hitting the database millions of times which doesn't seem right.
My current Pandas version - 0.14+, Python version 2.7.9
I get the following error when I try to write a sample data frame to the sql table on Continuum.io's wakari server.
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.
When I run the same locally, the following error is being returned:
pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")
Following is the code:
import sqlalchemy
import pyodbc
import pandas as pd
from pandas.io.sql import frame_query
from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc://<userid>:<password>#pydsn')
cnxn = engine.raw_connection()
df = pd.DataFrame()
data = pd.DataFrame({"A": range(9),"B":range(9)})
df = df.append(data)
print df
df.to_sql("dbo.xyz_test",cnxn,if_exists = 'replace')
print 'done'
Appreciate any help here.

Categories

Resources