Export pandas dataframe to Microsoft SQL Server 2014 - python

I'm trying to parse an API response which is in json format to a tabular format and write the same to a table residing on Microsoft SQL server.
Each json api response has 51 columns and there are around 15 million rows of data to be written to the SQL server.
I have tried combinations of pyodbc and sqlalchemy as mentioned in other posts on SO.
Using the traditional SQL "insert into " involves hitting the database millions of times which doesn't seem right.
My current Pandas version - 0.14+, Python version 2.7.9
I get the following error when I try to write a sample data frame to the sql table on Continuum.io's wakari server.
sqlalchemy.exc.ResourceClosedError: This result object does not return rows. It has been closed automatically.
When I run the same locally, the following error is being returned:
pyodbc.ProgrammingError: ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")
Following is the code:
import sqlalchemy
import pyodbc
import pandas as pd
from pandas.io.sql import frame_query
from sqlalchemy import create_engine
engine = create_engine('mssql+pyodbc://<userid>:<password>#pydsn')
cnxn = engine.raw_connection()
df = pd.DataFrame()
data = pd.DataFrame({"A": range(9),"B":range(9)})
df = df.append(data)
print df
df.to_sql("dbo.xyz_test",cnxn,if_exists = 'replace')
print 'done'
Appreciate any help here.

Related

"Invalid SQL data type (0) (SQLBindParameter)" when querying cosmos data and using to_sql from pandas

First I query my cosmos data and put it into a dataframe. Then I connect to my database and try using to_sql:
params = urllib.parse.quote_plus(r'Driver=SQL Server;Server={SERVERNAME},1433;Database=cosmosTest;Trusted_Connection=yes;TrustServerCertificate=no;Connection Timeout=0;')
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str,echo=True)
conn = engine.connect()
df.to_sql('Document', conn, if_exists='replace', index = False)
When I try and run my script I run into this error message
('HY004', '[HY004] [Microsoft][ODBC SQL Server Driver]Invalid SQL data type (0) (SQLBindParameter)')
I'm not sure if I'm receiving this error due to something I'm doing wrong or if the data I'm querying from cosmos doesn't play well with SQL Server. I've checked a couple other posts about this issue, but they didn't really pertain to me. Any suggestions on what I can try here?

"Connection is busy" error message using petl with pyodbc connection

With python, I am trying to read a table from SQL Server, populate one field and load the resulting table back to SQL Server as a new table. I'm using "petl".
I can read the table with no problems, I populate the field successfully, but then I get an error when I try to load the resulting table back to SQL Server.
This is the python code I'm using:
import petl as etl
import pyodbc
def populate_field(z, row):
...
con = pyodbc.connect('Trusted_Connection=yes', driver='{SQL Server}', server=r'my_server', database='my_db')
query = r'SELECT TOP 10 * FROM my_table'
tab = etl.fromdb(con, query)
tab = etl.convert(tab, 'my_field', populate_field, pass_row=True)
etl.todb(tab, con, 'my_new_table')
And this is the error message I get on "etl.todb(tab, con, 'my_new_table')"
Error: ('HY000', '[HY000] [Microsoft][ODBC SQL Server Driver]Connection is busy with results for another hstmt (0) (SQLExecDirectW)')

select rows from teradata database using python based on another column in df

I am trying to limit my query to a table in teradata database using python
#1.params=dfm['cust'].astype(str).to_list()
params=tuple(list(dfm['cust'].astype(str)))
import teradata
import pandas as pd
#Make a connection
udaExec = teradata.UdaExec (appName="test", version="1.0", logConsole=False)
with udaExec.connect(method="ODBC",system=host, username=uid,
password=pwd, driver="Teradata Database ODBC Driver 16.20",authentication="LDAP") as connect:
query = "SELECT Cust, Flag FROM DBName.Tablename where Cust in %(params)s"
#Reading query to df
df = pd.read_sql(query,connect,params=params)
I tried to supply params as a list and it did not work. I tried to supply params as a tuple and it still did not work. I got the following error.
Execution failed on sql 'SELECT Cust,Flag FROM DBName.TableName where Cust in %(params)s': (3704, "[42000] [Teradata][ODBC Teradata Driver][Teradata Database](-3704)'%' ('25'X) is not a valid Teradata SQL token.")
What is wrong with the code?

Utility to find join columns

I have been given several tables in SQL Server and am trying to figure out the best way to join them.
What I've done is:
1) open a connection in R to the database
2) pull all the column names from the INFORMATION_SCHEMA.COLUMNS table
3) build loops in R to try every combination of columns and see what the row count is of the inner join of the 2 columns
I'm wondering if there's a better way to do this or if there's a package or utility that helps with this type of problem.
You could do your joins in python using pandas. Pandas has a powerful IO engine, so you could import from SQL Server into a pandas dataframe, perform your joins with python and write back to SQL server.
Below is a script I use to perform an import from SQL Server and an export to a MySQL table. I use the python package sqlalchemy for my ORM connections. You could follow this example and read up on joins in pandas.
import pyodbc
import pandas as pd
from sqlalchemy import create_engine
# MySQL info
username = 'user'
password = 'pw'
sqlDB = 'mydb'
# Create MSSQL PSS Connector
server = 'server'
database = 'mydb'
connMSSQL = pyodbc.connect(
'DRIVER={ODBC Driver 13 for SQL Server};' +
f'SERVER={server};PORT=1433;DATABASE={database};Trusted_Connection=yes;')
# Read Table into pandas dataframe
tsql = '''
SELECT [Index],
Tag,
FROM [dbo].[Tags]
'''
df = pd.read_sql(tsql, connMSSQL, index_col='Index')
# Write df to MySQL db
engine = create_engine(
f'mysql+mysqldb://{username}:{password}#localhost/mydb', pool_recycle=3600)
with engine.connect() as connMySQL:
df.to_sql('pss_alarms', connMySQL, if_exists='replace')

to_sql pandas data frame into SQL server error: DatabaseError

While trying to write a pandas' dataframe into sql-server, I get this error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW); [42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Statement(s) could not be prepared. (8180)")
It seems pandas is looking into sqlite instead of the real database.
It's not a connection problem since I can read from the sql-server with the same connection using pandas.read_sql
The connection has been set using
sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
It's not a database permission problem either since I can write line by line using the same connection parameters as:
cursor = conn.cursor()
cursor.execute('insert into test values (1, 'test', 10)')
conn.commit()
I could just write a loop to instert line by line but I would like to know why to_sql isn't working for me, and I am affraid it won't be as efficient.
Environment:
Python: 2.7
Pandas: 0.20.1
sqlalchemy: 1.1.12
Thanks in advance.
runnable example:
import pandas as pd
from sqlalchemy import create_engine
import urllib
params = urllib.quote_plus("DRIVER={SQL Server Native Client 11.0};SERVER=
<servername>;DATABASE=<databasename>;UID=<username>;PWD=<password>")
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
test = pd.DataFrame({'col1':1, 'col2':'test', 'col3':10}, index=[0])
conn=engine.connect().connection
test.to_sql("dbo.test", con=conn, if_exists="append", index=False)
According to the to_sql doc, the con parameter is either an SQLAchemy engine or the legacy DBAPI2 connection (sqlite3). Because you are passing the connection object rather than the SQLAlchemy engine object as the parameter, pandas is inferring that you're passing a DBAPI2 connection, or a SQLite3 connection since its the only one supported. To remedy this, just do:
myeng = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
# Code to create your df
...
# Now write to DB
df.to_sql('table', myeng, index=False)
try this.
good to connect MS SQL server(SQL Authentication) and update data
from sqlalchemy import create_engine
params = urllib.parse.quote_plus(
'DRIVER={ODBC Driver 13 for SQL Server};'+
'SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
engine = create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
#df: pandas.dataframe; mTableName:table name in MS SQL
#warning: discard old table if exists
df.to_sql(mTableName, con=engine, if_exists='replace', index=False)
So I ran into this same thing. I tried looking through the code, couldn't figure out why it wasn't working but it looks like it gets stuck on this call.
pd.io.sql._is_sqlalchemy_connectable(engine)
I found that if I run this first it returns True, but as soon as I run it after running df.to_sql() it returns False. Right now I'm running it before I do the df.to_sql() and it actually works.
Hope this helps.

Categories

Resources