Why to_sql is not working with pyodbc in pandas? - python

I have an excel file. Im importing that to dataframe and trying to update a database table using the data.
import pyodbc
def get_sale_file():
try:
cnxn = pyodbc.connect('DRIVER=ODBC Driver 17 for SQL Server;'
'SERVER=' + server + ';DATABASE=' + database + ';UID=' + uname + ';PWD=' + pword,
autocommit=False)
files = os.listdir(ile_path)
df = pd.DataFrame()
for f in files:
if (f.endswith('.xlsx') or f.endswith('.xls')):
df = pd.read_excel(os.path.join(sap_file_path, f))
df.to_sql('temptable', cnxn, if_exists='replace')
query = "UPDATE MList AS mas" + \
" SET TTY = temp.[Territory Code] ," + \
" Freq =temp.[Frequency Code]," + \
" FROM temptable AS temp" + \
" WHERE mas.SiteCode = temp.[ri a]"
When I execute above code block; I get
1/12/2019 10:19:45 AM ERROR: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")
Am i trying in right way? Does panads have any other function to update mssql table other than to_sql?
How can I overcome above error?
Edit
Should i have to create temptable beforehand to load datafarme? If that so, my file contains 100s of column, it may vary..(except few columns) How could I make sure pandas to load only few columns to temptable?

According the guide of pandas.DataFrame.to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) , the connection expect a connection type sqlalchemy.engine.Engine or sqlite3.Connection , then is necesary change your code using a connection like this :
import sqlalchemy
import pyodbc
cnxn = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>#<dsnname>")
df.to_sql("table_name", cnxn,if_exists='replace')
UPDATE : Using urllib
import urllib
import pyodbc
params = urllib.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};SERVER=yourserver;DATABASE=yourdatabase ;UID=user;PWD=password")
cnxn = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
df.to_sql("table_name", cnxn,if_exists='replace')

You can try another package, too, instead of pyodbc, e.g. pytds or adodbapi.
The first one is very simple, with adodbapi the connection config looks like
from adodbapi import adodbapi as adba
raw_config_adodbapi = f"PROVIDER=SQLOLEDB.1;Data Source={server};Initial Catalog={database};trusted_connection=no;User ID={user};Password={password};"
conn = adba.connect(raw_config_adodbapi, timeout=120, autocommit=True)
Besides, it seems like the parameters in the connections string in pyodbc should be enclosed in {}, but maybe it's not mandatory.

Related

Df to sql to Teradata in python

I'm trying to load a csv file into a Teradata table with the df.to_sql method.
So far with Teradata python modules i was able to connect, but i can't manage to load my csv file.
Here is my code :
import teradata
import pandas as pd
global udaExec
global session
global host
global username
global password
def Connexion_Teradata(usernames,passwords):
host= 'FTGPRDTD'
udaExec = teradata.UdaExec (appName="TEST", version="1.0", logConsole=False)
session=udaExec.connect(method="odbc",system=host, username=usernames,password=passwords, driver="Teradata")
print('connection ok')
df = pd.read_csv(r'C:/Users/c92434/Desktop/Load.csv')
print('chargement df ok')
df.to_sql(name = 'DB_FTG_SRS_DATALAB.mdc_load', con = session, if_exists="replace", index ="False" )
print ('done')
Connexion_Teradata ("******","****")
When I play my script all I got is:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': (3707, "[42000] [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error, expected something like '(' between the 'type' keyword and '='. ")
What can I do?
Please try this:
from teradataml.dataframe.copy_to import copy_to_sql
from sqlalchemy import create_engine
import pandas as pd
sqlalchemy_engine = create_engine('teradatasql://'+ user + ':' + passwd + '#'+host)
td_context = create_context(tdsqlengine = sqlalchemy_engine)
df = pd.read_csv(r'/Users/abc/test.csv')
Use copy_to_sql() function to create a table in Vantage based on a teradataml DataFrame or a pandas DataFrame.
copy_to_sql(df, 'testtable', if_exists='replace')

I would like to write the most efficient code to import Microsoft SQL data to a data frame

I would like to load a SQL query into a data frame as efficiently as possible. I read different sources and everyone seems to use a different approach. I am not sure why... Some are using cursors some aren't.
Currently I have:
import pandas as pd
import pyodbc
con = pyodbc.connect('Driver={something};'
'Server=something;'
'Database=something;'
'Trusted_Connection=yes;'
)
sql="""
SQL CODE
"""
df = pd.read_sql_query(con,sql)
And for some reason, this doesn't work in my machine.
Just pack it in a function. Also I add username and password (just in case)
import pandas as pd
import pyodbc
def GetSQLData(dbName, query):
sPass = 'MyPassword'
sServer = 'MyServer\\SQL1'
uname = 'MyUser'
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=" + sServer + ";"
"Database=" + dbName + ";"
"uid=" + uname + ";pwd=" + sPass)
df = pd.read_sql(query, cnxn)
return df
Try this solution
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=something;"
"Database=something;"
"Trusted_Connection=yes;")
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM something')
for row in cursor:
print(row)
import pandas as pd
import pyodbc
con = pyodbc.connect('Driver={something};'
'Server=something;'
'Database=something;'
'Trusted_Connection=yes;'
)
cursor = con.cursor()
cursor.execute("SQL Syntax")
Not quite sure what your last line is doing, but the cursor method works well and runs with efficiently with minimal code.
This should run. You can test it by adding in sqllist = list(cursor.fetchall()) and then print(sqllist)

Getting pypyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Must declare the scalar variable "#Id".')

I am trying to execute .sql file from Python3.
Below is the python code I am trying
import time
userdate=time.strftime("%m_%d_%H_%M%S")
import pypyodbc as pyodbc
db_host='hostname\DBTEST'
db_name='dbname'
conn='Driver={SQL Server};Server=' + db_host + ';Database=' +db_name +
';Trusted_Connection=yes;'
db=pyodbc.connect(conn)
cursor=db.cursor()
file=open('C:\\abc\\xyz.sql','r')
line=file.read()
sql_cmd=line.split('\n')
for x in sql_cmd:
cursor.execute(x)
Below is the xyz.sql script
DECLARE #XML XML;
DECLARE #FileName VARCHAR(1000);
DECLARE #Id UNIQUEIDENTIFIER
SELECT #Id = NEWID()
SELECT #FileName = 'ggg.xml'
SELECT #XML = '<Model>
....xml tags here...
....
</Model>'
IF EXISTS (SELECT * FROM tablename CM WHERE CM.columnname = 'test') BEGIN
UPDATE CM
SET CM.pn = '01-00001',
CM.rev= '06',
CM.Model = #XML,
CM.ModifiedOn = GETUTCDATE()
FROM cm.tablename CM
WHERE CM.columnname= 'test'
PRINT 'Updated ' + #FileName
END ELSE BEGIN
INSERT INTO cm.tablename(cmID, MN, CMType, Description, PN, Rev, CM,
RowStatus, ModifiedBy, ModifiedOn)
SELECT #Id, 'test123', 'abc.1', '', '01-00011', '01', #XML, 'A',
'74E8A3E0-E5CA-4563-BD49-12DFD210ED92', GETUTCDATE()
PRINT 'Inserted ' + #FileName
END
I get below error when I run the python code.
pypyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL
Server Driver][SQL Server]Must declare the scalar variable "#Id".')
DECLARE #XML XML;
DECLARE #FileName VARCHAR(1000);
DECLARE #Id UNIQUEIDENTIFIER
SELECT #Id = NEWID()
Process finished with exit code 1
Note: If I run the sql query from M/S SQL Management studio (sql server 2016), it runs successfully.
any help on this would be appreciated.
The key is to NOT execute the script one command at a time, I instead use the simpler approach of passing the entire query to cursor.execute as one unedited script.
The great thing about doing it this way is you can fully develop/debug the query in MS Sql Server and then just copy that procedure into a file (making the simple adjustments for passing arguments of course).
As a (python 3.x, with pyodbc) example, I use:
SQL_QueryName = SQL_Folder + AllAreasReportName + ".sql"
Textfile = open( SQL_QueryName, 'r' )
SQL_COMMANDS = Textfile.read()
cursor.execute( SQL_COMMANDS, ParameterList )
The same approach should work with pypyodbc.
BTW, if the query must appear within the python procedure, put the entire query inside a (triply quoted) string and pass that string to cursor.execute.

Syntax error when creating table in Vertica with PYODBC

I am trying to load a big list of sql queries into a table in Vertica using PYODBC. Here's my code:
tablename = DVXTEMP.my_table_name
sql = my_sql_query.strip().strip(';')
samplesize = 1000
createstring = 'CREATE TABLE %s AS %s \n limit %s;' %(tablename, sql, samplesize)
cursor.execute(createstring)
when I print createstring and run it in Toad, it works fine. when I try to execute it in pyodbc, it gives me the following error:
'Syntax error at or near "DVXTEMP" at character 1\n (4856) (SQLExecDirectW)'
We are using Vertica Analytic Database v7.1.2-6
Any ideas what might be causing this?
Thanks
1) did you import pyodbc?
2) did you define "cursor" from "pyodbc.connect"?
import pyodbc
DB = '[string for dbfile]'
DRV = '[string of which driver you are going to use]'
con = pyodbc.connect('DRIVER={};DBQ={}'.format(DRV,DB))
cursor = con.cursor()
##build SQL code and execute as you have done
Try SQL commands after you can connect without an error.
3) I use pyodbc for mdb files (MS Access) and some of my queries will not run unless I put single quotes outside double quotes on table/field names.
mytbl_1 = "mytbl"
SQL = 'SELECT * FROM ' + mytbl_1
print SQL
print result -> SELECT * FROM mytbl
(this fails)
mytbl_2 = '"mytbl"' #single quotes outside of double quote
SQL = 'SELECT * FROM ' + mytbl_2
print SQL
print result -> SELECT * FROM "mytbl"
(this string gets passed w/o error works for me with MDB files)

Python - pypyodbc - Importing CSV into MS SQL Server

I have a Python Script that I am running to take data from a CSV and insert it into my MS SQL Server. The CSV is about 35 MB and contains about 200,000 records with 15 columns. It takes the SQL Server Import Tool less than 5 min to insert all the data into the Server. The python script, using pypyodbc takes 30 minutes or longer.
What am I doing wrong with my code that is causing this to take so long
import pypyodbc
import csv
import datetime
now = datetime.datetime.now()
cnxn = pypyodbc.connect('DRIVER={SQL Server};SERVER=;DATABASE=')
cursor = cnxn.cursor()
cursor.execute("""
DELETE FROM DataMaster
""")
cnxn.commit()
FileName = "Data - " + str('{:02d}'.format(now.month)) + "-" + str('{:02d}'.format(now.day-1)) + "-" + str(now.year) + ".csv"
myCSV = open(FileName)
myCSV = csv.reader(myCSV)
next(myCSV, None) # this skips the headers
listlist = list(myCSV)
cursor.executemany('''
INSERT INTO DataMaster (Column1,Column2,Column3,Column4,Column5,Column6,Column7,Column8,Column9,Column10,Column11,Column12,Column13,Column14,Column15)
VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)''',
(
listlist
)
)
cursor.commit() # commits any changes
cnxn.close() # closes the connection
print "Import Completed."
Below code, should work in your case. This should not much time.
import pymysql
import logging
import numpy as np
try:
conn = pymysql.connect(hostname='', user='user', passwd='password', db='dbname', connect_timeout=5)
except Exception as e:
logger.error("ERROR: Unexpected error: Could not connect to MySql instance.")
logger.error(e)
sys.exit()
logger.info("SUCCESS: Connection to SQL Server instance succeeded")
local_cursor = conn.cursor()
path ='path/*/csv'
for files in glob.glob(path + "/*.csv"):
add_csv_file="""LOAD DATA LOCAL INFILE '%s' INTO TABLE tablename FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' IGNORE 1 LINES;;;""" %(files)
cursor.execute(add_csv_file)
cnx.commit()
cursor.close();
cnx.close();
Please select the answer if this works.

Categories

Resources