Df to sql to Teradata in python - python

I'm trying to load a csv file into a Teradata table with the df.to_sql method.
So far with Teradata python modules i was able to connect, but i can't manage to load my csv file.
Here is my code :
import teradata
import pandas as pd
global udaExec
global session
global host
global username
global password
def Connexion_Teradata(usernames,passwords):
host= 'FTGPRDTD'
udaExec = teradata.UdaExec (appName="TEST", version="1.0", logConsole=False)
session=udaExec.connect(method="odbc",system=host, username=usernames,password=passwords, driver="Teradata")
print('connection ok')
df = pd.read_csv(r'C:/Users/c92434/Desktop/Load.csv')
print('chargement df ok')
df.to_sql(name = 'DB_FTG_SRS_DATALAB.mdc_load', con = session, if_exists="replace", index ="False" )
print ('done')
Connexion_Teradata ("******","****")
When I play my script all I got is:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': (3707, "[42000] [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error, expected something like '(' between the 'type' keyword and '='. ")
What can I do?

Please try this:
from teradataml.dataframe.copy_to import copy_to_sql
from sqlalchemy import create_engine
import pandas as pd
sqlalchemy_engine = create_engine('teradatasql://'+ user + ':' + passwd + '#'+host)
td_context = create_context(tdsqlengine = sqlalchemy_engine)
df = pd.read_csv(r'/Users/abc/test.csv')
Use copy_to_sql() function to create a table in Vantage based on a teradataml DataFrame or a pandas DataFrame.
copy_to_sql(df, 'testtable', if_exists='replace')

Related

How to send Excel data to MySQL using pandas and PyMySQL?

I'm having issues importing data with python into a table on my Database directly from an excel file.
I have this code:
import os
import pandas as pd
import pymysql
if os.path.exists("env.py"):
import env
print(os.environ)
# Abre conexion con la base de datos
db = pymysql.connect(
host = os.environ.get("MY_DATABASE_HOST"),
user = os.environ.get("MY_USERNAME"),
password = os.environ.get("MY_PASSWORD"),
database = os.environ.get("MY_DATABASE_NAME")
)
##################################################
################LECTURA DE EXCEL##################
tabla_azul = "./excelFiles/tablaAzul.xlsx"
dAzul = pd.read_excel(tabla_azul, sheet_name='Órdenes')
dAzul.to_sql(con=db, name='tablaazul', if_exists='append', schema='str')
#print(type(dAzul))
tabla_verde = "./excelFiles/tablaVerde.xlsx"
dVerde = pd.read_excel(tabla_verde, sheet_name='Órdenes')
dVerde.to_sql(con=db, name='tablaverde', if_exists='append', schema='str')
I'm not sure what table name I have to put into the name variable.
Do I need to use sqlalchemy yes or yes?
If question 2 is yes: Is it possible to connect sqlalchemy with pymysql?
If question 3 is no: Ho do I use the .env variables like host with sqlalchemy connection?
thank you!
when I run the code above, it's giving me this error:
pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
As stated in the pandas documentation, for any database other than SQLite .to_sql() requires a SQLAlchemy Connectable object, which is either an Engine object or a Connection object. You can create an Engine object for PyMySQL like so:
import sqlalchemy as sa
connection_url = sa.engine.URL.create(
"mysql+pymysql",
username=os.environ.get("MY_USERNAME"),
password=os.environ.get("MY_PASSWORD"),
host=os.environ.get("MY_DATABASE_HOST"),
database=os.environ.get("MY_DATABASE_NAME")
)
engine = sa.create_engine(connection_url)
Then you can call .to_sql() and pass it the engine:
dVerde.to_sql(con=engine, name='tablaverde', if_exists='append', schema='str')

Why to_sql is not working with pyodbc in pandas?

I have an excel file. Im importing that to dataframe and trying to update a database table using the data.
import pyodbc
def get_sale_file():
try:
cnxn = pyodbc.connect('DRIVER=ODBC Driver 17 for SQL Server;'
'SERVER=' + server + ';DATABASE=' + database + ';UID=' + uname + ';PWD=' + pword,
autocommit=False)
files = os.listdir(ile_path)
df = pd.DataFrame()
for f in files:
if (f.endswith('.xlsx') or f.endswith('.xls')):
df = pd.read_excel(os.path.join(sap_file_path, f))
df.to_sql('temptable', cnxn, if_exists='replace')
query = "UPDATE MList AS mas" + \
" SET TTY = temp.[Territory Code] ," + \
" Freq =temp.[Frequency Code]," + \
" FROM temptable AS temp" + \
" WHERE mas.SiteCode = temp.[ri a]"
When I execute above code block; I get
1/12/2019 10:19:45 AM ERROR: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': ('42S02', "[42S02] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Invalid object name 'sqlite_master'. (208) (SQLExecDirectW)")
Am i trying in right way? Does panads have any other function to update mssql table other than to_sql?
How can I overcome above error?
Edit
Should i have to create temptable beforehand to load datafarme? If that so, my file contains 100s of column, it may vary..(except few columns) How could I make sure pandas to load only few columns to temptable?
According the guide of pandas.DataFrame.to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html) , the connection expect a connection type sqlalchemy.engine.Engine or sqlite3.Connection , then is necesary change your code using a connection like this :
import sqlalchemy
import pyodbc
cnxn = sqlalchemy.create_engine("mssql+pyodbc://<username>:<password>#<dsnname>")
df.to_sql("table_name", cnxn,if_exists='replace')
UPDATE : Using urllib
import urllib
import pyodbc
params = urllib.quote_plus("DRIVER={ODBC Driver 17 for SQL Server};SERVER=yourserver;DATABASE=yourdatabase ;UID=user;PWD=password")
cnxn = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
df.to_sql("table_name", cnxn,if_exists='replace')
You can try another package, too, instead of pyodbc, e.g. pytds or adodbapi.
The first one is very simple, with adodbapi the connection config looks like
from adodbapi import adodbapi as adba
raw_config_adodbapi = f"PROVIDER=SQLOLEDB.1;Data Source={server};Initial Catalog={database};trusted_connection=no;User ID={user};Password={password};"
conn = adba.connect(raw_config_adodbapi, timeout=120, autocommit=True)
Besides, it seems like the parameters in the connections string in pyodbc should be enclosed in {}, but maybe it's not mandatory.

I would like to write the most efficient code to import Microsoft SQL data to a data frame

I would like to load a SQL query into a data frame as efficiently as possible. I read different sources and everyone seems to use a different approach. I am not sure why... Some are using cursors some aren't.
Currently I have:
import pandas as pd
import pyodbc
con = pyodbc.connect('Driver={something};'
'Server=something;'
'Database=something;'
'Trusted_Connection=yes;'
)
sql="""
SQL CODE
"""
df = pd.read_sql_query(con,sql)
And for some reason, this doesn't work in my machine.
Just pack it in a function. Also I add username and password (just in case)
import pandas as pd
import pyodbc
def GetSQLData(dbName, query):
sPass = 'MyPassword'
sServer = 'MyServer\\SQL1'
uname = 'MyUser'
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=" + sServer + ";"
"Database=" + dbName + ";"
"uid=" + uname + ";pwd=" + sPass)
df = pd.read_sql(query, cnxn)
return df
Try this solution
import pyodbc
import pandas as pd
cnxn = pyodbc.connect("Driver={SQL Server Native Client 11.0};"
"Server=something;"
"Database=something;"
"Trusted_Connection=yes;")
cursor = cnxn.cursor()
cursor.execute('SELECT * FROM something')
for row in cursor:
print(row)
import pandas as pd
import pyodbc
con = pyodbc.connect('Driver={something};'
'Server=something;'
'Database=something;'
'Trusted_Connection=yes;'
)
cursor = con.cursor()
cursor.execute("SQL Syntax")
Not quite sure what your last line is doing, but the cursor method works well and runs with efficiently with minimal code.
This should run. You can test it by adding in sqllist = list(cursor.fetchall()) and then print(sqllist)

Pandas import SQL query - keep column type

I have imported into pandas a SQL query by doing the following:
import pandas as pd
import numpy as np
import pyodbc
con= pyodbc.connect(
'Trusted_Connection=yes',
driver = '{SQL Server Native Client 11.0}',
server = 'SERVER',
database = 'DATABASE')
Receivables = pd.read_sql_query("select * from receivables",con)
Which works fine, but most columns are now of type "object", some has been recognized as float. Is there no method for just keeping the column type from the SQL server, where they are already defined correctly.
try to use read_sql function from pandas.

Python mysql connector with multiple statements

I'm trying to run a SQL query through mysql.connector that requires a SET command in order to query a specific table:
import mysql.connector
import pandas as pd
cnx = mysql.connector.connect(host=ip,
port=port,
user=user,
passwd=pwd,
database="")
sql="""SET variable='Test';
SELECT * FROM table """
df = pd.read_sql(sql, cnx)
when I run this I get the error "Use multi=True when executing multiple statements". But where do I put multi=True?
Pass the parameters as a dictionary into the params argument should do the trick, documentation here:
pd.read_sql(sql, cnx, params={'multi': True})
The parameters are passed to the underlying database driver.
after many hours of experimenting, i figured out how do to this. forgive me if this is not the most succinct way, but the best i could come up with-
import mysql.connector
import pandas as pd
cnx = mysql.connector.connect(host=ip,
port=port,
user=user,
passwd=pwd,
database="")
sql1="SET variable='Test';"
sql2="""SELECT * FROM table """
cursor=cnx.cursor()
cursor.execute(sql1)
cursor.close()
df = pd.read_sql(sql2, cnx)

Categories

Resources