Storing a dataframe with sqlalchemy, pyodbc: SQL syntax error - python

I would to like to store a dataframe into a Teradata database using the command pandas.to_sql, but get a SQL syntax error. Error appears to come from the built-in method, I don't know how to deal with it.
My code:
import pandas as pd
import datetime as dt
import sqlalchemy, pyodbc
todays_date = dt.datetime.now().date()
index = pd.date_range(todays_date-dt.timedelta(10), periods=10, freq='D')
columns = ['A','B', 'C']
df_ = pd.DataFrame(index=index, columns=columns)
df_ = df_.fillna(0)
engine = sqlalchemy.create_engine("mssql+pyodbc://" + user + ":" + passwd + "#" +dsnname)
df_.to_sql(name= 'TableTest', con = engine, if_exists='replace')
And the error I get:
ProgrammingError: (pyodbc.ProgrammingError) ('42000', "[42000] [Teradata][ODBC Teradata Driver][Teradata Database] Syntax error: expected something between '(' and ')'. (-3706) (SQLExecDirectW)") [SQL: 'SELECT schema_name()']

Here's a two part answer:
Install sqlalchemy-teradata.
Create the engine and the table as follows:
engine = sqlalchemy.create_engine("teradata://" + user + ":" + passwd + "#" +dsnname)
df.to_sql(name= 'TableTest', con = engine, index=False, schema='database_name', if_exists='replace')

Related

Snowflake: SQL compilation error: error line invalid identifier '"dateutc"'

I'm moving data from Postgres to snowflake. Originally it worked however I've added:
df_postgres["dateutc"]= pd.to_datetime(df_postgres["dateutc"])
because the date format was incorrectly loading to snowflake and now I see this error:
SQL compilation error: error line 1 at position 87 invalid identifier
'"dateutc"'
Here is my code:
from sqlalchemy import create_engine
import pandas as pd
import glob
import os
from config import postgres_user, postgres_pass, host,port, postgres_db, snow_user, snow_pass,snow_account,snow_warehouse
from snowflake.connector.pandas_tools import pd_writer
from snowflake.sqlalchemy import URL
from sqlalchemy.dialects import registry
registry.register('snowflake', 'snowflake.sqlalchemy', 'dialect')
engine = create_engine(f'postgresql://{postgres_user}:{postgres_pass}#{host}:{port}/{postgres_db}')
conn = engine.connect()
#reads query
df_postgres = pd.read_sql("SELECT * FROM rok.my_table", conn)
#dropping these columns
drop_cols=['RPM', 'RPT']
df_postgres.drop(drop_cols, inplace=True, axis=1)
#changed columns to lowercase
df_postgres.columns = df_postgres.columns.str.lower()
df_postgres["dateutc"]= pd.to_datetime(df_postgres["dateutc"])
print(df_postgres.dateutc.dtype)
sf_conn = create_engine(URL(
account = snow_account,
user = snow_user,
password = snow_pass,
database = 'test',
schema = 'my_schema',
warehouse = 'test',
role = 'test',
))
df_postgres.to_sql(name='my_table',
index = False,
con = sf_conn,
if_exists = 'append',
chunksize = 300,
method = pd_writer)
Moving Ilja's answer from comment to answer for completeness:
Snowflake is case sensitive.
When writing "unquoted" SQL, Snowflake will convert table names and fields to uppercase.
This usually works, until someone decides to start quoting their identifiers in SQL.
pd_writer adds quotes to identifiers.
Hence when you have df_postgres["dateutc"] it remains in lowercase when its transformed into a fully quoted query.
Writing df_postgres["DATEUTC"] in Python should fix the issue.

JSON response into database

Ok, I have tried several kinds of solutions recommended by others on this site and other sited. However, I can't get it work as I would like it to do.
I get a XML-response which I normalize and then save to a CSV. This first part works fine.
Instead of saving it to CSV I would like to save it into an existing table in an access database. The second part below:
Would like to use an existing table instead of creating a new one
The result is not separated with ";" into different columns. Everything ends up in the same column not separated, see image below
response = requests.get(u,headers=h).json()
dp = pd.json_normalize(response,'Units')
response_list.append(dp)
export = pd.concat(response_list)
export.to_csv(r'C:\Users\username\Documents\Python Scripts\Test\Test2_'+str(now)+'.csv', index=False, sep=';',encoding='utf-8')
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
strSQL = "SELECT * INTO projects2 FROM [text;HDR=Yes;FMT=sep(;);" + \
"Database=C:\\Users\\username\\Documents\\Python Scripts\\Test].Testdata.csv;"
cur = conn.cursor()
cur.execute(strSQL)
conn.commit()
conn.close()
If you already have the data in a well-formed pandas DataFrame then you don't really need to dump it to a CSV file; you can use the sqlalchemy-access dialect to push the data directly into an Access table using pandas' to_sql() method:
from pprint import pprint
import urllib
import pandas as pd
import sqlalchemy as sa
connection_string = (
r"DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};"
r"DBQ=C:\Users\Public\Database1.accdb;"
r"ExtendedAnsiSQL=1;"
)
connection_uri = f"access+pyodbc:///?odbc_connect={urllib.parse.quote_plus(connection_string)}"
engine = sa.create_engine(connection_uri)
with engine.begin() as conn:
# existing data in table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com')]
"""
# DataFrame to insert
df = pd.DataFrame(
[
("newdev", "newdev#example.com"),
("newerdev", "newerdev#example.com"),
],
columns=["username", "email"],
)
df.to_sql("user_table", engine, index=False, if_exists="append")
with engine.begin() as conn:
# updated table
pprint(
conn.execute(sa.text("SELECT * FROM user_table")).fetchall(), width=30
)
"""
[('gord', 'gord#example.com'),
('jennifer', 'jennifer#example.com'),
('newdev', 'newdev#example.com'),
('newerdev', 'newerdev#example.com')]
"""
(Disclosure: I am currently the maintainer of the sqlalchemy-access dialect.)
Solved with the following code
SE_export_Tuple = list(zip(SE_export.Name,SE_export.URL,SE_export.ImageUrl,......,SE_export.ID))
print(SE_export_Tuple)
access_path = r"C:\Users\username\Documents\Python Scripts\Test\Test_db.accdb"
conn = pyodbc.connect("DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={};" \
.format(access_path))
cursor = conn.cursor()
mySql_insert_query="INSERT INTO Temp_table (UnitName,URL,ImageUrl,.......,ID) VALUES (?,?,?,......,?)"
cursor.executemany(mySql_insert_query,SE_export_Tuple)
conn.commit()
conn.close()
However, when I add many fields I get an error at "executemany", saying:
cursor.executemany(mySql_insert_query,SE_export_Tuple)
Error: ('HY004', '[HY004] [Microsoft][ODBC Microsoft Access Driver]Invalid SQL data type (67) (SQLBindParameter)')

SQL query not running in Python

I have the following Python code:
import pandas as pd
from sqlalchemy import create_engine
import mysql.connector
# Give the location of the file
loc = ("C:\\Users\\27826\\Desktop\\11Sixteen\\Models and Reports\\Historical results files\\EPL 1993-94.csv")
df = pd.read_csv(loc)
# Remove empty columns then rows
df = df.dropna(axis=1, how='all')
df = df.dropna(axis=0, how='all')
# Create DataFrame and then import to db (new game results table)
engine = create_engine("mysql://root:xxx#localhost/11sixteen")
df.to_sql('new_game_results', con=engine, if_exists="replace")
# Move from new games results table to game results table
db = mysql.connector.connect(host="localhost",
user="root",
passwd="xxx",
database="11sixteen")
my_cursor = db.cursor()
my_cursor.execute("INSERT INTO 11sixteen.game_results "
"SELECT * FROM 11sixteen.new_game_results WHERE "
"NOT EXISTS (SELECT date, HomeTeam "
"FROM 11sixteen.game_results WHERE "
"11sixteen.game_results.date = 11sixteen.new_game_results.date AND "
"11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)")
print("complete")
Basically the objective is that I copy data from several excel files to a SQL table (one at a time) and then transfer it from there to the fuller table where ALL the data will be aggregated (without duplicates hopefully)
Everything works 100% except the SQL query as below:
INSERT INTO 11sixteen.game_results
SELECT * FROM 11sixteen.new_game_results
WHERE NOT EXISTS ( SELECT date, HomeTeam
FROM 11sixteen.game_results WHERE
11sixteen.game_results.date = 11sixteen.new_game_results.date AND
11sixteen.game_results.HomeTeam = 11sixteen.new_game_results.HomeTeam)
If I run the same query on MySQL Workbench it works perfect. Any ideas why I can't get Python to execute the query as expected?
add a commit at the end.
db.commit()

How to execute multiple queries in pandas?

How to execute the following queries with sqlalchemy?
import pandas as pd
import urllib
from sqlalchemy import create_engine
from sqlalchemy.types import NVARCHAR
params = urllib.parse.quote_plus(r'DRIVER={SQL Server};SERVER=localhost\SQLEXPRESS;Trusted_Connection=yes;DATABASE=my_db;autocommit=true;MultipleActiveResultSets=True')
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine = create_engine(conn_str, encoding = 'utf-8-sig')
with engine.connect() as con:
con.execute('Declare #latest_date nvarchar(8);')
con.execute('SELECT #latest_date = max(date) FROM my_table')
df = pd.read_sql_query('SELECT * from my_db where date = #latest_date', conn_str)
However, an error occured:
sqlalchemy.exc.ProgrammingError: (pyodbc.ProgrammingError) ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL Server]Must declare the scalar variable "#latest_date". (137) (SQLExecDirectW)')
How to solve this problem?
Thanks.
You don't need to declare a variable and use so many queries, you can do it just with one query:
SELECT *
FROM my_db
WHERE date = (SELECT max(date)
FROM my_db)
And then you can use, i use backticks because date is a reserved word:
with engine.connect() as con:
query="SELECT * FROM my_db WHERE `date` = (SELECT max(`date`) FROM my_db)"
df = pd.read_sql(query, con=con)

How to import all the tables from a Postgres schema using python

I know I can do it manually using sqlalchemy and pandas
dbschema ='myschema'
engine = create_engine('postgresql://XX:YY#localhost:5432/DB',
connect_args={'options': '-csearch_path={}'.format(dbschema )})
df = psql.read_sql('Select * from myschema."df"', con = engine)
But is it possible to do a loop and to get all the tables ?
I tried something like
tables = engine.table_names()
print(tables)
['A', 'B']
for table in tables :
table = psql.read_sql('Select * from myschema."%(name)s"', con = engine, params={'name' : table})
I get this message:
LINE 1: Select * from myschema.'A'
I guess the problem is caused by my quotes but I am not so sure.
EDIT :
So I tried the example here : Passing table name as a parameter in psycopg2
from psycopg2 import sql
try:
conn = psycopg2.connect("dbname='DB' user='XX' host='localhost' password='YY'")
except:
print ('I am unable to connect to the database')
print(conn)
cur = conn.cursor()
for table in tables :
table = cur.execute(sql.SQL("Select * from myschema.{}").format(sql.Identifier(table)))
But my tables are 'None' so I am doing something wrong but I can't see what.

Categories

Resources