Writing to MySQL database with pandas using SQLAlchemy, to_sql - python

trying to write pandas dataframe to MySQL table using to_sql. Previously been using flavor='mysql', however it will be depreciated in the future and wanted to start the transition to using SQLAlchemy engine.
sample code:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
cnx = engine.raw_connection()
data = pd.read_sql('SELECT * FROM sample_table', cnx)
data.to_sql(name='sample_table2', con=cnx, if_exists = 'append', index=False)
The read works fine but the to_sql has an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': Wrong number of arguments during
string formatting
Why does it look like it is trying to use sqlite? What is the correct use of a sqlalchemy connection with mysql and specifically mysql.connector?
I also tried passing the engine in as the connection as well, and that gave me an error referencing no cursor object.
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
>>AttributeError: 'Engine' object has no attribute 'cursor'

Using the engine in place of the raw_connection() worked:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
Not clear on why when I tried this yesterday it gave me the earlier error.

Alternatively, use pymysql package...
import pymysql
from sqlalchemy import create_engine
cnx = create_engine('mysql+pymysql://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data = pd.read_sql('SELECT * FROM sample_table', cnx)
data.to_sql(name='sample_table2', con=cnx, if_exists = 'append', index=False)

Using pymysql and sqlalchemy, this works for Pandas v0.22:
import pandas as pd
import pymysql
from sqlalchemy import create_engine
user = 'yourUserName'
passw = 'password'
host = 'hostName' # either localhost or ip e.g. '172.17.0.2' or hostname address
port = 3306
database = 'dataBaseName'
mydb = create_engine('mysql+pymysql://' + user + ':' + passw + '#' + host + ':' + str(port) + '/' + database , echo=False)
directory = r'directoryLocation' # path of csv file
csvFileName = 'something.csv'
df = pd.read_csv(os.path.join(directory, csvFileName ))
df.to_sql(name=csvFileName[:-4], con=mydb, if_exists = 'replace', index=False)
"""
if_exists: {'fail', 'replace', 'append'}, default 'fail'
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
"""

I know in the title of the question is included the word SQLAlchemy, however I see in the questions and answers the need to import pymysql or mysql.connector, and also is possible to do the job with pymysql, withouth calling SQLAlchemy.
import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud' # In previous posts variable "pass"
host = '172.17.0.2'
port = 3306
database = 'sample_table' # In previous posts similar to "schema"
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database)
data.to_sql(name=database, con=conn, if_exists = 'append', index=False, flavor = 'mysql')
I think this solution could be good althought it is not using SQLAlchemy.

Related

SQLalchemy query PostgreSQL database view ERROR: function schema_name() does not exist

I am trying to use python sqlalchemy to query our PostgreSQL database view using ODBC but I am getting the error
{ProgrammingError}(pyodbc.ProgrammingError) ('42883', '[42883] ERROR: function schema_name() does not exist;\nError while executing the query (1) (SQLExecDirectW)')
[SQL: SELECT schema_name()]
(Background on this error at: https://sqlalche.me/e/14/f405)
Using the code below, I successfully create the connection engine but executing the query seems to be the problem.
When using 'pyodbc' or 'psycopg2' establishing the connection and querying data does work perfectly, but with a warning
'UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn('
as to why we are looking into establishing the connection the sqlalchemy-way
import config
import sqlalchemy
if __name__ == '__main__':
connection_string = (config.odbc('database_odbc.txt'))
connection_url = sqlalchemy.engine.url.URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
conn = sqlalchemy.create_engine(connection_url)
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
As mentioned, the query runs perfectly using the other methods. I already tried different syntax of the database view including
SELECT [column name in view] FROM [database name].public.[name of view]
SELECT [column name in view] FROM [name of view]
and more without success.
Any help is appreciated, thank you!
Thank you #Gord Thompson,
I followed the default postgresql syntax at https://docs.sqlalchemy.org/en/14/core/engines.html
engine = create_engine('postgresql://scott:tiger#localhost/mydatabase')
now the code looks like
import sqlalchemy
if __name__ == '__main__':
engine = create_engine('postgresql://[user]:[password]#[host]/[db]')
conn = engine.connect()
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
and now it works perfectly, thank you!

"SELECT name FROM sqlite_master" error while uploading DataFrame using .to_sql()

Context: I'd like to send a concatenated data frame (I joined several dataframes from individual stock data) into a MySQL database, however, I can't seem to create a table and send the data there
Problem: When I run this code df.to_sql(name='stockdata', con=con, if_exists='append', index=False) (source: Writing a Pandas Dataframe to MySQL), I keep getting this error: pandas.io.sql.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting.
I'm new to MySQL as well so any help is very welcome! Thank you
from __future__ import print_function
import pandas as pd
from datetime import date, datetime, timedelta
import numpy as np
import yfinance as yf
import mysql.conector
import pymysql as pymysql
import pandas_datareader.data as web
from sqlalchemy import create_engine
import yahoo_fin.stock_info as si
######################################################
# PyMySQL configuration
user = '...'
passw = '...'
host = '...'
port = 3306
database = 'stockdata'
con.cursor().execute("CREATE DATABASE IF NOT EXISTS {0} ".format(database))
con = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database,
charset='utf8')
df.to_sql(name='stockdata', con=con, if_exists='append', index=False)
.to_sql() expects the second argument to be either a SQLAlchemy Connectable object (Engine or Connection) or a DBAPI Connection object. If it is the latter then pandas assumes that it is a SQLite connection.
You need to use SQLAlchemy to create an engine object
engine = create_engine("mysql+pymysql://…")
and pass that to to_sql()

How can I read data from AWS- Aurora postgresql as python pandas DataFrame and write the same to Oracle table?

I need to read from AWS- Aurora table and write the content to Oracle table.
My code is-
import pandas as pd
import psycopg2
from sqlalchemy import types, create_engine
import cx_Oracle
import sys
**# Connect to Aurora**
host = sys.argv[1]
username = sys.argv[2]
password = sys.argv[3]
database = sys.argv[4]
db_conn = psycopg2.connect(host=host, database=database, user=username, password=password)
sql = "SELECT * FROM Table_Name;"
data_df = pd.io.sql.read_sql(sql, db_conn)
print(data_df.head(2))
db_conn.close()
# Connect to Oracle and write data_df dataframe
dsn = cx_Oracle.makedsn('10.z.y.xx', '1521', service_name='abcd')
u_name = sys.argv[5]
pwd = sys.argv[6]
conn = cx_Oracle.connect(user=u_name, password=pwd, dsn=dsn)
ora_engine = create_engine(f'oracle+cx_oracle://{u_name}:{pwd}#{dsn}', echo=True)
ora_engine.connect()
data_df.to_sql(name='oracle_table_name', con=conn)
conn.close()
Connect to Aurora is working but I'm unable to create engine in Oracle and write the dataframe!
The code is correct, due to high volume of data and low RAM being configured, it was failing.
Thanks.

python cx_oracle hang when storing as DataFrame?

I'm trying to store the results of an Oracle SQL query into a dataframe and the execution hangs infinitely. But, when I print the query it comes out instantly. What is causing the error when saving this as a DataFrame?
import cx_Oracle
import pandas as pd
dsn_tns = cx_Oracle.makedsn('HOST', 'PORT', service_name='SID')
conn = cx_Oracle.connect(user='USER', password='PASSWORD', dsn=dsn_tns)
curr =conn.cursor()
curr.execute('alter session set current_schema= apps')
df = pd.read_sql('select * from TABLE', curr)
####THE ALTERNATIVE CODE TO PRINT THE RESULTS
# curr.execute('select * from TABLE')
# for line in curr:
# print(line)
curr.close()
conn.close()
Pandas's read_sql requires a connection object for its con argument not the result of a cursor's execute. Also, consider using SQLAlchemy the recommended interface between pandas and databases where you define the schema in the engine connection assignment. This engine also allows to_sql calls.
engine = create_engine("oracle+cx_oracle://user:pwd#host:port/dbname")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()
And as mentioned on this DBA post, in Oracle users and schemas are essentially the same thing (unlike other RBDMS). Therefore, try passing apps as the user in create_engine call with needed credentials:
engine = create_engine("oracle+cx_oracle://apps:PASSWORD#HOST:PORT/SID")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()

impala connection via sqlalchemy

I'm new to hadoop and impala. I managed to connect to impala by installing impyla and executing the following code. This is connection by LDAP:
from impala.dbapi import connect
from impala.util import as_pandas
conn = connect(host="server.lrd.com",port=21050, database='tcad',auth_mechanism='PLAIN', user="alexcj", use_ssl=True,timeout=20, password="secret1pass")
I'm then able to grab a cursor and execute queries as:
cursor = conn.cursor()
cursor.execute('SELECT * FROM tab_2014_m LIMIT 10')
df = as_pandas(cursor)
I'd like to be able use sqlalchemy to connect to impala and be able to use some nice sqlalchemy functions. I found a test file in imyla source code that illustrates how to create an sqlalchemy engine with impala driver like:
engine = create_engine('impala://localhost')
I'd like to be able to do that but I'm not able to because my call to the connect function above has a lot more parameters; and I do not know how to pass those to sqlalchemy's create_engine to get a successful connection. Has anyone done this? Thanks.
As explained at https://github.com/cloudera/impyla/issues/214
import sqlalchemy
def conn():
return connect(host='some_host',
port=21050,
database='default',
timeout=20,
use_ssl=True,
ca_cert='some_pem',
user=user, password=pwd,
auth_mechanism='PLAIN')
engine = sqlalchemy.create_engine('impala://', creator=conn)
If your Impala is secured by Kerberos below script works (due to some reason I need to use hive:// instead of impala://)
import sqlalchemy
from sqlalchemy.engine import create_engine
connect_args={'auth': 'KERBEROS', 'kerberos_service_name': 'impala'}
engine = create_engine('hive://impalad-host:21050', connect_args=connect_args)
conn = engine.connect()
ResultProxy = conn.execute("SELECT * FROM db1.table1 LIMIT 5")
print(ResultProxy.fetchall())
import time
from sqlalchemy import create_engine, MetaData, Table, select, and_
ENGINE = create_engine(
'impala://{host}:{port}/{database}'.format(
host=host, # your host
port=port,
database=database,
)
)
METADATA = MetaData(ENGINE)
TABLES = {
'table': Table('table_name', METADATA, autoload=True),
}

Categories

Resources