python cx_oracle hang when storing as DataFrame?

python cx_oracle hang when storing as DataFrame? - python

I'm trying to store the results of an Oracle SQL query into a dataframe and the execution hangs infinitely. But, when I print the query it comes out instantly. What is causing the error when saving this as a DataFrame?
import cx_Oracle
import pandas as pd
dsn_tns = cx_Oracle.makedsn('HOST', 'PORT', service_name='SID')
conn = cx_Oracle.connect(user='USER', password='PASSWORD', dsn=dsn_tns)
curr =conn.cursor()
curr.execute('alter session set current_schema= apps')
df = pd.read_sql('select * from TABLE', curr)
####THE ALTERNATIVE CODE TO PRINT THE RESULTS
# curr.execute('select * from TABLE')
# for line in curr:
# print(line)
curr.close()
conn.close()

Pandas's read_sql requires a connection object for its con argument not the result of a cursor's execute. Also, consider using SQLAlchemy the recommended interface between pandas and databases where you define the schema in the engine connection assignment. This engine also allows to_sql calls.
engine = create_engine("oracle+cx_oracle://user:pwd#host:port/dbname")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()
And as mentioned on this DBA post, in Oracle users and schemas are essentially the same thing (unlike other RBDMS). Therefore, try passing apps as the user in create_engine call with needed credentials:
engine = create_engine("oracle+cx_oracle://apps:PASSWORD#HOST:PORT/SID")
df = pd.read_sql('select * from TABLE', con=engine)
engine.dispose()

Related

SQLalchemy query PostgreSQL database view ERROR: function schema_name() does not exist

I am trying to use python sqlalchemy to query our PostgreSQL database view using ODBC but I am getting the error
{ProgrammingError}(pyodbc.ProgrammingError) ('42883', '[42883] ERROR: function schema_name() does not exist;\nError while executing the query (1) (SQLExecDirectW)')
[SQL: SELECT schema_name()]
(Background on this error at: https://sqlalche.me/e/14/f405)
Using the code below, I successfully create the connection engine but executing the query seems to be the problem.
When using 'pyodbc' or 'psycopg2' establishing the connection and querying data does work perfectly, but with a warning
'UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy
warnings.warn('
as to why we are looking into establishing the connection the sqlalchemy-way
import config
import sqlalchemy
if __name__ == '__main__':
connection_string = (config.odbc('database_odbc.txt'))
connection_url = sqlalchemy.engine.url.URL.create("mssql+pyodbc", query={"odbc_connect": connection_string})
conn = sqlalchemy.create_engine(connection_url)
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
As mentioned, the query runs perfectly using the other methods. I already tried different syntax of the database view including
SELECT [column name in view] FROM [database name].public.[name of view]
SELECT [column name in view] FROM [name of view]
and more without success.
Any help is appreciated, thank you!

Thank you #Gord Thompson,
I followed the default postgresql syntax at https://docs.sqlalchemy.org/en/14/core/engines.html
engine = create_engine('postgresql://scott:tiger#localhost/mydatabase')
now the code looks like
import sqlalchemy
if __name__ == '__main__':
engine = create_engine('postgresql://[user]:[password]#[host]/[db]')
conn = engine.connect()
query_string = """SELECT [column name in view] FROM public.[name of view]"""
df1 = pd.read_sql(query_string, conn)
print(df1.to_string())
conn.close()
print('Database connection closed.')
and now it works perfectly, thank you!

Utility to find join columns

I have been given several tables in SQL Server and am trying to figure out the best way to join them.
What I've done is:
1) open a connection in R to the database
2) pull all the column names from the INFORMATION_SCHEMA.COLUMNS table
3) build loops in R to try every combination of columns and see what the row count is of the inner join of the 2 columns
I'm wondering if there's a better way to do this or if there's a package or utility that helps with this type of problem.

You could do your joins in python using pandas. Pandas has a powerful IO engine, so you could import from SQL Server into a pandas dataframe, perform your joins with python and write back to SQL server.
Below is a script I use to perform an import from SQL Server and an export to a MySQL table. I use the python package sqlalchemy for my ORM connections. You could follow this example and read up on joins in pandas.
import pyodbc
import pandas as pd
from sqlalchemy import create_engine
# MySQL info
username = 'user'
password = 'pw'
sqlDB = 'mydb'
# Create MSSQL PSS Connector
server = 'server'
database = 'mydb'
connMSSQL = pyodbc.connect(
'DRIVER={ODBC Driver 13 for SQL Server};' +
f'SERVER={server};PORT=1433;DATABASE={database};Trusted_Connection=yes;')
# Read Table into pandas dataframe
tsql = '''
SELECT [Index],
Tag,
FROM [dbo].[Tags]
'''
df = pd.read_sql(tsql, connMSSQL, index_col='Index')
# Write df to MySQL db
engine = create_engine(
f'mysql+mysqldb://{username}:{password}#localhost/mydb', pool_recycle=3600)
with engine.connect() as connMySQL:
df.to_sql('pss_alarms', connMySQL, if_exists='replace')

Python to SQL Server Insert

I'm trying to follow the method for inserting a Panda data frame into SQL Server that is mentioned here as it appears to be the fastest way to import lots of rows.
However I am struggling with figuring out the connection parameter.
I am not using DSN , I have a server name, a database name, and using trusted connection (i.e. windows login).
import sqlalchemy
import urllib
server = 'MYServer'
db = 'MyDB'
cxn_str = "DRIVER={SQL Server Native Client 11.0};SERVER=" + server +",1433;DATABASE="+db+";Trusted_Connection='Yes'"
#cxn_str = "Trusted_Connection='Yes',Driver='{ODBC Driver 13 for SQL Server}',Server="+server+",Database="+db
params = urllib.parse.quote_plus(cxn_str)
engine = sqlalchemy.create_engine("mssql+pyodbc:///?odbc_connect=%s" % params)
conn = engine.connect().connection
cursor = conn.cursor()
I'm just not sure what the correct way to specify my connection string is. Any suggestions?

I have been working with pandas and SQL server for a while and the fastest way I found to insert a lot of data in a table was in this way:
You can create a temporary CSV using:
df.to_csv('new_file_name.csv', sep=',', encoding='utf-8')
Then use pyobdc and BULK INSERT Transact-SQL:
import pyodbc
conn = pyodbc.connect(DRIVER='{SQL Server}', Server='server_name', Database='Database_name', trusted_connection='yes')
cur = conn.cursor()
cur.execute("""BULK INSERT table_name
FROM 'C:\\Users\\folders path\\new_file_name.csv'
WITH
(
CODEPAGE = 'ACP',
FIRSTROW = 2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)""")
conn.commit()
cur.close()
conn.close()
Then you can delete the file:
import os
os.remove('new_file_name.csv')
It was a second to charge a lot of data at once into SQL Server. I hope this gives you an idea.
Note: don't forget to have a field for the index. It was my mistake when I started to use this lol.

Connection string parameter values should not be enclosed in quotes so you should use Trusted_Connection=Yes instead of Trusted_Connection='Yes'.

Load table to Oracle through pandas io SQL

Im executing the following code, the purposes of the exeuction is to create a lookup-table in the Oracle data base to speed up my load of data. The table I want to load in is simply a vector with ID values, so only one column is loaded.
The code is written per below:
lookup = df.id_variable.drop_duplicates()
conn = my_oracle_connection()
obj = lookup.to_sql(name = 'lookup', con = conn, if_exists = 'replace')
I get the following error when exeucting this:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': ORA-01036: illegal variable
name/number
I can execute a psql.read_sql() query but above fails.
Now, I dont exactly know how to go about fixing it, im quite new to the technical aspects of getting this to work so any pointers in what direction to take it would be greately appriciated.
Thanks for any time and input!

I had the same issue when using cx_Oracle connection (I was able to use .read_sql function, but not the .to_sql one)
Use SQLalchemy connection instead:
import sqlalchemy as sa
oracle_db = sa.create_engine('oracle://username:password#database')
connection = oracle_db.connect()
dataframe.to_sql('table_name', connection, schema='schema_name', if_exists='append', index=False)

I think the problem happens writing to the Oracle DB using a connection object created by cx_Oracle. SqlAlchemy has a work around:
import cx_Oracle
from sqlalchemy import types, create_engine
conn = create_engine('oracle+cx_oracle://Jeremy:SuperSecret#databasehost:1521/?service_name=gdw')
df.to_sql('TEST', conn, if_exists='replace')

Writing to MySQL database with pandas using SQLAlchemy, to_sql

trying to write pandas dataframe to MySQL table using to_sql. Previously been using flavor='mysql', however it will be depreciated in the future and wanted to start the transition to using SQLAlchemy engine.
sample code:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
cnx = engine.raw_connection()
data = pd.read_sql('SELECT * FROM sample_table', cnx)
data.to_sql(name='sample_table2', con=cnx, if_exists = 'append', index=False)
The read works fine but the to_sql has an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master
WHERE type='table' AND name=?;': Wrong number of arguments during
string formatting
Why does it look like it is trying to use sqlite? What is the correct use of a sqlalchemy connection with mysql and specifically mysql.connector?
I also tried passing the engine in as the connection as well, and that gave me an error referencing no cursor object.
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
>>AttributeError: 'Engine' object has no attribute 'cursor'

Using the engine in place of the raw_connection() worked:
import pandas as pd
import mysql.connector
from sqlalchemy import create_engine
engine = create_engine('mysql+mysqlconnector://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data.to_sql(name='sample_table2', con=engine, if_exists = 'append', index=False)
Not clear on why when I tried this yesterday it gave me the earlier error.

Alternatively, use pymysql package...
import pymysql
from sqlalchemy import create_engine
cnx = create_engine('mysql+pymysql://[user]:[pass]#[host]:[port]/[schema]', echo=False)
data = pd.read_sql('SELECT * FROM sample_table', cnx)
data.to_sql(name='sample_table2', con=cnx, if_exists = 'append', index=False)

Using pymysql and sqlalchemy, this works for Pandas v0.22:
import pandas as pd
import pymysql
from sqlalchemy import create_engine
user = 'yourUserName'
passw = 'password'
host = 'hostName' # either localhost or ip e.g. '172.17.0.2' or hostname address
port = 3306
database = 'dataBaseName'
mydb = create_engine('mysql+pymysql://' + user + ':' + passw + '#' + host + ':' + str(port) + '/' + database , echo=False)
directory = r'directoryLocation' # path of csv file
csvFileName = 'something.csv'
df = pd.read_csv(os.path.join(directory, csvFileName ))
df.to_sql(name=csvFileName[:-4], con=mydb, if_exists = 'replace', index=False)
"""
if_exists: {'fail', 'replace', 'append'}, default 'fail'
fail: If table exists, do nothing.
replace: If table exists, drop it, recreate it, and insert data.
append: If table exists, insert data. Create if does not exist.
"""

I know in the title of the question is included the word SQLAlchemy, however I see in the questions and answers the need to import pymysql or mysql.connector, and also is possible to do the job with pymysql, withouth calling SQLAlchemy.
import pymysql
user = 'root'
passw = 'my-secret-pw-for-mysql-12ud' # In previous posts variable "pass"
host = '172.17.0.2'
port = 3306
database = 'sample_table' # In previous posts similar to "schema"
conn = pymysql.connect(host=host,
port=port,
user=user,
passwd=passw,
db=database)
data.to_sql(name=database, con=conn, if_exists = 'append', index=False, flavor = 'mysql')
I think this solution could be good althought it is not using SQLAlchemy.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python cx_oracle hang when storing as DataFrame? - python

Related

SQLalchemy query PostgreSQL database view ERROR: function schema_name() does not exist

Utility to find join columns

Python to SQL Server Insert

Load table to Oracle through pandas io SQL

Writing to MySQL database with pandas using SQLAlchemy, to_sql

Categories

Resources