Accessing a table in an Oracle Schema with Pandas - python

I have an Oracle Schema called "L_US" and a table called "entity". How do I store the contents of the "entity" table in a dataframe ? I tried:
import sqlalchemy
import pandas as pd
import cx_Oracle
dsnStr = cx_Oracle.makedsn('URL', '1521', 'SERVICE_NAME')
dsnStr = dsnStr.replace('SID', 'SERVICE_NAME')
connect_str = 'oracle://user:password#"+dsnStr'
engine = sqlalchemy.create_engine(connect_str)
df = pd.read_sql_table("L_US.entity", engine)
However, I get an error that the table is not found.

Related

Passing a non-executable SQL object to pandas read_sql() method

I'm facing this new warning within some Python 3.9 code:
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:761:
UserWarning:
pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connectionother DBAPI2
objects are not tested, please consider using SQLAlchemy
on such snippet:
import pandas as pd
from psycopg2 import sql
fields = ('object', 'category', 'number', 'mode')
query = sql.SQL("SELECT {} FROM categories;").format(
sql.SQL(', ').join(map(sql.Identifier, fields))
)
df = pd.read_sql(
sql=query,
con=connector() # custom function which returns db parameters as a psycopg2 connection object
)
It works like a charm for the moment, but according to the warning message, I'd like to switch to SQLAlchemy.
But by doing so:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=connector)
df = pd.read_sql(
sql=query,
con=engine
)
it says:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object:
Composed([SQL('SELECT '), Composed([Identifier('object'), SQL(', '),
Identifier('category'), SQL(', '), Identifier('number'), SQL(', '),
Identifier('mode')]), SQL(' FROM categories;')])
So I have to tweak it this way to avoid this error:
engine = create_engine('postgresql+psycopg2://', creator=connector)
conn = connector()
curs = conn.cursor()
df = pd.read_sql(
sql=query.as_string(conn), # non-pythonic, isn't it?
con=engine
)
I'm wondering what's the benefit of using an SQLAlchemy engine with pandas if I have to "decode" the query string using a psycpg2 connection context... (in some specific cases where the query string is a binary string I have to "decode" it by applying .decode('UTF-8')...)
How can I rewrite the DataFrame construction in a proper (i.e. the best) way by using an SQLAlchemy engine with pandas?
The pandas doc is not 100% clear for me:
Parameters
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
Version info:
python: 3.9
pandas: '1.4.3'
sqlalchemy: '1.4.35'
psycopg2: '2.9.3 (dt dec pq3 ext lo64)'
The query can be expressed in SQLAlchemy syntax like this:
import pandas as pd
import sqlalchemy as sa
fields = ('object', 'category', 'number', 'mode')
# Adjust engine configuration to match your environment.
engine = sa.create_engine('postgresql+psycopg2:///test')
metadata = sa.MetaData()
# Reflect the table from the database.
tbl = sa.Table('categories', metadata, autoload_with=engine)
# Get column objects for column names.
columns = [tbl.c[name] for name in fields]
query = sa.select(*columns)
df = pd.read_sql(sql=query, con=engine)
print(df)

Missing column names when importing data from database (python + postgre sql)

I am trying to import some data from the database (Postgre SQL) to work with them in Python. I tried with the code below, which seems quite similar to the ones I've found on the internet.
import psycopg2
import sqlalchemy as db
import pandas as pd
engine = db.create_engine('database specifications')
connection = engine.connect()
metadata = db.MetaData()
data = db.Table(tabela, metadata, schema=shema, autoload=True, autoload_with=engine)
query = db.select([data])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
df = pd.DataFrame(ResultSet)
However, it returns data without column names. What did I forget?
It turned out the only thing needed is adding
columns = data.columns.keys()
df.columns = columns
There is a great debate about that in this thread.

How insert the dataframe output to mysql

import pymysql
import pandas as pd
db = pymysql.connect('localhost', 'testuser', 'test123', 'world')
df1 = pd.read_sql('select * from country limit 5', db)
df1
I need to create a table name with country2 and update the df1 out to country2
Use Pandas to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html). This should work for you:
import mymysql
from sqlalchemy import create_engine
sql_table_name = 'country2'
engine = create_engine("mysql://testuser:test123#localhost:0/world") # creat engine
df1.to_sql(sql_table_name, engine) # add to table
Definitely check out SqlAlchemy. Use SqlAlchemy to write a Mysql interation class. SqlAlchemy enables using python to connect database. Encoding your dataframe into a upsert sql string. And then use cursor.execute(query_string) to do the upsert.
engine = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name',
connect_args={'auth_plugin': 'mysql_native_password'})
sample_sql_database = df.to_sql('table_name', con=engine)
There is an option to "append" the contends from data frame or "replace" also
sample_sql_database = df.to_sql('table_name', engine, if_exists='replace')
sample_sql_database = df.to_sql('table_name', engine, if_exists='append')
Reference :
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

Inserting to schema-specific table with python's odo

I'm using python's odo to move data from a pandas dataframe to a postgresql database. The goal is that each "user" sees their own data in their schema, but with an identical data model and table/view naming schema between "users". With normal SQL I can do this:
CREATE SCHEMA my_schema;
CREATE TABLE my_schema.my_table AS select 1;
My DB URI looks like this
db_uri = 'postgresql://localhost/postgres::my_schema.my_table'
This gives me tables in the default schema named "my_schema.my_table", including the '.' in the table name, instead of tables named "my_table" in the schema "my_schema".
I've tried different combinations based on this github issue, such as:
db_uri = 'postgresql://localhost/postgres.schema::tmp')
which gives me this Traceback
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: database "postgres/schema" does not exist
And also this one
db_uri = 'postgresql://localhost/postgres::my_schema/my_table'
which gives me tables named "my_schema/my_table".
Here's a sample code:
import pandas as pd
from odo import odo
db_uri = 'postgresql://localhost/postgres::my_schema.my_table'
odo(pd.DataFrame([{'a': 1}, {'a': 1}]), db_uri)
Hidden deep in a mailing list for blaze is a mention of the schema parameter
d = Data(resource('postgresql://localhost/db::t', schema='myschema'))
which can be used with odo with the following format:
from odo import odo, drop
drop(db_uri, schema='my_schema') # to drop table in specific schema
odo(data, db_uri, schema='my_schema')
working code
import pandas as pd
from odo import odo
db_uri = 'postgresql://localhost/postgres::my_table'
odo(pd.DataFrame([{'a': 1}, {'a': 1}]), db_uri, schema='my_schema')

How can I convert Sqlalchemy table object to Pandas DataFrame?

Is it possible to convert retrieved SqlAlchemy table object into Pandas DataFrame or do I need to write a particular function for that aim ?
This might not be the most efficient way, but it has worked for me to reflect a database table using automap_base and then convert it to a Pandas DataFrame.
import pandas as pd
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
connection_string = "your:db:connection:string:here"
engine = create_engine(connection_string, echo=False)
session = Session(engine)
# sqlalchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)
# Mapped classes are now created with names by default matching that of the table name.
Table_Name = Base.classes.table_name
# Example query with filtering
query = session.query(Table_Name).filter(Table_Name.language != 'english')
# Convert to DataFrame
df = pd.read_sql(query.statement, engine)
df.head()
I think I've tried this before. It's hacky, but for whole-table ORM query results, this should work:
import pandas as pd
cols = [c.name for c in SQLA_Table.__table__.columns]
pk = [c.name for c in SQLA_Table.__table__.primary_key]
tuplefied_list = [(getattr(item, col) for col in cols) for item in result_list]
df = pd.DataFrame.from_records(tuplefied_list, index=pk, columns=cols)
Partial query results (NamedTuples) will also work, but you have to construct the DataFrame columns and index to match your query.
Pandas database functions such as read_sql_query accept SQLAlchemy connection objects (so-called SQLAlchemy connectables, see pandas docs and sqlalchemy docs). Here's one example of using such object called my_connection:
import pandas as pd
import sqlalchemy
# create SQLAlchemy Engine object instance
my_engine = sqlalchemy.create_engine(f"{dialect}+{driver}://{login}:{password}#{host}/{db_name}")
# connect to the database using the newly created Engine instance
my_connection = my_engine.connect()
# run SQL query
my_df = pd.read_sql_query(sql=my_sql_query, con=my_connection)
I have a simpler way:
# Step1: import
import pandas as pd
from sqlalchemy import create_engine
# Step2: create_engine
connection_string = "sqlite:////absolute/path/to/database.db"
engine = create_engine(connection_string)
# Step3: select table
print (engine.table_names())
# Step4: read table
table_df = pd.read_sql_table('table_name', engine)
table_df.head()
For other types of connection_string, SQLAlchemy 1.4 Documentation.

Categories

Resources