Is it possible to convert retrieved SqlAlchemy table object into Pandas DataFrame or do I need to write a particular function for that aim ?
This might not be the most efficient way, but it has worked for me to reflect a database table using automap_base and then convert it to a Pandas DataFrame.
import pandas as pd
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
connection_string = "your:db:connection:string:here"
engine = create_engine(connection_string, echo=False)
session = Session(engine)
# sqlalchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)
# Mapped classes are now created with names by default matching that of the table name.
Table_Name = Base.classes.table_name
# Example query with filtering
query = session.query(Table_Name).filter(Table_Name.language != 'english')
# Convert to DataFrame
df = pd.read_sql(query.statement, engine)
df.head()
I think I've tried this before. It's hacky, but for whole-table ORM query results, this should work:
import pandas as pd
cols = [c.name for c in SQLA_Table.__table__.columns]
pk = [c.name for c in SQLA_Table.__table__.primary_key]
tuplefied_list = [(getattr(item, col) for col in cols) for item in result_list]
df = pd.DataFrame.from_records(tuplefied_list, index=pk, columns=cols)
Partial query results (NamedTuples) will also work, but you have to construct the DataFrame columns and index to match your query.
Pandas database functions such as read_sql_query accept SQLAlchemy connection objects (so-called SQLAlchemy connectables, see pandas docs and sqlalchemy docs). Here's one example of using such object called my_connection:
import pandas as pd
import sqlalchemy
# create SQLAlchemy Engine object instance
my_engine = sqlalchemy.create_engine(f"{dialect}+{driver}://{login}:{password}#{host}/{db_name}")
# connect to the database using the newly created Engine instance
my_connection = my_engine.connect()
# run SQL query
my_df = pd.read_sql_query(sql=my_sql_query, con=my_connection)
I have a simpler way:
# Step1: import
import pandas as pd
from sqlalchemy import create_engine
# Step2: create_engine
connection_string = "sqlite:////absolute/path/to/database.db"
engine = create_engine(connection_string)
# Step3: select table
print (engine.table_names())
# Step4: read table
table_df = pd.read_sql_table('table_name', engine)
table_df.head()
For other types of connection_string, SQLAlchemy 1.4 Documentation.
Related
I'm facing this new warning within some Python 3.9 code:
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:761:
UserWarning:
pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connectionother DBAPI2
objects are not tested, please consider using SQLAlchemy
on such snippet:
import pandas as pd
from psycopg2 import sql
fields = ('object', 'category', 'number', 'mode')
query = sql.SQL("SELECT {} FROM categories;").format(
sql.SQL(', ').join(map(sql.Identifier, fields))
)
df = pd.read_sql(
sql=query,
con=connector() # custom function which returns db parameters as a psycopg2 connection object
)
It works like a charm for the moment, but according to the warning message, I'd like to switch to SQLAlchemy.
But by doing so:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=connector)
df = pd.read_sql(
sql=query,
con=engine
)
it says:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object:
Composed([SQL('SELECT '), Composed([Identifier('object'), SQL(', '),
Identifier('category'), SQL(', '), Identifier('number'), SQL(', '),
Identifier('mode')]), SQL(' FROM categories;')])
So I have to tweak it this way to avoid this error:
engine = create_engine('postgresql+psycopg2://', creator=connector)
conn = connector()
curs = conn.cursor()
df = pd.read_sql(
sql=query.as_string(conn), # non-pythonic, isn't it?
con=engine
)
I'm wondering what's the benefit of using an SQLAlchemy engine with pandas if I have to "decode" the query string using a psycpg2 connection context... (in some specific cases where the query string is a binary string I have to "decode" it by applying .decode('UTF-8')...)
How can I rewrite the DataFrame construction in a proper (i.e. the best) way by using an SQLAlchemy engine with pandas?
The pandas doc is not 100% clear for me:
Parameters
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
Version info:
python: 3.9
pandas: '1.4.3'
sqlalchemy: '1.4.35'
psycopg2: '2.9.3 (dt dec pq3 ext lo64)'
The query can be expressed in SQLAlchemy syntax like this:
import pandas as pd
import sqlalchemy as sa
fields = ('object', 'category', 'number', 'mode')
# Adjust engine configuration to match your environment.
engine = sa.create_engine('postgresql+psycopg2:///test')
metadata = sa.MetaData()
# Reflect the table from the database.
tbl = sa.Table('categories', metadata, autoload_with=engine)
# Get column objects for column names.
columns = [tbl.c[name] for name in fields]
query = sa.select(*columns)
df = pd.read_sql(sql=query, con=engine)
print(df)
I cannot find out, how to use pandas.read_sql_query and correctly (= safely against sql injection) parametrize table names (or other sql identifiers). Using sqlalchemy+psycopg2 to access PostgreSQL database.
Example of what doesn't work:
import os
import pandas
from sqlalchemy import create_engine
db = create_engine(os.getenv(POSTGRES_CONNSTRING))
pandas.read_sql_query(sql='select * from %(schema)s.%(table)s',
con = db,
params={'schema': 'public', 'table': 'table_name'})
Yields:
SyntaxError: syntax error at or near "'public'"
LINE 1: select * from 'public'.'table_name'
For psycopg2 the correct solution is described here.
import psycopg2
query = psycopg2.sql.SQL('select * from {schema}.{table}') \
.format(schema = psycopg2.sql.Identifier('public'),
table = psycopg2.sql.Identifier('table_name'))
But the query is now of type psycopg2.sql.Composed, which I can pass to the execute methods in psycopg2 but not to pandas.read_sql_query.
Is there any good solution to this?
You can use the as_string method to turn the Composed query into a string that you can pass to Pandas (docs).
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://user:pw#host:port/db')
cur = engine.raw_connection().cursor()
query = psycopg2.sql.SQL('select * from {schema}.{table}') \
.format(schema = psycopg2.sql.Identifier('public'),
table = psycopg2.sql.Identifier('table_name'))
query_string = query.as_string(cur)
pd.read_sql_query(query_string, engine)
I am trying to import some data from the database (Postgre SQL) to work with them in Python. I tried with the code below, which seems quite similar to the ones I've found on the internet.
import psycopg2
import sqlalchemy as db
import pandas as pd
engine = db.create_engine('database specifications')
connection = engine.connect()
metadata = db.MetaData()
data = db.Table(tabela, metadata, schema=shema, autoload=True, autoload_with=engine)
query = db.select([data])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
df = pd.DataFrame(ResultSet)
However, it returns data without column names. What did I forget?
It turned out the only thing needed is adding
columns = data.columns.keys()
df.columns = columns
There is a great debate about that in this thread.
import pymysql
import pandas as pd
db = pymysql.connect('localhost', 'testuser', 'test123', 'world')
df1 = pd.read_sql('select * from country limit 5', db)
df1
I need to create a table name with country2 and update the df1 out to country2
Use Pandas to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html). This should work for you:
import mymysql
from sqlalchemy import create_engine
sql_table_name = 'country2'
engine = create_engine("mysql://testuser:test123#localhost:0/world") # creat engine
df1.to_sql(sql_table_name, engine) # add to table
Definitely check out SqlAlchemy. Use SqlAlchemy to write a Mysql interation class. SqlAlchemy enables using python to connect database. Encoding your dataframe into a upsert sql string. And then use cursor.execute(query_string) to do the upsert.
engine = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name',
connect_args={'auth_plugin': 'mysql_native_password'})
sample_sql_database = df.to_sql('table_name', con=engine)
There is an option to "append" the contends from data frame or "replace" also
sample_sql_database = df.to_sql('table_name', engine, if_exists='replace')
sample_sql_database = df.to_sql('table_name', engine, if_exists='append')
Reference :
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html
I have an Oracle Schema called "L_US" and a table called "entity". How do I store the contents of the "entity" table in a dataframe ? I tried:
import sqlalchemy
import pandas as pd
import cx_Oracle
dsnStr = cx_Oracle.makedsn('URL', '1521', 'SERVICE_NAME')
dsnStr = dsnStr.replace('SID', 'SERVICE_NAME')
connect_str = 'oracle://user:password#"+dsnStr'
engine = sqlalchemy.create_engine(connect_str)
df = pd.read_sql_table("L_US.entity", engine)
However, I get an error that the table is not found.