Passing a non-executable SQL object to pandas read_sql() method

Passing a non-executable SQL object to pandas read_sql() method - python

I'm facing this new warning within some Python 3.9 code:
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:761:
UserWarning:
pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connectionother DBAPI2
objects are not tested, please consider using SQLAlchemy
on such snippet:
import pandas as pd
from psycopg2 import sql
fields = ('object', 'category', 'number', 'mode')
query = sql.SQL("SELECT {} FROM categories;").format(
sql.SQL(', ').join(map(sql.Identifier, fields))
)
df = pd.read_sql(
sql=query,
con=connector() # custom function which returns db parameters as a psycopg2 connection object
)
It works like a charm for the moment, but according to the warning message, I'd like to switch to SQLAlchemy.
But by doing so:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=connector)
df = pd.read_sql(
sql=query,
con=engine
)
it says:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object:
Composed([SQL('SELECT '), Composed([Identifier('object'), SQL(', '),
Identifier('category'), SQL(', '), Identifier('number'), SQL(', '),
Identifier('mode')]), SQL(' FROM categories;')])
So I have to tweak it this way to avoid this error:
engine = create_engine('postgresql+psycopg2://', creator=connector)
conn = connector()
curs = conn.cursor()
df = pd.read_sql(
sql=query.as_string(conn), # non-pythonic, isn't it?
con=engine
)
I'm wondering what's the benefit of using an SQLAlchemy engine with pandas if I have to "decode" the query string using a psycpg2 connection context... (in some specific cases where the query string is a binary string I have to "decode" it by applying .decode('UTF-8')...)
How can I rewrite the DataFrame construction in a proper (i.e. the best) way by using an SQLAlchemy engine with pandas?
The pandas doc is not 100% clear for me:
Parameters
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
Version info:
python: 3.9
pandas: '1.4.3'
sqlalchemy: '1.4.35'
psycopg2: '2.9.3 (dt dec pq3 ext lo64)'

The query can be expressed in SQLAlchemy syntax like this:
import pandas as pd
import sqlalchemy as sa
fields = ('object', 'category', 'number', 'mode')
# Adjust engine configuration to match your environment.
engine = sa.create_engine('postgresql+psycopg2:///test')
metadata = sa.MetaData()
# Reflect the table from the database.
tbl = sa.Table('categories', metadata, autoload_with=engine)
# Get column objects for column names.
columns = [tbl.c[name] for name in fields]
query = sa.select(*columns)
df = pd.read_sql(sql=query, con=engine)
print(df)

Related

DataFrame.to_sql equivalent of using "RETURNING id" with psycopg2?

When inserting data using psycopg2 I can retrieve the id of the inserted row using a RETURNING PostgreSQL statement:
import psycopg2
conn = my_connection_parameters()
curs = conn.cursor()
sql_insert_data_query = (
"""INSERT INTO public.data
(created_by, comment)
VALUES ( %(user)s, %(comment)s )
RETURNING id; # the id is automatically managed by the database.
"""
)
curs.execute(
sql_insert_data_query,
{
"user": 'me',
"comment": 'my comment'
}
)
conn.commit()
data_id = curs.fetchone()[0]
and that's great because I need this id to write other data to, e.g. an associative table.
But when having a large dictionary to write to PostgreSQL (which keys are the column identifiers), it's more convenient to rely on pandas' DataFrame.to_sql() method:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=my_connection_parameters)
df = pd.DataFrame(my_dict, index=[0]) # this is a "one-row" DataFrame, each column being created from the dict keys
df.to_sql(
name='user_table',
con=engine,
schema='public',
if_exists='append',
index=False
)
but there is no direct way to retrieve the id PostgreSQL has created when this record was actually inserted.
Is there a nice and reliable workaround to get it?
Or should I stick with psycopg2 to write my large dictionary using a SQL query?

Missing column names when importing data from database (python + postgre sql)

I am trying to import some data from the database (Postgre SQL) to work with them in Python. I tried with the code below, which seems quite similar to the ones I've found on the internet.
import psycopg2
import sqlalchemy as db
import pandas as pd
engine = db.create_engine('database specifications')
connection = engine.connect()
metadata = db.MetaData()
data = db.Table(tabela, metadata, schema=shema, autoload=True, autoload_with=engine)
query = db.select([data])
ResultProxy = connection.execute(query)
ResultSet = ResultProxy.fetchall()
df = pd.DataFrame(ResultSet)
However, it returns data without column names. What did I forget?

It turned out the only thing needed is adding
columns = data.columns.keys()
df.columns = columns
There is a great debate about that in this thread.

How insert the dataframe output to mysql

import pymysql
import pandas as pd
db = pymysql.connect('localhost', 'testuser', 'test123', 'world')
df1 = pd.read_sql('select * from country limit 5', db)
df1
I need to create a table name with country2 and update the df1 out to country2

Use Pandas to_sql (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html). This should work for you:
import mymysql
from sqlalchemy import create_engine
sql_table_name = 'country2'
engine = create_engine("mysql://testuser:test123#localhost:0/world") # creat engine
df1.to_sql(sql_table_name, engine) # add to table

Definitely check out SqlAlchemy. Use SqlAlchemy to write a Mysql interation class. SqlAlchemy enables using python to connect database. Encoding your dataframe into a upsert sql string. And then use cursor.execute(query_string) to do the upsert.

engine = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name',
connect_args={'auth_plugin': 'mysql_native_password'})
sample_sql_database = df.to_sql('table_name', con=engine)
There is an option to "append" the contends from data frame or "replace" also
sample_sql_database = df.to_sql('table_name', engine, if_exists='replace')
sample_sql_database = df.to_sql('table_name', engine, if_exists='append')
Reference :
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

Pandas 0.24 read_sql operational errors

I just upgraded to Pandas 0.24.0 from 0.23.4 (Python 2.7.12), and many of my pd.read_sql queries are breaking. It looks like something related to MySQL, but it's strange that these errors only occur after updating my pandas version. Any ideas what's going on?
Here's my MySQL table:
CREATE TABLE `xlations_topic_update_status` (
`run_ts` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Here's my query:
import pandas as pd
from sqlalchemy import create_engine
db_engine = create_engine('mysql+mysqldb://<><>/product_analytics', echo=False)
pd.read_sql('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]
And here's the error:
OperationalError: (_mysql_exceptions.OperationalError) (1059, "Identifier name 'select max(run_ts) from product_analytics.xlations_topic_update_status;' is too long") [SQL: 'DESCRIBE `select max(run_ts) from product_analytics.xlations_topic_update_status;`']
I've also gotten this for other more complex queries, but won't post them here.

According to documentation the first argument is either a string (a table name) or SQLAlchemy Selectable (select or text object). In other words pd.read_sql() is delegating to pd.read_sql_table() and treating the entire query string as a table identifier.
Wrap your query string in a text() construct first:
stmt = text('select max(run_ts) from product_analytics.xlations_topic_update_status')
pd.read_sql(stmt, con = db_engine).values[0][0]
This way pd.read_sql() will delegate to pd.read_sql_query() instead. Another option is to call it directly.

Try using pd.read_sql_query(sql, con), instead of pd.read_sql(...).
So:
pd.read_sql_query('select max(run_ts) from product_analytics.xlations_topic_update_status', con = db_engine).values[0][0]

How can I convert Sqlalchemy table object to Pandas DataFrame?

Is it possible to convert retrieved SqlAlchemy table object into Pandas DataFrame or do I need to write a particular function for that aim ?

This might not be the most efficient way, but it has worked for me to reflect a database table using automap_base and then convert it to a Pandas DataFrame.
import pandas as pd
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
connection_string = "your:db:connection:string:here"
engine = create_engine(connection_string, echo=False)
session = Session(engine)
# sqlalchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)
# Mapped classes are now created with names by default matching that of the table name.
Table_Name = Base.classes.table_name
# Example query with filtering
query = session.query(Table_Name).filter(Table_Name.language != 'english')
# Convert to DataFrame
df = pd.read_sql(query.statement, engine)
df.head()

I think I've tried this before. It's hacky, but for whole-table ORM query results, this should work:
import pandas as pd
cols = [c.name for c in SQLA_Table.__table__.columns]
pk = [c.name for c in SQLA_Table.__table__.primary_key]
tuplefied_list = [(getattr(item, col) for col in cols) for item in result_list]
df = pd.DataFrame.from_records(tuplefied_list, index=pk, columns=cols)
Partial query results (NamedTuples) will also work, but you have to construct the DataFrame columns and index to match your query.

Pandas database functions such as read_sql_query accept SQLAlchemy connection objects (so-called SQLAlchemy connectables, see pandas docs and sqlalchemy docs). Here's one example of using such object called my_connection:
import pandas as pd
import sqlalchemy
# create SQLAlchemy Engine object instance
my_engine = sqlalchemy.create_engine(f"{dialect}+{driver}://{login}:{password}#{host}/{db_name}")
# connect to the database using the newly created Engine instance
my_connection = my_engine.connect()
# run SQL query
my_df = pd.read_sql_query(sql=my_sql_query, con=my_connection)

I have a simpler way:
# Step1: import
import pandas as pd
from sqlalchemy import create_engine
# Step2: create_engine
connection_string = "sqlite:////absolute/path/to/database.db"
engine = create_engine(connection_string)
# Step3: select table
print (engine.table_names())
# Step4: read table
table_df = pd.read_sql_table('table_name', engine)
table_df.head()
For other types of connection_string, SQLAlchemy 1.4 Documentation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Passing a non-executable SQL object to pandas read_sql() method - python

Related

DataFrame.to_sql equivalent of using "RETURNING id" with psycopg2?

Missing column names when importing data from database (python + postgre sql)

How insert the dataframe output to mysql

Pandas 0.24 read_sql operational errors

How can I convert Sqlalchemy table object to Pandas DataFrame?

Categories

Resources