Parametric table name for pandas read_sql_query on Postgres - python

I cannot find out, how to use pandas.read_sql_query and correctly (= safely against sql injection) parametrize table names (or other sql identifiers). Using sqlalchemy+psycopg2 to access PostgreSQL database.
Example of what doesn't work:
import os
import pandas
from sqlalchemy import create_engine
db = create_engine(os.getenv(POSTGRES_CONNSTRING))
pandas.read_sql_query(sql='select * from %(schema)s.%(table)s',
con = db,
params={'schema': 'public', 'table': 'table_name'})
Yields:
SyntaxError: syntax error at or near "'public'"
LINE 1: select * from 'public'.'table_name'
For psycopg2 the correct solution is described here.
import psycopg2
query = psycopg2.sql.SQL('select * from {schema}.{table}') \
.format(schema = psycopg2.sql.Identifier('public'),
table = psycopg2.sql.Identifier('table_name'))
But the query is now of type psycopg2.sql.Composed, which I can pass to the execute methods in psycopg2 but not to pandas.read_sql_query.
Is there any good solution to this?

You can use the as_string method to turn the Composed query into a string that you can pass to Pandas (docs).
import pandas as pd
import psycopg2
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://user:pw#host:port/db')
cur = engine.raw_connection().cursor()
query = psycopg2.sql.SQL('select * from {schema}.{table}') \
.format(schema = psycopg2.sql.Identifier('public'),
table = psycopg2.sql.Identifier('table_name'))
query_string = query.as_string(cur)
pd.read_sql_query(query_string, engine)

Related

Passing a non-executable SQL object to pandas read_sql() method

I'm facing this new warning within some Python 3.9 code:
/usr/local/lib/python3.9/site-packages/pandas/io/sql.py:761:
UserWarning:
pandas only support SQLAlchemy connectable(engine/connection) or
database string URI or sqlite3 DBAPI2 connectionother DBAPI2
objects are not tested, please consider using SQLAlchemy
on such snippet:
import pandas as pd
from psycopg2 import sql
fields = ('object', 'category', 'number', 'mode')
query = sql.SQL("SELECT {} FROM categories;").format(
sql.SQL(', ').join(map(sql.Identifier, fields))
)
df = pd.read_sql(
sql=query,
con=connector() # custom function which returns db parameters as a psycopg2 connection object
)
It works like a charm for the moment, but according to the warning message, I'd like to switch to SQLAlchemy.
But by doing so:
from sqlalchemy import create_engine
engine = create_engine('postgresql+psycopg2://', creator=connector)
df = pd.read_sql(
sql=query,
con=engine
)
it says:
sqlalchemy.exc.ObjectNotExecutableError: Not an executable object:
Composed([SQL('SELECT '), Composed([Identifier('object'), SQL(', '),
Identifier('category'), SQL(', '), Identifier('number'), SQL(', '),
Identifier('mode')]), SQL(' FROM categories;')])
So I have to tweak it this way to avoid this error:
engine = create_engine('postgresql+psycopg2://', creator=connector)
conn = connector()
curs = conn.cursor()
df = pd.read_sql(
sql=query.as_string(conn), # non-pythonic, isn't it?
con=engine
)
I'm wondering what's the benefit of using an SQLAlchemy engine with pandas if I have to "decode" the query string using a psycpg2 connection context... (in some specific cases where the query string is a binary string I have to "decode" it by applying .decode('UTF-8')...)
How can I rewrite the DataFrame construction in a proper (i.e. the best) way by using an SQLAlchemy engine with pandas?
The pandas doc is not 100% clear for me:
Parameters
sqlstr or SQLAlchemy Selectable (select or text object)
SQL query to be executed or a table name.
Version info:
python: 3.9
pandas: '1.4.3'
sqlalchemy: '1.4.35'
psycopg2: '2.9.3 (dt dec pq3 ext lo64)'
The query can be expressed in SQLAlchemy syntax like this:
import pandas as pd
import sqlalchemy as sa
fields = ('object', 'category', 'number', 'mode')
# Adjust engine configuration to match your environment.
engine = sa.create_engine('postgresql+psycopg2:///test')
metadata = sa.MetaData()
# Reflect the table from the database.
tbl = sa.Table('categories', metadata, autoload_with=engine)
# Get column objects for column names.
columns = [tbl.c[name] for name in fields]
query = sa.select(*columns)
df = pd.read_sql(sql=query, con=engine)
print(df)

Unable to create data frame from MS Access query results

I'm trying to insert information from an MS Access database MDB file, unfortunately I don't know how to delimitate the columns from the database table with Python.
I'm getting the error
ValueError: Shape of passed values is (109861, 1), indices imply (3,1)
and the code I'm using is:
import os
import shutil
import pyodbc
import pandas as pd
import csv
from datetime import datetime
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\\Users\\sguerra\\Desktop\\Python\\Measurements-2020-12-15.mdb;')
cursor = conn.cursor()
cursor.execute('select * from Measurements')
new = cursor.fetchall()
columns = ['Prod_Date','Prod_Time','CCE_SKU']
df = pd.DataFrame(new,columns)
for row in df.itertuples():
cursor.execute('''
insert into MITSF_1.dbo.MeasurementsTest ([Prod_Date],[Prod_Time],[CCE_SKU])
VALUES (?,?,?)
''',
row.Prod_Date,
row.Prod_Time,
row.CCE_SKU
)
conn.commit()
You are using the same cursor to try and execute both the select and the insert, so both of those statements would be operating on the same database. To keep things simple, you should use pandas' read_sql_query() to read the required columns from Access and then use to_sql() to write them to SQL Server:
df = pd.read_sql_query(
"SELECT [Prod_Date],[Prod_Time],[CCE_SKU] FROM Measurements",
conn,
)
from sqlalchemy import create_engine
engine = create_engine(
"mssql+pyodbc://scott:tiger#192.168.0.199/MITSF_1"
"?driver=ODBC+Driver+17+for+SQL+Server",
fast_executemany=True,
)
df.to_sql("MeasurementsTest", engine, schema="dbo",
index=False, if_exists="append",
)

SQL Alchemy NameError: name <table> is not defined

I wasn't sure what to call this, feel free to edit my post title.
Before I begin, I googled and looked here, but this didn't seem to help me.
My code:
import pyodbc
import pandas as pd
import numpy as np
import os
import sqlalchemy as sal
from sqlalchemy import create_engine
from sqlalchemy import MetaData
from sqlalchemy import Table, Column, Integer, Numeric, String, ForeignKey, Boolean
##
from datetime import datetime
from sqlalchemy import DateTime
from sqlalchemy import PrimaryKeyConstraint, UniqueConstraint, CheckConstraint
from sqlalchemy import Index
from sqlalchemy import ForeignKeyConstraint
from sqlalchemy import insert
from sqlalchemy.sql import select
from sqlalchemy.sql import func
from sqlalchemy import cast
from sqlalchemy import and_, or_, not_
from sqlalchemy import update, delete
from sqlalchemy import text
##
import urllib
#############################################################
server = 'fake_server'
database = '_Testing_Only'
driver = 'SQL+SERVER+NATIVE+CLIENT+11.0'
trusted_connection='yes'
database_connection = 'mssql+pyodbc://fake_server/' + database + '?trusted_connection=' + trusted_connection + '&driver=' + driver
engine = sal.create_engine(database_connection)
connection=engine.connect()
metadata = MetaData()
print(engine.table_names())
Here is the result of my print statement:
['cookies', 'line_items', 'orders', 'testing_sym_keys', 'users']
I then tried to run this code:
s = select([cookies])
I got the following error message:
Traceback (most recent call last):
File "<pyshell#167>", line 1, in <module>
s = select([cookies])
NameError: name 'cookies' is not defined
The table clearly exists, why am I getting the error message?
The issue is that you are not binding tables in the engine variable to a variable in your Python code.
Trying running a native SQL query of the form:
engine = create_engine(database_connection)
metadata = MetaData(engine)
metadata.reflect()
with engine.begin() as conn:
conn.execute("select * from cookies")
Other solution, if you want to use select method you can try this:
engine = create_engine(database_connection)
meta = MetaData(engine).reflect()
table = meta.tables['cookies']
# select * from 'cookies'
select_cookies = select([table])
Just found a fast solution for this. Try creating table object first for your desired table.
engine = create_engine(database_connection)
metadata = MetaData()
cookies = Table('cookies', metadata, autoload=True, autoload_with=engine)
So everytime you want to query a table, just do the above code each one so that you initialize the table in python.

Python mysql connector with multiple statements

I'm trying to run a SQL query through mysql.connector that requires a SET command in order to query a specific table:
import mysql.connector
import pandas as pd
cnx = mysql.connector.connect(host=ip,
port=port,
user=user,
passwd=pwd,
database="")
sql="""SET variable='Test';
SELECT * FROM table """
df = pd.read_sql(sql, cnx)
when I run this I get the error "Use multi=True when executing multiple statements". But where do I put multi=True?
Pass the parameters as a dictionary into the params argument should do the trick, documentation here:
pd.read_sql(sql, cnx, params={'multi': True})
The parameters are passed to the underlying database driver.
after many hours of experimenting, i figured out how do to this. forgive me if this is not the most succinct way, but the best i could come up with-
import mysql.connector
import pandas as pd
cnx = mysql.connector.connect(host=ip,
port=port,
user=user,
passwd=pwd,
database="")
sql1="SET variable='Test';"
sql2="""SELECT * FROM table """
cursor=cnx.cursor()
cursor.execute(sql1)
cursor.close()
df = pd.read_sql(sql2, cnx)

How can I convert Sqlalchemy table object to Pandas DataFrame?

Is it possible to convert retrieved SqlAlchemy table object into Pandas DataFrame or do I need to write a particular function for that aim ?
This might not be the most efficient way, but it has worked for me to reflect a database table using automap_base and then convert it to a Pandas DataFrame.
import pandas as pd
from sqlalchemy.ext.automap import automap_base
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
connection_string = "your:db:connection:string:here"
engine = create_engine(connection_string, echo=False)
session = Session(engine)
# sqlalchemy: Reflect the tables
Base = automap_base()
Base.prepare(engine, reflect=True)
# Mapped classes are now created with names by default matching that of the table name.
Table_Name = Base.classes.table_name
# Example query with filtering
query = session.query(Table_Name).filter(Table_Name.language != 'english')
# Convert to DataFrame
df = pd.read_sql(query.statement, engine)
df.head()
I think I've tried this before. It's hacky, but for whole-table ORM query results, this should work:
import pandas as pd
cols = [c.name for c in SQLA_Table.__table__.columns]
pk = [c.name for c in SQLA_Table.__table__.primary_key]
tuplefied_list = [(getattr(item, col) for col in cols) for item in result_list]
df = pd.DataFrame.from_records(tuplefied_list, index=pk, columns=cols)
Partial query results (NamedTuples) will also work, but you have to construct the DataFrame columns and index to match your query.
Pandas database functions such as read_sql_query accept SQLAlchemy connection objects (so-called SQLAlchemy connectables, see pandas docs and sqlalchemy docs). Here's one example of using such object called my_connection:
import pandas as pd
import sqlalchemy
# create SQLAlchemy Engine object instance
my_engine = sqlalchemy.create_engine(f"{dialect}+{driver}://{login}:{password}#{host}/{db_name}")
# connect to the database using the newly created Engine instance
my_connection = my_engine.connect()
# run SQL query
my_df = pd.read_sql_query(sql=my_sql_query, con=my_connection)
I have a simpler way:
# Step1: import
import pandas as pd
from sqlalchemy import create_engine
# Step2: create_engine
connection_string = "sqlite:////absolute/path/to/database.db"
engine = create_engine(connection_string)
# Step3: select table
print (engine.table_names())
# Step4: read table
table_df = pd.read_sql_table('table_name', engine)
table_df.head()
For other types of connection_string, SQLAlchemy 1.4 Documentation.

Categories

Resources