I need to escape table names for any sqlalchemy engine (I want to expand a library I made for other databases than postgres, see Details section at the end of my post) automatically.
It is possible with columns like this:
from sqlalchemy.sql import column
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
escaped_column = column('%"my_column123?').compile(dialect=engine.dialect)
str(escaped_column)
'"%""my_column123?"'
I (naively) tried the following but it does not work (gives back an empty string):
from sqlalchemy.sql import table
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
escaped_table_name = table("%table?&;").compile(dialect=engine.dialect)
str(escaped_table_name)
''
Thanks in advance!
Details
I made a library to update PostGres table using pandas DataFrames (see https://github.com/ThibTrip/pangres) and realized that a part of the code is not SQL injection safe (if you are curious here is the part I am talking about: https://github.com/ThibTrip/pangres/blob/7cfa2d2190cf65a1ede8ef30868883f0da3fa5fc/pangres/helpers.py#L270-L290).
I found a way to add columns in a "sqlalchemy way" (adding columns to any table was the reason why I wanted to escaped table names). Unfortunately it is not perfect as it creates a table "alembic_version" for some reasons:
# alembic is a library from the creator of sqlalchemy to migrate databases
from alembic.runtime.migration import MigrationContext # pip install alembic
from alembic.operations import Operations
from sqlalchemy import Column, TEXT, create_engine # pip install sqlalchemy
# create engine
engine = create_engine('sqlite:///test.db')
# add column "some_new_column" of type TEXT in the table 'test'
with engine.connect() as con:
ctx = MigrationContext.configure(con, )
op = Operations(ctx)
op.add_column('test', column=Column('some_new_column', TEXT))
EDIT: it seems the table "alembic_version" has been added somehow by previous tests as I could not reproduce this behavior after dropping the table. So this solution seems good :)!!!
Related
I'm not able to connect to temporary tables created on an SQL server using SQLAlchemy.
I connect to the server:
engine = create_engine(URL, poolclass=StaticPool)
I fill a temporary table with data from a pandas dataframe:
df_tmp.to_sql('#table_test', con=engine)
The table exists on the server:
res = engine.execute('SELECT * FROM tempdb..#table_test')
print(res)
which returns a list of tuples of my data. But then when I try to make an SQLAlchemy table it fails with a NoSuchTableError:
from sqlalchemy import create_engine, MetaData, Table
metadata = MetaData(engine)
metadata.create_all()
table = Table('#table_test', metadata, autoload=True, autoload_with=engine)
I also tried this, which gives the same error:
table = Table('tempdb..#table_test', metadata, autoload=True, autoload_with=engine)
And I also tried creating a blank table with an SQL command, which gives the same error when I try to read it with SQLAlchemy:
engine.execute('CREATE TABLE #table_test (id_number INT, name TEXT)')
Does SQLAlchemy support temporary tables? If so what is going wrong here? I'd like to have the temporary table as an sqlalchemy.schema.Table object if possible, as then it fits with all my other code.
(re: comments to the question)
Actually, it is a limitation of the current mechanism by which SQLAlchemy's mssql dialect checks for the existence of a table. It queries INFORMATION_SCHEMA.TABLES for the current catalog (database), and #temp tables do not appear in that view. They do appear — after a fashion, and in a not-particularly-helpful way — if we USE tempdb and then query INFORMATION_SCHEMA.TABLES from there.
For now, I have created a GitHub issue here to see if we can improve on this.
Update 2020-09-01
The changes for the above GitHub issue have been merged into SQLAlchemy's master branch and will be included in version 1.4. If you want to take advantage of this feature before 1.4 is officially released you can install SQLAlchemy via
pip install git+https://github.com/sqlalchemy/sqlalchemy.git
While programming on Python and working with SQL dbs I previously used mysql.connector library, but I found out that it takes a lot of rows of code to get your data into SQL table because you need to write the whole SQL query column by column.
From the other side while using pandas there are easy methods working with SQLalchemy library only:
- pd.to_sql
- pd.read_sql_table
(Well, I got errors while using mysql db.cursor() and couldn't find any tutorial besides SQLalchemy + Pandas).
These two methods let you easily get a dataframe from SQL table and create a SQL table from dataframe.
I wonder if there is such analogue in mysql.connector to easily convert dataframe to SQL table and vice versa, since still the syntax of this library for me is more convenient for other actions rather than SQL???
P.S. MySQL initiate code is written just for info and not used in
provided code, though I need to somehow find an analogue
# ------------------- IMPORT -------------------------
import mysql.connector
import pandas as pd
import sqlalchemy
# ------------------- MYSQL + SQL Alchemy -------------------------
mydb = mysql.connector.connect(
host='host',
user='user',
passwd='pass',
database='db'
)
mycursor = mydb.cursor()
engine = sqlalchemy.create_engine('mysql+mysqldb://user:pass#host/db', pool_recycle=3600)
# ------------------- FUNCTIONS ----------------------
def get_function():
df = pd.read_html(
"https://www.url.com")
df[3].to_sql(name=table, con=engine, index=False, if_exists='replace')
# -------------------- MAIN --------------------------
table = 'table_name'
get_function()
print(pd.read_sql_table(table, engine, columns=[]))
Your question isn't very clear, so I'm not completely sure what you're trying to do, but I'll try my best to answer.
So SQLAlchemy is a so called Object-relational-Mapper (like Hibernate in the Java world), which maps between relations (columns, rows, tables) and objects.
Pandas is a data analysis library, that can use SQLAlchemy. SQLAlchemy itself supports a wide range of Databases, including MySQL.
Now I didn't understand whether you'd like to use Pandas + SQLAlchemy + MySQL, or whether you just want a simple way to work with MySQL directly.
In the first case you can simply use Pandas, in the latter case you can use SQLAlchemy directly. Pandas provides documentation and so does SQLAlchemy
I'm trying to make a specific insert statement that has an ON CONFLICT argument (I'm uploading to a Postgres database); will the df.to_sql(method='callable') allow that? Or is it intended for another purpose? I've read through the documentation, but I wasn't able to grasp the concept. I looked around on this website and others for similar questions, but I haven't found one yet. If possible I would love to see an example of how to use the 'callable' method in practice. Any other ideas on how to effectively load large numbers of rows from pandas using ON CONFLICT logic would be much appreciated as well. Thanks in advance for the help!
Here's an example on how to use postgres's ON CONFLICT DO NOTHING with to_sql
# import postgres specific insert
from sqlalchemy.dialects.postgresql import insert
def to_sql_on_conflict_do_nothing(pd_table, conn, keys, data_iter):
# This is very similar to the default to_sql function in pandas
# Only the conn.execute line is changed
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(insert(pd_table.table).on_conflict_do_nothing(), data)
conn = engine.connect()
df.to_sql("some_table", conn, if_exists="append", index=False, method=to_sql_on_conflict_do_nothing)
I have just had similar problem, and followed by to this answer I came up with solution on how to send df to potgresSQL ON CONFLICT:
1. Send some initial data to the database to create the table
from sqlalchemy import create_engine
engine = create_engine(connection_string)
df.to_sql(table_name,engine)
2. add primary key
ALTER TABLE table_name ADD COLUMN id SERIAL PRIMARY KEY;
3. prepare index on the column (or columns) you want to check the uniqueness
CREATE UNIQUE INDEX review_id ON test(review_id);
4. map the sql table with sqlalchemy
from sqlalchemy.ext.automap import automap_base
ABase = automap_base()
Table = ABase.classes.table_name
Table.__tablename__ = 'table_name'
6. do your insert on conflict with:
from sqlalchemy.dialects.postgresql import insert
insrt_vals = df.to_dict(orient='records')
insrt_stmnt = insert(Table).values(insrt_vals)
do_nothing_stmt = insrt_stmnt.on_conflict_do_nothing(index_elements=['review_id'])
results = engine.execute(do_nothing_stmt)
I am fairly new to the world of programming. I'm using Python, Pandas and SQLlite; and recently I've started to build Postgresql databases. I am trying to query a postgres database and create a Pandas dataframe with the results. I've found that the following works:
import pandas as pd
from sqlalchemy import create_engine # database connection
engine = create_engine('postgresql://postgres:xxxxx#localhost:xxxx/my_postgres_db')
df = pd.read_sql("SELECT * FROM my_table Where province='Saskatchewan'", engine)
The works perfectly but my problem is how to pass user input to the sql query. Specifically, I want to do the following:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=province_name", engine)
However, this returns an error message:
ProgrammingError: (psycopg2.ProgrammingError) column "province_selected" does not exist
LINE 1: SELECT * FROM my_table Where province =province_selec...
Can anyone provide guidance on this matter? In addition, can anyone advise me as to how to handle field names in a postgres database that have characters such as '/'. My database has a field (column header) called CD/CSD and when I try to run a query on that field (similar to code above) I just get error messages. Any help would be greatly appreciated.
You should use the functionality provided by the DBAPI module that SQLAlchemy uses to send parameters to the query. Using psycopg2 that could look like this:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=%s", engine, params=(province_name,))
This is safer than using Python's string formatting to insert the parameter into the query.
Passing parameters using psycopg2
pandas.read_sql documentation
I want to query a PostgreSQL database and return the output as a Pandas dataframe.
I created a connection to the database with 'SqlAlchemy':
from sqlalchemy import create_engine
engine = create_engine('postgresql://user#localhost:5432/mydb')
I write a Pandas dataframe to a database table:
i=pd.read_csv(path)
i.to_sql('Stat_Table',engine,if_exists='replace')
Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine:
a=pd.read_sql_query('select * from Stat_Table',con=engine)
But it throws an error:
ProgrammingError: (ProgrammingError) relation "stat_table" does not exist
I'm using Pandas version 0.14.1.
What's the right way to do this?
You are bitten by the case (in)sensitivity issues with PostgreSQL. If you quote the table name in the query, it will work:
df = pd.read_sql_query('select * from "Stat_Table"',con=engine)
But personally, I would advise to just always use lower case table names (and column names), also when writing the table to the database to prevent such issues.
From the PostgreSQL docs (http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS):
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case
To explain a bit more: you have written a table with the name Stat_Table to the database (and sqlalchemy will quote this name, so it will be written as "Stat_Table" in the postgres database). When doing the query 'select * from Stat_Table' the unquoted table name will be converted to lower case stat_table, and so you get the message that this table is not found.
See eg also Are PostgreSQL column names case-sensitive?
Read postgres sql data in pandas in given below and image link
import psycopg2 as pg
import pandas.io.sql as psql
connection = pg.connect("host=localhost dbname=kinder user=your_username password=your_password")
dataframe = psql.read_sql('SELECT * FROM product_product', connection)
product_category = psql.read_sql_query('select * from product_category', connection)
https://i.stack.imgur.com/1bege.png
Late to the party here, but to give you a full example of this:
import pandas as pd
import psycopg2 as pg
engine = pg.connect("dbname='my_db_name' user='pguser' host='127.0.0.1' port='15432' password='pgpassword'")
df = pd.read_sql('select * from Stat_Table', con=engine)
You need to run the following to install the dependencies for ubuntu:
pip install pandas psycopg2-binary SQLAlchemy
Pandas docs on the subject here
The error message is telling you that a table named:
stat_table
does not exist( a relation is a table in postgres speak). So, of course you can't select rows from it. Check your db after executing:
i.to_sql('Stat_Table',engine,if_exists='replace')
and see if a table by that name got created in your db.
When I use your read statement:
df = pd.read_sql_query('select * from Stat_Table',con=engine)
I get the data back from a postgres db, so there's nothing wrong with it.
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine('postgresql://user#localhost:5432/mydb')
You must specify schema and table
df = pd.read_sql_query("""select * from "dvd-rental".film""", con=engine)