mysql.connector analogue to SQLAlchemy's pd.to_sql - python

While programming on Python and working with SQL dbs I previously used mysql.connector library, but I found out that it takes a lot of rows of code to get your data into SQL table because you need to write the whole SQL query column by column.
From the other side while using pandas there are easy methods working with SQLalchemy library only:
- pd.to_sql
- pd.read_sql_table
(Well, I got errors while using mysql db.cursor() and couldn't find any tutorial besides SQLalchemy + Pandas).
These two methods let you easily get a dataframe from SQL table and create a SQL table from dataframe.
I wonder if there is such analogue in mysql.connector to easily convert dataframe to SQL table and vice versa, since still the syntax of this library for me is more convenient for other actions rather than SQL???
P.S. MySQL initiate code is written just for info and not used in
provided code, though I need to somehow find an analogue
# ------------------- IMPORT -------------------------
import mysql.connector
import pandas as pd
import sqlalchemy
# ------------------- MYSQL + SQL Alchemy -------------------------
mydb = mysql.connector.connect(
host='host',
user='user',
passwd='pass',
database='db'
)
mycursor = mydb.cursor()
engine = sqlalchemy.create_engine('mysql+mysqldb://user:pass#host/db', pool_recycle=3600)
# ------------------- FUNCTIONS ----------------------
def get_function():
df = pd.read_html(
"https://www.url.com")
df[3].to_sql(name=table, con=engine, index=False, if_exists='replace')
# -------------------- MAIN --------------------------
table = 'table_name'
get_function()
print(pd.read_sql_table(table, engine, columns=[]))

Your question isn't very clear, so I'm not completely sure what you're trying to do, but I'll try my best to answer.
So SQLAlchemy is a so called Object-relational-Mapper (like Hibernate in the Java world), which maps between relations (columns, rows, tables) and objects.
Pandas is a data analysis library, that can use SQLAlchemy. SQLAlchemy itself supports a wide range of Databases, including MySQL.
Now I didn't understand whether you'd like to use Pandas + SQLAlchemy + MySQL, or whether you just want a simple way to work with MySQL directly.
In the first case you can simply use Pandas, in the latter case you can use SQLAlchemy directly. Pandas provides documentation and so does SQLAlchemy

Related

Escape a table name in sql alchemy

I need to escape table names for any sqlalchemy engine (I want to expand a library I made for other databases than postgres, see Details section at the end of my post) automatically.
It is possible with columns like this:
from sqlalchemy.sql import column
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
escaped_column = column('%"my_column123?').compile(dialect=engine.dialect)
str(escaped_column)
'"%""my_column123?"'
I (naively) tried the following but it does not work (gives back an empty string):
from sqlalchemy.sql import table
from sqlalchemy import create_engine
engine = create_engine("sqlite:///test.db")
escaped_table_name = table("%table?&;").compile(dialect=engine.dialect)
str(escaped_table_name)
''
Thanks in advance!
Details
I made a library to update PostGres table using pandas DataFrames (see https://github.com/ThibTrip/pangres) and realized that a part of the code is not SQL injection safe (if you are curious here is the part I am talking about: https://github.com/ThibTrip/pangres/blob/7cfa2d2190cf65a1ede8ef30868883f0da3fa5fc/pangres/helpers.py#L270-L290).
I found a way to add columns in a "sqlalchemy way" (adding columns to any table was the reason why I wanted to escaped table names). Unfortunately it is not perfect as it creates a table "alembic_version" for some reasons:
# alembic is a library from the creator of sqlalchemy to migrate databases
from alembic.runtime.migration import MigrationContext # pip install alembic
from alembic.operations import Operations
from sqlalchemy import Column, TEXT, create_engine # pip install sqlalchemy
# create engine
engine = create_engine('sqlite:///test.db')
# add column "some_new_column" of type TEXT in the table 'test'
with engine.connect() as con:
ctx = MigrationContext.configure(con, )
op = Operations(ctx)
op.add_column('test', column=Column('some_new_column', TEXT))
EDIT: it seems the table "alembic_version" has been added somehow by previous tests as I could not reproduce this behavior after dropping the table. So this solution seems good :)!!!

How to pass user input variable to Sqalchemy statement?

I am fairly new to the world of programming. I'm using Python, Pandas and SQLlite; and recently I've started to build Postgresql databases. I am trying to query a postgres database and create a Pandas dataframe with the results. I've found that the following works:
import pandas as pd
from sqlalchemy import create_engine # database connection
engine = create_engine('postgresql://postgres:xxxxx#localhost:xxxx/my_postgres_db')
df = pd.read_sql("SELECT * FROM my_table Where province='Saskatchewan'", engine)
The works perfectly but my problem is how to pass user input to the sql query. Specifically, I want to do the following:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=province_name", engine)
However, this returns an error message:
ProgrammingError: (psycopg2.ProgrammingError) column "province_selected" does not exist
LINE 1: SELECT * FROM my_table Where province =province_selec...
Can anyone provide guidance on this matter? In addition, can anyone advise me as to how to handle field names in a postgres database that have characters such as '/'. My database has a field (column header) called CD/CSD and when I try to run a query on that field (similar to code above) I just get error messages. Any help would be greatly appreciated.
You should use the functionality provided by the DBAPI module that SQLAlchemy uses to send parameters to the query. Using psycopg2 that could look like this:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=%s", engine, params=(province_name,))
This is safer than using Python's string formatting to insert the parameter into the query.
Passing parameters using psycopg2
pandas.read_sql documentation

Copy tables from one database to another in SQL Server, using Python

Does anybody know of a good Python code that can copy large number of tables (around 100 tables) from one database to another in SQL Server?
I ask if there is a way to do it in Python, because due to restrictions at my place of employment, I cannot copy tables across databases inside SQL Server alone.
Here is a simple Python code that copies one table from one database to another. I am wondering if there is a better way to write it if I want to copy 100 tables.
print('Initializing...')
import pandas as pd
import sqlalchemy
import pyodbc
db1 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_one")
db2 = sqlalchemy.create_engine("mssql+pyodbc://user:password#db_two")
print('Writing...')
query = '''SELECT * FROM [dbo].[test_table]'''
df = pd.read_sql(query, db1)
df.to_sql('test_table', db2, schema='dbo', index=False, if_exists='replace')
print('(1) [test_table] copied.')
SQLAlchemy is actually a good tool to use to create identical tables in the second db:
table = Table('test_table', metadata, autoload=True, autoload_with=db1)
table.create(engine=db2)
This method will also produce correct keys, indexes, foreign keys. Once the needed tables are created, you can move the data by either select/insert if the tables are relatively small or use bcp utility to dump table to disk and then load it into the second database (much faster but more work to get it to work correctly)
If using select/insert then it is better to insert in batches of 500 records or so.
You can do something like this:
tabs = pd.read_sql("SELECT table_name FROM INFORMATION_SCHEMA.TABLES", db1)
for tab in tabs['table_name']:
pd.read_sql("select * from {}".format(tab), db1).to_sql(tab, db2, index=False)
But it might be be awfully slow. Use SQL Server tools to do this job.
Consider using sp_addlinkedserver procedure to link one SQL Server from another. After that you can execute:
SELECT * INTO server_name...table_name FROM table_name
for all tables from the db1 database.
PS this might be done in Python + SQLAlchemy as well...

How to create a table in one DB from the result of a query from different DB in Python

I'm working with two separate databases (Oracle and PostgreSQL) where a substantial amount of reference data is in Oracle that I'll frequently need to reference while doing analysis on data stored in Postgres (I have no control over this part). I'd like to be able to directly transfer the results of a query from Oracle to a table in Postgres (and vice versa). The closest I've gotten is something like the code below using Pandas as a go between, but it's quite slow.
import pandas as pd
import psycopg2
import cx_Oracle
import sqlalchemy
# CONNECT TO POSTGRES
db__PG = psycopg2.connect(dbname="username", user="password")
eng_PG = sqlalchemy.create_engine('postgresql://username:password#localhost:5432/db_one')
# CONNECT TO ORACLE
db__OR = cx_Oracle.connect('username', 'password', 'SID_Name')
eng_OR = sqlalchemy.create_engine('oracle+cx_oracle://username:password#10.1.2.3:4422/db_two')
#DEFINE QUERIES
query_A = "SELECT * FROM tbl_A"
query_B = "SELECT * FROM tbl_B"
# CREATE PANDAS DATAFRAMES FROM QUERY RESULTS
df_A = pd.read_sql_query(query_A, db__OR)
df_B = pd.read_sql_query(query_B, db__PG)
# CREATE TABLES FROM PANDAS DATAFRAMES
df_A.to_sql(name='tbl_a', con=eng_PG, if_exists='replace', index=False)
df_B.to_sql(name='tbl_b', con=eng_OR, if_exists='replace', index=False)
I think there would have to be a more efficient, direct way to do this (like database links for moving data across different DB's in Oracle), but I'm fairly new to Python and have generally worked either directly in SQL or in SAS previously. I've been searching for ways to create a table directly from a python cursor result or a SQLAlchemy ResultProxy, but haven't had much luck.
Suggestions?

Return Pandas dataframe from PostgreSQL query with sqlalchemy

I want to query a PostgreSQL database and return the output as a Pandas dataframe.
I created a connection to the database with 'SqlAlchemy':
from sqlalchemy import create_engine
engine = create_engine('postgresql://user#localhost:5432/mydb')
I write a Pandas dataframe to a database table:
i=pd.read_csv(path)
i.to_sql('Stat_Table',engine,if_exists='replace')
Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine:
a=pd.read_sql_query('select * from Stat_Table',con=engine)
But it throws an error:
ProgrammingError: (ProgrammingError) relation "stat_table" does not exist
I'm using Pandas version 0.14.1.
What's the right way to do this?
You are bitten by the case (in)sensitivity issues with PostgreSQL. If you quote the table name in the query, it will work:
df = pd.read_sql_query('select * from "Stat_Table"',con=engine)
But personally, I would advise to just always use lower case table names (and column names), also when writing the table to the database to prevent such issues.
From the PostgreSQL docs (http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS):
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case
To explain a bit more: you have written a table with the name Stat_Table to the database (and sqlalchemy will quote this name, so it will be written as "Stat_Table" in the postgres database). When doing the query 'select * from Stat_Table' the unquoted table name will be converted to lower case stat_table, and so you get the message that this table is not found.
See eg also Are PostgreSQL column names case-sensitive?
Read postgres sql data in pandas in given below and image link
import psycopg2 as pg
import pandas.io.sql as psql
connection = pg.connect("host=localhost dbname=kinder user=your_username password=your_password")
dataframe = psql.read_sql('SELECT * FROM product_product', connection)
product_category = psql.read_sql_query('select * from product_category', connection)
https://i.stack.imgur.com/1bege.png
Late to the party here, but to give you a full example of this:
import pandas as pd
import psycopg2 as pg
engine = pg.connect("dbname='my_db_name' user='pguser' host='127.0.0.1' port='15432' password='pgpassword'")
df = pd.read_sql('select * from Stat_Table', con=engine)
You need to run the following to install the dependencies for ubuntu:
pip install pandas psycopg2-binary SQLAlchemy
Pandas docs on the subject here
The error message is telling you that a table named:
stat_table
does not exist( a relation is a table in postgres speak). So, of course you can't select rows from it. Check your db after executing:
i.to_sql('Stat_Table',engine,if_exists='replace')
and see if a table by that name got created in your db.
When I use your read statement:
df = pd.read_sql_query('select * from Stat_Table',con=engine)
I get the data back from a postgres db, so there's nothing wrong with it.
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine('postgresql://user#localhost:5432/mydb')
You must specify schema and table
df = pd.read_sql_query("""select * from "dvd-rental".film""", con=engine)

Categories

Resources