I am fairly new to the world of programming. I'm using Python, Pandas and SQLlite; and recently I've started to build Postgresql databases. I am trying to query a postgres database and create a Pandas dataframe with the results. I've found that the following works:
import pandas as pd
from sqlalchemy import create_engine # database connection
engine = create_engine('postgresql://postgres:xxxxx#localhost:xxxx/my_postgres_db')
df = pd.read_sql("SELECT * FROM my_table Where province='Saskatchewan'", engine)
The works perfectly but my problem is how to pass user input to the sql query. Specifically, I want to do the following:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=province_name", engine)
However, this returns an error message:
ProgrammingError: (psycopg2.ProgrammingError) column "province_selected" does not exist
LINE 1: SELECT * FROM my_table Where province =province_selec...
Can anyone provide guidance on this matter? In addition, can anyone advise me as to how to handle field names in a postgres database that have characters such as '/'. My database has a field (column header) called CD/CSD and when I try to run a query on that field (similar to code above) I just get error messages. Any help would be greatly appreciated.
You should use the functionality provided by the DBAPI module that SQLAlchemy uses to send parameters to the query. Using psycopg2 that could look like this:
province_name = 'Saskatchewan' #user input
df = pd.read_sql("SELECT * FROM my_table Where province=%s", engine, params=(province_name,))
This is safer than using Python's string formatting to insert the parameter into the query.
Passing parameters using psycopg2
pandas.read_sql documentation
Related
I need to select some columns from a table with SQL Alchemy. Everything works fine except selecting the one column with '/' in the name. My query looks like:
query = select([func.sum(Table.c.ColumnName),
func.sum(Table.c.Column/Name),
])
Obviously the issue comes from the second line with the column 'Column/Name'. Is there a way in SQL Alchemy to overcome special characters in a column name?
edit:
I've it all inside some class but simplified version of a process looks like this. I create an engine (all necessary db data is inside create_new_engine() function) and map all tables in db into metadata.
def map(self):
from sqlalchemy.engine.base import Engine
# check if engine exist
if not isinstance(self.engine, Engine):
self.create_new_engine()
self.metadata = MetaData({'schema': 'dbo'})
self.metadata.reflect(bind=self.engine)
Then I map a single table with:
def map_table(self, table_name):
table = "{schema}.{table_name}".format(schema=self.metadata.schema, table_name=table_name)
table = self.metadata.tables[table]
return table
In the end I use pandas read_sql_query to run above query with connection and engine established earlier.
I'm connecting to SQL Server.
Since Table.c points to a plain python obect. Try in pure Python
query = select([func.sum(Table.c.ColumnName),
func.sum(getattr(Table.c, 'Column/Name')),
])
So in your case (from comments above) :
func.sum(getattr(Table.c, 'cur/fees'))
I have a data frame in jupyter notebook. My objective is to import this df into snowflake as a new table.
Is there any way to write a new table into snowflake directly without defining any table columns' names and types?
i am using
import snowflake.connector as snow
from snowflake.connector.pandas_tools import write_pandas
from sqlalchemy import create_engine
import pandas as pd
connection = snowflake.connector.connect(
user='XXX',
password='XXX',
account='XXX',
warehouse='COMPUTE_WH',
database= 'SNOWPLOW',
schema = 'DBT_WN'
)
df.to_sql('aaa', connection, index = False)
it ran into an error:
DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': not all arguments converted during string formatting
Can anyone provide the sample code to fix this issue?
Here's one way to do it -- apologies in advance for my code formatting in SO combined with python's spaces vs tabs "model". Check the tabs/spaces if you cut-n-paste ...
Because of the Snowsql security model, in your connection parameters be sure to specify the ROLE you are using as well. (Often the default role is 'PUBLIC')
Since you already have sqlAlchemy in the mix ... this idea doesn't use the snowflake write_pandas, so it isn't a good answer for large dataframes ... Some odd behaviors with sqlAlchemy and Snowflake; make sure the dataframe column names are upper case; yet use a lowercase table name in the argument to to_sql() ...
def df2sf_alch(target_df, target_table):
# create a sqlAlchemy connection object
engine = create_engine(f"snowflake://{your-sf-account-url}",
creator=lambda:connection)
# re/create table in Snowflake
try:
# sqlAlchemy creates table based on a lower-case table name
# and it works to have uppercase df column names
target_df.to_sql(target_table.lower(), con=engine, if_exists='replace', index=False)
print(f"Table {target_table.upper()} re/created")
except Exception as e:
print(f"Could not replace table {target_table.upper()}", exc_info=1)
nrows = connection.cursor().execute(f"select count(*) from {target_table}").fetchone()[0]
print(f"Table {target_table.upper()} rows = {nrows}")
Note this function needs to be changed to reflect the appropriate 'snowflake account url' in order to create the sqlAlchemy connection object. Also, assuming the case naming oddities are taken care of in the df, along with your already defined connection, you'd call this function simply passing the df and the name of the table, like df2sf_alch(my_df, 'MY_TABLE')
While programming on Python and working with SQL dbs I previously used mysql.connector library, but I found out that it takes a lot of rows of code to get your data into SQL table because you need to write the whole SQL query column by column.
From the other side while using pandas there are easy methods working with SQLalchemy library only:
- pd.to_sql
- pd.read_sql_table
(Well, I got errors while using mysql db.cursor() and couldn't find any tutorial besides SQLalchemy + Pandas).
These two methods let you easily get a dataframe from SQL table and create a SQL table from dataframe.
I wonder if there is such analogue in mysql.connector to easily convert dataframe to SQL table and vice versa, since still the syntax of this library for me is more convenient for other actions rather than SQL???
P.S. MySQL initiate code is written just for info and not used in
provided code, though I need to somehow find an analogue
# ------------------- IMPORT -------------------------
import mysql.connector
import pandas as pd
import sqlalchemy
# ------------------- MYSQL + SQL Alchemy -------------------------
mydb = mysql.connector.connect(
host='host',
user='user',
passwd='pass',
database='db'
)
mycursor = mydb.cursor()
engine = sqlalchemy.create_engine('mysql+mysqldb://user:pass#host/db', pool_recycle=3600)
# ------------------- FUNCTIONS ----------------------
def get_function():
df = pd.read_html(
"https://www.url.com")
df[3].to_sql(name=table, con=engine, index=False, if_exists='replace')
# -------------------- MAIN --------------------------
table = 'table_name'
get_function()
print(pd.read_sql_table(table, engine, columns=[]))
Your question isn't very clear, so I'm not completely sure what you're trying to do, but I'll try my best to answer.
So SQLAlchemy is a so called Object-relational-Mapper (like Hibernate in the Java world), which maps between relations (columns, rows, tables) and objects.
Pandas is a data analysis library, that can use SQLAlchemy. SQLAlchemy itself supports a wide range of Databases, including MySQL.
Now I didn't understand whether you'd like to use Pandas + SQLAlchemy + MySQL, or whether you just want a simple way to work with MySQL directly.
In the first case you can simply use Pandas, in the latter case you can use SQLAlchemy directly. Pandas provides documentation and so does SQLAlchemy
Pandas to_sql is not working.
I initialize my engine using sqlalchemy. I also have cx_Oracle imported just in case. For the test purpose, I used the code example provided by pandas.DataFrame.to_sql documentation. Following is an example:
import pandas as pd
from sqlalchemy import create_engine
import cx_Oracle
# specify oracle connection string
oracle_connection_string = (
'oracle+cx_oracle://{username}:{password}#' +
cx_Oracle.makedsn('{hostname}', '{port}', service_name='{service_name}')
)
# create oracle engine
engine = create_engine(
oracle_connection_string.format(
username='username',
password='password',
hostname='aaa.aaa.com',
port='1521',
service_name='aaa.aaa.com',
)
)
# test to_sql
# Create a table from scratch with 3 rows
df = pd.DataFrame({'name' : ['User 1', 'User 2', 'User 3']})
df.to_sql('schema.table_name', con=engine, if_exists='replace', index_label='id')
engine.execute("SELECT * FROM schema.table_name").fetchall()
If the table was not created in the first place, I will get "ORA-00942: table or view does not exist" error. Otherwise, I will just get a blank list.
FYI, I have tested the read_sql function and it works. The other approach I have tried is to specify the schema in the parameter statement, please refer to the links provided in the comment section.
According to my understanding, sqlalchemy also calls cx_Oracle. However, insertion using cx_Oracle works.
Just want to know if anyone has experienced this before. Thanks.
Pandas has named your table schema.table_name under your default schema (using a quoted identifier), not table table_name under schema schema. Use the keyword argument schema to define it:
df.to_sql('table_name', schema='schema', con=engine, if_exists='replace', index_label='id')
My guess is that you passed the schema and table name in similar manner to read_sql as you did to to_sql, so it worked because it quoted the identifier, where as your raw query did not.
I want to query a PostgreSQL database and return the output as a Pandas dataframe.
I created a connection to the database with 'SqlAlchemy':
from sqlalchemy import create_engine
engine = create_engine('postgresql://user#localhost:5432/mydb')
I write a Pandas dataframe to a database table:
i=pd.read_csv(path)
i.to_sql('Stat_Table',engine,if_exists='replace')
Based on the docs, looks like pd.read_sql_query() should accept a SQLAlchemy engine:
a=pd.read_sql_query('select * from Stat_Table',con=engine)
But it throws an error:
ProgrammingError: (ProgrammingError) relation "stat_table" does not exist
I'm using Pandas version 0.14.1.
What's the right way to do this?
You are bitten by the case (in)sensitivity issues with PostgreSQL. If you quote the table name in the query, it will work:
df = pd.read_sql_query('select * from "Stat_Table"',con=engine)
But personally, I would advise to just always use lower case table names (and column names), also when writing the table to the database to prevent such issues.
From the PostgreSQL docs (http://www.postgresql.org/docs/8.0/static/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS):
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to lower case
To explain a bit more: you have written a table with the name Stat_Table to the database (and sqlalchemy will quote this name, so it will be written as "Stat_Table" in the postgres database). When doing the query 'select * from Stat_Table' the unquoted table name will be converted to lower case stat_table, and so you get the message that this table is not found.
See eg also Are PostgreSQL column names case-sensitive?
Read postgres sql data in pandas in given below and image link
import psycopg2 as pg
import pandas.io.sql as psql
connection = pg.connect("host=localhost dbname=kinder user=your_username password=your_password")
dataframe = psql.read_sql('SELECT * FROM product_product', connection)
product_category = psql.read_sql_query('select * from product_category', connection)
https://i.stack.imgur.com/1bege.png
Late to the party here, but to give you a full example of this:
import pandas as pd
import psycopg2 as pg
engine = pg.connect("dbname='my_db_name' user='pguser' host='127.0.0.1' port='15432' password='pgpassword'")
df = pd.read_sql('select * from Stat_Table', con=engine)
You need to run the following to install the dependencies for ubuntu:
pip install pandas psycopg2-binary SQLAlchemy
Pandas docs on the subject here
The error message is telling you that a table named:
stat_table
does not exist( a relation is a table in postgres speak). So, of course you can't select rows from it. Check your db after executing:
i.to_sql('Stat_Table',engine,if_exists='replace')
and see if a table by that name got created in your db.
When I use your read statement:
df = pd.read_sql_query('select * from Stat_Table',con=engine)
I get the data back from a postgres db, so there's nothing wrong with it.
import sqlalchemy
import psycopg2
engine = sqlalchemy.create_engine('postgresql://user#localhost:5432/mydb')
You must specify schema and table
df = pd.read_sql_query("""select * from "dvd-rental".film""", con=engine)