I am using pandas.io.sql to execute a SQL script that contains CTE and would like to do something like this:
import pandas.io.sql as psql
param1 = 'park'
param2 = 'zoo'
sqlstr = ("""WITH CTE_A AS (
SELECT *
FROM A
WHERE A.Location = param1),
CTE_B AS (
SELECT *
FROM B
WHERE B.Location = param2)
SELECT A.*, B.*
FROM C
INNER JOIN A
ON C.something = A.something
INNER JOIN B
ON C.something = B.something
WHERE C.combined = param1 || param2
)
I would like to do something like this
result = psql.frame_query(sqlstr, con = db, params = (param1,param2))
Could anyone help me in passing the two parameters using Pandas?
The only way that I know how to do something like this is to do the following. It doesn't however take advantage of the psql package in Pandas.
import pyodbc
import pandas
conn = pyodbc.connect('yourconnectionstring')
curs = conn.cursor()
param1 = 'park'
param2 = 'zoo'
sqlstr = """WITH CTE_A AS (
SELECT *
FROM A
WHERE A.Location = param1),
CTE_B AS (
SELECT *
FROM B
WHERE B.Location = param2)
SELECT A.*, B.*
FROM C
INNER JOIN A
ON C.something = A.something
INNER JOIN B
ON C.something = B.something
WHERE C.combined = ?|| ?;"""
q = curs.execute(sqlstr,[param1,param2]).fetchall()
df = pandas.DataFrame(q)
curs.close()
conn.close()
This passes parameters to avoid SQL injection and ends with a DataFrame object containing your results
When using pandas.io.sql in combination with mysql.connector the syntax is as follows:
import pandas.io.sql as psql
import mysql.connector as mysql
db = mysql.connector(host="localhost",user="user",passwd="password")
hour = 7
result = psql.read_sql("select * from table where
`hour` > %(hour)s and `name` = %(name)s",con=db,params={'hour':hour,'name':'John'})
So just enter %(name)s in the query, replace 'name' with whatever name you want. And add an dictionary for params.
I this option does add '' to the string so if for example you need to use it in a table name then this doesn't work. I use regex to clean the string for this. i.e.
import re
table_name = re.sub(r'[\W]', ' ',table_name)
(use r'[\W_]' if the table name also doesn't have underscores)
Related
I've a script that makes a query to my database on MySQL. And my doubt is if I can pass any parameter to that query through Python.
For example, on the following script I want to calculate the date_filter using Python and then apply that filter on the query.
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = connection.cursor(buffered=True)
query = ('''
SELECT *
FROM myTable
WHERE date >= ''' + date_filter + '''
''')
I try it on that way but I got the following error:
builtins.TypeError: can only concatenate str (not "datetime.datetime") to str
It is possible to apply the filter like that?
Thanks!
Yes, you can do it. To avoid sql injections, the best way is not using the python formatting facilities, but the sql parameters & placeholders (see that you donĀ“t need the single quotes ' as the placeholder does the job and converts the type of the variable):
now = dt.datetime.now()
date_filter = now - timedelta(days=3)
dq_connection = mysql.connector.connect(user='user', password='pass', host='localhost', database='db')
engine = create_engine('localhost/db')
cursor = db_connection.cursor(buffered=True)
query = "SELECT * FROM myTable WHERE date >=%s"
cursor.execute(query,(date_filter,))
Also, you had a mistake in your cursor, it should be db_connection.cursor. The last comma after date_filter is ok because you need to send a tuple.
In case you need more than one paremeter, you can place more than one placeholder:
query = "SELECT * FROM myTable WHERE date >=%s and date<=%s"
cursor.execute(query,(date_filter,other_date))
You can just do something like:
WHERE date >= ''' + str(date_filter) + '''
to represent the date as string as not a datetime object.
You can try with this:
date_f = str(date_filter)
query = ('''
SELECT *
FROM myTable
WHERE date >= "{}"
'''.format(date_f))
I am writing a sql query where I want to pass a WHERE condition with parameters in pandas.read_sql_query.
It works fine for the value but I encounters problems with the variable.
My workaround is a concated string which I pass to pandas, but I don't like to see my code so.
I already figured out, that the column name of the table is written wrong. It is e.g. 'colname' instead of colname.
I wrote the sql as string:
command=("SELECT * FROM review r "
"WHERE 1=1 "
"AND "+selected_var+"= "+selected_val
)
And then i passed it to pandas
self.reviews = pd.read_sql_query(command, con = self.cnxn)
But I would like to include it without workaround.
import pandas as pd
import mysql.connector
self.reviews = pd.read_sql_query("""
SELECT *
FROM review r
WHERE 1=1
AND %(sel_var)s = %(sel_val)s;
""", con = self.cnxn, params = {'sel_var': selected_var,
'sel_val': selected_val
})
I expect that the query shows results without writing everything as command string.
What about string formatting?
input_params = {'sel_var': selected_var,
'sel_val': selected_val}
self.reviews = pd.read_sql_query(""" SELECT * FROM review r WHERE 1=1
AND {sel_var}={sel_val};""".format(**input_params),
con = self.cnxn)
I want to read a table stored in HANA directly from python. For that I use the following code:
from hdbcli import dbapi
import pandas as pd
conn = dbapi.connect(
address="address",
port=XYZ,
user="user",
password="password"
)
print (conn.isconnected())
# Fetch table data
stmnt = "select * from '_SYS_NAME'.'part1.part2.part3.part4.part5.part6/table_name'"
cursor = conn.cursor()
cursor.execute(stmnt)
result = cursor.fetchall()
print('Create the dataframe')
The problem is in the line stmnt: I tried different ways of puting the path name so that python can read it as a string but none is working. I know the problem is not relying on the technique, because if the path is simple and not containing the special characters then the code works.
I tried all the following combinations (among others):
stmnt = "select * from '_SYS_NAME'.'part1.part2.part3.part4.part5.part6/table_name'"
stmnt = """select * from '_SYS_NAME'.'part1.part2.part3.part4.part5.part6/table_name'"""
stmnt = "select * from \'_SYS_NAME\'\.\'part1.part2.part3.part4.part5.part6/table_name\'
stmnt = """select * from \'_SYS_NAME\'\.\'part1.part2.part3.part4.part5.part6/table_name\'"""
The error I get is always the following:
hdbcli.dbapi.Error: (257, 'sql syntax error: incorrect syntax near "_SYS_NAME": line 1 col 1 (at pos 1)')
And the original path as I get it from SQL is:
'_SYS_NAME'.'part1.part2.part3.part4.part5.part6/table_name'
Any ideas what I am missing?
You should reverse your quotes:
stmnt = 'select * from "_SYS_BIC"."rwev.dev.bw.project.si.churn/SI_CV_CHU_7_DATA_MODEL"'
In SQLAlchemy, how do I populate or update a table from a SELECT statement?
SQLalchemy doesn't build this construct for you. You can use the query from text.
session.execute('INSERT INTO t1 (SELECT * FROM t2)')
EDIT:
More than one year later, but now on sqlalchemy 0.6+ you can create it:
from sqlalchemy.ext import compiler
from sqlalchemy.sql.expression import Executable, ClauseElement
class InsertFromSelect(Executable, ClauseElement):
def __init__(self, table, select):
self.table = table
self.select = select
#compiler.compiles(InsertFromSelect)
def visit_insert_from_select(element, compiler, **kw):
return "INSERT INTO %s (%s)" % (
compiler.process(element.table, asfrom=True),
compiler.process(element.select)
)
insert = InsertFromSelect(t1, select([t1]).where(t1.c.x>5))
print insert
Produces:
"INSERT INTO mytable (SELECT mytable.x, mytable.y, mytable.z FROM mytable WHERE mytable.x > :x_1)"
Another EDIT:
Now, 4 years later, the syntax is incorporated in SQLAlchemy 0.9, and backported to 0.8.3; You can create any select() and then use the new from_select() method of Insert objects:
>>> from sqlalchemy.sql import table, column
>>> t1 = table('t1', column('a'), column('b'))
>>> t2 = table('t2', column('x'), column('y'))
>>> print(t1.insert().from_select(['a', 'b'], t2.select().where(t2.c.y == 5)))
INSERT INTO t1 (a, b) SELECT t2.x, t2.y
FROM t2
WHERE t2.y = :y_1
More information in the docs.
As of 0.8.3, you can now do this directly in sqlalchemy: Insert.from_select:
sel = select([table1.c.a, table1.c.b]).where(table1.c.c > 5)
ins = table2.insert().from_select(['a', 'b'], sel)
As Noslko pointed out in comment, you can now get rid of raw sql:
http://www.sqlalchemy.org/docs/core/compiler.html#compiling-sub-elements-of-a-custom-expression-construct
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import Executable, ClauseElement
class InsertFromSelect(Executable, ClauseElement):
def __init__(self, table, select):
self.table = table
self.select = select
#compiles(InsertFromSelect)
def visit_insert_from_select(element, compiler, **kw):
return "INSERT INTO %s (%s)" % (
compiler.process(element.table, asfrom=True),
compiler.process(element.select)
)
insert = InsertFromSelect(t1, select([t1]).where(t1.c.x>5))
print insert
Produces:
INSERT INTO mytable (SELECT mytable.x, mytable.y, mytable.z FROM mytable WHERE mytable.x > :x_1)
I'm working with an Oracle database. I can do this much:
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle as odb
conn = odb.connect(_user +'/'+ _pass +'#'+ _dbenv)
sqlStr = "SELECT * FROM customers"
df = psql.frame_query(sqlStr, conn)
But I don't know how to handle bind variables, like so:
sqlStr = """SELECT * FROM customers
WHERE id BETWEEN :v1 AND :v2
"""
I've tried these variations:
params = (1234, 5678)
params2 = {"v1":1234, "v2":5678}
df = psql.frame_query((sqlStr,params), conn)
df = psql.frame_query((sqlStr,params2), conn)
df = psql.frame_query(sqlStr,params, conn)
df = psql.frame_query(sqlStr,params2, conn)
The following works:
curs = conn.cursor()
curs.execute(sqlStr, params)
df = pd.DataFrame(curs.fetchall())
df.columns = [rec[0] for rec in curs.description]
but this solution is just...inellegant. If I can, I'd like to do this without creating the cursor object. Is there a way to do the whole thing using just pandas?
Try using pandas.io.sql.read_sql_query. I used pandas version 0.20.1, I used it, it worked out:
import pandas as pd
import pandas.io.sql as psql
import cx_Oracle as odb
conn = odb.connect(_user +'/'+ _pass +'#'+ _dbenv)
sqlStr = """SELECT * FROM customers
WHERE id BETWEEN :v1 AND :v2
"""
pars = {"v1":1234, "v2":5678}
df = psql.frame_query(sqlStr, conn, params=pars)
As far as I can tell, pandas expects that the SQL string be completely formed prior to passing it along. With that in mind, I would (and always do) use string interpolation:
params = (1234, 5678)
sqlStr = """
SELECT * FROM customers
WHERE id BETWEEN %d AND %d
""" % params
print(sqlStr)
which gives
SELECT * FROM customers
WHERE id BETWEEN 1234 AND 5678
So that should feed into psql.frame_query just fine. (it does in my experience with postgres, mysql, and sql server).