I'm trying to create a dynamic PostgreSQL query in Python:
import psycopg2
import pandas as pd
V_ID=111
conn = psycopg2.connect(host="xxxx", port = 5432, database="xxxx", user="xxxxx", password="xxx")
df= pd.read_sql_query
("""SELECT u.user_name, sum(s.sales_amount)
FROM public.users u left join public.sales s on u.id=s.user_id
WHERE USER_ID = v_ID
Group by u.user_name""",con=conn)
df.head()
When I set a fixed ID then a query ran fine, but if I set a variable "V_ID", then get an error.
Help me please, how to properly place a variable within a query...
You can use string formatting to pass the value in the query string. You can read more about string formatting here: https://www.w3schools.com/python/ref_string_format.asp
v_ID = "v_id that you want to pass"
query = """SELECT u.user_name, sum(s.sales_amount)
FROM public.users u left join public.sales s on u.id=s.user_id
WHERE USER_ID = {v_ID}
Group by u.user_name""".format(v_ID=v_ID)
df= pd.read_sql_query(query ,con=conn)
As mentioned by #Adrian in the comments, string formatting is not the right way to do this.
More details here: https://www.psycopg.org/docs/usage.html#passing-parameters-to-sql-queries
As per the docs, params can be list, tuple or dict.
The syntax used to pass parameters is database driver dependent. Check
your database driver documentation for which of the five syntax
styles, described in PEP 249’s paramstyle, is supported. Eg. for
psycopg2, uses %(name)s so use params={‘name’ : ‘value’}.
Here is how you can do this in the case of psycopg2.
v_ID = "v_id that you want to pass"
query = """SELECT u.user_name, sum(s.sales_amount)
FROM public.users u left join public.sales s on u.id=s.user_id
WHERE USER_ID = %(v_ID)s
Group by u.user_name"""
df= pd.read_sql_query(query ,con=conn, params={"v_ID":v_ID})
Related
I'm trying to run SQL statements through Python on a list.
By passing in a list, in this case date. Since i want to run multiple SELECT SQL queries and return them.
I've tested this by passing in integers, however when trying to pass in a date I am getting ORA-01036 error. Illegal variable name/number. I'm using an Oracle DB.
cursor = connection.cursor()
date = ["'01-DEC-21'", "'02-DEC-21'"]
sql = "select * from table1 where datestamp = :date"
for item in date:
cursor.execute(sql,id=item)
res=cursor.fetchall()
print(res)
Any suggestions to make this run?
You can't name a bind variable date, it's an illegal name. Also your named variable in cursor.execute should match the bind variable name. Try something like:
sql = "select * from table1 where datestamp = :date_input"
for item in date:
cursor.execute(sql,date_input=item)
res=cursor.fetchall()
print(res)
Some recommendation and warnings to your approach:
you should not depend on your default NLS date setting, while binding a String (e.g. "'01-DEC-21'") to a DATE column. (You probably need also remone one of the quotes).
You should ommit to fetch data in a loop if you can fetch them in one query (using an IN list)
use prepared statement
Example
date = ['01-DEC-21', '02-DEC-21']
This generates the query that uses bind variables for your input list
in_list = ','.join([f" TO_DATE(:d{ind},'DD-MON-RR','NLS_DATE_LANGUAGE = American')" for ind, d in enumerate(date)])
sql_query = "select * from table1 where datestamp in ( " + in_list + " )"
The sql_query generate is
select * from table1 where datestamp in
( TO_DATE(:d0,'DD-MON-RR','NLS_DATE_LANGUAGE = American'), TO_DATE(:d1,'DD-MON-RR','NLS_DATE_LANGUAGE = American') )
Note that the INlist contains one bind variable for each member of your input list.
Note also the usage of to_date with explicite mask and fixing the language to avoid problems with interpretation of the month abbreviation. (e.g. ORA-01843: not a valid month)
Now you can use the query to fetch the data in one pass
cur.prepare(sql_query)
cur.execute(None, date)
res = cur.fetchall()
I am trying to use a registered virtual table as a table in a SQL statement using a connection to another database. I can't just turn the column into a string and use that, I need the table/dataframe itself to work in the statement and join with the other tables in the SQL statment. I'm trying this out on an Access database to start. This is what I have so far:
import pyodbc
import pandas as pd
import duckdb
conn = duckdb.connect()
starterset = pd.read_excel (r'e:\Data Analytics\Python_Projects\Applications\DB_Test.xlsx')
conn.register("test_starter", starterset)
IDS = conn.execute("SELECT * FROM test_starter WHERE ProjectID > 1").fetchdf()
StartDate = '1/1/2015'
EndDate = '12/1/2021'
# establish the connection
connt = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=E:\Databases\Offline.accdb;')
cursor = conn.cursor()
# Run the query
query = ("Select ProjectID, Revenue, ClosedDate from Projects INNER JOIN " + IDS + " Z on Z.ProjectID = Projects.ProjectID "
"where ClosedDate between #" + StartDate + "# and #" + EndDate + "# AND Revenue > 0 order by ClosedDate")
sfd
df = pd.read_sql(query, connt)
df.to_excel(r'TEMP.xlsx', index=False)
os.system("start EXCEL.EXE TEMP.xlsx")
# Close the connection
cursor.close()
connt.close()
I have a list of IDs in the excel sheet that I'm trying to use as a filter from the database query. Ultimately, this will form into several criteria from the same table: dates, revenue, and IDs among others.
Honestly, I'm surprised I'm having so much trouble doing this. In SAS, with PROC SQL, it's so easy, but I can't get a dataframe to interface within the SQL parameters how I need it to. Am I making a syntax mistake?
Most common error so far is "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U55'), dtype('<U55')) -> dtype('<U55')", but the types are the same.
It looks like you are pushing the contents of a DataFrame into an Access database query. I don't think there is a native way to do this in Pandas. The technique I use is database vendor specific, but I just build up a text string as either a CTE/WITH Clause or a temporary table.
Ex:
"""WITH my_data as (
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
...
)
[Your original query here]
"""
I am working in Python/Django with a MySQL Db. This sql query works fine in my Workbench
SELECT * FROM frontend_appliance_sensor_reading
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x'28E86479972003D2') AND
appliance_id = 185)
I am returning the latest record for a sensor. The hex string is the sensor ID that I need to pass in as a variable in python.
Here is my python function that would return this object
def get_last_sensor_reading(appliance_id, sensor_id):
dw_conn = connection
dw_cur = dw_conn.cursor()
appliance_id_lookup = appliance_id
dw_cur.execute('''
SELECT * FROM frontend_appliance_sensor_reading as sr
WHERE id = (SELECT max(id) FROM frontend_appliance_sensor_reading WHERE sensor_id = hex(x{sensor_id_lookup}) AND
appliance_id = {appliance_id_lookup})
'''.format(appliance_id_lookup=appliance_id_lookup, sensor_id_lookup = str(sensor_id)))
values = dw_cur.fetchall()
dw_cur.close()
dw_conn.close()
print values
However it seems to concat the x with the variable like this:
(1054, "Unknown column 'x9720032F0100DE86' in 'where clause'")
I have tried various string combinations to get it to execute correctly with no luck. What am I missing? Does the x actually get interpreted as a str? Or should I be converting it to something else prior?
Also, I cannot not use Django's ORM for this as the sensor id is stored in a BinaryField as BLOB data. You cannot filter by BLOB data in Django. This is the reason I am using a sql command instead of just doing SensorReading.objects.filter(sensor_id = sensor).latest('id)
I'm trying to do this query in sqlalchemy
SELECT id, name FROM user WHERE id IN (123, 456)
I would like to bind the list [123, 456] at execution time.
How about
session.query(MyUserClass).filter(MyUserClass.id.in_((123,456))).all()
edit: Without the ORM, it would be
session.execute(
select(
[MyUserTable.c.id, MyUserTable.c.name],
MyUserTable.c.id.in_((123, 456))
)
).fetchall()
select() takes two parameters, the first one is a list of fields to retrieve, the second one is the where condition. You can access all fields on a table object via the c (or columns) property.
Assuming you use the declarative style (i.e. ORM classes), it is pretty easy:
query = db_session.query(User.id, User.name).filter(User.id.in_([123,456]))
results = query.all()
db_session is your database session here, while User is the ORM class with __tablename__ equal to "users".
An alternative way is using raw SQL mode with SQLAlchemy, I use SQLAlchemy 0.9.8, python 2.7, MySQL 5.X, and MySQL-Python as connector, in this case, a tuple is needed. My code listed below:
id_list = [1, 2, 3, 4, 5] # in most case we have an integer list or set
s = text('SELECT id, content FROM myTable WHERE id IN :id_list')
conn = engine.connect() # get a mysql connection
rs = conn.execute(s, id_list=tuple(id_list)).fetchall()
Hope everything works for you.
Just wanted to share my solution using sqlalchemy and pandas in python 3. Perhaps, one would find it useful.
import sqlalchemy as sa
import pandas as pd
engine = sa.create_engine("postgresql://postgres:my_password#my_host:my_port/my_db")
values = [val1,val2,val3]
query = sa.text("""
SELECT *
FROM my_table
WHERE col1 IN :values;
""")
query = query.bindparams(values=tuple(values))
df = pd.read_sql(query, engine)
With the expression API, which based on the comments is what this question is asking for, you can use the in_ method of the relevant column.
To query
SELECT id, name FROM user WHERE id in (123,456)
use
myList = [123, 456]
select = sqlalchemy.sql.select([user_table.c.id, user_table.c.name], user_table.c.id.in_(myList))
result = conn.execute(select)
for row in result:
process(row)
This assumes that user_table and conn have been defined appropriately.
Or maybe use .in_(list), similar to what #Carl has already suggested
as
stmt = select(
id,
name
).where(
id.in_(idlist),
)
Complete code assuming you have the data model in User class:
def fetch_name_ids(engine, idlist):
# create an empty dataframe
df = pd.DataFrame()
try:
# create session with engine
session = Session(engine, future=True)
stmt = select(
User.id,
User.name
).where(
User.id.in_(idlist),
)
data = session.execute(stmt)
df = pd.DataFrame(data.all())
if len(df) > 0:
df.columns = data.keys()
else:
columns = data.keys()
df = pd.DataFrame(columns=columns)
except SQLAlchemyError as e:
error = str(e.__dict__['orig'])
session.rollback()
raise error
else:
session.commit()
finally:
engine.dispose()
session.close()
return df
This question posted a solution to the select query, unfortunately, it is not working for the update query. Using this solution, it would help even in the select conditions also.
Update Query Solution:
id_list = [1, 2, 3, 4, 5] # in most cases we have an integer list or set
query = 'update myTable set content = 1 WHERE id IN {id_list}'.format(id_list=tuple(id_list))
conn.execute(query)
Note: Use a tuple instead of a list.
Just an addition to the answers above.
If you want to execute a SQL with an "IN" statement you could do this:
ids_list = [1,2,3]
query = "SELECT id, name FROM user WHERE id IN %s"
args = [(ids_list,)] # Don't forget the "comma", to force the tuple
conn.execute(query, args)
Two points:
There is no need for Parenthesis for the IN statement(like "... IN(%s) "), just put "...IN %s"
Force the list of your ids to be one element of a tuple. Don't forget the " , " : (ids_list,)
EDIT
Watch out that if the length of list is one or zero this will raise an error!
I'm working with a database that has a relationship that looks like:
class Source(Model):
id = Identifier()
class SourceA(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceB(Source):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
class SourceC(Source, ServerOptions):
source_id = ForeignKey('source.id', nullable=False, primary_key=True)
name = Text(nullable=False)
What I want to do is join all tables Source, SourceA, SourceB, SourceC and then order_by name.
Sound easy to me but I've been banging my head on this for while now and my heads starting to hurt. Also I'm not very familiar with SQL or sqlalchemy so there's been a lot of browsing the docs but to no avail. Maybe I'm just not seeing it. This seems to be close albeit related to a newer version than what I have available (see versions below).
I feel close not that that means anything. Here's my latest attempt which seems good up until the order_by call.
Sources = [SourceA, SourceB, SourceC]
# list of join on Source
joins = [session.query(Source).join(source) for source in Sources]
# union the list of joins
query = joins.pop(0).union_all(*joins)
query seems right at this point as far as I can tell i.e. query.all() works. So now I try to apply order_by which doesn't throw an error until .all is called.
Attempt 1: I just use the attribute I want
query.order_by('name').all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) column "name" does not exist
Attempt 2: I just use the defined column attribute I want
query.order_by(SourceA.name).all()
# throws sqlalchemy.exc.ProgrammingError: (ProgrammingError) missing FROM-clause entry for table "SourceA"
Is it obvious? What am I missing? Thanks!
versions:
sqlalchemy.version = '0.8.1'
(PostgreSQL) 9.1.3
EDIT
I'm dealing with a framework that wants a handle to a query object. I have a bare query that appears to accomplish what I want but I would still need to wrap it in a query object. Not sure if that's possible. Googling ...
select = """
select s.*, a.name from Source d inner join SourceA a on s.id = a.Source_id
union
select s.*, b.name from Source d inner join SourceB b on s.id = b.Source_id
union
select s.*, c.name from Source d inner join SourceC c on s.id = c.Source_id
ORDER BY "name";
"""
selectText = text(select)
result = session.execute(selectText)
# how to put result into a query. maybe Query(selectText)? googling...
result.fetchall():
Assuming that coalesce function is good enough, below examples should point you in the direction. One option automatically creates a list of children, while the other is explicit.
This is not the query you specified in your edit, but you are able to sort (your original request):
def test_explicit():
# specify all children tables to be queried
Sources = [SourceA, SourceB, SourceC]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
def test_implicit():
# get all children tables in the query
from sqlalchemy.orm import class_mapper
_map = class_mapper(Source)
Sources = [_smap.class_
for _smap in _map.self_and_descendants
if _smap != _map # #note: exclude base class, it has no `name`
]
AllSources = with_polymorphic(Source, Sources)
name_col = func.coalesce(*(_s.name for _s in Sources)).label("name")
query = session.query(AllSources).order_by(name_col)
for x in query:
print(x)
Your first attempt sounds like it isn't working because there is no name in Source, which is the root table of the query. In addition, there will be multiple name columns after your joins, so you will need to be more specific. Try
query.order_by('SourceA.name').all()
As for your second attempt, what is ServerA?
query.order_by(ServerA.name).all()
Probably a typo, but not sure if it's for SO or your code. Try:
query.order_by(SourceA.name).all()