number_tuple = (1,4,6,3)
sensex_quaterly_df = psql.sqldf("SELECT * FROM sensex_df
WHERE 'Num' IN ('number_tuple')")
"HERE number_tuple has the values that I want to retrieve from sensex_df database"
Because pandasql allows you to run SQL on data frames, you can build SQL with concatenated values of tuple into comma-separated string using string.join().
number_tuple = (1,4,6,3)
in_values = ", ".join(str(i) for i in number_tuple)
sql = f"SELECT * FROM sensex_df WHERE Num IN ({in_values})"
sensex_quaterly_df = psql.sqldf(sql)
However, concatenated SQL strings is not recommended if you use an actual relational database as backend. If so, use parameterization where you develop a prepared SQL statement with placeholders like %s of ? and in subsequent step binding values. Below demonstrates with pandas read_sql:
number_tuple = (1,4,6,3)
in_values = ", ".join('?' for i in number_tuple)
sql = f"SELECT * FROM sensex_df WHERE Num IN ({in_values})"
sensex_quaterly_df = pd.read_sql(sql, conn, params=number_tuple)
Related
I am trying to use a registered virtual table as a table in a SQL statement using a connection to another database. I can't just turn the column into a string and use that, I need the table/dataframe itself to work in the statement and join with the other tables in the SQL statment. I'm trying this out on an Access database to start. This is what I have so far:
import pyodbc
import pandas as pd
import duckdb
conn = duckdb.connect()
starterset = pd.read_excel (r'e:\Data Analytics\Python_Projects\Applications\DB_Test.xlsx')
conn.register("test_starter", starterset)
IDS = conn.execute("SELECT * FROM test_starter WHERE ProjectID > 1").fetchdf()
StartDate = '1/1/2015'
EndDate = '12/1/2021'
# establish the connection
connt = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=E:\Databases\Offline.accdb;')
cursor = conn.cursor()
# Run the query
query = ("Select ProjectID, Revenue, ClosedDate from Projects INNER JOIN " + IDS + " Z on Z.ProjectID = Projects.ProjectID "
"where ClosedDate between #" + StartDate + "# and #" + EndDate + "# AND Revenue > 0 order by ClosedDate")
sfd
df = pd.read_sql(query, connt)
df.to_excel(r'TEMP.xlsx', index=False)
os.system("start EXCEL.EXE TEMP.xlsx")
# Close the connection
cursor.close()
connt.close()
I have a list of IDs in the excel sheet that I'm trying to use as a filter from the database query. Ultimately, this will form into several criteria from the same table: dates, revenue, and IDs among others.
Honestly, I'm surprised I'm having so much trouble doing this. In SAS, with PROC SQL, it's so easy, but I can't get a dataframe to interface within the SQL parameters how I need it to. Am I making a syntax mistake?
Most common error so far is "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U55'), dtype('<U55')) -> dtype('<U55')", but the types are the same.
It looks like you are pushing the contents of a DataFrame into an Access database query. I don't think there is a native way to do this in Pandas. The technique I use is database vendor specific, but I just build up a text string as either a CTE/WITH Clause or a temporary table.
Ex:
"""WITH my_data as (
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
...
)
[Your original query here]
"""
I have a sqlite database named StudentDB which has 3 columns Roll number, Name, Marks. Now I want to fetch only the columns that user selects in the IDE. User can select one column or two or all the three. How can I alter the query accordingly using Python?
I tried:
import sqlite3
sel={"Roll Number":12}
query = 'select * from StudentDB Where({seq})'.format(seq=','.join(['?']*len(sel))),[i for k,i in sel.items()]
con = sqlite3.connect(database)
cur = con.cursor()
cur.execute(query)
all_data = cur.fetchall()
all_data
I am getting:
operation parameter must be str
You should control the text of the query. The where clause shall allways be in the form WHERE colname=value [AND colname2=...] or (better) WHERE colname=? [AND ...] if you want to build a parameterized query.
So you want:
query = 'select * from StudentDB Where ' + ' AND '.join('"{}"=?'.format(col)
for col in sel.keys())
...
cur.execute(query, tuple(sel.values()))
In your code, the query is now a tuple instead of str and that is why the error.
I assume you want to execute a query like below -
select * from StudentDB Where "Roll number"=?
Then you can change the sql query like this (assuming you want and and not or) -
query = "select * from StudentDB Where {seq}".format(seq=" and ".join('"{}"=?'.format(k) for k in sel.keys()))
and execute the query like -
cur.execute(query, tuple(sel.values()))
Please make sure in your code the provided database is defined and contains the database name and studentDB is indeed the table name and not database name.
I have a query I use except.I want to send the table path in format when running the select query.
query_2="""select *
from {}.{}
where date(etl_date) = current_date
except select *
from {}_test.{}
where date(etl_date)=current_date"""
.format(liste[0],liste[1])
But naturally I get an error like this.
IndexError: tuple index out of range
How else can I use the format function here? Thanks...
Do not use simple format for SQL queries; use sql.Identifier for tables, fields and use the second argument of the execute method to pass variables (if needed).
from psycopg2.sql import Identifier, SQL
connection = psycopg2.connect("...")
cursor = connection.cursor()
suffix = "_test"
identifiers = [Identifier("some_schema"), Identifier("some_table"), Identifier("other_schema%s" % suffix), Identifier("other_table")]
query_2 = SQL("""select * from {}.{} where date(etl_date) = current_date
except select * from {}.{} where date(etl_date)=current_date""").format(*identifiers)
print(query_2.as_string(cursor)) # if you want to see the final query
cursor.execute(query_2)
Output
select * from "some_schema"."some_table" where date(etl_date) = current_date
except select * from "other_schema_test"."other_table" where date(etl_date)=current_date
This assumes you have multiple schemas in the same database as you can't easily do cross database queries in PostgreSQL.
I have a list of IDs
L1=['A1','A14','B43']
I am trying to use a SQL script to extract information from a table where the ID is in the above list.
sqlquery= "select * from table where ID in " + L1
cur.execute(sqlquery)
I've connected to vertica using vertica_python and sqlalchemy_vertica. But I'm not sure how to incorporate my variable (the list L1) into the sql query.
Updated Code:
data = ['A1', 'A14', 'B43', ...]
placeholders = ','.join('?' * len(data)) # this gives you e.g. '?,?,?'
sqlquery = 'SELECT * FROM table WHERE id IN (%s)' % placeholders
cur.execute(sqlquery, tuple(data))
The docs on https://github.com/vertica/vertica-python shows that the Vertica DBAPI implementation uses ? for positional placeholders, so you can use a parametrized query.
Unfortunately lists cannot be passed nicely and need once parameter per element, so you need to generate this part dynamically:
data = ['A1', 'A14', 'B43', ...]
placeholders = ','.join('?' * len(data)) # this gives you e.g. '?,?,?'
sqlquery = 'SELECT * FROM table WHERE id IN (%s)' % placeholders
cur.execute(sqlquery, data)
But you still keep data and SQL separate that way, so there's no risk of SQL injection!
I'm trying to pass the same parameters to an oracle query in two separate places in the SQL code.
My code works if I hard code the criteria for table2 like this:
# define parameters
years = ['2018','2019']
placeholder= ':d'
placeholders= ', '.join(placeholder for unused in years)
placeholders
# create cursor
cursor = connection.cursor()
# query
qry = """
select * from table1
INNER
JOIN table2
ON table1_id = table2_id
where table1_year in (%s)
and table2_year in ['2018','2019'] --here's where I say I'm hard coding criteria
""" % placeholders
data = cursor.execute(qry, years)
df = pd.DataFrame(data.fetchall(), columns = [column[0] for column in cursor.description])
# close database connection
connection.close()
If I try to use the parameter for table2 like this:
qry = """
select * from table1
INNER
JOIN table2
ON table1_id = table2_id
where table1_year in (%s)
and table2_year in (%s) --part of code I'm having issues with
""" % placeholders
I get the following error:
TypeError: not enough arguments for format string
I can't simply rewrite the SQL because I frequently have to use someone else's code and it wouldn't be feasible to rewrite all of it.
If you want to fill multiple placeholders, you have to supply the same number of parameters.
"one meal: %s" % "sandwich" # works
"two meals: %s, %s" % "sandwich" # not working
"two meals: %s, %s" % ("sandwich", "sandwich") # works
NOTE: It is a bad/dangerous thing to use string formatting for the assembly of SQL queries (lookup "SQL Injection"). In your case it is fine, but in general you should use parameterized queries, especially when dealing with input from untrusted sources like user input. You don't want a user to input "2018; DROP TABLE table1;".