I have a query I use except.I want to send the table path in format when running the select query.
query_2="""select *
from {}.{}
where date(etl_date) = current_date
except select *
from {}_test.{}
where date(etl_date)=current_date"""
.format(liste[0],liste[1])
But naturally I get an error like this.
IndexError: tuple index out of range
How else can I use the format function here? Thanks...
Do not use simple format for SQL queries; use sql.Identifier for tables, fields and use the second argument of the execute method to pass variables (if needed).
from psycopg2.sql import Identifier, SQL
connection = psycopg2.connect("...")
cursor = connection.cursor()
suffix = "_test"
identifiers = [Identifier("some_schema"), Identifier("some_table"), Identifier("other_schema%s" % suffix), Identifier("other_table")]
query_2 = SQL("""select * from {}.{} where date(etl_date) = current_date
except select * from {}.{} where date(etl_date)=current_date""").format(*identifiers)
print(query_2.as_string(cursor)) # if you want to see the final query
cursor.execute(query_2)
Output
select * from "some_schema"."some_table" where date(etl_date) = current_date
except select * from "other_schema_test"."other_table" where date(etl_date)=current_date
This assumes you have multiple schemas in the same database as you can't easily do cross database queries in PostgreSQL.
Related
I want to use a variable in the Select Query but it give me a fail. Someone can guide me how to insert this variable in my SQL query ?
project = test
# Our SQL Query
Query_1 = """
SELECT * FROM project.dataplex_dq.dq-results`
"""
# labelling our query job
query_job_1 = client.query(Query_1)
# results as a dataframe
Table = query_job_1.result().to_dataframe()
I'm trying to run SQL statements through Python on a list.
By passing in a list, in this case date. Since i want to run multiple SELECT SQL queries and return them.
I've tested this by passing in integers, however when trying to pass in a date I am getting ORA-01036 error. Illegal variable name/number. I'm using an Oracle DB.
cursor = connection.cursor()
date = ["'01-DEC-21'", "'02-DEC-21'"]
sql = "select * from table1 where datestamp = :date"
for item in date:
cursor.execute(sql,id=item)
res=cursor.fetchall()
print(res)
Any suggestions to make this run?
You can't name a bind variable date, it's an illegal name. Also your named variable in cursor.execute should match the bind variable name. Try something like:
sql = "select * from table1 where datestamp = :date_input"
for item in date:
cursor.execute(sql,date_input=item)
res=cursor.fetchall()
print(res)
Some recommendation and warnings to your approach:
you should not depend on your default NLS date setting, while binding a String (e.g. "'01-DEC-21'") to a DATE column. (You probably need also remone one of the quotes).
You should ommit to fetch data in a loop if you can fetch them in one query (using an IN list)
use prepared statement
Example
date = ['01-DEC-21', '02-DEC-21']
This generates the query that uses bind variables for your input list
in_list = ','.join([f" TO_DATE(:d{ind},'DD-MON-RR','NLS_DATE_LANGUAGE = American')" for ind, d in enumerate(date)])
sql_query = "select * from table1 where datestamp in ( " + in_list + " )"
The sql_query generate is
select * from table1 where datestamp in
( TO_DATE(:d0,'DD-MON-RR','NLS_DATE_LANGUAGE = American'), TO_DATE(:d1,'DD-MON-RR','NLS_DATE_LANGUAGE = American') )
Note that the INlist contains one bind variable for each member of your input list.
Note also the usage of to_date with explicite mask and fixing the language to avoid problems with interpretation of the month abbreviation. (e.g. ORA-01843: not a valid month)
Now you can use the query to fetch the data in one pass
cur.prepare(sql_query)
cur.execute(None, date)
res = cur.fetchall()
I am trying to use a registered virtual table as a table in a SQL statement using a connection to another database. I can't just turn the column into a string and use that, I need the table/dataframe itself to work in the statement and join with the other tables in the SQL statment. I'm trying this out on an Access database to start. This is what I have so far:
import pyodbc
import pandas as pd
import duckdb
conn = duckdb.connect()
starterset = pd.read_excel (r'e:\Data Analytics\Python_Projects\Applications\DB_Test.xlsx')
conn.register("test_starter", starterset)
IDS = conn.execute("SELECT * FROM test_starter WHERE ProjectID > 1").fetchdf()
StartDate = '1/1/2015'
EndDate = '12/1/2021'
# establish the connection
connt = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=E:\Databases\Offline.accdb;')
cursor = conn.cursor()
# Run the query
query = ("Select ProjectID, Revenue, ClosedDate from Projects INNER JOIN " + IDS + " Z on Z.ProjectID = Projects.ProjectID "
"where ClosedDate between #" + StartDate + "# and #" + EndDate + "# AND Revenue > 0 order by ClosedDate")
sfd
df = pd.read_sql(query, connt)
df.to_excel(r'TEMP.xlsx', index=False)
os.system("start EXCEL.EXE TEMP.xlsx")
# Close the connection
cursor.close()
connt.close()
I have a list of IDs in the excel sheet that I'm trying to use as a filter from the database query. Ultimately, this will form into several criteria from the same table: dates, revenue, and IDs among others.
Honestly, I'm surprised I'm having so much trouble doing this. In SAS, with PROC SQL, it's so easy, but I can't get a dataframe to interface within the SQL parameters how I need it to. Am I making a syntax mistake?
Most common error so far is "UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U55'), dtype('<U55')) -> dtype('<U55')", but the types are the same.
It looks like you are pushing the contents of a DataFrame into an Access database query. I don't think there is a native way to do this in Pandas. The technique I use is database vendor specific, but I just build up a text string as either a CTE/WITH Clause or a temporary table.
Ex:
"""WITH my_data as (
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
SELECT 'raw_text_within_df' as df_column1, 'raw_text_within_df' as df_column2
UNION ALL
...
)
[Your original query here]
"""
I have a sqlite database named StudentDB which has 3 columns Roll number, Name, Marks. Now I want to fetch only the columns that user selects in the IDE. User can select one column or two or all the three. How can I alter the query accordingly using Python?
I tried:
import sqlite3
sel={"Roll Number":12}
query = 'select * from StudentDB Where({seq})'.format(seq=','.join(['?']*len(sel))),[i for k,i in sel.items()]
con = sqlite3.connect(database)
cur = con.cursor()
cur.execute(query)
all_data = cur.fetchall()
all_data
I am getting:
operation parameter must be str
You should control the text of the query. The where clause shall allways be in the form WHERE colname=value [AND colname2=...] or (better) WHERE colname=? [AND ...] if you want to build a parameterized query.
So you want:
query = 'select * from StudentDB Where ' + ' AND '.join('"{}"=?'.format(col)
for col in sel.keys())
...
cur.execute(query, tuple(sel.values()))
In your code, the query is now a tuple instead of str and that is why the error.
I assume you want to execute a query like below -
select * from StudentDB Where "Roll number"=?
Then you can change the sql query like this (assuming you want and and not or) -
query = "select * from StudentDB Where {seq}".format(seq=" and ".join('"{}"=?'.format(k) for k in sel.keys()))
and execute the query like -
cur.execute(query, tuple(sel.values()))
Please make sure in your code the provided database is defined and contains the database name and studentDB is indeed the table name and not database name.
number_tuple = (1,4,6,3)
sensex_quaterly_df = psql.sqldf("SELECT * FROM sensex_df
WHERE 'Num' IN ('number_tuple')")
"HERE number_tuple has the values that I want to retrieve from sensex_df database"
Because pandasql allows you to run SQL on data frames, you can build SQL with concatenated values of tuple into comma-separated string using string.join().
number_tuple = (1,4,6,3)
in_values = ", ".join(str(i) for i in number_tuple)
sql = f"SELECT * FROM sensex_df WHERE Num IN ({in_values})"
sensex_quaterly_df = psql.sqldf(sql)
However, concatenated SQL strings is not recommended if you use an actual relational database as backend. If so, use parameterization where you develop a prepared SQL statement with placeholders like %s of ? and in subsequent step binding values. Below demonstrates with pandas read_sql:
number_tuple = (1,4,6,3)
in_values = ", ".join('?' for i in number_tuple)
sql = f"SELECT * FROM sensex_df WHERE Num IN ({in_values})"
sensex_quaterly_df = pd.read_sql(sql, conn, params=number_tuple)