I am trying to join 2 tables in Python. (Using Windows, jupyter notebook.)
Table 1 is an excel file read in using pandas.
TABLE_1= pd.read_excel('my_file.xlsx')
Table 2 is a large table in oracle database that I can connect to using pyodbc. I can read in the entire table successfully using pyodbc like this, but it takes a very long time to run.
sql = "SELECT * FROM ORACLE.table_2"
cnxn = odbc.connect(##########)
TABLE_2 = pd.read_sql(sql, cnxn)
So I would like to do an inner join as part of the pyodbc import, so that it runs faster and I only pull in the needed records. Table 1 and Table 2 share the same unique identifier/primary key.
sql = "SELECT * FROM ORACLE.TABLE_1 INNER JOIN TABLE_2 ON ORACLE.TABLE1.ID=TABLE_2.ID"
cnxn = odbc.connect(##########)
TABLE_1_2_JOINED = pd.read_sql(sql, cnxn)
But this doesn't work. I get this error:
DatabaseError: Execution failed on sql 'SELECT * FROM ORACLE.TABLE_1
INNER JOIN TABLE_2 ON ORACLE.TABLE1.ID=TABLE_2.ID': ('42S02', '[42S02]
[Oracle][ODBC][Ora]ORA-00942: table or view does not exist\n (942)
(SQLExecDirectW)')
Is there another way I can do this? It seems very inefficient to have to import entire table w/millions of records when I only need to join a few hundred. Thank you.
Something like this might work.
First do:
MyIds = set(table_1['id'])
Then:
SQL1 = "CREATE TEMPORARY TABLE MyIds ( ID int );"
Now insert your ids:
SQL2 = "INSERT INTO MyIds.ID %d VALUES %s"
for element in list(MyIds):
cursor.execute(SQL2, element)
And lastly
SQL3 = "SELECT * FROM ORACLE.TABLE_1 WHERE ORACLE.TABLE1.ID IN (SELECT ID FROM MyIds)"
I have used MySQL not oracle and a different connector to you but the principles are probably the same. Of course there's a bit more code with the python-sql connections etc. Hope it works, otherwise try to make a regular table rather than a temporary one.
Related
I have a MS access DB and want to work with it from Python. The aim is to have a table, "units", which includes everything and in order to achieve that I would like to insert information to a table "units_temp" and then join these to tables.
The code is not complete yet but at the moment I am struggling with populating a random ID (purpose is only to not be forced to change ID in the code manually every time I want to try the code before every functionality is in place).
import pyodbc
from random import randint
random_id = randint(0,10000)
conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:\Users\aaa.bbb\Documents\Python Scripts\DB\Test_db.accdb;')
cursor = conn.cursor()
mySql_insert_query='''INSERT INTO Units (client_id,client_first_name,client_last_name,units_ordered,product_price_per_unit,product_name) VALUES (%s,%s,%s,%s,%s,%s)'''
cursor.execute('''
INSERT INTO Units (client_id,client_first_name,client_last_name,units_ordered,product_price_per_unit,product_name)
VALUES ('124','aa','bb','2','500','phones')
''')
recordTuple = (random_id,'aa','bb','99','900','random')
cursor.execute(mySql_insert_query,recordTuple)
JoinQuery = "SELECT Units.client_id from Units INNER JOIN Units_temp on (Units.client_id=Units_temp.Client_id)"
cursor.execute(JoinQuery)
conn.commit()
I get the following error
ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Microsoft Access Driver] Syntax error in query expression '%s'. (-3100) (SQLPrepare)")
pyodbc uses ? as the parameter placeholder, not %s, so your query string should be
mySql_insert_query='''INSERT INTO Units (client_id,client_first_name, … ) VALUES (?,?, … )'''
then you execute it with
cursor.execute(mySql_insert_query,recordTuple)
as before.
It's the line
mySql_insert_query='''INSERT INTO Units (client_id,client_first_name,client_last_name,units_ordered,product_price_per_unit,product_name) VALUES (%s,%s,%s,%s,%s,%s)'''
You have no values for the %s parameter. Replace these with values as you do in the next line.
Also, I seriously doubt, that these fields should be text and not numbers:
units_ordered,product_price_per_unit
I have a MySQL database of some measurements taken by a device and I'm looking for a way to retrieve specific columns from it, where the user chooses what columns he needs from a python interface/front end. All the solutions I've seen till now either retrieves all columns or had the columns specified in the code itself.
Is there a possible way I could do this?
Thanks!
Your query can look something like this :
select
table_name, table_schema, column_name
from information_schema.columns
where table_schema in ('schema1', 'schema2')
and column_name like '%column_name%'
order by table_name;
you can definitely pass the column_name as a parameter(fetch it from python code) run it dynamically.
import MySQLdb
#### #GET COLUMN NAME FROM USER PRESENT WITH IN TABLE
column = input()
#### #Open database connection
db = MySQLdb.connect("host","username","password","DB_name" )
#### #prepare a cursor object using cursor() method
cursor = db.cursor()
#### #execute SQL query using execute() method.
cursor.execute("SELECT * FROM TABLE")
# Fetch a all rows using fetchall() method.
result_set = cursor.fetchall()
for row in result_set:
print(row[column])
# disconnect from server
db.close()
OR you can use .execute() to run a specific query with column name.
I actually use Cx_Oracle library in Python to work with my database Oracle.
import cx_Oracle as Cx
# Parameters for server connexion
dsn_tns = Cx.makedsn(_ip, _port, service_name=_service_name)
# Connexion with Oracle Database
db = Cx.connect(_user, _password, dsn_tns)
# Obtain a cursor for make SQL query
cursor = db.cursor()
One of my query write in an INSERT of a Python dataframe into my Oracle target table among some conditions.
query = INSERT INTO ORA_TABLE(ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 != 'NF' AND :1 NOT IN (SELECT ID1 FROM ORA_TABLE))
OR (:1 = 'NF' AND :2 NOT IN (SELECT ID2 FROM ORA_TABLE))
The goal of this query is to write only rows who respect conditions into the WHERE.
Actually ,this query works well when my Oracle target table have few rows. But, if my target Oracle table have more than 100 000 rows, it's very slow because I read through all the table in WHERE condition.
Is there a way to improve performance of this query with join or something else ?
End of code :
# SQL query incoming
cursor.prepare(query)
# Launch query with Python dataset
cursor.executemany(None, _py_table.values.tolist())
# Commit changes into Oracle database
db.commit()
# Close the cursor
cursor.close()
# Close the server connexion
db.close()
Here is a possible solution that could help: The sql that you have has an OR condition and only one part of this condition will be true for a given value. So I would divide it in two parts by checking the following in the code and constructing two inserts instead of one and at any point of time, only one would execute:
IF :1 != 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 NOT IN (SELECT ID1
FROM ORA_TABLE));
and IF :1 = 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:2 NOT IN (SELECT ID2
FROM ORA_TABLE));
So you check in code what is the value of :1 and depending on that use the two simplified insert. Please check if this is functionally the same as original query and verify if it improves the response time.
Assuming Pandas, consider exporting your data as a table to be used as staging for final migration where you run your subquery only once and not for every row of data set. In Pandas, you would need to interface with sqlalchemy to run the to_sql export operation. Note: this assumes your connected user has such DROP TABLE and CREATE TABLE privileges.
Also, consider using EXISTS subquery to combine both IN subqueries. Below subquery attempts to run opposite of your logic for exclusion.
import sqlalchemy
...
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:password#dsn")
# EXPORT DATA -ALWAYS REPLACING
pandas_df.to_sql('myTempTable', con=engine, if_exists='replace')
# RUN TRANSACTION
with engine.begin() as cn:
sql = """INSERT INTO ORA_TABLE (ID1, ID2)
SELECT t.ID1, t.ID2
FROM myTempTable t
WHERE EXISTS
(
SELECT 1 FROM ORA_TABLE sub
WHERE (t.ID1 != 'NF' AND t.ID1 = sub.ID1)
OR (t.ID1 = 'NF' AND t.ID2 = sub.ID2)
)
"""
cn.execute(sql)
I usually use R to do SQL queries by using ODBC to link to a SQL database. The code generally looks like this:
library(RODBC)
ch<-odbcConnect('B1P HANA',uid='****',pwd='****')
myOffice <- c(0)
office_clause = ""
if (myOffice != 0) {
office_clause = paste(
'AND "_all"."/BIC/ZSALE_OFF" IN (',paste(myOffice, collapse=", "),')'
)
}
a <- sqlQuery(ch,paste(' SELECT "_all"."CALDAY" AS "ReturnDate FROM "SAPB1P"."/BIC/AZ_RT_A212" "_all"
WHERE "_all"."CALDAY"=20180101
',office_clause,'
GROUP BY "_all"."CALDAY
'))
The workflow is:
odbcConnect is to link R and SQL using ODBC.
myOffice is an array for achieving data from R. Those data will be used as filter conditions in WHERE clause in SQL.
a stores the query result from SQL database.
So, how to do all of these in Python, i.e., do SQL queries in Python by using ODBC to link SQL database and Python? I am new to Python. All I know is like:
import pyodbc
conn = pyodbc.connect(r'DSN=B1P HANA;UID=****;PWD=****')
Then I do not know how to continue. And I cannot find an overall example online. Could anyone help by providing a comprehensive example? From link SQL database in Python unitl retrieving the result?
Execute SQL from python
Instantiate a Cursor and use the execute method of the Cursor class to execute any SQL statement.
cursor = cnxn.cursor()
Select
You can use fetchall, fetchone, and fetchmany to retrieve rows returned from SELECT statements:
import pyodbc
cursor = cnxn.cursor()
cnxn = pyodbc.connect('DSN=myDSN;UID=***;PWD=***')
cursor.execute("SELECT Col1, Col2 FROM MyTable WHERE Col1= 'SomeValue'")
rows = cursor.fetchall()
for row in rows:
print(row.Col1, row.Col2 )
You can provide parameterized queries in a sequence or in the argument list:
cursor.execute("SELECT Col1, Col2, Col3, ... FROM MyTable WHERE Col1 = ?", 'SomeValue',1)
Insert
INSERT commands also use the execute method; however, you must subsequently call the commit method after an insert or you will lose your changes:
cursor.execute("INSERT INTO MyTable (Col1) VALUES ('SomeValue')")
cnxn.commit()
Update and Delete
As with an insert, you must also call commit after calling execute for an update or delete:
cursor.execute("UPDATE MyTable SET Col1= 'SomeValue'")
cnxn.commit()
Metadata Discovery
You can use the getinfo method to retrieve data such as information about the data source and the capabilities of the driver. The getinfo method passes through input to the ODBC SQLGetInfo method.
cnxn.getinfo(pyodbc.SQL_DATA_SOURCE_NAME)
I'm trying to generate & execute SQL statements via pyodbc. I expect multiple SQL statements, all of which start with the same SELECT & FROM but have a different value in the WHERE. The value in my WHERE clause is derived from looping through a table - each distinct value the SQL script finds in the table, I need Python to generate another SQL statement with this value as the WHERE clause.
I'm almost there with this, I'm just struggling to get pyodbc to put my query strings in formats that SQL likes. My code so far:
import pyodbc
cn = pyodbc.connect(connection info)
cursor = cn.cursor()
result = cursor.execute('SELECT distinct searchterm_name FROM table1')
for row in result:
sql = str("SELECT * from table2 WHERE table1.searchterm_name = {c}".format(c=row)),
#print sql
This code generates an output like this, where "name here" is based on the value found in table1.
('SELECT * from ifb_person WHERE searchterm_name = (u\'name here\', )',)
I just need to remove all the crap surrounding the query & where clause so it looks like this. Then I can pass it into another cursor.execute()
SELECT * from ifb_person WHERE searchterm_name = 'name here'
EDIT
for row in result:
cursor.execute("insert into test (searchterm_name) SELECT searchterm_name FROM ifb_person WHERE searchterm_name = ?",
(row[0],))
This query fails with the error pyodbc.ProgrammingError: No results. Previous SQL was not a query.
Basically what I am trying to do is get Python to generate a fresh SQL statement for every result it finds in table1. The second query is running searches against the table ifb_person and inserting the results to a table "test". I want to run separate SQL statements for every result found in table1
pyodbc allows us to iterate over a Cursor object to return the rows, during which time the Cursor object is still "in use", so we cannot use the same Cursor object to perform other operations. For example, this code will fail:
crsr = cnxn.cursor()
result = crsr.execute("SELECT ...") # result is just a reference to the crsr object
for row in result:
# we are actually iterating over the crsr object
crsr.execute("INSERT ...") # this clobbers the previous crsr object ...
# ... so the next iteration of the for loop fails with " Previous SQL was not a query."
We can work around that by using fetchall() to retrieve all the rows into result ...
result = crsr.execute("SELECT ...").fetchall()
# result is now a list of pyodbc.Row objects and the crsr object is no longer "in use"
... or use a different Cursor object in the loop
crsr_select = cnxn.cursor()
crsr_insert = cnxn.cursor()
crsr_select.execute("SELECT ...")
for row in crsr_select:
crsr_insert.execute("INSERT ...")