Improve performance SQL query in Python Cx_Oracle

Improve performance SQL query in Python Cx_Oracle - python

I actually use Cx_Oracle library in Python to work with my database Oracle.
import cx_Oracle as Cx
# Parameters for server connexion
dsn_tns = Cx.makedsn(_ip, _port, service_name=_service_name)
# Connexion with Oracle Database
db = Cx.connect(_user, _password, dsn_tns)
# Obtain a cursor for make SQL query
cursor = db.cursor()
One of my query write in an INSERT of a Python dataframe into my Oracle target table among some conditions.
query = INSERT INTO ORA_TABLE(ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 != 'NF' AND :1 NOT IN (SELECT ID1 FROM ORA_TABLE))
OR (:1 = 'NF' AND :2 NOT IN (SELECT ID2 FROM ORA_TABLE))
The goal of this query is to write only rows who respect conditions into the WHERE.
Actually ,this query works well when my Oracle target table have few rows. But, if my target Oracle table have more than 100 000 rows, it's very slow because I read through all the table in WHERE condition.
Is there a way to improve performance of this query with join or something else ?
End of code :
# SQL query incoming
cursor.prepare(query)
# Launch query with Python dataset
cursor.executemany(None, _py_table.values.tolist())
# Commit changes into Oracle database
db.commit()
# Close the cursor
cursor.close()
# Close the server connexion
db.close()

Here is a possible solution that could help: The sql that you have has an OR condition and only one part of this condition will be true for a given value. So I would divide it in two parts by checking the following in the code and constructing two inserts instead of one and at any point of time, only one would execute:
IF :1 != 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 NOT IN (SELECT ID1
FROM ORA_TABLE));
and IF :1 = 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:2 NOT IN (SELECT ID2
FROM ORA_TABLE));
So you check in code what is the value of :1 and depending on that use the two simplified insert. Please check if this is functionally the same as original query and verify if it improves the response time.

Assuming Pandas, consider exporting your data as a table to be used as staging for final migration where you run your subquery only once and not for every row of data set. In Pandas, you would need to interface with sqlalchemy to run the to_sql export operation. Note: this assumes your connected user has such DROP TABLE and CREATE TABLE privileges.
Also, consider using EXISTS subquery to combine both IN subqueries. Below subquery attempts to run opposite of your logic for exclusion.
import sqlalchemy
...
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:password#dsn")
# EXPORT DATA -ALWAYS REPLACING
pandas_df.to_sql('myTempTable', con=engine, if_exists='replace')
# RUN TRANSACTION
with engine.begin() as cn:
sql = """INSERT INTO ORA_TABLE (ID1, ID2)
SELECT t.ID1, t.ID2
FROM myTempTable t
WHERE EXISTS
(
SELECT 1 FROM ORA_TABLE sub
WHERE (t.ID1 != 'NF' AND t.ID1 = sub.ID1)
OR (t.ID1 = 'NF' AND t.ID2 = sub.ID2)
)
"""
cn.execute(sql)

Related

Is there a way to retrieve a column specified by the user from a MySQL database?

I have a MySQL database of some measurements taken by a device and I'm looking for a way to retrieve specific columns from it, where the user chooses what columns he needs from a python interface/front end. All the solutions I've seen till now either retrieves all columns or had the columns specified in the code itself.
Is there a possible way I could do this?
Thanks!

Your query can look something like this :
select
table_name, table_schema, column_name
from information_schema.columns
where table_schema in ('schema1', 'schema2')
and column_name like '%column_name%'
order by table_name;
you can definitely pass the column_name as a parameter(fetch it from python code) run it dynamically.

import MySQLdb
#### #GET COLUMN NAME FROM USER PRESENT WITH IN TABLE
column = input()
#### #Open database connection
db = MySQLdb.connect("host","username","password","DB_name" )
#### #prepare a cursor object using cursor() method
cursor = db.cursor()
#### #execute SQL query using execute() method.
cursor.execute("SELECT * FROM TABLE")
# Fetch a all rows using fetchall() method.
result_set = cursor.fetchall()
for row in result_set:
print(row[column])
# disconnect from server
db.close()
OR you can use .execute() to run a specific query with column name.

How to use ODBC to link SQL database and do SQL queries in Python

I usually use R to do SQL queries by using ODBC to link to a SQL database. The code generally looks like this:
library(RODBC)
ch<-odbcConnect('B1P HANA',uid='****',pwd='****')
myOffice <- c(0)
office_clause = ""
if (myOffice != 0) {
office_clause = paste(
'AND "_all"."/BIC/ZSALE_OFF" IN (',paste(myOffice, collapse=", "),')'
)
}
a <- sqlQuery(ch,paste(' SELECT "_all"."CALDAY" AS "ReturnDate FROM "SAPB1P"."/BIC/AZ_RT_A212" "_all"
WHERE "_all"."CALDAY"=20180101
',office_clause,'
GROUP BY "_all"."CALDAY
'))
The workflow is:
odbcConnect is to link R and SQL using ODBC.
myOffice is an array for achieving data from R. Those data will be used as filter conditions in WHERE clause in SQL.
a stores the query result from SQL database.
So, how to do all of these in Python, i.e., do SQL queries in Python by using ODBC to link SQL database and Python? I am new to Python. All I know is like:
import pyodbc
conn = pyodbc.connect(r'DSN=B1P HANA;UID=****;PWD=****')
Then I do not know how to continue. And I cannot find an overall example online. Could anyone help by providing a comprehensive example? From link SQL database in Python unitl retrieving the result?

Execute SQL from python
Instantiate a Cursor and use the execute method of the Cursor class to execute any SQL statement.
cursor = cnxn.cursor()
Select
You can use fetchall, fetchone, and fetchmany to retrieve rows returned from SELECT statements:
import pyodbc
cursor = cnxn.cursor()
cnxn = pyodbc.connect('DSN=myDSN;UID=***;PWD=***')
cursor.execute("SELECT Col1, Col2 FROM MyTable WHERE Col1= 'SomeValue'")
rows = cursor.fetchall()
for row in rows:
print(row.Col1, row.Col2 )
You can provide parameterized queries in a sequence or in the argument list:
cursor.execute("SELECT Col1, Col2, Col3, ... FROM MyTable WHERE Col1 = ?", 'SomeValue',1)
Insert
INSERT commands also use the execute method; however, you must subsequently call the commit method after an insert or you will lose your changes:
cursor.execute("INSERT INTO MyTable (Col1) VALUES ('SomeValue')")
cnxn.commit()
Update and Delete
As with an insert, you must also call commit after calling execute for an update or delete:
cursor.execute("UPDATE MyTable SET Col1= 'SomeValue'")
cnxn.commit()
Metadata Discovery
You can use the getinfo method to retrieve data such as information about the data source and the capabilities of the driver. The getinfo method passes through input to the ODBC SQLGetInfo method.
cnxn.getinfo(pyodbc.SQL_DATA_SOURCE_NAME)

Using pyodbc with SQL join statement in Python

I am trying to join 2 tables in Python. (Using Windows, jupyter notebook.)
Table 1 is an excel file read in using pandas.
TABLE_1= pd.read_excel('my_file.xlsx')
Table 2 is a large table in oracle database that I can connect to using pyodbc. I can read in the entire table successfully using pyodbc like this, but it takes a very long time to run.
sql = "SELECT * FROM ORACLE.table_2"
cnxn = odbc.connect(##########)
TABLE_2 = pd.read_sql(sql, cnxn)
So I would like to do an inner join as part of the pyodbc import, so that it runs faster and I only pull in the needed records. Table 1 and Table 2 share the same unique identifier/primary key.
sql = "SELECT * FROM ORACLE.TABLE_1 INNER JOIN TABLE_2 ON ORACLE.TABLE1.ID=TABLE_2.ID"
cnxn = odbc.connect(##########)
TABLE_1_2_JOINED = pd.read_sql(sql, cnxn)
But this doesn't work. I get this error:
DatabaseError: Execution failed on sql 'SELECT * FROM ORACLE.TABLE_1
INNER JOIN TABLE_2 ON ORACLE.TABLE1.ID=TABLE_2.ID': ('42S02', '[42S02]
[Oracle][ODBC][Ora]ORA-00942: table or view does not exist\n (942)
(SQLExecDirectW)')
Is there another way I can do this? It seems very inefficient to have to import entire table w/millions of records when I only need to join a few hundred. Thank you.

Something like this might work.
First do:
MyIds = set(table_1['id'])
Then:
SQL1 = "CREATE TEMPORARY TABLE MyIds ( ID int );"
Now insert your ids:
SQL2 = "INSERT INTO MyIds.ID %d VALUES %s"
for element in list(MyIds):
cursor.execute(SQL2, element)
And lastly
SQL3 = "SELECT * FROM ORACLE.TABLE_1 WHERE ORACLE.TABLE1.ID IN (SELECT ID FROM MyIds)"
I have used MySQL not oracle and a different connector to you but the principles are probably the same. Of course there's a bit more code with the python-sql connections etc. Hope it works, otherwise try to make a regular table rather than a temporary one.

sqlite3 in Python: Update a column in one table through a column from another table when the primary key is the same

I want to use sqlite3 in Python. I have a table with four columns (id INT, other_no INT, position TEXT, classification TEXT, PRIMARY KEY is id). In this table, the column for classification is left empty and will be updated by the information from table 2. See my code below. I then have a second table which has three columns. (id INT, class TEXT, type TEXT, PRIMARY KEY (id)). Basically, the two tables have two common columns. In both tables, the primary key is the id column, the classification and class column would eventually have to be merged. So the code needs to be able to go through table 2 and whenever it finds a matching id in table 1 to updating the class column (of table 1) from the classification column of table 2. The information to build the two tables comes from two separate files.
# function to create Table1...
# function to create Table2...
(the tables are created as expected). The problem occurs when I try to update table1 with information from table2.
def update_table1():
con = sqlite3.connect('table1.db', 'table2.db') #I know this is wrong but if how do I connect table2 so that I don't get error that the Table2 global names is not defined?
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class # so now instead of Null it should have the class information from table2
cur.execute("UPDATE Table1 SET class = ? WHERE id =? ", (new_classification, id))
con.commit()
But, I get an error for line2: TypeError: a float is required. I know that it's because I put two parameters in the connect method. But then if I only connect with Table1 I get the error Table2 is not defined.
I read this post Updating a column in one table through a column in another table I understand the logic around it but I can't translate the SQL code into Python. I have been working on this for some time and can't seem to just get it. Would you please help? Thanks
After the comments of a user I got this code but it still doesn't work:
#connect to the database containing the two tables
cur.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1 = row[0]
cur.execute("SELECT (id, class) FROM Table2")
for row1 in cur.fetchall():
row_table2 = row[0] #catches the id
row_table2_class = row[1] #catches the name
if row_table1 == row_table2:
print "yes" #as a test for me to see the loop worked
new_class = row_table_class
cur.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1))
con.commit()
From this however I get an operational error. I know it's my syntax, but like I said I am new to this so any guidance is greatly appreciated.

You need a lot more code than what you have there. Your code logic should go something like this:
connect to sqlite db
execute a SELECT query on TABLE2 and fetch rows. Call this rows2.
execute a SELECT query on TABLE1 and fetch rows. Call this rows1.
For every id in rows1, if this id exists in rows2, execute an UPDATE on that particular id in TABLE1.
You are missing SELECT queries in your code:
cur = con.cursor()
if id in Table2 == id in Table1:
new_classification = Table2.class
You can't just directly test like this. You need to first fetch the rows in both tables using SELECT queries before you can test them out the way you want.
Find below modified code from what you posted above. I have just typed that code in here directly, so I have not had the chance to test it, but you can look at it to get an idea. This could probably even run.
Also, this is by no means the most efficient way to do this. This is actually very clunky. Especially because for every id in Table1, you are fetching all the rows for Table2 everytime to match. Instead, you would want to fetch all the rows for Table1 once, then all the rows for Table2 once and then match them up. I will leave the optimization to make this faster upto you.
import sqlite3
#connect to the database containing the two tables
conn = sqlite3.connect("<PUT DB FILENAME HERE>")
cur = conn.execute("SELECT id FROM Table1")
for row in cur.fetchall():
row_table1_id = row[0]
cur2 = conn.execute("SELECT id, class FROM Table2")
for row1 in cur2.fetchall():
row_table2_id = row1[0] # catches the id
row_table2_class = row1[1] # catches the name
if row_table1_id == row_table2_id:
print "yes" # as a test for me to see the loop worked
new_class = row_table2_class
conn.execute("UPDATE Table1 SET classification=? WHERE id=?", (new_class, row_table1_id))
conn.commit()

Update and select in one operation in MSSQL

I have a table with a column ID varchar(255) and a done bit.
I want fetch the first ID found, where the bit isn't set and whilst fetching also set the bit. So that no other instance of the script uses the same ID and no race condition is possible.
import _mssql
con = _mssql.connect(server='server', user='user', password='password', database='default')
#these two in a single command
con.execute_query('SELECT TOP 1 ID FROM tableA WHERE done=0')
con.execute_query('UPDATE tableA SET done=1 WHERE ID=\''+id_from_above+'\'')
for row in con:
#row['ID'] contains nothing as it last used with the UPDATE, not the SELECT
start_function(row['ID'])
edit (including the suggestion of wewesthemenace):
[...]
con.execute_query('UPDATE tableA SET done = 1 WHERE ID = (SELECT TOP 1 ID FROM tableA WHERE done = 0)')
for row in con:
#row['ID'] contains nothing as it last used with the UPDATE, not the SELECT
start_function(row['ID'])
Working on Microsoft SQL Server Enterprise Edition v9.00.3042.00, i.e. SQL Server 2005 Service Pack 2
edit 2:
The answered question lead me to a follow-up question: While mssql query returns an affected ID use it in a while loop

How about this one?
UPDATE tableA SET done = 1 WHERE ID = (SELECT TOP 1 ID FROM tableA WHERE done = 0)

Possible solution, which works in my situation.
con.execute_query('UPDATE tableA SET done=1 OUTPUT INSERTED.ID WHERE ID=(SELECT TOP(1) ID FROM tableA WHERE done=0)')
for row in con:
#row['ID'] is exactly one ID where the done bit wasn't set, but now is.
start_function(row['ID'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Improve performance SQL query in Python Cx_Oracle - python

Related

Is there a way to retrieve a column specified by the user from a MySQL database?

How to use ODBC to link SQL database and do SQL queries in Python

Using pyodbc with SQL join statement in Python

sqlite3 in Python: Update a column in one table through a column from another table when the primary key is the same

Update and select in one operation in MSSQL

Categories

Resources