Why is a Pandas Oracle DB query faster with literals?

Why is a Pandas Oracle DB query faster with literals? - python

When I use the bind variable approach found here: https://cx-oracle.readthedocs.io/en/latest/user_guide/bind.html#bind
and here: Python cx_Oracle bind variables
my query takes around 8 minutes, but when I use hardcoded values (literals), it takes around 20 seconds.
I'm struggling to comprehend what's happening "behind-the-scenes" (variables/memory access/data transfer/query parsing) to see if there's any way for me to adhere to the recommended approach of using bind variables and get the same ~20s performance.
This python script will be automated and the values will be dynamic, so I definitely can't use hardcoded values.
Technical background: Python 3.6; Oracle 11g; cx_Oracle 8
---- python portion of code -----
first version
param_dict = {“startDate:”01-Jul-21”, “endDate:”31-Jul-2021”}
conn = (typical database connection code….)
cur = conn.cursor()
###### this query has the bind variables and param_dict keys match bind variable aliases; runtime ~480s (8mins)
cur_df = pandas.DataFrame(cur.execute("inserted_query_here", param_dict))
second version
conn = (typical database connection code….)
cur = conn.cursor()
###### this query has the hardcoded values (literals); runtime ~20s
cur_df = pandas.DataFrame(cur.execute("inserted_query_here"))

#ChristopherJones and Alex thanks for the referenced articles.
I've been able to solve the issue by thoroughly examining the EXPLAIN PLAN. The query that performed faster wasn't using index (faster to do full table scan); the other was (bind variable version of query).
I applied NO_INDEX hint accordingly and now have ~20s result for bind variable version of query.

Related

What is the return of an UPDATE query?

I'm using sqlalchemy in combination with sqlite and the databases library and I'm trying to wrap my head around what that combination returns when doing update queries. I'm running a testcase and I have sqlalchemy set up to roll back upon execution of each testcase via force_rollback=True.
db = databases.Database(DB_URL, force_rollback=True)
query = update(my_table).where(my_table.columns.id == some_id_to_update).values(**values)
res = await db.execute(query)
When working with psql, I'd expect res to be the number of rows that were affected by the UPDATE query, but from reading the documentation, sqlite seems to behave differently in that it doesn't return anything. I tested this manually by connecting to the database via sqlite3 and as expected, there is no return when doing UPDATE queries. sqlalchemy however does return something, which I assume is the number of total rows in the table, but I'm not sure. Can anybody shed some light into what is actually returned?
What's more, when I tried to get the number of rows affected by the UPDATE query via SELECT changes(), I'm also getting the number of total rows in the table and not the rows affected by the most recent query. Do I have a misunderstanding of what changes() does?

"The changes() function returns the number of database rows that were changed or inserted or deleted by the most recently completed INSERT, DELETE, or UPDATE statement, exclusive of statements in lower-level triggers."
When you use the Python sqlite3 module, you use .executeXXX interfaces to evaluate/prepare your query. If the query is supposed to modify the database, it does it at this stage. You have to use the same interface to prepare a SELECT statement. In either case, the .executeXXX interfaces never return anything. To get the result of a SELECT query, you have to use a .fetchXXX interface after running .executeXXX.
To get the number of changed rows after INSERT, DELETE, or UPDATE statement via sqlite3, you can also take the difference in con.total_changes before/after running .executeXXX.

Sequence nextval/currval in two sessions

Setup:
Oracle DB running on a windows machine
Mac connected with the database, both in the same network
Problem:
When I created a sequence in SQL Developer, I can see and use the sequence in this session. If I logoff and login again the sequence is still there. But if I try to use the sequence via Python and cx_Oracle, it doesn't work. It also doesn't work the other way around.
[In SQL Developer: user: uc]
create SEQUENCE seq1;
select seq1.nextval from dual; ---> 1
commit; --> although the create statement is a DDL method, just in case
[login via Python, user: uc]
select seq1.currval from dual;--> ORA-08002 Sequence seq1.currval isn't defined in this session
The python code:
import cx_Oracle
cx_Oracle.init_oracle_client(lib_dir="/Users/benreisinger/Documents/testclients/instantclient_19_8", config_dir=None, error_url=None, driver_name=None)
# Connect as user "hr" with password "hr" to the "orclpdb" service running on a remote computer.
connection = cx_Oracle.connect("uc", "uc", "10.0.0.22/orcl")
cursor = connection.cursor()
cursor.execute("""
select seq1.currval from dual
""")
print(cursor)
for seq1 in cursor:
print(seq1)
The error says, that [seq1] wasn't defined in this session, but why does the following work:
select seq1.nextval from dual
--> returns 2
Even after issuing this, I can't use seq1.currval
Btw., select sequence_name from user_sequences returns seq1in Python
[as SYS user]
select * from v$session
where username = 'uc';
--> returns zero rows
Why is seq1 not in reach for the python program ?
Note: With tables, everything just works fine
EDIT:
also with 'UC' being upper case, no rows returned
first issuing
still doesn't work

Not sure how to explain this. The previous 2 answers are correct, but somehow you seem to miss the point.
First, take everything that is irrelevant out of the equation. Mac client on Windows db: doesn't matter. SQLDeveloper vs python: doesn't matter. The only thing that matters is that you connect twice to the database as the same schema. You connect twice, that means that you have 2 separate sessions and those sessions don't know about each other. Both sessions have access to the same database objects, so you if you execute ddl (eg create sequence), that object will be visible in the other session.
Now to the core of your question. The oracle documentation states
"To use or refer to the current sequence value of your session, reference seq_name.CURRVAL. CURRVAL can only be used if seq_name.NEXTVAL has been referenced in the current user session (in the current or a previous transaction)."
You have 2 different sessions, so according to the documentation, you should not be able to call seq_name.CURRVAL in the other session. That is exactly the behaviour you are seeing.
You ask "Why is seq1 not in reach for the python program ?". The answer is: you're not correct, it is in reach for the python program. You can call seq1.NEXTVAL from any session. But you cannot invoke seq1.NEXTVAL from one session (SQLDeveloper) and then invoke seq1.CURRVAL from another session (python) because that is just how sequences works as stated in documentation.
Just to confirm you're not in the same session, execute the following statement for both clients (SQLDeveloper and python):
select sys_context('USERENV','SID') from dual;
You'll notice that the session id is different.

CURRVAL returns the last allocated sequence number in the current session. So it only works when we have previously executed a NEXTVAL. So these two statements will return the same value when run in the same session:
select seq1.nextval from dual
/
select seq1.currval from dual
/
It's not entirely clear what you're trying to achieve, but it looks like your python code is executing a single statement for the connection, so it's not tapping into an existing session.
This statement returns zero rows ...
select * from v$session
where username = 'uc';
... because database objects in Oracle are stored in UPPER case (at least by default, but it's wise to stick with that default. So use where username = 'UC' instead.

Python established a new session. In it, sequence hasn't been invoked yet, so its currval doesn't exist. First you have to select nextval (which, as you said, returned 2) - only then currval will make sense.
Saying that
Even after issuing this, I can't use seq1.currval
is hard to believe.
This: select * From v$session where username = 'uc' returned nothing because - by default - all objects are stored in uppercase, so you should have ran
.... where username = 'UC'
Finally:
commit; --> although the create statement is a DDL method, just in case
Which case? There's no case. DDL commits. Moreover, commits twice (before and after the actual DDL statement). And there's nothing to commit either. Therefore, what you did is unnecessary and pretty much useless.

MS SQL Stored Procedures incompletely executed from Python

I have an Issue in executing stored procedure from Python. It only
gets executed partially. However, when I execute the same from MSSQL
server, I have no Issues. I've reviewed my stored procedure several
times following inputs from SQL Stored Procedures not finishing when called from Python.
I am unable figure out as to why pyodbc would treat below SP any differently.
Stored Procedure
CREATE PROCEDURE
[dbo].[IVRP_Nodes]
AS
BEGIN
SET NOCOUNT ON;
DECLARE #Id nchar(16);
DECLARE #_id nchar(16);
DECLARE #number int;
DECLARE Nodes_Cursor CURSOR FOR
SELECT r._id,r.Number FROM Room r
OPEN Nodes_Cursor
FETCH NEXT FROM Nodes_Cursor into #_id,#number;
WHILE ##FETCH_STATUS = 0
BEGIN
DECLARE #RETN as nchar(16);
Exec SP_GetId Nodes , #RETN OUTPUT;
set #Id = #RETN;
INSERT INTO [dbo].[Nodes]([_Id],[RoomNumber],[NodeAddress],
[NodeType],[NodeState],[_tsUpd])
VALUES(#Id ,#number,'1','114','0',getdate());
FETCH NEXT FROM Nodes_Cursor into #_id,#number;
END;
CLOSE Nodes_Cursor;
DEALLOCATE Nodes_Cursor;
END
In Python, I am using following code snippet:
mydb_lock = pyodbc.connect('Driver={SQL Server Native Client 11.0};'
'Server=localhost;'
'Database=InterelRMS;'
'Trusted_Connection=yes;'
'MARS_Connection=yes;'
'user=sa;'
'password=Passw0rd;')
mycursor_lock = mydb_lock.cursor()
sql_nodes = "Exec IVRP_Nodes"
mycursor_lock.execute(sql_nodes)
mydb_lock.commit()
Any assistance or help regarding above matter would be appreciated.
Thanks.

pypyodbc - Invalid cursor state when executing stored procedure in a loop

I have a python program which uses pypyodbc to interact with MSSQL database. A stored procedure is created in MSSQL and is run via python. If I execute the stored procedure only once (via python), there are no problems. However, when it is executed multiple times within a for loop, I get the following error:
pypyodbc.ProgrammingError: ('24000', '[24000] [Microsoft][SQL Server Native Client 11.0]Invalid cursor state')
My python code details are below:
connection_string_prod = 'Driver={SQL Server Native Client 11.0};Server=PSCSQCD1234.TEST.AD.TEST.NET\SQLINS99,2222;Database=Test;Uid=python;Pwd=python;'
connection_prod = pypyodbc.connect(connection_string_prod)
cursor_prod = connection_prod.cursor()
get_files_to_load_query = "Get_Files_To_Load"
files_to_load = cursor_prod.execute(get_files_to_load_query)
for file_name in files_to_load:
load_query = "Load_Query_Stored_Proc #File_Name = '" + file_name + "'"
cursor_prod.execute(load_query)
cursor_prod.commit()
cursor_prod.close()
connection_prod.close()
In some posts it was suggested to use "SET NOCOUNT ON" at the top of the SQL stored procedure. I've done that already and it did not help with this issue.
Below is a code of a simple stored procedure that is causing the issue:
CREATE PROCEDURE [dbo].[Test]
AS
SET NOCOUNT ON
INSERT INTO Test(a)
SELECT 1
Why executing the stored procedure within a for loop only causes an issue?
Please advise.
Thank you!

You are using cursor_prod values to control the for loop and then using that same cursor object inside the loop to run the stored procedure, thus invalidating the previous state of the cursor for controlling the loop. The first iteration of the loop is where the cursor gets overwritten, which is why you don't encounter the error until you try to go through the loop a second time.
You don't need to create a second connection, but you do need to use a second cursor to execute the stored procedure. Or, you could use .fetchall to stuff all of the file names into a files_to_load list and then iterate over that list, freeing up the cursor to run the stored procedure.

SqlAlchemy+pymssql. Will raw parametrized queries use same execution plan?

In my application I have parametrized queries like this:
res = db_connection.execute(text("""
SELECT * FROM Luna_gestiune WHERE id_filiala = :id_filiala AND anul=:anul AND luna = :luna
"""),
id_filiala=6, anul=2010, luna=7).fetchone()
Will such query use same query execution plan if I run it in loop with different parameter values?

It seems unlikely. pymssql uses FreeTDS, and FreeTDS performs the parameter substitution before sending the query to the server, unlike some other mechanisms that send the query "template" and the parameters separately (e.g., pyodbc with Microsoft's ODBC drivers, as described in this answer).
That is, for the query you describe in your question, pymssql/FreeTDS will not send a query string like
SELECT * FROM Luna_gestiune WHERE id_filiala = #P1 AND anul = #P2 AND luna = #P3
along with separate values for #P1 = 6, #P2 = 2010, etc.. Instead it will build the literal query first, and then send
SELECT * FROM Luna_gestiune WHERE id_filiala = 6 AND anul = 2010 AND luna = 7
So for each parameteized query you send, the SQL command text will be different, and my understanding is that database engines will only re-use a cached execution plan if the current command text is identical to the cached version.
Edit: Subsequent testing confirms that pymssql apparently does not re-use cached execution plans. Details in this answer.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.