SqlAlchemy+pymssql. Will raw parametrized queries use same execution plan? - python

In my application I have parametrized queries like this:
res = db_connection.execute(text("""
SELECT * FROM Luna_gestiune WHERE id_filiala = :id_filiala AND anul=:anul AND luna = :luna
"""),
id_filiala=6, anul=2010, luna=7).fetchone()
Will such query use same query execution plan if I run it in loop with different parameter values?

It seems unlikely. pymssql uses FreeTDS, and FreeTDS performs the parameter substitution before sending the query to the server, unlike some other mechanisms that send the query "template" and the parameters separately (e.g., pyodbc with Microsoft's ODBC drivers, as described in this answer).
That is, for the query you describe in your question, pymssql/FreeTDS will not send a query string like
SELECT * FROM Luna_gestiune WHERE id_filiala = #P1 AND anul = #P2 AND luna = #P3
along with separate values for #P1 = 6, #P2 = 2010, etc.. Instead it will build the literal query first, and then send
SELECT * FROM Luna_gestiune WHERE id_filiala = 6 AND anul = 2010 AND luna = 7
So for each parameteized query you send, the SQL command text will be different, and my understanding is that database engines will only re-use a cached execution plan if the current command text is identical to the cached version.
Edit: Subsequent testing confirms that pymssql apparently does not re-use cached execution plans. Details in this answer.

Related

Why is a Pandas Oracle DB query faster with literals?

When I use the bind variable approach found here: https://cx-oracle.readthedocs.io/en/latest/user_guide/bind.html#bind
and here: Python cx_Oracle bind variables
my query takes around 8 minutes, but when I use hardcoded values (literals), it takes around 20 seconds.
I'm struggling to comprehend what's happening "behind-the-scenes" (variables/memory access/data transfer/query parsing) to see if there's any way for me to adhere to the recommended approach of using bind variables and get the same ~20s performance.
This python script will be automated and the values will be dynamic, so I definitely can't use hardcoded values.
Technical background: Python 3.6; Oracle 11g; cx_Oracle 8
---- python portion of code -----
first version
param_dict = {“startDate:”01-Jul-21”, “endDate:”31-Jul-2021”}
conn = (typical database connection code….)
cur = conn.cursor()
###### this query has the bind variables and param_dict keys match bind variable aliases; runtime ~480s (8mins)
cur_df = pandas.DataFrame(cur.execute("inserted_query_here", param_dict))
second version
conn = (typical database connection code….)
cur = conn.cursor()
###### this query has the hardcoded values (literals); runtime ~20s
cur_df = pandas.DataFrame(cur.execute("inserted_query_here"))
#ChristopherJones and Alex thanks for the referenced articles.
I've been able to solve the issue by thoroughly examining the EXPLAIN PLAN. The query that performed faster wasn't using index (faster to do full table scan); the other was (bind variable version of query).
I applied NO_INDEX hint accordingly and now have ~20s result for bind variable version of query.

Sequence nextval/currval in two sessions

Setup:
Oracle DB running on a windows machine
Mac connected with the database, both in the same network
Problem:
When I created a sequence in SQL Developer, I can see and use the sequence in this session. If I logoff and login again the sequence is still there. But if I try to use the sequence via Python and cx_Oracle, it doesn't work. It also doesn't work the other way around.
[In SQL Developer: user: uc]
create SEQUENCE seq1;
select seq1.nextval from dual; ---> 1
commit; --> although the create statement is a DDL method, just in case
[login via Python, user: uc]
select seq1.currval from dual;--> ORA-08002 Sequence seq1.currval isn't defined in this session
The python code:
import cx_Oracle
cx_Oracle.init_oracle_client(lib_dir="/Users/benreisinger/Documents/testclients/instantclient_19_8", config_dir=None, error_url=None, driver_name=None)
# Connect as user "hr" with password "hr" to the "orclpdb" service running on a remote computer.
connection = cx_Oracle.connect("uc", "uc", "10.0.0.22/orcl")
cursor = connection.cursor()
cursor.execute("""
select seq1.currval from dual
""")
print(cursor)
for seq1 in cursor:
print(seq1)
The error says, that [seq1] wasn't defined in this session, but why does the following work:
select seq1.nextval from dual
--> returns 2
Even after issuing this, I can't use seq1.currval
Btw., select sequence_name from user_sequences returns seq1in Python
[as SYS user]
select * from v$session
where username = 'uc';
--> returns zero rows
Why is seq1 not in reach for the python program ?
Note: With tables, everything just works fine
EDIT:
also with 'UC' being upper case, no rows returned
first issuing
still doesn't work
Not sure how to explain this. The previous 2 answers are correct, but somehow you seem to miss the point.
First, take everything that is irrelevant out of the equation. Mac client on Windows db: doesn't matter. SQLDeveloper vs python: doesn't matter. The only thing that matters is that you connect twice to the database as the same schema. You connect twice, that means that you have 2 separate sessions and those sessions don't know about each other. Both sessions have access to the same database objects, so you if you execute ddl (eg create sequence), that object will be visible in the other session.
Now to the core of your question. The oracle documentation states
"To use or refer to the current sequence value of your session, reference seq_name.CURRVAL. CURRVAL can only be used if seq_name.NEXTVAL has been referenced in the current user session (in the current or a previous transaction)."
You have 2 different sessions, so according to the documentation, you should not be able to call seq_name.CURRVAL in the other session. That is exactly the behaviour you are seeing.
You ask "Why is seq1 not in reach for the python program ?". The answer is: you're not correct, it is in reach for the python program. You can call seq1.NEXTVAL from any session. But you cannot invoke seq1.NEXTVAL from one session (SQLDeveloper) and then invoke seq1.CURRVAL from another session (python) because that is just how sequences works as stated in documentation.
Just to confirm you're not in the same session, execute the following statement for both clients (SQLDeveloper and python):
select sys_context('USERENV','SID') from dual;
You'll notice that the session id is different.
CURRVAL returns the last allocated sequence number in the current session. So it only works when we have previously executed a NEXTVAL. So these two statements will return the same value when run in the same session:
select seq1.nextval from dual
/
select seq1.currval from dual
/
It's not entirely clear what you're trying to achieve, but it looks like your python code is executing a single statement for the connection, so it's not tapping into an existing session.
This statement returns zero rows ...
select * from v$session
where username = 'uc';
... because database objects in Oracle are stored in UPPER case (at least by default, but it's wise to stick with that default. So use where username = 'UC' instead.
Python established a new session. In it, sequence hasn't been invoked yet, so its currval doesn't exist. First you have to select nextval (which, as you said, returned 2) - only then currval will make sense.
Saying that
Even after issuing this, I can't use seq1.currval
is hard to believe.
This: select * From v$session where username = 'uc' returned nothing because - by default - all objects are stored in uppercase, so you should have ran
.... where username = 'UC'
Finally:
commit; --> although the create statement is a DDL method, just in case
Which case? There's no case. DDL commits. Moreover, commits twice (before and after the actual DDL statement). And there's nothing to commit either. Therefore, what you did is unnecessary and pretty much useless.

SQLite: How to enable explain plan with the Python API?

I'm using Python (and Peewee) to connect to a SQLite database. My data access layer (DAL) is a mix of peewee ORM and SQL-based functions. I would like to enable EXPLAIN PLAN for all queries upon connecting to the database and toggle it via configuration or CLI parameter ... how can I do that using the Python API?
from playhouse.db_url import connect
self._logger.info("opening db connection to database, creating cursor and initializing orm model ...")
self.__db = connect(url)
# add support for a REGEXP and POW implementation
# TODO: this should be added only for the SQLite case and doesn't apply to other vendors.
self.__db.connection().create_function("REGEXP", 2, regexp)
self.__db.connection().create_function("POW", 2, pow)
self.__cursor = self.__db.cursor()
self.__cursor.arraysize = 100
# what shall I do here to enable EXPLAIN PLANs?
That is a feature of the sqlite interactive shell. To get the query plans, you will need to explicitly request it. This is not quite straightforward with Peewee because it uses parameterized queries. You can get the SQL executed by peewee in a couple of ways.
# Print all queries to stderr.
import logging
logger = logging.getLogger('peewee')
logger.addHandler(logging.StreamHandler())
logger.setLevel(logging.DEBUG)
Or for an individual query:
query = SomeModel.select()
sql, params = query.sql()
# To get the query plan:
curs = db.execute_sql('EXPLAIN ' + sql, params)
print(curs.fetchall()) # prints query plan

How to support Command Time out in mysql version 5.6? I am looking for a CommandTimeout option similar to that of ADO .NET but in python

I am new to python and would like to do what ADO .NET CommandTimeout property does (setting the execution time for a query) in mysql (5.6).
I am looking for any in-built python libraries that may support it by default (I have tried pymysql, pyodbc and mysql.connector) or what could the best way to do this, in case I have to invent the wheel from scratch.
I am working with an old mysql version (5.6), and the reason I am stressing on this is that on researching I have found out that this could be accomplished in mysql 5.7 onwards by using the following commands
SET SESSION MAX_EXECUTION_TIME=2000;
SET GLOBAL MAX_EXECUTION_TIME=2000;
Reference Links I have already gone through which suggests a solution in mysql 5.7 and upwards:
http://mysqlserverteam.com/server-side-select-statement-timeouts/
https://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html#optimizer-hints-execution-time
But I am looking for a solution in mysql 5.6, Thanks in advance.
So I was finally able to solve the issue, by creating a thread/process which is invoked based on a timer as follows:
query_monitoring_thread = threading.Timer(self.__command_timeout, self.kill_query,
args=[mydb.connection_id, source])
The thread would get all the processes created by the current user and who is executing for time more than the command timeout and kill the ones that exist:
cursor.execute(
"Select Id From Information_Schema.Processlist where Id={0} AND User='{1}' AND Time >= {2}".format(
conn_id, conn.user, self.__command_timeout))
records = cursor.fetchall()
if len(records) > 0:
cursor.execute('Kill query ' + str(conn_id))

MySQL driver issues with INFORMATION_SCHEMA?

I'm trying out the Concurrence framework for Stackless Python. It includes a MySQL driver and when running some code that previously ran fine with MySQLdb it fails.
What I am doing:
Connecting to the MySQL database using dbapi with username/password/port/database.
Executing SELECT * FROM INFORMATION_SCHEMA.COLUMNS
This fails with message:
Table 'mydatabase.columns' doesn't exist
"mydatabase" is the database I specified in step 1.
When doing the same query in the MySQL console after issuing "USE mydatabase", it works perfectly.
Checking the network communication yields something like this:
>>>myusername
>>>scrambled password
>>>mydatabase
>>>CMD 3 SET AUTOCOMMIT = 0
<<<0
>>>CMD 3 SELECT * FROM INFORMATION_SCHEMA.COLUMNS
<<<255
<<<Table 'mydatabase.columns' doesn't exist
Is this a driver issue (since it works in MySQLdb)? Or am I not supposed to be able to query INFORMATION_SCHEMA this way?
If I send a specific "USE INFORMATION_SCHEMA" before trying to query it, I get the expected result. But, I do not want to have to sprinkle my code all over with "USE" queries.
It definitely looks like a driver issue. Maybe the python driver don't support the DB prefix.
Just to be sure, try the other way around: first use INFORMATION_SCHEMA and then SELECT * FROM mydatabase.sometable
I finally found the reason.
The driver just echoed the server capability flags back in the protocol handshake, with the exception of compression:
## concurrence/database/mysql/client.py ##
client_caps = server_caps
#always turn off compression
client_caps &= ~CAPS.COMPRESS
As the server has the capability...
CLIENT_NO_SCHEMA 16 /* Don't allow database.table.column */
...that was echoed back to the server, telling it not to allow that syntax.
Adding client_caps &= ~CAPS.NO_SCHEMA did the trick.

Categories

Resources