Identify the sql statements using python - python

The task is to identify if the SQL statements is a DML / DDL or not.
So I had to use an array and push all the DML/DDL patterns into that and search for them by iterating.
Below is a simple code snippet where
I am sending an SQL query as a parameter
Check if it has update, alter, drop, delete
Print
def check_dml_statementent (self,sql)
actions = ['update', 'alter', 'drop', 'delete']
for action in actions
if ( re.search (action,sql.lower() ) :
print "This is a dml statement "
But there are some edge cases, for which I need to code for
To consider
Update table table_name where ...
alter table table_name
create table table_name
delete * from table_name
drop table_name
Not to consider:
select * from table where action='drop'
So, the task is to identify only SQL statements which modify, drop, create, alter table etc.
One specific idea is to check if an SQL statement starts with above array values using startswith function.

You can use python-sqlparse for that:
import sqlparse
query = """
select * from table where action='delete';
delete from table where name='Alex'
"""
parsed = sqlparse.parse(query)
for p in parsed:
print(p.get_type())
This will output:
SELECT
DELETE

Related

Suggested way to run multiple sql statements in python?

What would be the suggested way to run something like the following in python:
self.cursor.execute('SET FOREIGN_KEY_CHECKS=0; DROP TABLE IF EXISTS %s; SET FOREIGN_KEY_CHECKS=1' % (table_name,))
For example, should this be three separate self.cursor.execute(...) statements? Is there a specific method that should be used other than cursor.execute(...) to do something like this, or what is the suggested practice for doing this? Currently the code I have is as follows:
self.cursor.execute('SET FOREIGN_KEY_CHECKS=0;')
self.cursor.execute('DROP TABLE IF EXISTS %s;' % (table_name,))
self.cursor.execute('SET FOREIGN_KEY_CHECKS=1;')
self.cursor.execute('CREATE TABLE %s select * from mytable;' % (table_name,))
As you can see, everything is run separately...so I'm not sure if this is a good idea or not (or rather -- what the best way to do the above is). Perhaps BEGIN...END ?
I would create a stored procedure:
DROP PROCEDURE IF EXISTS CopyTable;
DELIMITER $$
CREATE PROCEDURE CopyTable(IN _mytable VARCHAR(64), _table_name VARCHAR(64))
BEGIN
SET FOREIGN_KEY_CHECKS=0;
SET #stmt = CONCAT('DROP TABLE IF EXISTS ',_table_name);
PREPARE stmt1 FROM #stmt;
EXECUTE stmt1;
SET FOREIGN_KEY_CHECKS=1;
SET #stmt = CONCAT('CREATE TABLE ',_table_name,' as select * from ', _mytable);
PREPARE stmt1 FROM #stmt;
EXECUTE stmt1;
DEALLOCATE PREPARE stmt1;
END$$
DELIMITER ;
and then just run:
args = ['mytable', 'table_name']
cursor.callproc('CopyTable', args)
keeping it simple and modular. Of course you should do some kind of error checking and you could even have the stored procedure return a code to indicate success or failure.
In the documentation of MySQLCursor.execute(), they suggest to use the multi=True parameter:
operation = 'SELECT 1; INSERT INTO t1 VALUES (); SELECT 2'
for result in cursor.execute(operation, multi=True):
...
You can find another example in the module's source code.
I would not rely on any multi=True parameter of the execute function, which is very driver dependent nor attempt to try to split a string on the ; character, which might be embedded in a string literal. The most straightforward approach would be to create a function, execute_multiple, that takes a list of statements to be executed and a rollback_on_error parameter to determine what action to be performed if any of the statements results in an exception.
My experience with MySQLdb and PyMySQL has been that by default they start off in autocommit=0, in other words as if you are already in a transaction and an explicit commit is required. Anyway, that assumption holds for the code below. If that is not the case, then you should either 1. explicitly set autocommit=0 after connecting or 2. Modify this code to start a transaction following the try statement
def execute_multiple(conn, statements, rollback_on_error=True):
"""
Execute multiple SQL statements and returns the cursor from the last executed statement.
:param conn: The connection to the database
:type conn: Database connection
:param statements: The statements to be executed
:type statements: A list of strings
:param: rollback_on_error: Flag to indicate action to be taken on an exception
:type rollback_on_error: bool
:returns cursor from the last statement executed
:rtype cursor
"""
try:
cursor = conn.cursor()
for statement in statements:
cursor.execute(statement)
if not rollback_on_error:
conn.commit() # commit on each statement
except Exception as e:
if rollback_on_error:
conn.rollback()
raise
else:
if rollback_on_error:
conn.commit() # then commit only after all statements have completed successfully
You can also have a version that handles prepared statements with its parameter list:
def execute_multiple_prepared(conn, statements_and_values, rollback_on_error=True):
"""
Execute multiple SQL statements and returns the cursor from the last executed statement.
:param conn: The connection to the database
:type conn: Database connection
:param statements_and_values: The statements and values to be executed
:type statements_and_values: A list of lists. Each sublist consists of a string, the SQL prepared statement with %s placeholders, and a list or tuple of its parameters
:param: rollback_on_error: Flag to indicate action to be taken on an exception
:type rollback_on_error: bool
:returns cursor from the last statement executed
:rtype cursor
"""
try:
cursor = conn.cursor()
for s_v in statements_and_values:
cursor.execute(s_v[0], s_v[1])
if not rollback_on_error:
conn.commit() # commit on each statement
except Exception as e:
if rollback_on_error:
conn.rollback()
raise
else:
if rollback_on_error:
conn.commit() # then commit only after all statements have completed successfully
return cursor # return the cursor in case there are results to be processed
For example:
cursor = execute_multiple_prepared(conn, [('select * from test_table where count = %s', (2000,))], False)
Although, admittedly, the above call only had one SQL prepared statement with parameters.
I stuck multiple times in these types of problem in project. After the lot of research i found some points and suggestion.
execute() method work well with one query at a time. Because during the execution method take care of state.
I know cursor.execute(operation, params=None, multi=True) take multiple query. But parameters does not work well in this case and sometimes internal error exception spoil all results too. And code become massive and ambiguous. Even docs also mention this.
executemany(operation, seq_of_params) is not a good practice to implement every times. Because operation which produces one or more result sets constitutes undefined behavior, and the implementation is permitted (but not required) to raise an exception when it detects that a result set has been created by an invocation of the operation. [source - docs]
Suggestion 1-:
Make a list of queries like -:
table_name = 'test'
quries = [
'SET FOREIGN_KEY_CHECKS=0;',
'DROP TABLE IF EXISTS {};'.format(table_name),
'SET FOREIGN_KEY_CHECKS=1;',
'CREATE TABLE {} select * from mytable;'.format(table_name),
]
for query in quries:
result = self.cursor.execute(query)
# Do operation with result
Suggestion 2-:
Set with dict. [you can also make this by executemany for recursive parameters for some special cases.]
quries = [
{'DROP TABLE IF EXISTS %(table_name);':{'table_name': 'student'}},
{'CREATE TABLE %(table_name) select * from mytable;':
{'table_name':'teacher'}},
{'SET FOREIGN_KEY_CHECKS=0;': ''}
]
for data in quries:
for query, parameter in data.iteritems():
if parameter == '':
result = self.cursor.execute(query)
# Do something with result
else:
result = self.cursor.execute(query, parameter)
# Do something with result
You can also use split with script. Not recommended
with connection.cursor() as cursor:
for statement in script.split(';'):
if len(statement) > 0:
cursor.execute(statement + ';')
Note -: I use mostly list of query approach but in some complex place use make dictionary approach.
Beauty is in the eye of the beholder, so the best way to do something is subjective unless you explicitly tell us how to measure is. There are three hypothetical options I can see:
Use the multi option of MySQLCursor (not ideal)
Keep the query in multiple rows
Keep the query in a single row
Optionally, you can also change the query around to avoid some unnecessary work.
Regarding the multi option the MySQL documentation is quite clear on this
If multi is set to True, execute() is able to execute multiple statements specified in the operation string. It returns an iterator that enables processing the result of each statement. However, using parameters does not work well in this case, and it is usually a good idea to execute each statement on its own.
Regarding option 2. and 3. it is purely a preference on how you would like to view your code. Recall that a connection object has autocommit=FALSE by default, so the cursor actually batches cursor.execute(...) calls into a single transaction. In other words, both versions below are equivalent.
self.cursor.execute('SET FOREIGN_KEY_CHECKS=0;')
self.cursor.execute('DROP TABLE IF EXISTS %s;' % (table_name,))
self.cursor.execute('SET FOREIGN_KEY_CHECKS=1;')
self.cursor.execute('CREATE TABLE %s select * from mytable;' % (table_name,))
vs
self.cursor.execute(
'SET FOREIGN_KEY_CHECKS=0;'
'DROP TABLE IF EXISTS %s;' % (table_name,)
'SET FOREIGN_KEY_CHECKS=1;'
'CREATE TABLE %s select * from mytable;' % (table_name,)
)
Python 3.6 introduced f-strings that are super elegant and you should use them if you can. :)
self.cursor.execute(
'SET FOREIGN_KEY_CHECKS=0;'
f'DROP TABLE IF EXISTS {table_name};'
'SET FOREIGN_KEY_CHECKS=1;'
f'CREATE TABLE {table_name} select * from mytable;'
)
Note that this no longer holds when you start to manipulate rows; in this case, it becomes query specific and you should profile if relevant. A related SO question is What is faster, one big query or many small queries?
Finally, it may be more elegant to use TRUNCATE instead of DROP TABLE unless you have specific reasons not to.
self.cursor.execute(
f'CREATE TABLE IF NOT EXISTS {table_name};'
'SET FOREIGN_KEY_CHECKS=0;'
f'TRUNCATE TABLE {table_name};'
'SET FOREIGN_KEY_CHECKS=1;'
f'INSERT INTO {table_name} SELECT * FROM mytable;'
)
Look at the documentation for MySQLCursor.execute().
It claims that you can pass in a multi parameter that allows you to run multiple queries in one string.
If multi is set to True, execute() is able to execute multiple statements specified in the operation string.
multi is an optional second parameter to the execute() call:
operation = 'SELECT 1; INSERT INTO t1 VALUES (); SELECT 2'
for result in cursor.execute(operation, multi=True):
With import mysql.connector
you can do following command, just need to replace t1 and episodes, with your own tabaes
tablename= "t1"
mycursor.execute("SET FOREIGN_KEY_CHECKS=0; DROP TABLE IF EXISTS {}; SET FOREIGN_KEY_CHECKS=1;CREATE TABLE {} select * from episodes;".format(tablename, tablename),multi=True)
While this will run, you must be sure that the foreign key restraints that will be in effect after enabling it, will not cause problems.
if tablename is something that a user can enter, you should think about a whitelist of table names.
Prepared statemnts don't work with table and column names , so we have to use string replacement to get the correct tablenames at the right posistion, bit this will make your code vulnerable to sql injection
The multi=True is necessary to run 4 commands in the connector, when i tested it, the debugger demanded it.
executescript()
This is a convenience method for executing multiple SQL statements at once. It executes the SQL script it gets as a parameter.
Syntax:
sqlite3.connect.executescript(script)
Example code:
import sqlite3
# Connection with the DataBase
# 'library.db'
connection = sqlite3.connect("library.db")
cursor = connection.cursor()
# SQL piece of code Executed
# SQL piece of code Executed
cursor.executescript("""
CREATE TABLE people(
firstname,
lastname,
age
);
CREATE TABLE book(
title,
author,
published
);
INSERT INTO
book(title, author, published)
VALUES (
'Dan Clarke''s GFG Detective Agency',
'Sean Simpsons',
1987
);
""")
sql = """
SELECT COUNT(*) FROM book;"""
cursor.execute(sql)
# The output in fetched and returned
# as a List by fetchall()
result = cursor.fetchall()
print(result)
sql = """
SELECT * FROM book;"""
cursor.execute(sql)
result = cursor.fetchall()
print(result)
# Changes saved into database
connection.commit()
# Connection closed(broken)
# with DataBase
connection.close()
Output:
[(1,)]
[("Dan Clarke's GFG Detective Agency", 'Sean Simpsons', 1987)]
executemany()
It is often the case when, large amount of data has to be inserted into database from Data Files(for simpler case take Lists, arrays). It would be simple to iterate the code many a times than write every time, each line into database. But the use of loop would not be suitable in this case, the below example shows why. Syntax and use of executemany() is explained below and how it can be used like a loop:
Source: GeeksForGeeks: SQL Using Python
Check out this source.. this has lots of great stuff for you.
All the answers are completely valid so I'd add my solution with static typing and closing context manager.
from contextlib import closing
from typing import List
import mysql.connector
import logging
logger = logging.getLogger(__name__)
def execute(stmts: List[str]) -> None:
logger.info("Starting daily execution")
with closing(mysql.connector.connect()) as connection:
try:
with closing(connection.cursor()) as cursor:
cursor.execute(' ; '.join(stmts), multi=True)
except Exception:
logger.exception("Rollbacking changes")
connection.rollback()
raise
else:
logger.info("Finished successfully")
If I'm not mistaken connection or cursor might not be a context manager, depending on the version of mysql driver you're having, so that's a pythonic safe solution.

Improve performance SQL query in Python Cx_Oracle

I actually use Cx_Oracle library in Python to work with my database Oracle.
import cx_Oracle as Cx
# Parameters for server connexion
dsn_tns = Cx.makedsn(_ip, _port, service_name=_service_name)
# Connexion with Oracle Database
db = Cx.connect(_user, _password, dsn_tns)
# Obtain a cursor for make SQL query
cursor = db.cursor()
One of my query write in an INSERT of a Python dataframe into my Oracle target table among some conditions.
query = INSERT INTO ORA_TABLE(ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 != 'NF' AND :1 NOT IN (SELECT ID1 FROM ORA_TABLE))
OR (:1 = 'NF' AND :2 NOT IN (SELECT ID2 FROM ORA_TABLE))
The goal of this query is to write only rows who respect conditions into the WHERE.
Actually ,this query works well when my Oracle target table have few rows. But, if my target Oracle table have more than 100 000 rows, it's very slow because I read through all the table in WHERE condition.
Is there a way to improve performance of this query with join or something else ?
End of code :
# SQL query incoming
cursor.prepare(query)
# Launch query with Python dataset
cursor.executemany(None, _py_table.values.tolist())
# Commit changes into Oracle database
db.commit()
# Close the cursor
cursor.close()
# Close the server connexion
db.close()
Here is a possible solution that could help: The sql that you have has an OR condition and only one part of this condition will be true for a given value. So I would divide it in two parts by checking the following in the code and constructing two inserts instead of one and at any point of time, only one would execute:
IF :1 != 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:1 NOT IN (SELECT ID1
FROM ORA_TABLE));
and IF :1 = 'NF' then use the following insert:
INSERT INTO ORA_TABLE (ID1, ID2)
SELECT :1, :2
FROM DUAL
WHERE (:2 NOT IN (SELECT ID2
FROM ORA_TABLE));
So you check in code what is the value of :1 and depending on that use the two simplified insert. Please check if this is functionally the same as original query and verify if it improves the response time.
Assuming Pandas, consider exporting your data as a table to be used as staging for final migration where you run your subquery only once and not for every row of data set. In Pandas, you would need to interface with sqlalchemy to run the to_sql export operation. Note: this assumes your connected user has such DROP TABLE and CREATE TABLE privileges.
Also, consider using EXISTS subquery to combine both IN subqueries. Below subquery attempts to run opposite of your logic for exclusion.
import sqlalchemy
...
engine = sqlalchemy.create_engine("oracle+cx_oracle://user:password#dsn")
# EXPORT DATA -ALWAYS REPLACING
pandas_df.to_sql('myTempTable', con=engine, if_exists='replace')
# RUN TRANSACTION
with engine.begin() as cn:
sql = """INSERT INTO ORA_TABLE (ID1, ID2)
SELECT t.ID1, t.ID2
FROM myTempTable t
WHERE EXISTS
(
SELECT 1 FROM ORA_TABLE sub
WHERE (t.ID1 != 'NF' AND t.ID1 = sub.ID1)
OR (t.ID1 = 'NF' AND t.ID2 = sub.ID2)
)
"""
cn.execute(sql)

Python function to select records from a mysql table

I created a table with 100 records and now i want to generate a python function which takes a number and the table name and pulls records from the referenced table. ie. function should selects 'n' records from specified table.
I have already stated querying the database using python scripts. I can do a regular select but every time i want to select i have to edit the query. Is there a way for me to write a function in python that would take two parameters; eg.(n, table) that will allow me to select n records from any table in my database?
Is this possible and if it is where should i start?
You can use below function
def query_db( no_of_rows, table_name ):
cur = db.cursor()
query = """
select top %d rows from %s
""" %( int(no_of_rows), table_name )
cur.execute(query)
for row in cur.fetchall() :
print row[0]
Is this what you want, or am i missing something?

How to show available tables in monetdb using python api?

When using mclient it is possible to list all tables in database by issuing command '\d'. I'm using python-monetdb package and I don't know how the same can be accomplished. I've seen example like "SELECT * FROM TABLES;" but I get an error that "tables" table does not exist.
In your query you need to specify that you are looking for the tables table that belongs to the default sys schema, or sys.tables. The SQL query that returns the names of all non-system tables in MonetDB is:
SELECT t.name FROM sys.tables t WHERE t.system=false
In Python this should look something like:
import monetdb.sql
connection = monetdb.sql.connect(username='<username>', password='<password>', hostname='<hostname>', port=50000, database='<database>')
cursor = connection.cursor()
cursor.execute('SELECT t.name FROM sys.tables t WHERE t.system=false')
If you are looking for tables only in a specific schema, you will need to extend your query, specifying the schema:
SELECT t.name FROM sys.tables t WHERE t.system=false AND t.schema_id IN (SELECT s.id FROM sys.schemas s WHERE name = '<schema-name>')
where the <schema-name> is your schema, surrounded by single quotes.

Getting the table name to select on from the user

My problem is I have a SELECT query but the table to select the data from needs to be specified by the user, from an HTML file. Can anyone suggest a way to do this?
I am querying a postgres database and the SQL queries are in a python file.
Create a variable table_name and verify it contains only characters allowed in a table name. Then put it into the SQL query:
sql = "SELECT ... FROM {} WHERE ...".format(table_name) # replace ... with real sql
If you don't verify it and the user sends something nasty, you run into risk.

Categories

Resources