See cousin post: psycopg - Get formatted sql instead of executing
I need to transition some code from Postgres to MS SQL Server. I have been using psycopg2 in Python to do all database calls. I have found a simlar library in pymssql that actually has a very similar API.
One thing that is missing is the mogrify call. In short, mogrify prevents SQL injection but does so without executing. Great for building up a SQL string.
Is there a call that is similar to the mogrify call in pymssql? If not, is there anohter Python library that does have a mogrify-like call? If I cannot find anything, I will transition my code to use the execute/executemany calls, but I'd prefer to avoid that if at all possible.
The function substitute_params is exported in the _mssql module. Example usage:
>>> import pymssql
>>> print pymssql._mssql.substitute_params("SELECT * FROM foo WHERE a = %s", ("quoted ' string",))
SELECT * FROM foo WHERE a = 'quoted '' string'
Related
I'm using Sqlite3 database in my Python application and query it using parameters substitution.
For example:
cursor.execute('SELECT * FROM table WHERE id > ?', (10,))
Some queries do not return results properly and I would like to log them and try to query sqlite manually.
How can I log these queries with parameters instead of question marks?
Python 3.3 has sqlite3.Connection.set_trace_callback:
import sqlite3
connection = sqlite3.connect(':memory:')
connection.set_trace_callback(print)
The function you provide as argument gets called for every SQL statement that is executed through that particular Connection object. Instead of print, you may want to use a function from the logging module.
Assuming that you have a log function, you could call it first :
query, param = 'SELECT * FROM table WHERE id > ?', (10,)
log(query.replace('?', '%s') % param)
cursor.execute(query, param)
So you don't modify your query at all.
Moreover, this is not Sqlite specific.
Task specs:
Import Excel file(s) into MSSQL database(s) using Python, but in a parametrized manner, and using SQL Server Agent job(s).
With the added requirement to set parameter values and/or run the job steps from SQL (query or SP).
And without using Access Database Engine(s) and/or any code that makes use of such drivers (in any wrapping).
First. Let's get some preparatory stuff out of the way.
We will need to set some PowerShell settings.
Run windows PowerShell as Administrator and do:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
Second. Some assumptions for reasons of clarity.
And those are:
1a. You have at least one instance of SQL2017 or later (Developer / Enterprise / Standard edition) installed and running on your machine.
1b. You have not bootstrapped the installation of this SQL instance so as to exclude Integration Services (SSIS).
1c. There exists SQL Server Agent running, bound to this SQL instance.
1d. You have some SSMS installed.
2a. There is at least one database attached to this instance (if not create one – please refrain from using in-memory filegroups for this exercise, I have not tested on those).
2b. There are no database level DML triggers that log all data changes in a designated table.
3. There is no active Server Audit Specification for this database logging everything we do.
4. Replication is not enabled (I mean the proper MSSQL replication feature not like scripts by 3rd party apps).
For 2b and 3 it's just cause I have not tested this with those on, but for number 4 it defo won't work with that on.
5. You are windows authenticated into the chosen SQL instance and your instance login and db mappings and privileges are sufficient for at least table creation and basic stuff.
Third.
We are going to need some kind of Python script to do this right?
Ok let's make one.
import pandas as pd
import sqlalchemy as sa
import urllib
import sys
import warnings
import os
import re
import time
#COMMAND LINE PARAMETERS
server = sys.argv[1]
database = sys.argv[2]
ExcelFileHolder = sys.argv[3]
SQLTableName = sys.argv[4]
#END OF COMMAND LINE PARAMETERS
excel_sheet_number_left_to_right = 0
warnings.filterwarnings('ignore')
driver = "SQL Server Native Client 11.0"
params = "DRIVER={%s};SERVER=%s;DATABASE=%s;Trusted_Connection=yes;QuotedID=Yes;" % (driver, server, database) #added the explicit "QuotedID=Yes;" to ensure no issues with column names
params = urllib.parse.quote_plus(params) #urllib.parse.quote_plus for Python 3
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s?charset=utf8" % params) #charset is cool to have here
conn = engine.connect()
def execute_sql_trans(sql_string, log_entry):
with conn.begin() as trans:
result = conn.execute(sql_string)
if len(log_entry) >= 1:
log.write(log_entry + "\n")
return result
excelfilesCursor = {}
def process_excel_file(excelfile, excel_sheet_name, tableName, withPyIndexOrSQLIndex, orderByCandidateFields):
withPyIndexOrSQLIndex = 0
excelfilesCursor.update({tableName: withPyIndexOrSQLIndex})
df = pd.read_excel(open(excelfile,'rb'), sheet_name=excel_sheet_name)
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Reading file \"" + excelfile + "\" to memory: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
df.to_sql(tableName, engine, if_exists='replace', index_label='index.py')
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Writing file \"" + excelfile + "\", sheet " +str(excel_sheet_name)+ " to SQL instance " +server+ ", into ["+database+"].[dbo].["+tableName+"]: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
def convert_datetimes_to_dates(tableNameParam):
sql_string = "exec [convert_datetimes_to_dates] '"+tableNameParam+"';"
execute_sql_trans(sql_string, "")
process_excel_file(ExcelFileHolder, excel_sheet_number_left_to_right, SQLTableName, 0, None)
sys.exit(0)
Ok you may or may not notice that my script contains some extra defs, I sometimes use them for convenience you may as well ignore them.
Save the python script somewhere nice say C:\PythonWorkspace\ExcelPythonToSQL.py
Also, needless to mention that you will need some py modules in your venv. The ones you don't already have you need to pip install them obviously.
Fourth.
Connect to your db, SSMS, etc. and create a new Agent job.
Let's call it "ExcelPythonToSQL".
New step, let's call it "PowerShell parametrized script".
Set the Type to PowerShell.
And place this code inside it:
$pyFile="C:\PythonWorkspace\ExcelPythonToSQL.py"
$SQLInstance="SomeMachineName\SomeNamedSQLInstance"
#or . or just the computername or localhost if your SQL instance is a default instance i.e. not a named one.
$dbName="SomeDatabase"
$ExcelFileFullPath="C:\Temp\ExampleExcelFile.xlsx"
$targetTableName="somenewtable"
C:\ProgramData\Miniconda3\envs\YOURVENVNAMEHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName
Save the step and the job.
Now let's wrap it around something easier to handle. Because remember, this job and step is not like an SSIS step where you could potentially alter the parameter values in its configuration tab. You don't want to properties the job and the step each time and specify different excel file or target table.
So.
Ah also, do me a solid and do this little trick. Do a small alteration in the code, anything and then instead of OK do a Script to New Query Window. That way we can capture the guid of the job without having to query for it.
So now.
Create a SP like so:
use [YourDatabase];
GO
create proc [ExcelPythonToSQL_amend_job_step_params]( #pyFile nvarchar(max),
#SQLInstance nvarchar(max),
#dbName nvarchar(max),
#ExcelFileFullPath nvarchar(max),
#targetTableName nvarchar(max)='somenewtable'
)
as
begin
declare #sql nvarchar(max);
set #sql = '
exec msdb.dbo.sp_update_jobstep #job_id=N''7f6ff378-56cd-4a8d-ba40-e9057439a5bc'', #step_id=1,
#command=N''
$pyFile="'+#pyFile+'"
$SQLInstance="'+#SQLInstance+'"
$dbName="'+#dbName+'"
$ExcelFileFullPath="'+#ExcelFileFullPath+'"
$targetTableName="'+#targetTableName+'"
C:\ProgramData\Miniconda3\envs\YOURVENVGOESHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName''
';
--print #sql;
exec sp_executesql #sql;
end
But inside it you must replace 2 things. One, the global uniqueidentifier for the Agent job that you found by doing the trick I described earlier, yes the one with the script to new query window. Two, you must fill in the name of your Python venv replacing the word YOURVENVGOESHERE in the code.
Cool.
Now, with a simple script we can play-test it.
Let's have in a new query window something like this:
use [YourDatabase];
GO
--to set parameters
exec [ExcelPythonToSQL_amend_job_step_params] #pyFile='C:\PythonWorkspace\ExcelPythonToSQL.py',
#SQLInstance='.',
#dbName='YourDatabase',
#ExcelFileFullPath='C:\Temp\ExampleExcelFile.xlsx',
#targetTableName='somenewtable';
--to execute the job
exec msdb.dbo.sp_start_job N'ExcelPythonToSQL', #step_name = N'PowerShell parametrized script';
--let's test that the table is there and drop it.
if object_id('YourDatabase..somenewtable') is not null
begin
select 'Table was here!' [test: table exists?];
drop table [somenewtable];
end
else select 'NADA!' [test: table exists?];
You can run the set parameters part, then the execution, carefull to then wait a little bit like a few seconds, calling the sp_start_job like in this script is asynchronous. And then run the test script to clean up and make sure it had gone in.
That's it.
Obviously lots of variations are possible.
Like in the job step, we could instead call a batch file, we could call a powershell .ps1 file and have the parameters in there, lots and lots of other ways of doing it. I merely described one in this post.
Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.
I have a malformed database. When I try to get records from any of two tables, it throws an exception:
DatabaseError: database disk image is malformed
I know that through commandline I can do this:
sqlite3 ".dump" base.db | sqlite3 new.db
Can I do something like this from within Python?
As far as i know you cannot do that (alas, i might be mistaken), because the sqlite3 module for python is very limited.
Only workaround i can think of involves calling the os command shell (e.g. terminal, cmd, ...) (more info) via pythons call-command:
Combine it with the info from here to do something like this:
This is done on an windows xp machine:
Unfortunately i can't test it on a unix machine right now - hope it will help you:
from subprocess import check_call
def sqliterepair():
check_call(["sqlite3", "C:/sqlite-tools/base.db", ".mode insert", ".output C:/sqlite-tools/dump_all.sql", ".dump", ".exit"])
check_call(["sqlite3", "C:/sqlite-tools/new.db", ".read C:/sqlite-tools/dump_all.sql", ".exit"])
return
The first argument is calling the sqlite3.exe. Because it is in my system path variable, i don't need to specify the path or the suffix ".exe".
The other arguments are chained into the sqlite3-shell.
Note that the argument ".exit" is required so the sqlite-shell will exit. Otherwise the check_call() will never complete because the outer cmd-shell or terminal will be in suspended.
Of course the dump-file should be removed afterwards...
EDIT: Much shorter solution (credit goes to OP (see comment))
os.system("sqlite3 C:/sqlite-tools/base.db .dump | sqlite3 C:/sqlite-tools/target.db")
Just tested this: it works. Apparently i was wrong in the comments.
If I understood properly, what you want is to duplicate an sqlite3 database in python. Here is how I would do it:
# oldDB = path to the corrupted db,
# newDB = path to the new db
def duplicateDB(oldDB, newDB):
con = sqlite3.connect(oldDB)
script = ''.join(con.iterdump())
con.close()
con = sqlite3.connect(newDB)
con.executescript(script)
con.close()
print "duplicated %s into %s" % (oldDB,newDB)
In your example, call duplicateDB('base.db', 'new.db'). The iterdump function is equivalent to dump.
Note that if you use Python 3, you will need to change the print statement.
I'm looking for a way to debug queries as they are executed and I was wondering if there is a way to have MySQLdb print out the actual query that it runs, after it has finished inserting the parameters and all that? From the documentation, it seems as if there is supposed to be a Cursor.info() call that will give information about the last query run, but this does not exist on my version (1.2.2).
This seems like an obvious question, but for all my searching I haven't been able to find the answer.
We found an attribute on the cursor object called cursor._last_executed that holds the last query string to run even when an exception occurs. This was easier and better for us in production than using profiling all the time or MySQL query logging as both of those have a performance impact and involve more code or more correlating separate log files, etc.
Hate to answer my own question but this is working better for us.
You can print the last executed query with the cursor attribute _last_executed:
try:
cursor.execute(sql, (arg1, arg2))
connection.commit()
except:
print(cursor._last_executed)
raise
Currently, there is a discussion how to get this as a real feature in pymysql (see pymysql issue #330: Add mogrify to Cursor, which returns the exact string to be executed; pymysql should be used instead of MySQLdb)
edit: I didn't test it by now, but this commit indicates that the following code might work:
cursor.mogrify(sql, (arg1, arg2))
For me / for now _last_executed doesn't work anymore. In the current version you want to access
cursor.statement.
see: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
For mysql.connector:
cursor.statement
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
cursor.statement and cursor._last_executed raised AttributeError exception
cursor._executed
worked for me!
One way to do it is to turn on profiling:
cursor.execute('set profiling = 1')
try:
cursor.execute('SELECT * FROM blah where foo = %s',[11])
except Exception:
cursor.execute('show profiles')
for row in cursor:
print(row)
cursor.execute('set profiling = 0')
yields
(1L, 0.000154, 'SELECT * FROM blah where foo = 11')
Notice the argument(s) were inserted into the query, and that the query was logged even though the query failed.
Another way is to start the server with logging turned on:
sudo invoke-rc.d mysql stop
sudo mysqld --log=/tmp/myquery.log
Then you have to sift through /tmp/myquery.log to find out what the server received.
I've had luck with cursor._last_executed generally speaking, but it doesn't work correctly when used with cursor.executemany(). That drops all but the last statement. Here's basically what I use now in that instance instead (based on tweaks from the actual MySQLDb cursor source):
def toSqlResolvedList( cursor, sql, dynamicValues ):
sqlList=[]
try:
db = cursor._get_db()
if isinstance( sql, unicode ):
sql = sql.encode( db.character_set_name() )
for values in dynamicValues :
sqlList.append( sql % db.literal( values ) )
except: pass
return sqlList
This read-only property returns the last executed statement as a string. The statement property can be useful for debugging and displaying what was sent to the MySQL server.
The string can contain multiple statements if a multiple-statement string was executed. This occurs for execute() with multi=True. In this case, the statement property contains the entire statement string and the execute() call returns an iterator that can be used to process results from the individual statements. The statement property for this iterator shows statement strings for the individual statements.
str = cursor.statement
source: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
I can't say I've ever seen
Cursor.info()
In the documentation, and I can't find it after a few minutes searching. Maybe you saw some old documentation?
In the mean time you can always turn on MySQL Query Logging and have a look at the server's log files.
assume that your sql is like select * from table1 where 'name' = %s
from _mysql import escape
from MySQLdb.converters import conversions
actual_query = sql % tuple((escape(item, conversions) for item in parameters))