Dump data from malformed SQLite in Python - python

I have a malformed database. When I try to get records from any of two tables, it throws an exception:
DatabaseError: database disk image is malformed
I know that through commandline I can do this:
sqlite3 ".dump" base.db | sqlite3 new.db
Can I do something like this from within Python?

As far as i know you cannot do that (alas, i might be mistaken), because the sqlite3 module for python is very limited.
Only workaround i can think of involves calling the os command shell (e.g. terminal, cmd, ...) (more info) via pythons call-command:
Combine it with the info from here to do something like this:
This is done on an windows xp machine:
Unfortunately i can't test it on a unix machine right now - hope it will help you:
from subprocess import check_call
def sqliterepair():
check_call(["sqlite3", "C:/sqlite-tools/base.db", ".mode insert", ".output C:/sqlite-tools/dump_all.sql", ".dump", ".exit"])
check_call(["sqlite3", "C:/sqlite-tools/new.db", ".read C:/sqlite-tools/dump_all.sql", ".exit"])
return
The first argument is calling the sqlite3.exe. Because it is in my system path variable, i don't need to specify the path or the suffix ".exe".
The other arguments are chained into the sqlite3-shell.
Note that the argument ".exit" is required so the sqlite-shell will exit. Otherwise the check_call() will never complete because the outer cmd-shell or terminal will be in suspended.
Of course the dump-file should be removed afterwards...
EDIT: Much shorter solution (credit goes to OP (see comment))
os.system("sqlite3 C:/sqlite-tools/base.db .dump | sqlite3 C:/sqlite-tools/target.db")
Just tested this: it works. Apparently i was wrong in the comments.

If I understood properly, what you want is to duplicate an sqlite3 database in python. Here is how I would do it:
# oldDB = path to the corrupted db,
# newDB = path to the new db
def duplicateDB(oldDB, newDB):
con = sqlite3.connect(oldDB)
script = ''.join(con.iterdump())
con.close()
con = sqlite3.connect(newDB)
con.executescript(script)
con.close()
print "duplicated %s into %s" % (oldDB,newDB)
In your example, call duplicateDB('base.db', 'new.db'). The iterdump function is equivalent to dump.
Note that if you use Python 3, you will need to change the print statement.

Related

Import Excel file into MSSQL via Python, using an SQL Agent job

Task specs:
Import Excel file(s) into MSSQL database(s) using Python, but in a parametrized manner, and using SQL Server Agent job(s).
With the added requirement to set parameter values and/or run the job steps from SQL (query or SP).
And without using Access Database Engine(s) and/or any code that makes use of such drivers (in any wrapping).
First. Let's get some preparatory stuff out of the way.
We will need to set some PowerShell settings.
Run windows PowerShell as Administrator and do:
Set-ExecutionPolicy -ExecutionPolicy Unrestricted
Second. Some assumptions for reasons of clarity.
And those are:
1a. You have at least one instance of SQL2017 or later (Developer / Enterprise / Standard edition) installed and running on your machine.
1b. You have not bootstrapped the installation of this SQL instance so as to exclude Integration Services (SSIS).
1c. There exists SQL Server Agent running, bound to this SQL instance.
1d. You have some SSMS installed.
2a. There is at least one database attached to this instance (if not create one – please refrain from using in-memory filegroups for this exercise, I have not tested on those).
2b. There are no database level DML triggers that log all data changes in a designated table.
3. There is no active Server Audit Specification for this database logging everything we do.
4. Replication is not enabled (I mean the proper MSSQL replication feature not like scripts by 3rd party apps).
For 2b and 3 it's just cause I have not tested this with those on, but for number 4 it defo won't work with that on.
5. You are windows authenticated into the chosen SQL instance and your instance login and db mappings and privileges are sufficient for at least table creation and basic stuff.
Third.
We are going to need some kind of Python script to do this right?
Ok let's make one.
import pandas as pd
import sqlalchemy as sa
import urllib
import sys
import warnings
import os
import re
import time
#COMMAND LINE PARAMETERS
server = sys.argv[1]
database = sys.argv[2]
ExcelFileHolder = sys.argv[3]
SQLTableName = sys.argv[4]
#END OF COMMAND LINE PARAMETERS
excel_sheet_number_left_to_right = 0
warnings.filterwarnings('ignore')
driver = "SQL Server Native Client 11.0"
params = "DRIVER={%s};SERVER=%s;DATABASE=%s;Trusted_Connection=yes;QuotedID=Yes;" % (driver, server, database) #added the explicit "QuotedID=Yes;" to ensure no issues with column names
params = urllib.parse.quote_plus(params) #urllib.parse.quote_plus for Python 3
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect=%s?charset=utf8" % params) #charset is cool to have here
conn = engine.connect()
def execute_sql_trans(sql_string, log_entry):
with conn.begin() as trans:
result = conn.execute(sql_string)
if len(log_entry) >= 1:
log.write(log_entry + "\n")
return result
excelfilesCursor = {}
def process_excel_file(excelfile, excel_sheet_name, tableName, withPyIndexOrSQLIndex, orderByCandidateFields):
withPyIndexOrSQLIndex = 0
excelfilesCursor.update({tableName: withPyIndexOrSQLIndex})
df = pd.read_excel(open(excelfile,'rb'), sheet_name=excel_sheet_name)
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Reading file \"" + excelfile + "\" to memory: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
df.to_sql(tableName, engine, if_exists='replace', index_label='index.py')
now = time.time()
mlsec = repr(now).split('.')[1][:3]
log_string = "Writing file \"" + excelfile + "\", sheet " +str(excel_sheet_name)+ " to SQL instance " +server+ ", into ["+database+"].[dbo].["+tableName+"]: " + str(time.strftime("%Y-%m-%d %H:%M:%S.{} %Z".format(mlsec), time.localtime(now))) + "\n"
print(log_string)
def convert_datetimes_to_dates(tableNameParam):
sql_string = "exec [convert_datetimes_to_dates] '"+tableNameParam+"';"
execute_sql_trans(sql_string, "")
process_excel_file(ExcelFileHolder, excel_sheet_number_left_to_right, SQLTableName, 0, None)
sys.exit(0)
Ok you may or may not notice that my script contains some extra defs, I sometimes use them for convenience you may as well ignore them.
Save the python script somewhere nice say C:\PythonWorkspace\ExcelPythonToSQL.py
Also, needless to mention that you will need some py modules in your venv. The ones you don't already have you need to pip install them obviously.
Fourth.
Connect to your db, SSMS, etc. and create a new Agent job.
Let's call it "ExcelPythonToSQL".
New step, let's call it "PowerShell parametrized script".
Set the Type to PowerShell.
And place this code inside it:
$pyFile="C:\PythonWorkspace\ExcelPythonToSQL.py"
$SQLInstance="SomeMachineName\SomeNamedSQLInstance"
#or . or just the computername or localhost if your SQL instance is a default instance i.e. not a named one.
$dbName="SomeDatabase"
$ExcelFileFullPath="C:\Temp\ExampleExcelFile.xlsx"
$targetTableName="somenewtable"
C:\ProgramData\Miniconda3\envs\YOURVENVNAMEHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName
Save the step and the job.
Now let's wrap it around something easier to handle. Because remember, this job and step is not like an SSIS step where you could potentially alter the parameter values in its configuration tab. You don't want to properties the job and the step each time and specify different excel file or target table.
So.
Ah also, do me a solid and do this little trick. Do a small alteration in the code, anything and then instead of OK do a Script to New Query Window. That way we can capture the guid of the job without having to query for it.
So now.
Create a SP like so:
use [YourDatabase];
GO
create proc [ExcelPythonToSQL_amend_job_step_params]( #pyFile nvarchar(max),
#SQLInstance nvarchar(max),
#dbName nvarchar(max),
#ExcelFileFullPath nvarchar(max),
#targetTableName nvarchar(max)='somenewtable'
)
as
begin
declare #sql nvarchar(max);
set #sql = '
exec msdb.dbo.sp_update_jobstep #job_id=N''7f6ff378-56cd-4a8d-ba40-e9057439a5bc'', #step_id=1,
#command=N''
$pyFile="'+#pyFile+'"
$SQLInstance="'+#SQLInstance+'"
$dbName="'+#dbName+'"
$ExcelFileFullPath="'+#ExcelFileFullPath+'"
$targetTableName="'+#targetTableName+'"
C:\ProgramData\Miniconda3\envs\YOURVENVGOESHERE\python $pyFile $SQLInstance $dbName $ExcelFileFullPath $targetTableName''
';
--print #sql;
exec sp_executesql #sql;
end
But inside it you must replace 2 things. One, the global uniqueidentifier for the Agent job that you found by doing the trick I described earlier, yes the one with the script to new query window. Two, you must fill in the name of your Python venv replacing the word YOURVENVGOESHERE in the code.
Cool.
Now, with a simple script we can play-test it.
Let's have in a new query window something like this:
use [YourDatabase];
GO
--to set parameters
exec [ExcelPythonToSQL_amend_job_step_params] #pyFile='C:\PythonWorkspace\ExcelPythonToSQL.py',
#SQLInstance='.',
#dbName='YourDatabase',
#ExcelFileFullPath='C:\Temp\ExampleExcelFile.xlsx',
#targetTableName='somenewtable';
--to execute the job
exec msdb.dbo.sp_start_job N'ExcelPythonToSQL', #step_name = N'PowerShell parametrized script';
--let's test that the table is there and drop it.
if object_id('YourDatabase..somenewtable') is not null
begin
select 'Table was here!' [test: table exists?];
drop table [somenewtable];
end
else select 'NADA!' [test: table exists?];
You can run the set parameters part, then the execution, carefull to then wait a little bit like a few seconds, calling the sp_start_job like in this script is asynchronous. And then run the test script to clean up and make sure it had gone in.
That's it.
Obviously lots of variations are possible.
Like in the job step, we could instead call a batch file, we could call a powershell .ps1 file and have the parameters in there, lots and lots of other ways of doing it. I merely described one in this post.

Using Jenkins variables/parameters in Python Script with os.path.join

I'm trying to learn how to use variables from Jenkins in Python scripts. I've already learned that I need to call the variables, but I'm not sure how to implement them in the case of using os.path.join().
I'm not a developer; I'm a technical writer. This code was written by somebody else. I'm just trying to adapt the Jenkins scripts so they are parameterized so we don't have to modify the Python scripts for every release.
I'm using inline Jenkins python scripts inside a Jenkins job. The Jenkins string parameters are "BranchID" and "BranchIDShort". I've looked through many questions that talk about how you have to establish the variables in the Python script, but with the case of os.path.join(),I'm not sure what to do.
Here is the original code. I added the part where we establish the variables from the Jenkins parameters, but I don't know how to use them in the os.path.join() function.
# Delete previous builds.
import os
import shutil
BranchID = os.getenv("BranchID")
BranchIDshort = os.getenv("BranchIDshort")
print "Delete any output from a previous build."
if os.path.exists(os.path.join("C:\\Doc192CS", "Output")):
shutil.rmtree(os.path.join("C:\\Doc192CS", "Output"))
I expect output like: c:\Doc192CS\Output
I am afraid that if I do the following code:
if os.path.exists(os.path.join("C:\\Doc",BranchIDshort,"CS", "Output")):
shutil.rmtree(os.path.join("C:\\Doc",BranchIDshort,"CS", "Output"))
I'll get: c:\Doc\192\CS\Output.
Is there a way to use the BranchIDshort variable in this context to get the output c:\Doc192CS\Output?
User #Adonis gave the correct solution as a comment. Here is what he said:
Indeed you're right. What you would want to do is rather:
os.path.exists(os.path.join("C:\\","Doc{}CS".format(BranchIDshort),"Output"))
(in short use a format string for the 2nd argument)
So the complete corrected code is:
import os
import shutil
BranchID = os.getenv("BranchID")
BranchIDshort = os.getenv("BranchIDshort")
print "Delete any output from a previous build."
if os.path.exists(os.path.join("C:\\Doc{}CS".format(BranchIDshort), "Output")):
shutil.rmtree(os.path.join("C:\\Doc{}CS".format(BranchIDshort), "Output"))
Thank you, #Adonis!

pymssql - Get formatted sql instead of executing

See cousin post: psycopg - Get formatted sql instead of executing
I need to transition some code from Postgres to MS SQL Server. I have been using psycopg2 in Python to do all database calls. I have found a simlar library in pymssql that actually has a very similar API.
One thing that is missing is the mogrify call. In short, mogrify prevents SQL injection but does so without executing. Great for building up a SQL string.
Is there a call that is similar to the mogrify call in pymssql? If not, is there anohter Python library that does have a mogrify-like call? If I cannot find anything, I will transition my code to use the execute/executemany calls, but I'd prefer to avoid that if at all possible.
The function substitute_params is exported in the _mssql module. Example usage:
>>> import pymssql
>>> print pymssql._mssql.substitute_params("SELECT * FROM foo WHERE a = %s", ("quoted ' string",))
SELECT * FROM foo WHERE a = 'quoted '' string'

Print the actual query MySQLdb runs?

I'm looking for a way to debug queries as they are executed and I was wondering if there is a way to have MySQLdb print out the actual query that it runs, after it has finished inserting the parameters and all that? From the documentation, it seems as if there is supposed to be a Cursor.info() call that will give information about the last query run, but this does not exist on my version (1.2.2).
This seems like an obvious question, but for all my searching I haven't been able to find the answer.
We found an attribute on the cursor object called cursor._last_executed that holds the last query string to run even when an exception occurs. This was easier and better for us in production than using profiling all the time or MySQL query logging as both of those have a performance impact and involve more code or more correlating separate log files, etc.
Hate to answer my own question but this is working better for us.
You can print the last executed query with the cursor attribute _last_executed:
try:
cursor.execute(sql, (arg1, arg2))
connection.commit()
except:
print(cursor._last_executed)
raise
Currently, there is a discussion how to get this as a real feature in pymysql (see pymysql issue #330: Add mogrify to Cursor, which returns the exact string to be executed; pymysql should be used instead of MySQLdb)
edit: I didn't test it by now, but this commit indicates that the following code might work:
cursor.mogrify(sql, (arg1, arg2))
For me / for now _last_executed doesn't work anymore. In the current version you want to access
cursor.statement.
see: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
For mysql.connector:
cursor.statement
https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
cursor.statement and cursor._last_executed raised AttributeError exception
cursor._executed
worked for me!
One way to do it is to turn on profiling:
cursor.execute('set profiling = 1')
try:
cursor.execute('SELECT * FROM blah where foo = %s',[11])
except Exception:
cursor.execute('show profiles')
for row in cursor:
print(row)
cursor.execute('set profiling = 0')
yields
(1L, 0.000154, 'SELECT * FROM blah where foo = 11')
Notice the argument(s) were inserted into the query, and that the query was logged even though the query failed.
Another way is to start the server with logging turned on:
sudo invoke-rc.d mysql stop
sudo mysqld --log=/tmp/myquery.log
Then you have to sift through /tmp/myquery.log to find out what the server received.
I've had luck with cursor._last_executed generally speaking, but it doesn't work correctly when used with cursor.executemany(). That drops all but the last statement. Here's basically what I use now in that instance instead (based on tweaks from the actual MySQLDb cursor source):
def toSqlResolvedList( cursor, sql, dynamicValues ):
sqlList=[]
try:
db = cursor._get_db()
if isinstance( sql, unicode ):
sql = sql.encode( db.character_set_name() )
for values in dynamicValues :
sqlList.append( sql % db.literal( values ) )
except: pass
return sqlList
This read-only property returns the last executed statement as a string. The statement property can be useful for debugging and displaying what was sent to the MySQL server.
The string can contain multiple statements if a multiple-statement string was executed. This occurs for execute() with multi=True. In this case, the statement property contains the entire statement string and the execute() call returns an iterator that can be used to process results from the individual statements. The statement property for this iterator shows statement strings for the individual statements.
str = cursor.statement
source: https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-statement.html
I can't say I've ever seen
Cursor.info()
In the documentation, and I can't find it after a few minutes searching. Maybe you saw some old documentation?
In the mean time you can always turn on MySQL Query Logging and have a look at the server's log files.
assume that your sql is like select * from table1 where 'name' = %s
from _mysql import escape
from MySQLdb.converters import conversions
actual_query = sql % tuple((escape(item, conversions) for item in parameters))

How can I add a command to the Python interactive shell?

I'm trying to save myself just a few keystrokes for a command I type fairly regularly in Python.
In my python startup script, I define a function called load which is similar to import, but adds some functionality. It takes a single string:
def load(s):
# Do some stuff
return something
In order to call this function I have to type
>>> load('something')
I would rather be able to simply type:
>>> load something
I am running Python with readline support, so I know there exists some programmability there, but I don't know if this sort of thing is possible using it.
I attempted to get around this by using the InteractivConsole and creating an instance of it in my startup file, like so:
import code, re, traceback
class LoadingInteractiveConsole(code.InteractiveConsole):
def raw_input(self, prompt = ""):
s = raw_input(prompt)
match = re.match('^load\s+(.+)', s)
if match:
module = match.group(1)
try:
load(module)
print "Loaded " + module
except ImportError:
traceback.print_exc()
return ''
else:
return s
console = LoadingInteractiveConsole()
console.interact("")
This works with the caveat that I have to hit Ctrl-D twice to exit the python interpreter: once to get out of my custom console, once to get out of the real one.
Is there a way to do this without writing a custom C program and embedding the interpreter into it?
Edit
Out of channel, I had the suggestion of appending this to the end of my startup file:
import sys
sys.exit()
It works well enough, but I'm still interested in alternative solutions.
You could try ipython - which gives a python shell which does allow many things including automatic parentheses which gives you the function call as you requested.
I think you want the cmd module.
See a tutorial here:
http://wiki.python.org/moin/CmdModule
Hate to answer my own question, but there hasn't been an answer that works for all the versions of Python I use. Aside from the solution I posted in my question edit (which is what I'm now using), here's another:
Edit .bashrc to contain the following lines:
alias python3='python3 ~/py/shellreplace.py'
alias python='python ~/py/shellreplace.py'
alias python27='python27 ~/py/shellreplace.py'
Then simply move all of the LoadingInteractiveConsole code into the file ~/py/shellreplace.py Once the script finishes executing, python will cease executing, and the improved interactive session will be seamless.

Categories

Resources