How to get tables from .mdb using meza? - python

I'm using meza to read .mdb (MSAccess database) files.
from meza import io
try:
self.data = list(io.read_mdb(self.db_path, table=self.table))
except TypeError as e:
raise
io.read_mdb returns generator object (if table param is specified it returns all rows from specified database if not from first). However - it also prints all table names to the console when I ran this code snippet.
QUESTION:
Is there a way how to fetch all table names with meza?
or
Is there a way how to catch 'unwanted' tables console output?
i tried this, but without success:
with open(here_path + os.sep + "temp.txt", "w") as f:
with redirect_stdout(f):
try:
x = list(io.read_mdb(path))
except TypeError as e:
raise
Then I would just read tables from file temp.txt
EDIT:
edit based on reubano answer:
def show_tables_linux(path):
tables = subprocess.check_output(["mdb-tables", path])
return tables.decode().split()
above function returns list of tables.

You'd be better off using the mdbtools command mdb-tables.
mdb-tables test.mdb

Related

Python win32com SaveAs2 error when I try to save file

def convert_to_word():
target = pwd + "/open.doc"
source = pwd + "/template.html"
pythoncom.CoInitialize()
app = win32com.client.Dispatch("Word.Application")
pythoncom.CoInitialize()
# try:
app.Documents.Open(source)
app.Documents.SaveAs2(target,FileFormat=0)
app.Documents.Open(source)
app.Selection.WholeStory()
app.Selection.Fields.Unlink()
app.Documents.Save()
# except Exception as e:
# print(e)
# finally:
app.ActiveDocument.Close()
I need to save html file to .doc, but it report a error <unknown>.SaveAs2 which I cant solve.
Can anyone help me ? Thanks
My first answer on stackoverflow ...
I had had same troubles then I have found solution:
Your code seems to be almost OK. You just need to store your opened document to a variable and then use SaveAs2 function on the variable.
my_doc = app.Documents.Open(source)
my_doc.SaveAs2(target, FileFormat=0)

How to skip one part of a single loop iteration in Python

I am creating about 200 variables within a single iteration of a python loop (extracting fields from excel documents and pushing them to a SQL database) and I am trying to figure something out.
Let's say that a single iteration is a single Excel workbook that I am looping through in a directory. I am extracting around 200 fields from each workbook.
If one of these fields I extract (lets say field #56 out of 200) and it isn't in proper format (lets say the date was filled out wrong ie. 9/31/2015 which isnt a real date) and it errors out with the operation I am performing.
I want the loop to skip that variable and proceed to creating variable #57. I don't want the loop to completely go to the next iteration or workbook, I just want it to ignore that error on that variable and continue with the rest of the variables for that single loop iteration.
How would I go about doing something like this?
In this sample code I would like to continue extracting "PolicyState" even if ExpirationDate has an error.
Some sample code:
import datetime as dt
import os as os
import xlrd as rd
files = os.listdir(path)
for file in files: #Loop through all files in path directory
filename = os.fsdecode(file)
if filename.startswith('~'):
continue
elif filename.endswith( ('.xlsx', '.xlsm') ):
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
PolicyState=str(SoldModelInfo.cell(1,6).value)
print("Insert data of " + file +" was successful")
else:
continue
Use multiple try blocks. Wrap each decode operation that might go wrong in its own try block to catch the exception, do something, and carry on with the next one.
try:
book = rd.open_workbook(os.path.join(path,file))
except KeyError:
print ("Error opening file for "+ file)
continue
errors = []
SoldModelInfo=book.sheet_by_name("SoldModelInfo")
AccountName=str(SoldModelInfo.cell(1,5).value)
try:
ExpirationDate=dt.datetime.strftime(xldate_to_datetime(SoldModelInfo.cell(1,7).value),'%Y-%m-%d')
except WhateverError as e:
# do something, maybe set a default date?
ExpirationDate = default_date
# and/or record that it went wrong?
errors.append( [ "ExpirationDate", e ])
PolicyState=str(SoldModelInfo.cell(1,6).value)
...
# at the end
if not errors:
print("Insert data of " + file +" was successful")
else:
# things went wrong somewhere above.
# the contents of errors will let you work out what
As suggested you could use multiple try blocks on each of your extract variable, or you could streamline it with your own custom function that handles the try for you:
from functools import reduce, partial
def try_funcs(cell, default, funcs):
try:
return reduce(lambda val, func: func(val), funcs, cell)
except Exception as e:
# do something with your Exception if necessary, like logging.
return default
# Usage:
AccountName = try_funcs(SoldModelInfo.cell(1,5).value, "some default str value", str)
ExpirationDate = try_funcs(SoldModelInfo.cell(1,7).value), "some default date", [xldate_to_datetime, partial(dt.datetime.strftime, '%Y-%m-%d')])
PolicyState = try_funcs(SoldModelInfo.cell(1,6).value, "some default str value", str)
Here we use reduce to repeat multiple functions, and pass partial as a frozen function with arguments.
This can help your code look tidy without cluttering up with lots of try blocks. But the better, more explicit way is just handle the fields you anticipate might error out individually.
So, basically you need to wrap your xldate_to_datetime() call into try ... except
import datetime as dt
v = SoldModelInfo.cell(1,7).value
try:
d = dt.datetime.strftime(xldate_to_datetime(v), '%Y-%m-%d')
except TypeError as e:
print('Could not parse "{}": {}'.format(v, e)

parsing .log files then sort in access

I'm writing a parsing program that that searches through 100+ .log files after some keyword, then puts the words in different array´s and separates the words in to columns in excel. Now I want to sort them in Access automatically so that I can process the different .log file combinations. I can "copy paste" from my Excel file to Access, but that so inefficient and gives some errors... I would like it to be "automatic". I'm new to Access and don´t know how to link from python to Access, I have tried doing it as I did to Excel but that didn't work and started looking in to OBDC but had some problems there to...
import glob # includes
import xlwt # includes
from os import listdir # includes
from os.path import isfile, join # includes
def logfile(filename, tester, createdate,completeresponse):
# Arrays for strings
response = []
message = []
test = []
date = []
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
for line in filesearch:
file = filename[39:] # extract filename [file]
for lines in filesearch:
if createdate in lines: # extract "Create Time" {date}
date.append(lines[15:34])
if completeresponse in lines:
response.append(lines[19:])
print('pending...')
i = 1 # set a number on log {i}
d = {}
for name in filename:
if not d.get(name, False):
d[name] = i
i += 1
if tester in line:
start = '-> '
end = ':\ ' # |<----------->|
number = line[line.find(start)+3: line.find(end)] #Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
test.append(number) # extract tester {test}
# |<--------------------------------------------
text = line[line.find(end)+3:] # Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
message.append(text)
with open('Excel.txt', 'a') as handler: # create .txt file
for i in range(len(message)):
# A B C D E
handler.write(f"{file}|{date[i]}|{i}|{test[i]}|{response[i]}")
# A = filename B = create time C = number in file D = tester E = Complete response
# open with 'w' to "reset" the file.
with open('Excel.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\Desktop\Access\*.log'):
logfile(filename, 'Sending Request: Tester ->', 'Create Time:','Complete Response:','Channel')
def if_number(s): # look if number or float
try:
float(s)
return True
except ValueError:
return False
# ----------------------------------------------
my_path = r"C:\Users\Desktop\Access" # directory
# search directory for .txt files
text_files = [join(my_path, f) for f in listdir(my_path) if isfile(join(my_path, f)) and '.txt' in f]
for text_file in text_files: # loop and open .txt document
with open(text_file, 'r+') as wordlist:
string = [] # array ot the saved string
for word in wordlist:
string.append(word.split('|')) # put word to string array
column_list = zip(*string) # make list of all string
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet('Tab')
worksheet.col(0) # construct cell
first_col = worksheet.col(0)
first_col.width = 256*50
second_col = worksheet.col(1)
second_col.width = 256*25
third_col = worksheet.col(2)
third_col.width = 256*10
fourth_col = worksheet.col(3)
fourth_col.width = 256*50
fifth_col = worksheet.col(4)
fifth_col.width = 256*100
i = 0 # choose column 0 = A, 3 = C etc
for column in column_list:
for item in range(len(column)):
value = column[item].strip()
if if_number(value):
worksheet.write(item, i, float(value)) # text / float
else:
worksheet.write(item, i, value) # number / int
i += 1
print('File:', text_files, 'Done')
workbook.save(text_file.replace('.txt', '.xls'))
Is there a way to automate the "copy paste"-command, if so how would that look like and work? and if that's something that can´t be done, some advice would help a lot!
EDIT
Thanks i have done som googling and thanks for your help! but now i get a a error... i still can´t send the information to the Access file, i get a syntax error. and i know it exist because i would want to uppdate the existing file... is there a command to "uppdate an exising Acces file"?
error
pyodbc.ProgrammingError: ('42S01', "[42S01] [Microsoft][ODBC Microsoft Access Driver] Table 'tblLogfile' already exists. (-1303) (SQLExecDirectW)")
code
import pyodbc
UDC = r'C:\Users\Documents\Access\UDC.accdb'
# DSN Connection
constr = " DSN=MS Access Database; DBQ={0};".format(UDC)
# DRIVER connection
constr = "DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};UID=admin;UserCommitSync=Yes;Threads=3;SafeTransactions=0;PageTimeout=5;MaxScanRows=8;MaxBufferSize=2048;FIL={MS Access};DriverId=25;DefaultDir=C:/USERS/DOCUMENTS/ACCESS;DBQ=C:/USERS/DOCUMENTS/ACCESS/UDC.accdb"
# Connect to database UDC and open cursor
db = pyodbc.connect(constr)
cursor = db.cursor()
sql = "SELECT * INTO [tblLogfile]" +\
"FROM [Excel 8.0;HDR=YES;Database=C:/Users/Documents/Access/Excel.xls.[Tab];"
cursor.execute(sql)
db.commit()
cursor.close()
db.close()
First, please note, MS Access, a database management system, is not MS Excel, a spreadsheet application. Access sits on top of a relational engine and maintains strict rules in data and relational integrity whereas in Excel anything can be written across cells or ranges of cells with no rules. Additionally, the Access object library (tabledefs, querydefs, forms, reports, macros, modules) is much different than the Excel object library (workbooks, worksheets, range, cell, etc.), so there is no one-to-one translation in code.
Specifically, for your Python project, consider pyodbc using a make-table query that runs a direct connection to the Excel workbook. Since MS Access' database is the ACE/JET engine (Windows .dll files, available on Windows machines regardless of Access install). One feature of this data store is the ability to connect to workbooks even text files! So really, MSAccess.exe is just a GUI console to view .mdb/.accdb files.
Below creates a new database table that replicates the specific workbook sheet data, assuming the sheet maintains:
tabular format beginning in A1 cell (no merged cells/repeating labels)
headers in top row (no spaces before or after or special characters !#$%^~<>)))
columns of consistent data type format (i.e., data integrity rules)
Python code
import pyodbc
databasename = 'C:\\Path\\To\\Database\\File.accdb'
# DSN Connection
constr = "DSN=MS Access Database;DBQ={0};".format(databasename)
# DRIVER CONNECTION
constr = "DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={0};".format(databasename)
# CONNECT TO DATABASE AND OPEN CURSOR
db = pyodbc.connect(constr)
cur = db.cursor()
# RUN MAKE-TABLE QUERY FROM EXCEL WORKBOOK SOURCE
# OLDER EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 8.0;HDR=Yes;Database=C:\Path\To\Workbook.xls].[SheetName$];"
# CURRENT EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 12.0 Xml;HDR=Yes;Database=C:\Path\To\Workbook.xlsx].[SheetName$];"
cursor.execute(sql)
db.commit()
cur.close()
db.close()
Almost certainly the answer from Parfait above is a better way to go, but for fun I'll leave my answer below
If you are willing to put in the time I think you need 3 things to complete the automation of what you want to do:
1) Send a string representation of your data to the windows Clipboard There is windows specific code for this, or you can just save yourself some time and use pyperclip
2) Learn VBA and use VBA to grab the string from the clipboard and process it. Here is some example VBA code that I used in excel the past to grab text from the Clipboard
Function GetTextFromClipBoard() As String
Dim MSForms_DataObject As New MSForms.DataObject
MSForms_DataObject.GetFromClipboard
GetTextFromClipBoard = MSForms_DataObject.GetText()
End Function
3) use pywin32 (I believe available easily with Anaconda) to automate the vba access calls from Python. This is probably going to be the hardest part as the specific call trees are (in my opinion) not well documented and takes a lot of poking and digging to figure out what exactly you need to do. Its painful to say the least, but use IPython to help you with visual cues of what methods your pywin32 objects have available.
As I look at the instructions above, I realize it may also be possible to skip the clipboard and just send the information directly from python to access via pywin32. If you do the clipboard route however, you can break the steps up.
send one dataset to the clipboard
grab and process the data using the VBA editor in Access
after you figure out 1 and 2, use pywin32 to bridge the gap
Good luck, and maybe write a blog post about it if you figure it out to share the details.

lxml + loads of files = random SerialisationError: IO_WRITE

I'm using lxml and python 3 to parse many files and merge files that belong together.
The files are actually stored in pairs of two (that are also merged first) inside zip files but i don't think that matters here.
We're talking about 100k files that are about 900MB in zipped form.
My problems is that my script works fine but at somepoint (for multiple runs it's not always the same point so it shouldn't be a problem with a certain file) i get this error:
File "C:\Users\xxx\workspace\xxx\src\zip2xml.py", line 110, in
_writetonorm
normroot.getroottree().write(norm_file_path) File "lxml.etree.pyx", line 1866, in lxml.etree._ElementTree.write
(src/lxml\lxml.etree.c:46006) File "serializer.pxi", line 481, in
lxml.etree._tofilelike (src/lxml\lxml.etree.c:93719) File
"serializer.pxi", line 187, in lxml.etree._raiseSerialisationError
(src/lxml\lxml.etree.c:90965) lxml.etree.SerialisationError: IO_WRITE
I have no idea what causes this error.
The entire code is a little cumbersome so i hope the relevant areas suffice:
def _writetonorm(self, outputpath):
'''Writes the current XML to a file.
It'll update the file if it already exists and create the file otherwise'''
#Find Name
name = None
try:
name = self._xml.xpath("xxx")[0].text.rstrip().lstrip()
except Exception as e:
try:
name = self._xml.xpath("xxx")[0].text.rstrip().lstrip()
except Exception as e:
name = "damn it!"
if name != None:
#clean name a bit
name = name[:35]
table = str.maketrans(' /#*"$!&<>-:.,;()','_________________')
name = name.translate(table)
name = name.lstrip("_-").rstrip("_-")
#generate filename
norm_file_name = name + ".xml"
norm_file_path = os.path.join(outputpath, norm_file_name)
#Check if we have that completefile already. If we do, update it.
if os.path.isfile(norm_file_path):
norm_file = etree.parse(norm_file_path, self._parser)
try:
normroot = norm_file.getroot()
except:
print(norm_file_path + "is broken !!!!")
time.sleep(10)
else:
normroot = etree.Element("norm")
jurblock = etree.Element("jurblock")
self._add_jurblok_attributes(jurblock)
jurblock.insert(0, self._xml)
normroot.insert(0, jurblock)
try:
normroot.getroottree().write(norm_file_path) #here the Exception occurs
except Exception as e:
print(norm_file_path)
raise e
I know that my exception handling isn't great but this is just a proof of work for now.
Can anyone tell me why the error happens ?
Looking at the file that causes the error it's not wellformed but I suspect that is because the error happened and it was fine before the latest iteration.
It seems to have been a mistake to use maped network drives for this. No such Exception when letting it work with the files locally.
Learned something :)

How to check if a SQLite3 database exists in Python?

I am trying to create a function in Python 2.7.3 to open a SQLite database.
This is my code at the moment:
import sqlite3 as lite
import sys
db = r'someDb.sqlite'
def opendb(db):
try:
conn = lite.connect(db)
except sqlite3.Error:
print "Error open db.\n"
return False
cur = conn.cursor()
return [conn, cur]
I have tried the code above and I have observed that the sqlite3 library opens the database declared if exists, or creates a new database if this one doesn't exist.
Is there a way to check if the database exists with sqlite3 methods or I have to use file operation like os.path.isfile(path)?
In Python 2, you'll have to explicitly test for the existence using os.path.isfile:
if os.path.isfile(db):
There is no way to force the sqlite3.connect function to not create the file for you.
For those that are using Python 3.4 or newer, you can use the newer URI path feature to set a different mode when opening a database. The sqlite3.connect() function by default will open databases in rwc, that is Read, Write & Create mode, so connecting to a non-existing database will cause it to be created.
Using a URI, you can specify a different mode instead; if you set it to rw, so Read & Write mode, an exception is raised when trying to connect to a non-existing database. You can set different modes when you set the uri=True flag when connecting and pass in a file: URI, and add a mode=rw query parameter to the path:
from urllib.request import pathname2url
try:
dburi = 'file:{}?mode=rw'.format(pathname2url(db))
conn = lite.connect(dburi, uri=True)
except sqlite3.OperationalError:
# handle missing database case
See the SQLite URI Recognized Query Parameters documentation for more details on what parameters are accepted.
os.path.isfile() is just telling you if a file exists, not if it exists AND is a SQLite3 database! Knowing http://www.sqlite.org/fileformat.html, you could do this :
def isSQLite3(filename):
from os.path import isfile, getsize
if not isfile(filename):
return False
if getsize(filename) < 100: # SQLite database file header is 100 bytes
return False
with open(filename, 'rb') as fd:
header = fd.read(100)
return header[:16] == 'SQLite format 3\x00'
and subsequently use it like :
for file in files:
if isSQLite3(file):
print "'%s' is a SQLite3 database file" % file
else:
print "'%s' is not a SQLite3 database file" % file
Yes, there is a way to do what you want with Python 3.4+.
Use the sqlite3.connect() function to connect, but pass it a URI instead of a file path, and add mode=rw to its query string.
Here is a complete working code example:
import sqlite3
con = sqlite3.connect('file:aaa.db?mode=rw', uri=True)
This will open an existing database from a file named aaa.db in the current folder, but will raise an error in case that file can not be opened or does not exist:
Traceback (most recent call last):
File "aaa.py", line 2, in <module>
con = sqlite3.connect('file:aaa.db?mode=rw', uri=True)
sqlite3.OperationalError: unable to open database file
Python sqlite.connect() docs state that:
If uri is true, database is interpreted as a URI. This allows you to specify options. For example, to open a database in read-only mode you can use:
db = sqlite3.connect('file:path/to/database?mode=ro', uri=True)
More information about this feature, including a list of recognized options, can be found in the SQLite URI documentation.
Here's an excerpt of all the relevant URI option information collected from http://www.sqlite.org/c3ref/open.html:
mode: The mode parameter may be set to either "ro", "rw", "rwc", or "memory". Attempting to set it to any other value is an error. If "ro" is specified, then the database is opened for read-only access, just as if the SQLITE_OPEN_READONLY flag had been set in the third argument to sqlite3_open_v2(). If the mode option is set to "rw", then the database is opened for read-write (but not create) access, as if SQLITE_OPEN_READWRITE (but not SQLITE_OPEN_CREATE) had been set. Value "rwc" is equivalent to setting both SQLITE_OPEN_READWRITE and SQLITE_OPEN_CREATE. If the mode option is set to "memory" then a pure in-memory database that never reads or writes from disk is used. It is an error to specify a value for the mode parameter that is less restrictive than that specified by the flags passed in the third parameter to sqlite3_open_v2().
The sqlite3_open_v2() interface works like sqlite3_open() except that it accepts two additional parameters for additional control over the new database connection. The flags parameter to sqlite3_open_v2() can take one of the following three values, optionally combined with the SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX, SQLITE_OPEN_SHAREDCACHE, SQLITE_OPEN_PRIVATECACHE, and/or SQLITE_OPEN_URI flags:
SQLITE_OPEN_READONLY
The database is opened in read-only mode. If the database does not already exist, an error is returned.
SQLITE_OPEN_READWRITE
The database is opened for reading and writing if possible, or reading only if the file is write protected by the operating system. In either case the database must already exist, otherwise an error is returned.
SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
The database is opened for reading and writing, and is created if it does not already exist. This is the behavior that is always used for sqlite3_open() and sqlite3_open16().
For convenience, here's also a Python 3.4+ function for converting a regular path to an URI usable by sqlite.connect():
import pathlib
import urllib.parse
def _path_to_uri(path):
path = pathlib.Path(path)
if path.is_absolute():
return path.as_uri()
return 'file:' + urllib.parse.quote(path.as_posix(), safe=':/')
This is a fork (using Python 3) based on Tom Horen's answer, which presents a solution more complete and reliable that the elected answer.
The elected answer, does not evaluate any content, header, etc., in order to determine whether the file actually contains any data related to a SQLite3 database or not.
I tried to present something more pragmatic here:
#!/usr/bin/python3
import os
import sys
if os.path.isfile('test.sqlite3'):
if os.path.getsize('test.sqlite3') > 100:
with open('test.sqlite3','r', encoding = "ISO-8859-1") as f:
header = f.read(100)
if header.startswith('SQLite format 3'):
print("SQLite3 database has been detected.")
Building on a couple of other answers above. Here is a clean solution that works in Python 3.7.7:
def isSqlite3Db(db):
if not os.path.isfile(db): return False
sz = os.path.getsize(db)
# file is empty, give benefit of the doubt that its sqlite
# New sqlite3 files created in recent libraries are empty!
if sz == 0: return True
# SQLite database file header is 100 bytes
if sz < 100: return False
# Validate file header
with open(db, 'rb') as fd: header = fd.read(100)
return (header[:16] == b'SQLite format 3\x00')
Usage:
if isSqlite3Db('<path_to_db>'):
# ... <path_to_db> is a Sqlite 3 DB
Notes:
The answers checking file size is > 100 does not work as a new sqlite3 db created in recent python creates an file with length of 0.
Other examples reading file header returned bytes in Python 3.7.7 and not string so comparison would fail.
Examples that use sqlite3.connect(dburl, uri=True) did not work for me in Python 3.7.7 as it gave false positives.
I am using a function like the following at the beginning of my script so that I can try and figure out why a sqlite3 db script might not be working. Like the comments say, it uses 3 phases, checks if a path exist, checks if the path is a file, checks if that file's header is a sqlite3 header.
def checkdbFileforErrors():
#check if path exists
try:
with open('/path/to/your.db'): pass
except IOError:
return 1
#check if path if a file
if not isfile('/path/to/your.db'):
return 2
#check if first 100 bytes of path identifies itself as sqlite3 in header
f = open('/path/to/your.db', "rx")
ima = f.read(16).encode('hex')
f.close()
#see http://www.sqlite.org/fileformat.html#database_header magic header string
if ima != "53514c69746520666f726d6174203300":
return 3
return 0

Categories

Resources