parsing .log files then sort in access

parsing .log files then sort in access - python

I'm writing a parsing program that that searches through 100+ .log files after some keyword, then puts the words in different array´s and separates the words in to columns in excel. Now I want to sort them in Access automatically so that I can process the different .log file combinations. I can "copy paste" from my Excel file to Access, but that so inefficient and gives some errors... I would like it to be "automatic". I'm new to Access and don´t know how to link from python to Access, I have tried doing it as I did to Excel but that didn't work and started looking in to OBDC but had some problems there to...
import glob # includes
import xlwt # includes
from os import listdir # includes
from os.path import isfile, join # includes
def logfile(filename, tester, createdate,completeresponse):
# Arrays for strings
response = []
message = []
test = []
date = []
with open(filename) as filesearch: # open search file
filesearch = filesearch.readlines() # read file
for line in filesearch:
file = filename[39:] # extract filename [file]
for lines in filesearch:
if createdate in lines: # extract "Create Time" {date}
date.append(lines[15:34])
if completeresponse in lines:
response.append(lines[19:])
print('pending...')
i = 1 # set a number on log {i}
d = {}
for name in filename:
if not d.get(name, False):
d[name] = i
i += 1
if tester in line:
start = '-> '
end = ':\ ' # |<----------->|
number = line[line.find(start)+3: line.find(end)] #Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
test.append(number) # extract tester {test}
# |<--------------------------------------------
text = line[line.find(end)+3:] # Tester -> 1631 22 F1 2E :\ BCM_APP_31381140 AJ \ Read Data By Identifier \
message.append(text)
with open('Excel.txt', 'a') as handler: # create .txt file
for i in range(len(message)):
# A B C D E
handler.write(f"{file}|{date[i]}|{i}|{test[i]}|{response[i]}")
# A = filename B = create time C = number in file D = tester E = Complete response
# open with 'w' to "reset" the file.
with open('Excel.txt', 'w') as file_handler:
pass
# ---------------------------------------------------------------------------------
for filename in glob.glob(r'C:\Users\Desktop\Access\*.log'):
logfile(filename, 'Sending Request: Tester ->', 'Create Time:','Complete Response:','Channel')
def if_number(s): # look if number or float
try:
float(s)
return True
except ValueError:
return False
# ----------------------------------------------
my_path = r"C:\Users\Desktop\Access" # directory
# search directory for .txt files
text_files = [join(my_path, f) for f in listdir(my_path) if isfile(join(my_path, f)) and '.txt' in f]
for text_file in text_files: # loop and open .txt document
with open(text_file, 'r+') as wordlist:
string = [] # array ot the saved string
for word in wordlist:
string.append(word.split('|')) # put word to string array
column_list = zip(*string) # make list of all string
workbook = xlwt.Workbook()
worksheet = workbook.add_sheet('Tab')
worksheet.col(0) # construct cell
first_col = worksheet.col(0)
first_col.width = 256*50
second_col = worksheet.col(1)
second_col.width = 256*25
third_col = worksheet.col(2)
third_col.width = 256*10
fourth_col = worksheet.col(3)
fourth_col.width = 256*50
fifth_col = worksheet.col(4)
fifth_col.width = 256*100
i = 0 # choose column 0 = A, 3 = C etc
for column in column_list:
for item in range(len(column)):
value = column[item].strip()
if if_number(value):
worksheet.write(item, i, float(value)) # text / float
else:
worksheet.write(item, i, value) # number / int
i += 1
print('File:', text_files, 'Done')
workbook.save(text_file.replace('.txt', '.xls'))
Is there a way to automate the "copy paste"-command, if so how would that look like and work? and if that's something that can´t be done, some advice would help a lot!
EDIT
Thanks i have done som googling and thanks for your help! but now i get a a error... i still can´t send the information to the Access file, i get a syntax error. and i know it exist because i would want to uppdate the existing file... is there a command to "uppdate an exising Acces file"?
error
pyodbc.ProgrammingError: ('42S01', "[42S01] [Microsoft][ODBC Microsoft Access Driver] Table 'tblLogfile' already exists. (-1303) (SQLExecDirectW)")
code
import pyodbc
UDC = r'C:\Users\Documents\Access\UDC.accdb'
# DSN Connection
constr = " DSN=MS Access Database; DBQ={0};".format(UDC)
# DRIVER connection
constr = "DRIVER={Microsoft Access Driver (*.mdb, *.accdb)};UID=admin;UserCommitSync=Yes;Threads=3;SafeTransactions=0;PageTimeout=5;MaxScanRows=8;MaxBufferSize=2048;FIL={MS Access};DriverId=25;DefaultDir=C:/USERS/DOCUMENTS/ACCESS;DBQ=C:/USERS/DOCUMENTS/ACCESS/UDC.accdb"
# Connect to database UDC and open cursor
db = pyodbc.connect(constr)
cursor = db.cursor()
sql = "SELECT * INTO [tblLogfile]" +\
"FROM [Excel 8.0;HDR=YES;Database=C:/Users/Documents/Access/Excel.xls.[Tab];"
cursor.execute(sql)
db.commit()
cursor.close()
db.close()

First, please note, MS Access, a database management system, is not MS Excel, a spreadsheet application. Access sits on top of a relational engine and maintains strict rules in data and relational integrity whereas in Excel anything can be written across cells or ranges of cells with no rules. Additionally, the Access object library (tabledefs, querydefs, forms, reports, macros, modules) is much different than the Excel object library (workbooks, worksheets, range, cell, etc.), so there is no one-to-one translation in code.
Specifically, for your Python project, consider pyodbc using a make-table query that runs a direct connection to the Excel workbook. Since MS Access' database is the ACE/JET engine (Windows .dll files, available on Windows machines regardless of Access install). One feature of this data store is the ability to connect to workbooks even text files! So really, MSAccess.exe is just a GUI console to view .mdb/.accdb files.
Below creates a new database table that replicates the specific workbook sheet data, assuming the sheet maintains:
tabular format beginning in A1 cell (no merged cells/repeating labels)
headers in top row (no spaces before or after or special characters !#$%^~<>)))
columns of consistent data type format (i.e., data integrity rules)
Python code
import pyodbc
databasename = 'C:\\Path\\To\\Database\\File.accdb'
# DSN Connection
constr = "DSN=MS Access Database;DBQ={0};".format(databasename)
# DRIVER CONNECTION
constr = "DRIVER={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={0};".format(databasename)
# CONNECT TO DATABASE AND OPEN CURSOR
db = pyodbc.connect(constr)
cur = db.cursor()
# RUN MAKE-TABLE QUERY FROM EXCEL WORKBOOK SOURCE
# OLDER EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 8.0;HDR=Yes;Database=C:\Path\To\Workbook.xls].[SheetName$];"
# CURRENT EXCEL FORMAT
sql = "SELECT * INTO [myNewTable]" + \
" FROM [Excel 12.0 Xml;HDR=Yes;Database=C:\Path\To\Workbook.xlsx].[SheetName$];"
cursor.execute(sql)
db.commit()
cur.close()
db.close()

Almost certainly the answer from Parfait above is a better way to go, but for fun I'll leave my answer below
If you are willing to put in the time I think you need 3 things to complete the automation of what you want to do:
1) Send a string representation of your data to the windows Clipboard There is windows specific code for this, or you can just save yourself some time and use pyperclip
2) Learn VBA and use VBA to grab the string from the clipboard and process it. Here is some example VBA code that I used in excel the past to grab text from the Clipboard
Function GetTextFromClipBoard() As String
Dim MSForms_DataObject As New MSForms.DataObject
MSForms_DataObject.GetFromClipboard
GetTextFromClipBoard = MSForms_DataObject.GetText()
End Function
3) use pywin32 (I believe available easily with Anaconda) to automate the vba access calls from Python. This is probably going to be the hardest part as the specific call trees are (in my opinion) not well documented and takes a lot of poking and digging to figure out what exactly you need to do. Its painful to say the least, but use IPython to help you with visual cues of what methods your pywin32 objects have available.
As I look at the instructions above, I realize it may also be possible to skip the clipboard and just send the information directly from python to access via pywin32. If you do the clipboard route however, you can break the steps up.
send one dataset to the clipboard
grab and process the data using the VBA editor in Access
after you figure out 1 and 2, use pywin32 to bridge the gap
Good luck, and maybe write a blog post about it if you figure it out to share the details.

Related

Dealing with special characters in file paths - Python script and Athena Query

I have a script that I use to parse out S3 info using Athena and S3 inventory. Currently when it finds a specific file type, the script will use that path to generate a SQL query that is then passed to Athena. Here's an example:
def getS3TagReportData(session, tag, table):
whereStatement = ""
for v in tag.values:
first = True
for p in v.paths:
# print(p) #iterate through paths
if first:
whereStatement = f"WHERE dt = (SELECT MAX(dt) FROM \"{awsAthenaDatabase}\".\"{table}\") AND key LIKE '{p}/%'"
first = False
else:
whereStatement += f" OR key LIKE '{p}%'"
This works fine until I hit a path where someone has put in an '
For example: bucketname\data\int'l hotel files\data.jpg
Then the query that gets dynamically generated is:
WHERE dt = (SELECT MAX(dt) FROM \"{awsAthenaDatabase}\".\"{table}\") AND key LIKE 'bucketname\data\int'l hotel files\data.jpg'
As you can see, the Where statement is going to fail because there is that extra ' in there. How can I programmatically work around these errors and report them?
I appreciate the help!

Changing output of speedtest.py and speedtest-cli to include IP address in output .csv file

I added a line in the python code “speedtest.py” that I found at pimylifeup.com. I hoped it would allow me to track the internet provider and IP address along with all the other speed information his code provides. But when I execute it, the code only grabs the next word after the find all call. I would also like it to return the IP address that appears after the provider. I have attached the code below. Can you help me modify it to return what I am looking for.
Here is an example what is returned by speedtest-cli
$ speedtest-cli
Retrieving speedtest.net configuration...
Testing from Biglobe (111.111.111.111)...
Retrieving speedtest.net server list...
Selecting best server based on ping...
Hosted by GLBB Japan (Naha) [51.24 km]: 118.566 ms
Testing download speed................................................................................
Download: 4.00 Mbit/s
Testing upload speed......................................................................................................
Upload: 13.19 Mbit/s
$
And this is an example of what it is being returned by speediest.py to my .csv file
Date,Time,Ping,Download (Mbit/s),Upload(Mbit/s),myip
05/30/20,12:47,76.391,12.28,19.43,Biglobe
This is what I want it to return.
Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip
05/30/20,12:31,75.158,14.29,19.54,Biglobe 111.111.111.111
Or may be,
05/30/20,12:31,75.158,14.29,19.54,Biglobe,111.111.111.111
Here is the code that I am using. And thank you for any help you can provide.
import os
import re
import subprocess
import time
response = subprocess.Popen(‘/usr/local/bin/speedtest-cli’, shell=True, stdout=subprocess.PIPE).stdout.read().decode(‘utf-8’)
ping = re.findall(‘km]:\s(.*?)\s’, response, re.MULTILINE)
download = re.findall(‘Download:\s(.*?)\s’, response, re.MULTILINE)
upload = re.findall(‘Upload:\s(.*?)\s’, response, re.MULTILINE)
myip = re.findall(‘from\s(.*?)\s’, response, re.MULTILINE)
ping = ping[0].replace(‘,’, ‘.’)
download = download[0].replace(‘,’, ‘.’)
upload = upload[0].replace(‘,’, ‘.’)
myip = myip[0]
try:
f = open(‘/home/pi/speedtest/speedtestz.csv’, ‘a+’)
if os.stat(‘/home/pi/speedtest/speedtestz.csv’).st_size == 0:
f.write(‘Date,Time,Ping,Download (Mbit/s),Upload (Mbit/s),myip\r\n’)
except:
pass
f.write(‘{},{},{},{},{},{}\r\n’.format(time.strftime(‘%m/%d/%y’), time.strftime(‘%H:%M’), ping, download, upload, myip))

Let me know if this works for you, it should do everything you're looking for
#!/usr/local/env python
import os
import csv
import time
import subprocess
from decimal import *
file_path = '/home/pi/speedtest/speedtestz.csv'
def format_speed(bits_string):
""" changes string bit/s to megabits/s and rounds to two decimal places """
return (Decimal(bits_string) / 1000000).quantize(Decimal('.01'), rounding=ROUND_UP)
def write_csv(row):
""" writes a header row if one does not exist and test result row """
# straight from csv man page
# see: https://docs.python.org/3/library/csv.html
with open(file_path, 'a+', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',', quotechar='"')
if os.stat(file_path).st_size == 0:
writer.writerow(['Date','Time','Ping','Download (Mbit/s)','Upload (Mbit/s)','myip'])
writer.writerow(row)
response = subprocess.run(['/usr/local/bin/speedtest-cli', '--csv'], capture_output=True, encoding='utf-8')
# if speedtest-cli exited with no errors / ran successfully
if response.returncode == 0:
# from the csv man page
# "And while the module doesn’t directly support parsing strings, it can easily be done"
# this will remove quotes and spaces vs doing a string split on ','
# csv.reader returns an iterator, so we turn that into a list
cols = list(csv.reader([response.stdout]))[0]
# turns 13.45 ping to 13
ping = Decimal(cols[5]).quantize(Decimal('1.'))
# speedtest-cli --csv returns speed in bits/s, convert to bytes
download = format_speed(cols[6])
upload = format_speed(cols[7])
ip = cols[9]
date = time.strftime('%m/%d/%y')
time = time.strftime('%H:%M')
write_csv([date,time,ping,download,upload,ip])
else:
print('speedtest-cli returned error: %s' % response.stderr)

$/usr/local/bin/speedtest-cli --csv-header > speedtestz.csv
$/usr/local/bin/speedtest-cli --csv >> speedtestz.csv
output:
Server ID,Sponsor,Server Name,Timestamp,Distance,Ping,Download,Upload,Share,IP Address
Does that not get you what you're looking for? Run the first command once to create the csv with header row. Then subsequent runs are done with the append '>>` operator, and that'll add a test result row each time you run it
Doing all of those regexs will bite you if they or a library that they depend on decides to change their debugging output format
Plenty of ways to do it though. Hope this helps

Reading Excel File with openpyxl while populating form using Selenium takes too long

I'm filling out a web form which has input fields, dropdown menus, autocomplete fields and action buttons.
I'm pulling the data from an excel sheet using openpyxl. Initially it used to take between 3-4 seconds to populate these fields. After adding read_only=True to my readData function, it improved a bit but not as expected.
Does anyone has any suggestions on how I would be able to reduce the time it takes to populate each field? Any help is really appreciated. I'm leaving both the readData function as well as the populate_form which I use to fill out a text field as an example.
Cheers.
Method to read each cell:
workbook = openpyxl.load_workbook(file, read_only=True)
def readData(file, sheetName, row_num, column_num):
sheet = workbook.get_sheet_by_name(sheetName)
return sheet.cell(row=row_num, column=column_num).value
Method to populate input field:
def fill_out_form(driver, path, input_sel, row_num, column_num):
try:
wait_for_element(driver, "//input[#id='" + input_sel + "']", 5)
xls = readData(path, "Callcenter", row_num, column_num)
input_el = driver.find_element_by_xpath("//input[#id='" + input_sel + "']")
input_el.click()
if column_num == 9 or column_num == 40 or column_num == 67 or column_num == 121:
xls = datetime.strftime(xls,'%d/%m/%Y')
input_el.send_keys(xls)
input_el.send_keys(Keys.TAB)
loading_el = WebDriverWait(driver, 4).until(EC.presence_of_element_located((By.XPATH, "//*[#class='sk-attr js-sk-attr sk-attr--labeled sk-attr--mandatory sk-attr--infonnized sk-attr--error sk-textbox clearfix']")))
WebDriverWait(driver, 4).until(wait_not_spinning(loading_el))
except TimeoutException:
print("Loading took too much time!-Try again")

Unless your spreadsheet is huge I'm fairly certain the wait_for_element and WebDriverWait calls are taking the most time.
As was already suggested, try caching the spreadsheet(s) data using an efficient structure such as:
dict[file][sheet] = list[row][column]
Since it seems you only have one file you can load the data using:
def load_data(filename):
data = {}
workbook = openpyxl.load_workbook(filename, data_only=True, read_only=True, keep_vba=False)
for sheet_name in workbook.sheetnames:
data[sheet_name] = []
sheet = workbook[sheet_name]
for rows in sheet.iter_rows():
row_elements = []
for cell in rows:
try:
value = cell.value
except IndexError:
value = cell.internal_value
row_elements.append(value)
data[sheet_name].append(row_elements)
return data
In order to use it, you would call load_data(filename) once (when your application starts) and access the loaded data later on using xls_data instead of readData:
#application start
xls_data = load_data(filename)
....
# sheet_name->str, row_num->int, col_num->int
xls = xls_data[sheet_name][row_num][col_num]
The above will throw KeyError if the sheet name is invalid or IndexError for an invalid row,column combination.

Try implementing the readData method using the 'xlrd' library.
It does not provide rich API like openpyxl but I'm sure it'll run faster.

When you fill in a web form, in the end the data will be sent to a server with a POST request.
What I would recommend is to use e.g. wireshark to capture that POST request.
Analyse that request to see what exactly is sent to the server. Then you can create such a POST request using the requests module.
That means you don't have to deal with selenium at all.
And as the others have mentioned, read the excel file only once.

Looping SQL query in python

I am writing a python script which queries the database for a URL string. Below is my snippet.
db.execute('select sitevideobaseurl,videositestring '
'from site, video '
'where siteID =1 and site.SiteID=video.VideoSiteID limit 1')
result = db.fetchall()
filename = '/home/Site_info'
output = open(filename, "w")
for row in result:
videosite= row[0:2]
link = videosite[0].format(videosite[1])
full_link = link.replace("http://","https://")
print full_link
output.write("%s\n"%str(full_link))
output.close()
The query basically gives a URL link.It gives me baseURL from a table and the video site string from another table.
output: https://www.youtube.com/watch?v=uqcSJR_7fOc
SiteID is the primary key which is int and not in sequence.
I wish to loop this sql query to pick a new siteId for every execution so that i have unique site URL everytime and write all the results to a file.
desired output: https://www.youtube.com/watch?v=uqcSJR_7fOc
https://www.dailymotion.com/video/hdfchsldf0f
There are about 1178 records.
Thanks for your time and help in advance.

I'm not sure if I completely understand what you're trying to do. I think your goal is to get a list of all links to videos. You get a link to a video by joining the sitevideobaseurl from site and videositestring from video.
From my experience it's much easier to let the database do the heavy lifting, it's build for that. It should be more efficient to join the tables, return all the results and then looping trough them instead of making subsequent queries to the database for each row.
The code should look something like this: (Be careful, I didn't test this)
query = """
select s.sitevideobaseurl,
v.videositestring
from video as v
join site as s
on s.siteID = v.VideoSiteID
"""
db.execute(query)
result = db.fetchall()
filename = '/home/Site_info'
output = open(filename, "w")
for row in result:
link = "%s%s" % (row[0],row[1])
full_link = link.replace("http://","https://")
print full_link
output.write("%s\n" % str(full_link))
output.close()
If you have other reasons for wanting to fetch these ony by one an idea might be to fetch a list of all SiteIDs and store them in a list. Afterwards you start a loop for each item in that list and insert the id into the query via a parameterized query.

How to check if a SQLite3 database exists in Python?

I am trying to create a function in Python 2.7.3 to open a SQLite database.
This is my code at the moment:
import sqlite3 as lite
import sys
db = r'someDb.sqlite'
def opendb(db):
try:
conn = lite.connect(db)
except sqlite3.Error:
print "Error open db.\n"
return False
cur = conn.cursor()
return [conn, cur]
I have tried the code above and I have observed that the sqlite3 library opens the database declared if exists, or creates a new database if this one doesn't exist.
Is there a way to check if the database exists with sqlite3 methods or I have to use file operation like os.path.isfile(path)?

In Python 2, you'll have to explicitly test for the existence using os.path.isfile:
if os.path.isfile(db):
There is no way to force the sqlite3.connect function to not create the file for you.
For those that are using Python 3.4 or newer, you can use the newer URI path feature to set a different mode when opening a database. The sqlite3.connect() function by default will open databases in rwc, that is Read, Write & Create mode, so connecting to a non-existing database will cause it to be created.
Using a URI, you can specify a different mode instead; if you set it to rw, so Read & Write mode, an exception is raised when trying to connect to a non-existing database. You can set different modes when you set the uri=True flag when connecting and pass in a file: URI, and add a mode=rw query parameter to the path:
from urllib.request import pathname2url
try:
dburi = 'file:{}?mode=rw'.format(pathname2url(db))
conn = lite.connect(dburi, uri=True)
except sqlite3.OperationalError:
# handle missing database case
See the SQLite URI Recognized Query Parameters documentation for more details on what parameters are accepted.

os.path.isfile() is just telling you if a file exists, not if it exists AND is a SQLite3 database! Knowing http://www.sqlite.org/fileformat.html, you could do this :
def isSQLite3(filename):
from os.path import isfile, getsize
if not isfile(filename):
return False
if getsize(filename) < 100: # SQLite database file header is 100 bytes
return False
with open(filename, 'rb') as fd:
header = fd.read(100)
return header[:16] == 'SQLite format 3\x00'
and subsequently use it like :
for file in files:
if isSQLite3(file):
print "'%s' is a SQLite3 database file" % file
else:
print "'%s' is not a SQLite3 database file" % file

Yes, there is a way to do what you want with Python 3.4+.
Use the sqlite3.connect() function to connect, but pass it a URI instead of a file path, and add mode=rw to its query string.
Here is a complete working code example:
import sqlite3
con = sqlite3.connect('file:aaa.db?mode=rw', uri=True)
This will open an existing database from a file named aaa.db in the current folder, but will raise an error in case that file can not be opened or does not exist:
Traceback (most recent call last):
File "aaa.py", line 2, in <module>
con = sqlite3.connect('file:aaa.db?mode=rw', uri=True)
sqlite3.OperationalError: unable to open database file
Python sqlite.connect() docs state that:
If uri is true, database is interpreted as a URI. This allows you to specify options. For example, to open a database in read-only mode you can use:
db = sqlite3.connect('file:path/to/database?mode=ro', uri=True)
More information about this feature, including a list of recognized options, can be found in the SQLite URI documentation.
Here's an excerpt of all the relevant URI option information collected from http://www.sqlite.org/c3ref/open.html:
mode: The mode parameter may be set to either "ro", "rw", "rwc", or "memory". Attempting to set it to any other value is an error. If "ro" is specified, then the database is opened for read-only access, just as if the SQLITE_OPEN_READONLY flag had been set in the third argument to sqlite3_open_v2(). If the mode option is set to "rw", then the database is opened for read-write (but not create) access, as if SQLITE_OPEN_READWRITE (but not SQLITE_OPEN_CREATE) had been set. Value "rwc" is equivalent to setting both SQLITE_OPEN_READWRITE and SQLITE_OPEN_CREATE. If the mode option is set to "memory" then a pure in-memory database that never reads or writes from disk is used. It is an error to specify a value for the mode parameter that is less restrictive than that specified by the flags passed in the third parameter to sqlite3_open_v2().
The sqlite3_open_v2() interface works like sqlite3_open() except that it accepts two additional parameters for additional control over the new database connection. The flags parameter to sqlite3_open_v2() can take one of the following three values, optionally combined with the SQLITE_OPEN_NOMUTEX, SQLITE_OPEN_FULLMUTEX, SQLITE_OPEN_SHAREDCACHE, SQLITE_OPEN_PRIVATECACHE, and/or SQLITE_OPEN_URI flags:
SQLITE_OPEN_READONLY
The database is opened in read-only mode. If the database does not already exist, an error is returned.
SQLITE_OPEN_READWRITE
The database is opened for reading and writing if possible, or reading only if the file is write protected by the operating system. In either case the database must already exist, otherwise an error is returned.
SQLITE_OPEN_READWRITE | SQLITE_OPEN_CREATE
The database is opened for reading and writing, and is created if it does not already exist. This is the behavior that is always used for sqlite3_open() and sqlite3_open16().
For convenience, here's also a Python 3.4+ function for converting a regular path to an URI usable by sqlite.connect():
import pathlib
import urllib.parse
def _path_to_uri(path):
path = pathlib.Path(path)
if path.is_absolute():
return path.as_uri()
return 'file:' + urllib.parse.quote(path.as_posix(), safe=':/')

This is a fork (using Python 3) based on Tom Horen's answer, which presents a solution more complete and reliable that the elected answer.
The elected answer, does not evaluate any content, header, etc., in order to determine whether the file actually contains any data related to a SQLite3 database or not.
I tried to present something more pragmatic here:
#!/usr/bin/python3
import os
import sys
if os.path.isfile('test.sqlite3'):
if os.path.getsize('test.sqlite3') > 100:
with open('test.sqlite3','r', encoding = "ISO-8859-1") as f:
header = f.read(100)
if header.startswith('SQLite format 3'):
print("SQLite3 database has been detected.")

Building on a couple of other answers above. Here is a clean solution that works in Python 3.7.7:
def isSqlite3Db(db):
if not os.path.isfile(db): return False
sz = os.path.getsize(db)
# file is empty, give benefit of the doubt that its sqlite
# New sqlite3 files created in recent libraries are empty!
if sz == 0: return True
# SQLite database file header is 100 bytes
if sz < 100: return False
# Validate file header
with open(db, 'rb') as fd: header = fd.read(100)
return (header[:16] == b'SQLite format 3\x00')
Usage:
if isSqlite3Db('<path_to_db>'):
# ... <path_to_db> is a Sqlite 3 DB
Notes:
The answers checking file size is > 100 does not work as a new sqlite3 db created in recent python creates an file with length of 0.
Other examples reading file header returned bytes in Python 3.7.7 and not string so comparison would fail.
Examples that use sqlite3.connect(dburl, uri=True) did not work for me in Python 3.7.7 as it gave false positives.

I am using a function like the following at the beginning of my script so that I can try and figure out why a sqlite3 db script might not be working. Like the comments say, it uses 3 phases, checks if a path exist, checks if the path is a file, checks if that file's header is a sqlite3 header.
def checkdbFileforErrors():
#check if path exists
try:
with open('/path/to/your.db'): pass
except IOError:
return 1
#check if path if a file
if not isfile('/path/to/your.db'):
return 2
#check if first 100 bytes of path identifies itself as sqlite3 in header
f = open('/path/to/your.db', "rx")
ima = f.read(16).encode('hex')
f.close()
#see http://www.sqlite.org/fileformat.html#database_header magic header string
if ima != "53514c69746520666f726d6174203300":
return 3
return 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

parsing .log files then sort in access - python

Related

Dealing with special characters in file paths - Python script and Athena Query

Changing output of speedtest.py and speedtest-cli to include IP address in output .csv file

Reading Excel File with openpyxl while populating form using Selenium takes too long

Looping SQL query in python

How to check if a SQLite3 database exists in Python?

Categories

Resources