Python File Integrity Monitoring

Python File Integrity Monitoring - python

I have written the following code and can't get it working and I can't see why. The code:
Reads list of target files
Loops through the directories
Runs MD5 hash on the file
Checks the activefile for previous md5 hashes for said file
If the file is new it will log it as new
If the log is existing but changed it will write the change and log the change
If not new and no change then do nothing
Here is the code:
import hashlib
import logging as log
import optparse
import os
import re
import sys
import glob
import shutil
def md5(fileName):
"""Compute md5 hash of the specified file"""
try:
fileHandle = open(fileName, "rb")
except IOError:
return
m5Hash = hashlib.md5()
while True:
data = fileHandle.read(8192)
if not data:
break
m5Hash.update(data)
fileHandle.close()
return m5Hash.hexdigest()
req = open("requested.txt")
for reqline in req:
reqName = reqline[reqline.rfind('/') + 1:len(reqline) - 1]
reqDir = reqline[0:reqline.rfind('/') + 1]
tempFile = open("activetemp.txt", 'w')
for name in glob.glob(reqDir + reqName):
fileHash = md5(name)
actInt = 0
if fileHash != None:
actFile = open("activefile.txt")
for actLine in actFile:
actNameDir = actLine[0:actLine.rfind(' : ')]
actHash = actLine[actLine.rfind(' : ') + 3:len(actLine) -1]
if actNameDir == name and actHash == fileHash:
tempFile.write(name + " : " + fileHash + "\n")
actInt = 1
print fileHash
print actHash
print name
print actNameDir
if actNameDir == name and actHash != fileHash:
fimlog = open("fimlog.txt", 'a')
tempFile.write(name + " : " + actHash + "\n")
actInt = 1
fimlog.write("FIM Log: The file " + name + " was modified: " + actHash + "\n")
if actInt == 0:
fimlog = open("fimlog.txt", 'a')
fimlog.write("FIM Log: The file " + name + " was created: " + fileHash + "\n")
tempFile.write(name + " : " + fileHash + "\n")
shutil.copyfile("activetemp.txt", "activefile.txt")

You never really described the problem, but one possible culprit is that you never close the tempFile (or the other files for that matter), so the file copy at the end may fail.

Related

Python: Recursivly get the length of file path

Trying to recursively scan a given directory and get the length of the file or directory path not the file or directory size
If the length is more than say 35 characters, Just to test, output the path and length to a log file
If Directory Path is > 35 then little point traversing down further
import sys
import os
path = sys.argv[1]
Log = path + "\\PathToLongLog.txt"
fname = []
# Check if path exits
if os.path.exists(path):
print ("Directory exist")
for root,d_names,f_names in os.walk(path):
print (root, d_names, f_names)
for f in f_names:
fname.append(os.path.join(root, f))
#print("fname = %s" %fname)
for fp in fname:
Len = len(fp)
if Len > 35:
print("fname = %s" %fname, " Lenth ", str(Len) )
msg ="fname = " + str(fname) + " Lenth " + str(Len)
with open(Log, "a") as LogFile:
LogFile.write(msg + "\n")
Expected output would be 1 line for each file
D:\Path\To\Very Long File\Or Directory\My File.txt Length 50
What I'm getting is
fname = ['D:\\Path\\To\\Very Long File\\Or Directory\\My File.txt', 'Path\\To\\File1.ext', 'Path\\To\\File2.ext',etc] length 50
Can anyone see what I'm doing wrong?

You want to log the file name but you are actually logging the fname variable which is a list of all files.
You can change the code to log the 'fp' variable instead of the 'fname' and it will work:
for fp in fname:
Len = len(fp)
if Len > 35:
print("fname = %s" %fp, " Lenth ", str(Len))
msg = "fname = " + str(fp) + " Lenth " + str(Len)
with open(Log, "a") as LogFile:
LogFile.write(msg + "\n")

How to search and read a text file in a specific folder faster using python

I've written a simple python script to search for a log file in a folder (which has approx. 4 million files) and read the file.
Currently, the average time taken for the entire operation is 20 seconds. I was wondering if there is a way to get the response faster.
Below is my script
import re
import os
import timeit
from datetime import date
log_path = "D:\\Logs Folder\\"
rx_file_name = r"[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12}"
log_search_script = True
today = str(date.today())
while log_search_script:
try:
log_search = input("Enter image file name: ")
file_name = re.search(rx_file_name, log_search).group()
log_file_name = str(file_name) + ".log"
print(f"\nLooking for log file '{log_file_name}'...\n")
pass
except:
print("\n ***** Invalid input. Try again! ***** \n")
continue
start = timeit.default_timer()
if log_file_name in os.listdir(log_path):
log_file = open(log_path + "\\" + log_file_name, 'r', encoding="utf8")
print('\n' + "--------------------------------------------------------" + '\n')
print(log_file.read())
log_file.close()
print('\n' + "--------------------------------------------------------" + '\n')
print("Time Taken: " + str(timeit.default_timer() - start) + " seconds")
print('\n' + "--------------------------------------------------------" + '\n')
else:
print("Log File Not Found")
search_again = input('\nDo you want to search for another log ("y" / "n") ?').lower()
if search_again[0] == 'y':
print("======================================================\n\n")
continue
else:
log_search_script = False

Your problem is the line:
if log_file_name in os.listdir(log_path):
This has two problems:
os.listdir will create a huge list which can take a lot of time (and space...).
the ... in ... part will now go over that huge list linearly and search for the file.
Instead, let your OS do the hard work and "ask for forgivness, not permission". Just assume the file is there and try to open it. If it is not actually there - an error will be raised, which we will catch:
try:
with open(log_path + "\\" + log_file_name, 'r', encoding="utf8") as file:
print(log_file.read())
except FileNotFoundError:
print("Log File Not Found")

You can use glob.
import glob
print(glob.glob(directory_path))

urllib.urlretrieve downloads empty zip files

I'm trying to download a export of space in a zip file. But somehow python downloads a empty and corrupted zip file. When you download the file manual by the browser everything is ok.
I use Python 2.7.13
#!/usr/bin/python
import xmlrpclib
import time
import urllib
confluencesite = "https://confluence.com"
server = xmlrpclib.ServerProxy(confluencesite + '/rpc/xmlrpc')
username = '*'
password = '*'
token = server.confluence2.login(username, password)
loginString = "?os_username=" + username + "&os_password=" + password
filelist = ""
start = True
spacesummary = server.confluence2.getSpaces(token)
for space in spacesummary:
#if space['name'] == "24-codING":
# start = True
# continue
if start:
if space['type'] == 'global':
print "Exporting space " + space['name']
spaceDownloadUrl = server.confluence2.exportSpace(token, space['key'],
"TYPE_XML",
exportAll['true'])
filename = spaceDownloadUrl.split('/')[-1].split('#')[0].split('?')[0]
time.sleep(0.5)
urllib.urlretrieve(spaceDownloadUrl + loginString, filename)
print filename + " saved."
f = open("exportedspaces.txt", 'a')
f.write(filename + "\n")
f.close()

It's solved by the answer of Coldspeed. Changing the following:
loginString = "?os_username=" + username + "&os_password=" + password
to
loginString = "?os_username=" + username + "&os_password=" + password

Converting multiple gz file within subdirectories into csv

I have many subdirectories in my main directory and would like to write a script to unzip and convert all the files within it. If possible, I would also like to combine all the CSV within a single directory into a single CSV. But more importantly, I need help with my nested loop.
import gzip
import csv
import os
subdirlist = os.listdir('/home/user/Desktop/testloop')
subtotal = len(subdirlist)
subcounter = 0
for dirlist in subdirlist:
print "Working On " + dirlist
total = len(dirlist)
counter = 0
for dir in dirlist:
print "Working On " + dir
f = gzip.open('/' + str(subdirlist) + '/' + dir, 'rb')
file_content = f.read()
f.close()
print "25% Complete"
filename = '/' + str(subdirlist) + '/temp.txt'
target = open(filename, 'w')
target.write(file_content)
target.close()
print "50% Complete!"
csv_file = '/' + str(subdirlist) + '/' + str(dir) + '.csv'
in_txt = csv.reader(open(filename, "rb"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'wb'))
out_csv.writerows(in_txt)
os.remove(filename)
os.remove('/' + str(subdirlist) + '/' + dir)
counter+=1
print str(counter) + "/" + str(total) + " " + str(dir) + " Complete!"
print "SubDirectory Converted!"
print str(subcounter) + "/" + str(subtotal) + " " + str(subdirlist) + " Complete!"
subcounter+=1
print "All Files Converted!"
Thanks in advance

To get lists of files and subdirectories, you can use os.walk. Below is an implementation I wrote to get all files (optionally, of certain type(s)) in arbitrarily nested subdirectories:
from os import walk, sep
from functools import reduce # in Python 3.x only
def get_filelist(root, extensions=None):
"""Return a list of files (path and name) within a supplied root directory.
To filter by extension(s), provide a list of strings, e.g.
get_filelist(root, ["zip", "csv"])
"""
return reduce(lambda x, y: x+y,
[[sep.join([item[0], name]) for name in item[2]
if (extensions is None or
name.split(".")[-1] in extensions)]
for item in walk(root)])

Equivalent to grep, but not using open()

I have the following script (see below) which is taken stdin and manipulating into some simple files.
# Import Modules for script
import os, sys, fileinput, platform, subprocess
# Global variables
hostsFile = "hosts.txt"
hostsLookFile = "hosts.csv"
# Determine platform
plat = platform.system()
if plat == "Windows":
# Define Variables based on Windows and process
currentDir = os.getcwd()
hostsFileLoc = currentDir + "\\" + hostsFile
hostsLookFileLoc = currentDir + "\\" + hostsLookFile
ipAddress = sys.argv[1]
hostName = sys.argv[2]
hostPlatform = sys.argv[3]
hostModel = sys.argv[4]
# Add ipAddress to the hosts file for python to process
with open(hostsFileLoc,"a") as hostsFilePython:
for line in open(hostsFilePython, "r"):
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in hosts file"
else:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
# Add all details to the lookup file for displaying on-screen and added value
with open(hostsLookFileLoc,"a") as hostsLookFileCSV:
for line in open(hostsLookFileCSV, "r"):
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in lookup file"
else:
print "Adding details: " + ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + " to file"
hostsLookFileCSV.write(ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + "\n")
if plat == "Linux":
# Define Variables based on Linux and process
currentDir = os.getcwd()
hostsFileLoc = currentDir + "/" + hostsFile
hostsLookFileLoc = currentDir + "/" + hostsLookFile
ipAddress = sys.argv[1]
hostName = sys.argv[2]
hostPlatform = sys.argv[3]
hostModel = sys.argv[4]
# Add ipAddress to the hosts file for python to process
with open(hostsFileLoc,"a") as hostsFilePython:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
# Add all details to the lookup file for displaying on-screen and added value
with open(hostsLookFileLoc,"a") as hostsLookFileCSV:
print "Adding details: " + ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + " to file"
hostsLookFileCSV.write(ipAddress + "," + hostName + "," + hostPlatform + "," + hostModel + "\n")
This code obviously does not work, because the for line in open(hostsFilePython, "r"): syntax is wrong... I can not use a current file object with "open()". However this is want I want to achieve, how can I do this?

You want to open your file using the a+ mode so that you can both read and write, then simply use the existing file object (hostsFilePython).
However, this still won't work as you can only iterate over a file once before it is exhausted.
It's worth noting that this isn't very efficient. The better plan is to read the data into a set, update the set with your new values, then write the set to the file. (As pointed out in the comments, sets don't preserve duplicates (good for your purposes), and order, which may or may not work for you. If not, then you might need to use a list, which will be less efficient).

with open(hostsFileLoc) as hostsFilePython:
lines = hostsFilePython.readlines()
for filename in lines:
with open(hostsFileLoc, 'a') as hostFilePython:
with open(filename) as hostsFile:
for line in hostsFile.readlines():
if ipAddress in line:
print "ipAddress \(" + ipAddress + "\) already present in hosts file"
else:
print "Adding ipAddress: " + ipAddress + " to file"
hostsFilePython.write(ipAddress + "\n")
The default mode is read, so you don't need to pass in r explicitly.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python File Integrity Monitoring - python

You never really described the problem, but one possible culprit is that you never close the tempFile (or the other files for that matter), so the file copy at the end may fail.

Related

Python: Recursivly get the length of file path

How to search and read a text file in a specific folder faster using python

urllib.urlretrieve downloads empty zip files

Converting multiple gz file within subdirectories into csv

Equivalent to grep, but not using open()

Categories

Resources