So, I recently got into learning python and at work we wanted some way to make the process of finding specific keywords in our log files easier, to make it easier to tell what IPs to add to our block list.
I decided to go about writing a python script that would take in a logfile, take in a file with a list of key terms, and then look for those key terms in the log file and then write the lines that matched the session IDs where that key term was found; to a new file.
import sys
import time
import linecache
from datetime import datetime
def timeStamped(fname, fmt='%Y-%m-%d-%H-%M-%S_{fname}'):
return datetime.now().strftime(fmt).format(fname=fname)
importFile = open('rawLog.txt', 'r') #pulling in log file
importFile2 = open('keyWords.txt', 'r') #pulling in keywords
exportFile = open(timeStamped('ParsedLog.txt'), 'w') #writing the parsed log
FILE = importFile.readlines()
keyFILE = importFile2.readlines()
logLine = 1 #for debugging purposes when testing
parseString = ''
holderString = ''
sessionID = []
keyWords= []
j = 0
for line in keyFILE: #go through each line in the keyFile
keyWords = line.split(',') #add each word to the array
print(keyWords)#for debugging purposes when testing, this DOES give all the correct results
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 #advance the keyWords array
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
importFile.seek(0) #goes back to the start of the file
i = 0
for line in FILE:
if sessionID[i] in line[29:35]: #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
holderString = line #pulling the line of log file
exportFile.write(holderString)#writing the log file line to a new text file
print(holderString) #for debugging purposes when testing
if i < len(sessionID):
i+=1
importFile.close()
exportFile.close()
It is not iterating across my keyWords list, I probably made some stupid rookie mistake but I am not experienced enough to realize what I messed up. When I check the output it is only searching for the first item in the keyWords list in the rawLog.txt file.
The third loop does return the results that appear based on the sessionIDs that the second list pulls and does attempt to iterate (this gives an out of bounds exception due to i never being less than the length of the sessionID list, due to sessionID only having 1 value).
The program does write to and name the new logfile sucessfully, with a DateTime followed by ParsedLog.txt.
It looks to me like your second loop needs an inner loop instead of an inner if statement. E.g.
for line in FILE:
for word in keyWords:
if word in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
break # Assuming there will only be one keyword per line, else remove this
logLine +=1 #for debugging purposes when testing
importFile2.close()
print(sessionID) #for debugging purposes when testing
Assuming I have understood correctly, that is.
If the elif is never True you never increase j so you either need to increment always or check that the elif statement is actually ever evaluating to True
for line in FILE:
if keyWords[j] in line:
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
elif importFile == '' and j < len(keyWords): #if importFile is at end of file and we are not at the end of the array
importFile.seek(0) #goes back to the start of the file
j+=1 # always increase
Looking at the above loop, you create the file object with importFile = open('rawLog.txt', 'r') earlier in your code so comparing elif importFile == '' will never be True as importFile is a file object not a string.
You assign FILE = importFile.readlines() so that does exhaust the iterator creating the FILE list, you importFile.seek(0) but don't actually use the file object anywhere again.
So basically you loop one time over FILE, j never increases and your code then moves to the next block.
What you actually need are nested loops, using any to see if any word from keyWords is in each line and forget about your elif :
for line in FILE:
if any(word in line for word in keyWords):
parseString = line[29:35] #pulling in session ID
sessionID.append(parseString) #saving session IDs to a list
The same logic applies to your next loop:
for line in FILE:
if any(sess in line[29:35] for sess in sessionID ): #checking if the sessionID matches (doing it this way since I ran into issues where some sessionIDs matched parts of the log file that were not sessionIDs
exportFile.write(line)#writing the log file line to a new text file
holderString = line does nothing bar refer to the same object line so you can simply exportFile.write(line) and forget the assignment.
On a sidenote use lowercase and underscores for variables etc.. holderString -> holder_string and using with to open your files would be best as it also closes them for.
with open('rawLog.txt') as import_file:
log_lines = import_file.readlines()
I also changed FILE to log_lines, using more descriptive names makes your code easier to follow.
Related
trying to implement nested "for" loop in CSV files search in way - 'name' found in one CSV file being searched in other file. Here is code example:
import csv
import re
# Open the input file
with open("Authentication.csv", "r") as citiAuthen:
with open("Authorization.csv", "r") as citiAuthor:
#Set up CSV reader and process the header
csvAuthen = csv.reader(citiAuthen, quoting=csv.QUOTE_ALL, skipinitialspace=True)
headerAuthen = next(csvAuthen)
userIndex = headerAuthen.index("'USERNAME'")
statusIndex = headerAuthen.index("'STATUS'")
csvAuthor = csv.reader(citiAuthor)
headerAuthor = next(csvAuthor)
userAuthorIndex = headerAuthor.index("'USERNAME'")
iseAuthorIndex = headerAuthor.index("'ISE_NODE'")
# Make an empty list
userList = []
usrNumber = 0
# Loop through the authen file and build a list of
for row in csvAuthen:
user = row[userIndex]
#status = row[statusIndex]
#if status == "'Pass'" :
for rowAuthor in csvAuthor:
userAuthor = rowAuthor[userAuthorIndex]
print userAuthor
What is happening that "print userAuthor" make just one pass, while it has to make as many passes as there rows in csvAuthen.
What I am doing wrong? Any help is really appreciated.
You're reading the both files line-by-line from storage. When you search csvAuthor the first time, if the value you are searching for is not found, the file pointer remains at the end of the file after the search. The next search will start at the end of the file and return immediately. You could need to reset the file pointer to the beginning of the file before each search. Probably better just to read both files into memory before you start searching them.
I am using Python 3 to process a results file. The structure of the file is a combination of string identifiers followed by lists of integer values in this format:
ENERGY_BOUNDS
1.964033E+07 1.733253E+07 1.491825E+07 1.384031E+07 1.161834E+07 1.000000E+07 8.187308E+06 6.703200E+06
6.065307E+06 5.488116E+06 4.493290E+06 3.678794E+06 3.011942E+06 2.465970E+06 2.231302E+06 2.018965E+06
EIGENVALUE
1.219034E+00
There are maybe 50 different sets of data with unique identifiers in this file. What I want to do is write a code that will search for a specific identifier (e.g. ENERGY_BOUNDS), then read the values that follow into a list, stopping at the next identifier (in this case EIGENVALUE). I then need to be able to manipulate the list (finding its length, printing its values, etc.).
I am writing this as a function so I can call it multiple times in my code when I want to search for different identifiers. So far what I have is:
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(line)
list_of_results.append(nextValue.rstrip())
return list_of_results
It works fine up until it comes to reading the next line after the identifier, and I am stuck on how to continue reading the results after that line and how to make it stop at the next identifier.
Following is simple and tested answer.
You are making two mistakes
line is a string and not iterator so doing next(line) is causing error.
You are just reading one line after identifier has been found while you need to keep on reading until another identifier appears.
Following is the code after doing little modification of your code. It's also tested on your data
def read_data_from_file(file_name, identifier):
with open(file_name, 'r') as read_obj:
list_of_results = []
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
# If yes, read the next line
nextValue = next(read_obj)
while(not nextValue.strip().isalpha()): #keep on reading untill next identifier appears
list_of_results.extend(nextValue.split())
nextValue = next(read_obj)
print(list_of_results)
I would suggest adding a variable that indicates whether you have found a line containing an identifier.
Afterwards, simply add the values into the array until the next identifier has been reached.
def read_data_from_file(file_name, identifier):
list_of_results = [] # Create list_of_results to put results in for future manipulation
identifier_found = False
# Open the file in read only mode
with open(file_name, 'r') as read_obj:
# Read all lines in the file one by one
for line in read_obj:
# For each line, check if line contains the string
if identifier in line:
identifier_found = True
elif identifier_found:
if line.strip().isalpha(): # Next identifier reached, exit loop
break
list_of_results += line.split() # Add values to result
return list_of_results
Use booleans, continue, and break!
Try to implement logic as follows:
Set a boolean (I'll use in_range) to False
Look through the lines and see if they match the identifier.
If it does, set the boolean to True and continue
If it does not, continue
If the boolean is False AND the line begins with a space: continue
If the boolean is True AND the line begins with a space: Add the line to the list.
If the boolean is True AND the line doesn't begin with a space: break.
This ends the searching process once a new identifier has been started.
The other 2 answers are already helpful. Here is my method incase that you need something else. With comments to explain.
If you dont want to use the end_identifier you can use .isAlpha() which checks if the string only contains letters.
def read_data_from_file(file_name, start_identifier, end_identifier):
list_of_results = []
with open(file_name, 'r') as read_obj:
start_identifier_reached = False # variable to check if we reached the needed identifier_reached
for line in read_obj:
if start_identifier in line:
start_identifier_reached = True # now we reached the identifier
continue # We go back to the start so we dont write the identifier into the list
if start_identifier_reached and (end_identifier not in line): # Put the values into the list until we reach the end_identifier
list_of_results.append(line.rstrip())
else:
return list_of_results
I am trying to do a task where the programme goes through a directory, opens each file by turn, and checks a specific line before anything else. If the line meets a specific criteria (namely, that it does not match this line in any other file in the directory), the file closes and the programme moves onto the next file.
aps = []
import os
for filename in os.listdir("C:\..."):
f = open(filename,"r")
(f.readline())
(f.readline())
ap = (f.readline())
ap = ap.rstrip("\n")
aps.append(ap)
freqs = {}
for ap in aps:
freqs[ap] = freqs.get(ap, 0) + 1
for k, v in freqs.items():
if v == 2:
f.close()
else:
For the 'else:', I originally tried 'f.seek(0)', but got the error of Python being unable to work with a closed file. I then tried 'f = open(filename, "r")' again, but this is doing something odd, as when I try to print the first line through this method it sends it on a crazy loop and prints the line multiple times.
Is this the best way to go about this task? And if not, how could I get it to work?
Many thanks.
Don't close the file conditionally. Do what you need to do with the open file, and then close it at the end. With a with construct the file will close automatically:
for filename in os.listdir(path):
with open(filename) as f:
# do processing here
if positive_condition:
# do more processing
Here is why your code fails. You initialize the aps list outside of your outer for loop, so it will contain the specified line from all files that you loop over. Then your freqs dictionary is reset to empty for each file that you open.
So these lines:
for ap in aps:
freqs[ap] = freqs.get(ap, 0) + 1
loop over each line that has been read so far, and count the frequency. The problem comes in the inner for loop:
for k, v in freqs.items():
if v == 2:
f.close()
What happens here is that freqs has a set of keys potentially as large as the number of files you have looped over so far, and you are looping through each key. So the first time a key has a value of 2, the current file is closed. But then the loop continues, so the next time a key has a value of 2, python tries to close the file, but it is already closed.
The easiest fix is to add a break after the f.close(). But there are better ways to structure this code.
One is to always open a file using a with command, unless you have a good reason to do otherwise. So:
with open(filename,"r") as f:
#code
That way the file will close automatically when you are done with it.
I am assuming that the order you are looping through the files isn't important, and that you want the frequency test to include all the files, not just the ones that have been opened so far. In that case it may be easier to loop through twice, once for assembling your frequency dict, and a second time for doing whatever you want to do to the files that meet frequency requirements.
aps = []
freqs = {}
# First loop to read the important line from all files
for filename in os.listdir("C:\..."):
with open(filename,"r") as f:
f.readline()
f.readline()
ap = f.readline().rstrip("\n")
aps.append(ap)
# Populate the dictionary
for ap in aps:
freqs[ap] = freqs.get(ap, 0) + 1
# Second loop to handle the important cases
for filename in os.listdir("C:\..."):
with open(filename,"r") as f:
f.readline()
f.readline()
ap = f.readline().rstrip("\n")
if freqs[ap] != 2:
#do whatever
I strongly suspect there are more efficient and pythonic ways of getting there, but this is my best thought.
I have some CSV files that I have to modify which I do through a loop. The code loops through the source file, reads each line, makes some modifications and then saves the output to another CSV file. In order to check my work, I want the first line and the last line saved in another file so I can confirm that nothing was skipped.
What I've done is put all of the lines into a list then get the last one from the index minus 1. This works but I'm wondering if there is a more elegant way to accomplish this.
Code sample:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
check_count = 0
check_list = []
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
check_list.append(line)
check_count += 1
if check_count == 1:
check.write(line)
[CSV modifications become a string called "newline"]
fb.write(newline)
final_check = check_list[len(check_list)-1]
check.write(final_check)
fb.close()
If you actually need check_list for something, then, as the other answers suggest, using check_list[-1] is equivalent to but better than check_list[len(check_list)-1].
But do you really need the list? If all you want to keep track of is the first and last lines, you don't. If you keep track of the first line specially, and keep track of the current line as you go along, then at the end, the first line and the current line are the ones you want.
In fact, since you appear to be writing the first line into check as soon as you see it, you don't need to keep track of anything but the current line. And the current line, you've already got that, it's line.
So, let's strip all the other stuff out:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
first_line = True
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
if first_line:
check.write(line)
first_line = False
[CSV modifications become a string called "newline"]
fb.write(newline)
check.write(line)
fb.close()
You can enumerate the csv rows of inpunt file, and check the index, like this:
def CVS1():
with open('C:\\HP\\WS\\final-cir.csv','wb') as fb, open('C:\\HP\\WS\\check-all.csv','wb') as check, open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for idx,line in enumerate(skip_first_line):
if idx==0 or idx==len(skip_first_line):
check.write(line)
#[CSV modifications become a string called "newline"]
fb.write(newline)
I've replaced the open statements with with block, to delegate to interpreter the files handlers
you can access the index -1 directly:
final_check = check_list[-1]
which is nicer than what you have now:
final_check = check_list[len(check_list)-1]
If it's not an empty or 1 line file you can:
my_file = open(root_to file, 'r')
my_lines = my_file.readlines()
first_line = my_lines[0]
last_line = my_lines[-1]
I've been working on a program which assists in log analysis. It finds error or fail messages using regex and prints them to a new .txt file. However, it would be much more beneficial if the program including the top and bottom 4 lines around what the match is. I can't figure out how to do this! Here is a part of the existing program:
def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
line = line.strip()
if re.search(exp, line):
error_logs.add(line)
I'm assuming something needs to be added to the very last line, but I've been working on this for a bit and either am not applying myself fully or just can't figure it out.
Any advice or help on this is appreciated.
Thank you!
Why python?
grep -C4 '^your_regex$' logfile > outfile.txt
Some comments:
I'm not sure why error_logs is a set instead of a list.
Using readlines() will read the entire file in memory, which will be inefficient for large files. You should be able to just iterate over the file a line at a time.
exp (which you're using for re.search) isn't defined anywhere, but I assume that's elsewhere in your code.
Anyway, here's complete code that should do what you want without reading the whole file in memory. It will also preserve the order of input lines.
import re
from collections import deque
exp = '\d'
# matches numbers, change to what you need
def error_finder(filepath, context_lines = 4):
source = open(filepath, 'r')
error_logs = []
buffer = deque(maxlen=context_lines)
lines_after = 0
for line in source:
line = line.strip()
if re.search(exp, line):
# add previous lines first
for prev_line in buffer:
error_logs.append(prev_line)
# clear the buffer
buffer.clear()
# add current line
error_logs.append(line)
# schedule lines that follow to be added too
lines_after = context_lines
elif lines_after > 0:
# a line that matched the regex came not so long ago
lines_after -= 1
error_logs.append(line)
else:
buffer.append(line)
# maybe do something with error_logs? I'll just return it
return error_logs
I suggest to use index loop instead of for each loop, try this:
error_logs = list()
for i in range(len(source)):
line = source[i].strip()
if re.search(exp, line):
error_logs.append((line,i-4,i+4))
in this case your errors log will contain ('line of error', line index - 4, line index + 4), so you can get these lines later form "source"