Python text parsing and saving as html - python

I've been playing around with Python trying to write a script to scan a directory for specific files, finding certain keywords and saving the lines where these keywords appear into a new file. I came up with this;
import sys, os, glob
for filename in glob.glob("./*.LOG"):
with open(filename) as logFile:
name = os.path.splitext(logFile.name)[0]
newLOG = open(name + '_ERROR!'+'.LOG', "w")
allLines = logFile.readlines()
logFile.close()
printList = []
for line in allLines:
if ('ERROR' in line) or ('error' in line):
printList.append(line)
for item in printList:
# print item
newLOG.write(item)
This is all good but I thought I'd try instead saving this new file as html wrapping it all in the rights tags(html,head,body...) so that maybe I could change the font colour of the keywords. So far it looks like this;
import sys, os, glob
for filename in glob.glob("./*.LOG"):
with open (filename) as logFile:
name = os.path.splitext(logFile.name)[0]
newLOG = open(name + '_ERROR!'+'.html', "w")
newLOG.write('<html>')
newLOG.write('<head>')
newLOG.write('<body><p>')
allLines = logFile.readlines()
logFile.close()
printList = []
for line in allLines:
if ('ERROR' in line) or ('error' in line):
printList.append(line)
for item in printList:
# print item
newLOG.write('</html>')
newLOG.write('</head>')
newLOG.write('</body><p>')
newLOG.write(item)
Now the problem is I'm new to this and I'm still trying to figure out how to work with indentations and loops.. Because my html tags are being appended from within the loop, every line has the <html>, <head> & <body><p> tag around them and it just looks wrong. I understand the problem and have tried rewriting things so that the tags are applied outside the loop but I've not had much success.
Could someone show me a better way of getting the file name of the current file, creating a new file+appending it as I think this is why I'm getting the file handling errors when trying to change how it all works.
Thanks

It's a matter of indenting the lines to the right level. The HTML footer must be printed at the indentation level of the header lines, not indented within the loop. Try this:
import sys, os, glob
import cgi
for filename in glob.glob("./*.LOG"):
name = os.path.splitext(filename)[0]
with open(filename, 'r') as logFile, open('%s_ERROR!.html' % name, 'w') as outfile:
outfile.write("<html>\n<head>\n</head>\n<body><p>")
allLines = logFile.readlines()
printList = []
for line in allLines:
if ('ERROR' in line) or ('error' in line):
printList.append(line)
for item in printList:
# Note: HTML-escape value of item
outfile.write(cgi.escape(item) + '<br>')
outfile.write("</p></body>\n</html>")
Note that you don't need to use printList - you could just emit the HTML code as you go through the log.
Consider breaking this into smaller functions for reusability and readability.

Related

Bulk autoreplacing string in the KML file

I have a set of placemarks, which include quite a wide description included in its balloon within the property. Next each single description (former column header) is bounded in . Because of the shapefile naming restriction to 10 characters only.
https://gis.stackexchange.com/questions/15784/bypassing-10-character-limit-of-field-name-in-shapefiles
I have to retype most of these names manually.
Obviously, I use Notepad++, where I can swiftly press Ctrl+F and toggle Replace mode, as you can see below.
The green bounded strings were already replaced, the red ones still remain.
Basically, if I press "Replace All" then it works fine and quickly. Unfortunately, I have to go one by one. As you can see I have around 20 separate strings to "Replace all". Is there a possibility to do it quicker? Because all the .kml files are similar to each other, this is going to be the same everywhere. I need some tool, which will be able to do auto-replace for these headers cut by 10 characters limit. I think, that maybe Python tools might be helpful.
https://pythonhosted.org/pykml/
But in the tool above there is no information about bulk KML editing.
How can I set something like the "Replace All" tool for all my strings preferably if possible?
UPDATE:
I tried the code below:
files = []
with open("YesNF016.kml") as f:
for line in f.readlines():
if line[-1] == '\n':
files.append(line[:-1])
else:
files.append(line)
old_expression = 'ab'
new_expression = 'it worked'
for file in files:
new_file = ""
with open(file) as f:
for line in f.readlines():
new_file += line.replace(old_expression, new_expression)
with open(file, 'w') as f:
f.write(new_file)
The debugger shows:
[Errno 22] Invalid argument: ''
File "\test.py", line 13, in
with open(file) as f:
whereas line 13 is:
with open(file) as f:
The solutions here:
https://www.reddit.com/r/learnpython/comments/b9cljd/oserror_while_using_elementtree_to_parse_simple/
and
OSError: [Errno 22] Invalid argument Getting invalid argument while parsing xml in python
weren't helpful enough for me.
So you want to replace all occurence of X to Y in bunch of files ?
Pretty easy.
Just create a file_list.txt containing the list of files to edit.
python code:
files = []
with open("file_list.txt") as f:
for line in f.readlines():
if line[-1] == '\n':
files.append(line[:-1])
else:
files.append(line)
old_expression = 'ab'
new_expression = 'it worked'
for file in files:
new_file = ""
with open(file) as f:
for line in f.readlines():
new_file += line.replace(old_expression, new_expression)
with open(file, 'w') as f:
f.write(new_file)

Python reading text files

Please help I need python to compare text line(s) to words like this.
with open('textfile', 'r') as f:
contents = f.readlines()
print(f_contents)
if f_contents=="a":
print("text")
I also would need it to, read a certain line, and compare that line. But when I run this program it does not do anything no error messages, nor does it print text. Also
How do you get python to write in just line 1? When I try to do it for some reason, it combines both words together can someone help thank you!
what is f_contents it's supposed to be just print(contents)after reading in each line and storing it to contents. Hope that helps :)
An example of reading a file content:
with open("criticaldocuments.txt", "r") as f:
for line in f:
print(line)
#prints all the lines in this file
#allows the user to iterate over the file line by line
OR what you want is something like this using readlines():
with open("criticaldocuments.txt", "r") as f:
contents = f.readlines()
#readlines() will store each and every line into var contents
if contents == None:
print("No lines were stored, file execution failed most likely")
elif contents == "Password is Password":
print("We cracked it")
else:
print(contents)
# this returns all the lines if no matches
Note:
contents = f.readlines()
Can be done like this too:
for line in f.readlines():
#this eliminates the ambiguity of what 'contents' is doing
#and you could work through the rest of the code the same way except
#replace the contents with 'line'.

Parse multiple log files for strings

I'm trying to parse a number of log files from a log directory, to search for any number of strings in a list along with a server name. I feel like I've tried a million different options, and I have it working fine with just one log file.. but when I try to go through all the log files in the directory I can't seem to get anywhere.
if args.f:
logs = args.f
else:
try:
logs = glob("/var/opt/cray/log/p0-current/*")
except IndexError:
print "Something is wrong. p0-current is not available."
sys.exit(1)
valid_errors = ["error", "nmi", "CATERR"]
logList = []
for log in logs:
logList.append(log)
#theLog = open("logList")
#logFile = log.readlines()
#logFile.close()
#printList = []
#for line in logFile:
# if (valid_errors in line):
# printList.append(line)
#
#for item in printList:
# print item
# with open("log", "r") as tmp_log:
# open_log = tmp_log.readlines()
# for line in open_log:
# for down_nodes in open_log:
# if valid_errors in open_log:
# print valid_errors
down_nodes is a pre-filled list further up the script containing a list of servers which are marked as down.
Commented out are some of the various attempts I've been working through.
logList = []
for log in logs:
logList.append(log)
I thought this may be the way forward to put each individual log file in a list, then loop through this list and use open() followed by readlines() but I'm missing some kind of logic here.. maybe I'm not thinking correctly.
I could really do with some pointers here please.
Thanks.
So your last for loop is redundant because logs is already a list of strings. With that information, we can iterate through logs and do something for each log.
for log in logs:
with open(log) as f:
for line in f.readlines():
if any(error in line for error in valid_errors):
#do stuff
The line if any(error in line for error in valid_errors): checks the line to see if any of the errors in valid_errors are in the line. The syntax is a generator that yields error for each error in valid_errors.
To answer your question involving down_nodes, I don't believe you should include this in the same any(). You should try something like
if any(error in line for error in valid_errors) and \
any(node in line for node in down_nodes):
Firstly you need to find all logs:
import os
import fnmatch
def find_files(pattern, top_level_dir):
for path, dirlist, filelist in os.walk(top_level_dir):
for name in fnmatch.filter(filelist, pattern)
yield os.path.join(path, name)
For example, to find all *.txt files in current dir:
txtfiles = find_files('*.txt', '.')
Then get file objects from the names:
def open_files(filenames):
for name in filenames:
yield open(name, 'r', encoding='utf-8')
Finally individual lines from files:
def lines_from_files(files):
for f in files:
for line in f:
yield line
Since you want to find some errors the check could look like this:
import re
def find_errors(lines):
pattern = re.compile('(error|nmi|CATERR)')
for line in lines:
if pattern.search(line):
print(line)
You can now process a stream of lines generated from a given directory:
txt_file_names = find_files('*.txt', '.')
txt_files = open_files(txt_file_names)
txt_lines = lines_from_files(txt_files)
find_errors(txt_lines)
The idea to process logs as a stream of data originates from talk by David Beazley.

Python- need to append characters to the beginning and end of each line in text file

I should preface that I am a complete Python Newbie.
Im trying to create a script that will loop through a directory and its subdirectories looking for text files. When it encounters a text file it will parse the file and convert it to NITF XML and upload to an FTP directory.
At this point I am still working on reading the text file into variables so that they can be inserted into the XML document in the right places. An example to the text file is as follows.
Headline
Subhead
By A person
Paragraph text.
And here is the code I have so far:
with open("path/to/textFile.txt") as f:
#content = f.readlines()
head,sub,auth = [f.readline().strip() for i in range(3)]
data=f.read()
pth = os.getcwd()
print head,sub,auth,data,pth
My question is: how do I iterate through the body of the text file(data) and wrap each line in HTML P tags? For example;
<P>line of text in file </P> <P>Next line in text file</p>.
Something like
output_format = '<p>{}</p>\n'.format
with open('input') as fin, open('output', 'w') as fout:
fout.writelines( output_format(line.strip()) for line in fin )
This assumes that you want to write the new content back to the original file:
with open('path/to/textFile.txt') as f:
content = f.readlines()
with open('path/to/textFile.txt', 'w') as f:
for line in content:
f.write('<p>' + line.strip() + '</p>\n')
with open('infile') as fin, open('outfile',w) as fout:
for line in fin:
fout.write('<P>{0}</P>\n'.format(line[:-1]) #slice off the newline. Same as `line.rstrip('\n')`.
#Only do this once you're sure the script works :)
shutil.move('outfile','infile') #Need to replace the input file with the output file
in you case, you should probably replace
data=f.read()
with:
data = '\n'.join("<p>%s</p>" % l.strip() for l in f)
use data=f.readlines() here,
and then iterate over data and try something like this:
for line in data:
line="<p>"+line.strip()+"</p>"
#write line+'\n' to a file or do something else
append the and <\p> for each line
ex:
data_new=[]
data=f.readlines()
for lines in data:
data_new.append("<p>%s</p>\n" % data.strip().strip("\n"))
You could use the fileinput module to modify one or more files in-place, with optional backup file creation if desired (see its documentation for details). Here's it being used to process one file.
import fileinput
for line in fileinput.input('testinput.txt', inplace=1):
print '<P>'+line[:-1]+'<\P>'
The 'testinput.txt' argument could also be a sequence of two or more file names instead of just a single one, which could be useful especially if you're using os.walk() to generate the list of files in the directory and its subdirectories to process (as you probably should be doing).

With python and cherrypy, how do i read a txt file and display it to a page?

I have got a txt file in the format of:
line_1
line_2
line_3
I am trying to read it into a list and displaying it on to a web page just as it looks inside the txt file; one line under another. Here is my code
#cherrypy.expose
def readStatus(self):
f = open("directory","r")
lines = "\n".join(f.readlines())
f.close()
page += "<p>%s</p>" % (lines)
However, the output i have been getting is:
line_1 line_2 line_3
It would be great if someone could give me a hit as to what to do so line_1, line_2 and line_3 are displayed on 3 seperate lines inside the web browser?
Thanks in advance.
You're wrapping paragraph tags around all of the filenames. You probably meant to put paragraph tags around each filename individually:
with open("directory", "r") as f:
page = "\n".join("<p>%s</p>" % line for line in f)
Or, more semantically, you could put it all in an unordered list:
with open("directory", "r") as f:
page = '<ul>%s</ul>' % "\n".join("<li>%s</li>" % line for line in f)
Alternatively, you could put it all inside of a pre (preformatted text) tag:
with open('directory', 'r') as f:
page = '<pre>%s</pre>' % f.read()
Additionally, you might want to consider escaping the filenames with cgi.escape so browsers don't interpret any special characters in the filename.

Categories

Resources