Python: re-formatting multiple lines in text file

Python: re-formatting multiple lines in text file - python

I apologize if this post is long, but I am trying to be as detailed as possible. I have done a considerable amount of research on the topic, and would consider myself an "intermediate" skilled programmer.
My problem: I have a text file with multiple lines of data. I would like to remove certain parts of each line in an effort to get rid of some irrelevant information, and then save the file with the newly formatted lines.
Here is an example of what I am trying to accomplish. The original line is something like:
access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594
I am trying to have the code read the text file, and output:
permit tcp any 209.143.156.200 www
The following code works, but only if there is a single line in the text file:
input_file = open("ConfigInput.txt", "r")
output_file = open("ConfigOutput.txt", "w")
for line in input_file:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
output_file.close()
input_file.close()
However, when I attempt to run this with a full file of multiple lines of data, I receive an error:
File "C:\Python27\asaReader", line 5, in <module>
line = line.split("extended ", 1)[1]
IndexError: list index out of range
I suspect that it is not moving onto the next line of data in the text file, and therefore there isn't anything in [1] of the previous string. I would appreciate any help I can get on this.

Some possible causes:
You have blank lines in your file (blank lines obviously won't contain the word extended)
You have lines that aren't blank, but don't contain the word extended
You could try printing your lines individually to see where the problem occurs:
for line in input_file:
print("Got line: %s" % (line))
line = line.split("extended ", 1)[1]
Oh, and it's possible that the last line is blank and it's failing on that. It would be easy enough to miss.

Print something out when you hit a line that can't be processed
for line in input_file:
try:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
except Exception, e:
print "Choked on this line: %r"%line
print e

An alternate approach would be to cache all the lines (assuming the file is not humongous.)
>>> with open('/tmp/ConfigInput.txt', 'rU') as f:
... lines = f.readlines()
...
...
>>> lines
['access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594\n']
>>> lines = [re.sub('(^.*extended |\(.*$)', '', line) for line in lines]
>>> lines
['permit tcp any host 209.143.156.200 eq www \n']
>>> with open('/tmp/ConfigOutput.txt', 'w') as f:
... f.writelines(lines)
...
...
>>>

Related

I need to print the specific part of a line in a txt file

I have this text file that reads ,,Janitors, 3,, ,,Programers, 4,, and ,,Secretaries, 1,, and all of these are on different lines. I need to print out Janitor seperate from the number 3, and this has to work for basicaly any word and number combo. This is the code I came up with and, of course, it doesnt work. It says ,,substring not found,,
File = open("Jobs.txt", "r")
Beg_line = 1
for lines in File:
Line = str(File.readline(Beg_line))
Line = Line.strip('\n')
print(Line[0: Line.index(',')])
Beg_line = Beg_line + 1
File.close()

Try running the following code:
file = open("Jobs.txt", "r")
lines = file.read().split('\n')
for line in lines:
print(line.split(' ')[0])
file.close()
This will give the following output:
Janitors
Programers
Secretaries

How can we write a text file from variable using python?

I am working on NLP project and have extracted the text from pdf using PyPDF2. Further, I removed the blank lines. Now, my output is being shown on the console but I want to populate the text file with the same data which is stored in my variable (file).
Below is the code which is removing the blank lines from a text file.
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file=line
print(file)
Output on Console:
Eclipse,
Visual Studio 2012,
Arduino IDE,
Java
,
HTML,
CSS
2013
Excel
.
Now, I want the same data in my (resume1.txt) text file. I have used three methods but all these methods print a single dot in my resume1.txt file. If I see at the end of the text file then there is a dot which is being printed.
Method 1:
with open("resume1.txt", "w") as out_file:
out_file.write(file)
Method 2:
print(file, file=open("resume1.txt", 'w'))
Method 3:
pathlib.Path('resume1.txt').write_text(file)
Could you please be kind to assist me in populating the text file. Thank you for your cooperation.

First of all, note that you are writing to the same file losing the old data, I don't know if you want to do that. Other than that, every time you write using those methods, you are overwriting the data you previously wrote to the output file. So, if you want to use these methods, you must write just 1 time (write all the data).
SOLUTIONS
Using method 1:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
with open("resume1.txt", "w") as out_file:
out_file.write(to_save)
Using method 2:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
print(to_save, file=open("resume1.txt", 'w'))
Using method 3:
import pathlib
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
pathlib.Path('resume1.txt').write_text(to_save)
In these 3 methods, I have used to_save = '\n'.join(to_file) because I'm assuming you want to separate each line of other with an EOL, but if I'm wrong, you can just use ''.join(to_file) if you want not space, or ' '.join(to_file) if you want all the lines in a single one.
Other method
You can do this by using other file, let's say 'output.txt'.
out_file = open('output.txt', 'w')
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
out_file.close()
Also, you can do this (I prefer this):
with open('output.txt', 'w') as out_file:
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL

First post on stack, so excuse the format
new_line = ""
for line in open('resume1.txt', "r"):
for char in line:
if char != " ":
new_line += char
print(new_line)
with open('resume1.txt', "w") as f:
f.write(new_line)

Python match a regex with specified exceptions

I'm using Python Fabric, trying to comment all lines in a file that begin with "#", unless that "#" is followed by 2 specific IP addresses. So if the file contains (without the bullets)
#hi
#IP1
some stuff here
#IP2
then the resulting file should be (also without the bullets)
##hi
#IP1
some stuff here
#IP2
This is what I have so far:
def verify():
output = sudo("/sbin/service syslog status")
#if syslog is running
if 'is running...' in output:
#then set output to the value of the conf file
output = sudo("cat /etc/syslog.conf")
#If pattern is matched
if "#" in output and not "#IP1" and not "#IP2":
#read all the lines in the conf file
sys.stdout = open('/etc/syslog.conf', 'r+').readlines()
#and for every line, comment if it matches pattern
for line in sys.stdout:
if "#" in line and not "#1P1" and not "#IP2":
line = "#" + line
else:
print GOOD
else:
print RSYSLOG
I get that when I say
if "#" in output and not "#IP1" and not "#IP2"
Python is thinking that I am saying "do some thing if there is an # in the file, but ONLY if you also do not have #IP1 and #IP2." What I'm trying to say is "do some thing to any line starting with an #, except the lines #IP1 and #IP2." Also I know there are other errors in my code, but I'm working on just this now.
thanks.

Regex solution:
You can use the following regex to match:
^(?=#(?!(IP1|IP2)))
And replace with #
See DEMO
Code:
re.sub(r'^(?=#(?!(IP1|IP2)))', r'#', myStr)

I would do like,
if not "#IP1" in output or not "#IP2" in output:
if output.startswith("#"):
// stuff here

Check if criteria exists in the glob, and if so, open file, read line-by-line, and use re.sub() to add a # inline to the lines that require it.
import re
ip1 = '1.1.1.1'
ip2 = '2.2.2.2'
fh = open('in.txt', 'r')
f = fh.read()
fh.close()
if re.search(r'(#(?!({0}|{1})))'.format(ip1, ip2), f):
fh = open('in.txt', 'r')
for line in fh:
line = re.sub(r'^(#(?!({0}|{1})))'.format(ip1, ip2), r'#\1', line)
print(line)
Input file:
#1.1.1.1
#this
#2.2.2.2
#3.3.3.3
#
no #
blah
Output:
#1.1.1.1
##this
#2.2.2.2
##3.3.3.3
##
no #
blah

startswith() can't find '//' at the front of my string

I've got a pretty simple python script that reads in a file, and parses it line by line.
It doesn't seem to recognize the '//' at the start of my lines. If I change it to look for '#' at the start of my lines, it doesn't find those lines either. Am I just misunderstanding this?
line = fIn.readline()
while line:
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"
line = fIn.readline()
The file I'm reading in looks like this:
// Feedback
"Feedback" = "Feedback";
// New strings
"File URL not reachable." = "File URL not reachable.";
And the debug line looks appropriate when it prints out:
line is // Feedback
line is "Feedback" = "Feedback";
line is
line is // New strings
line is "File URL not reachable." = "File URL not reachable.";
line is

Better version:
with open("abc") as f:
for line in f:
line=line.strip()
if line and line.startswith("//"):
print "line is",line
print "winner"
print next(f)
....:
output:
line is // Feedback
winner
"Feedback" = "Feedback";
line is // New strings
winner
"File URL not reachable." = "File URL not reachable.";

You are only reading one line of your text file. Other than you have the wrong indent on the last line, it seems to work. Try running your program after making sure line = fIn.readline() gets executed on each iteration (move it one block to the left).
Here is what I get after fixing that one line, is this the desired output?
line is // Feedback
winner
line is "Feedback" = "Feedback";
line is
line is // New strings
winner
line is "File URL not reachable." = "File URL not reachable.";
Edit: does this work for you?
for line in open("yourfile.txt").readlines():
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"

try this
for line in fIn:
print "line is", line
line = line.strip()
if line[0:2]=='//':
print "winner"
line = fIn.readline()

how to replace (update) text in a file line by line

I am trying to replace text in a text file by reading each line, testing it, then writing if it needs to be updated. I DO NOT want to save as a new file, as my script already backs up the files first and operates on the backups.
Here is what I have so far... I get fpath from os.walk() and I guarantee that the pathmatch var returns correctly:
fpath = os.path.join(thisdir, filename)
with open(fpath, 'r+') as f:
for line in f.readlines():
if '<a href="' in line:
for test in filelist:
pathmatch = file_match(line, test)
if pathmatch is not None:
repstring = filelist[test] + pathmatch
print 'old line:', line
line = line.replace(test, repstring)
print 'new line:', line
f.write(line)
But what ends up happening is that I only get a few lines (updated correctly, mind you, but repeated from earlier in the file) corrected. I think this is a scoping issue, afaict.
*Also: I would like to know how to only replace the text upon the first instance of the match, for ex., I don't want to match the display text, only the underlying href.

First, you want to write the line whether it matches the pattern or not. Otherwise, you're writing out only the matched lines.
Second, between reading the lines and writing the results, you'll need to either truncate the file (can f.seek(0) then f.truncate()), or close the original and reopen. Picking the former, I'd end up with something like:
fpath = os.path.join(thisdir, filename)
with open(fpath, 'r+') as f:
lines = f.readlines()
f.seek(0)
f.truncate()
for line in lines:
if '<a href="' in line:
for test in filelist:
pathmatch = file_match(line, test)
if pathmatch is not None:
repstring = filelist[test] + pathmatch
line = line.replace(test, repstring)
f.write(line)

Open the file for read and copy all of the lines into memory. Close the file.
Apply your transformations on the lines in memory.
Open the file for write and write out all the lines of text in memory.
with open(filename, "r") as f:
lines = (line.rstrip() for line in f)
altered_lines = [some_func(line) if regex.match(line) else line for line in lines]
with open(filename, "w") as f:
f.write('\n'.join(altered_lines) + '\n')

A (relatively) safe way to replace a line in a file.
#!/usr/bin/python
# defensive programming style
# function to replace a line in a file
# and not destroy data in case of error
def replace_line(filepath, oldline, newline ):
"""
replace a line in a temporary file,
then copy it over into the
original file if everything goes well
"""
# quick parameter checks
assert os.exists(filepath) # !
assert ( oldline and str(oldline) ) # is not empty and is a string
assert ( newline and str(newline) )
replaced = False
written = False
try:
with open(filepath, 'r+') as f: # open for read/write -- alias to f
lines = f.readlines() # get all lines in file
if oldline not in lines:
pass # line not found in file, do nothing
else:
tmpfile = NamedTemporaryFile(delete=True) # temp file opened for writing
for line in lines: # process each line
if line == oldline: # find the line we want
tmpfile.write(newline) # replace it
replaced = True
else:
tmpfile.write(oldline) # write old line unchanged
if replaced: # overwrite the original file
f.seek(0) # beginning of file
f.truncate() # empties out original file
for tmplines in tmpfile:
f.write(tmplines) # writes each line to original file
written = True
tmpfile.close() # tmpfile auto deleted
f.close() # we opened it , we close it
except IOError, ioe: # if something bad happened.
printf ("ERROR" , ioe)
f.close()
return False
return replaced and written # replacement happened with no errors = True
(note: this replaces entire lines only , and all of the lines that match in the file)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: re-formatting multiple lines in text file - python

Related

I need to print the specific part of a line in a txt file

How can we write a text file from variable using python?

Python match a regex with specified exceptions

startswith() can't find '//' at the front of my string

how to replace (update) text in a file line by line

Categories

Resources