Python 2.7 - Read text replace and write to same file - python

I am reading a text file ( 20 + lines) and doing a find and replace at multiple places in the text with the below code .
with open(r"c:\TestFolder\test_file_with_State.txt","r+") as fp:
finds = 'MI'
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,'MICHIGAN',textdata)
fp.write(line)
When trying to write it back to the same file, I get the below error.
IOError Traceback (most recent call last)
<ipython-input> in <module>()
6 line = re.sub(pattern,'MICHIGAN',textdata)
7 print line
----> 8 fp.write(line)
9
what is that I am doing wrong.

You've already read the file in so you're at the end of the file and there's nowhere to write the text to.
You can get around this by going back to the beginning of the file with fp.seek(0)
Also surrounding whitespace is being consumed by the regex so you can add it back in.
So your code would be:
with open(r"c:\TestFolder\test_file_with_State.txt","r+") as fp:
finds = 'MI'
pattern = re.compile(r'[,\s]+' + re.escape(finds) + r'[\s]+')
textdata = fp.read()
line = re.sub(pattern,' MICHIGAN ',textdata)
fp.seek(0)
fp.write(line)

Related

I am unable to create multiple files with this code, what is wrong?

So I'm trying to write a program that takes names from a list and adds it to a letter. A text file is created for each name for the letter however the code seems to stop working at that point.
letter = []
names = []
file = open("Input/Letters/starting_letter.txt", "r")
letter = file.readlines()
file.close()
name1 = open("Input/Names/invited_names.txt", "r")
names = name1.readlines()
name1.close()
for name in names:
create_letter = open(f"{name}.txt", "w")
for line in letter:
line = line.replace("[name],", f"{name},")
create_letter.write(line)
create_letter.close()
I get the error message
Traceback (most recent call last):
File "C:\Users\Default User\PycharmProjects\Mail Merge Project Start\main.py", line 10, in <module>
create_letter = open(f"{name}.txt", "w")
OSError: [Errno 22] Invalid argument: 'Aang\n.txt'
Is there a problem with the way I am creating the files?
You can't have newlines in your file name. It is invalid in your OS/filesystem.
Remove them with:
open(f"{name.strip()}.txt", "w")
Or:
open(f"{name.replace('\n', '')}.txt", "w")

I need to print the specific part of a line in a txt file

I have this text file that reads ,,Janitors, 3,, ,,Programers, 4,, and ,,Secretaries, 1,, and all of these are on different lines. I need to print out Janitor seperate from the number 3, and this has to work for basicaly any word and number combo. This is the code I came up with and, of course, it doesnt work. It says ,,substring not found,,
File = open("Jobs.txt", "r")
Beg_line = 1
for lines in File:
Line = str(File.readline(Beg_line))
Line = Line.strip('\n')
print(Line[0: Line.index(',')])
Beg_line = Beg_line + 1
File.close()
Try running the following code:
file = open("Jobs.txt", "r")
lines = file.read().split('\n')
for line in lines:
print(line.split(' ')[0])
file.close()
This will give the following output:
Janitors
Programers
Secretaries

Searching a document for specific strings, then print out a part of that string

So in my program im attempting to search a document called console.log that has lines like this:
65536:KONSOLL:1622118174:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
65536:KONSOLL:1622177574:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
65536:KONSOLL:1622190642:NSN:From AutroSafe: 28 5 2021, 08:30:42; 05.046; Service: "Self Verify" mislykket; ; ; ProcessMsg; F:2177 L:655; 53298;1;13056;;
65536:KONSOLL:1622204573:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34
In my input i always specify "Self Verify" as im looking after that. I want the detectornumber (05.046) on the output. But i get a error.
This is my code:
import os
import re
pattern = input("What are you searching for? -->")
detectorPattern = re.compile(r'\d\d.\d\d\d')
directory = os.listdir()
for x in range(len(directory)):
with open(directory[x], 'r') as reader:
print("Opening " + directory[1])
# Read and print the entire file line by line
for line in reader:
findLine = re.search(pattern, line)
if findLine is not None:
mo = detectorPattern.search(findLine)
print(mo.group())
So what im trying to do is to to go for one line, and if i find "Self Verify" i will search that line for the detector specified in detectorPattern, and print that one out.
This is the error i get:
Traceback (most recent call last):
File "C:\Users\haral\Desktop\PP\SVFinder.py", line 14, in <module>
mo = detectorPattern.search(findLine)
TypeError: expected string or bytes-like object
Change:
mo = detectorPattern.search(findLine)
To:
mo = detectorPattern.search(findLine.string)
This will print:
162219
when executing the line:
print(mo.group())
I suggest you put directly line into detectorPattern.search.
Also please not if x is not None: can be replaced by if x:
This example should works as is:
import re
pattern = "Self Verify"
reader = ['65536:KONSOLL:1622118174:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34',
'65536:KONSOLL:1622177574:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34',
'65536:KONSOLL:1622190642:NSN:From AutroSafe: 28 5 2021, 08:30:42; 05.046; Service: "Self Verify" mislykket; ; ; ProcessMsg; F:2177 L:655; 53298;1;13056;; ',
'65536:KONSOLL:1622204573:NSN:ActivationUnits::HandleActivationUnitsMessages. There is no handler for FNo: 34]']
for line in reader:
findLine = re.search(pattern, line)
detectorPattern = re.compile(r'\d\d.\d\d\d')
if findLine:
detector = detectorPattern.search(line).group()
print(detector)
Output:
162219
I want the detectornumber (05.046) on the output
This seems to do the job. I just changed findLine by pattern in detectorPattern.
import os
import re
pattern = input("What are you searching for? -->")
detectorPattern = re.compile(r'\d\d.\d\d\d')
directory = os.listdir()
for x in range(len(directory)):
with open(directory[x], 'r') as reader:
print("Opening " + directory[1])
# Read and print the entire file line by line
for line in reader:
findLine = re.search(pattern, line)
if findLine is not None:
mo = detectorPattern.search(pattern)
print(mo.group())

How to parse XML document without closing tags (python)?

I am trying to read an xml document that does not appear to have closing tags. I did not make this XML document, but I'm downloading it from the following location:
import ftplib
import xml.etree.cElementTree as et
filename = 'FBOFeed20170509'
ftp = ftplib.FTP('ftp.fbo.gov')
ftp.login(user = '', passwd = '')
localfile = open(filename, 'wb')
ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
ftp.quit()
localfile.close()
tree = et.parse(filename)
for node in tree.iter():
print (node.tag, node.attrib)
And here is my error:
Traceback (most recent call last):
File "", line 18, in <module>
tree = et.parse(filename)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 1184, in parse
tree.parse(source, parser)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/xml/etree/ElementTree.py", line 596, in parse
self._root = parser._parse_whole(source)
File "<string>", line None
xml.etree.ElementTree.ParseError: mismatched tag: line 24, column 2
So I opened the file with a text editor to take a look, and see there are no closing tags. Here are the first 24 lines:
<PRESOL>
<DATE>0509
<YEAR>17
<AGENCY>Department of the Air Force
<OFFICE>Air Education and Training Command
<LOCATION>Luke AFB Contracting Squadron
<ZIP>85309
<CLASSCOD>Z
<NAICS>238320
<OFFADD>14100 W. Eagle Street Luke AFB AZ 85309
<SUBJECT>Painting IDIQ Luke AFB
<SOLNBR>FA488717R0005
<CONTACT>Justin A Cheeks, Phone 8566232747, Email justin.cheeks#us.af.mil
<DESC>The 56th ...
<LINK>
<URL>https://www.fbo.gov/spg/USAF/AETC/LukeAFBCS/FA488717R0005/listing.html
<DESC>Link To Document
<SETASIDE>Service-Disabled Veteran-Owned Small Business
<POPCOUNTRY>US
<POPZIP>85309
<POPADDRESS>14100 W Eagle Street (B-26)
Luke AFB, AZ
</PRESOL>
I guess the error is related to the fact that PRESOL is closed with /PRESOL, but none of the other tags are closed. This is a straightforward entry, some of the others include various HTML tags in the DESC or CONTACT portions so I'm not sure how I could write something to close the tags before I parse, for example here is another portion of the file:
<CONTACT>Tammy Davis
Tammy.Davis6#va.gov
Tammy.Davis6#va.gov
<DESC>The purpose...
I'm not sure if all of the tags are in the same order or even the same for each entry. Is this even an XML format? Is there a different python library I should be using here?
https://github.com/presidential-innovation-fellows/fbo-parser Parses the daily FBO file into JSON which adds the closing tags to the fields within the notice types. I use it then convert it to an XML file to import data into my database.
I've gone through the same issue recently. Below the piece of code I used to save all "PRESOL" (i.e., presolicitation notices) into a .csv file
tags = ["DATE","YEAR","AGENCY","OFFICE","LOCATION","ZIP",
"CLASSCOD","OFFADD","SUBJECT","SOLNBR","RESPDATE","ARCHDATE",
"CONTACT","CONTACTDESC","LINK","URL","URLDESC"]
for y in range(2005,2019):
outfile = 'my_dir/FBO' + str(y) + '.csv' # output file
yearterm = 'FBOFeed' + str(y) + '*'
counter = 0
with open(outfile, 'w+') as g:
writer = csv.writer(g)
writer.writerow(tags)
for csvfile in glob.glob(yearterm):
inpresol = 0
oldtag = '' # initiate the definition of old tag
with open(csvfile, 'r+', encoding="latin_1") as f:
for line in f:
tag = line[line.find("<")+1:line.find(">")] # find the line tag
if tag == "DESC": # there are multiple "DESC" tags, take care of them
dicttag = oldtag + tag
else:
dicttag = tag
if "<PRESOL>" in line: # start of the record: initiate the dictionary
d = {x : [] for x in tags}
inpresol = 1
continue
elif "</PRESOL>" in line: # end of the record
writer.writerow([d["DATE"],d["YEAR"],d["AGENCY"],d["OFFICE"],
d["LOCATION"],d["ZIP"],d["CLASSCOD"],d["OFFADD"],
d["SUBJECT"],d["SOLNBR"],d["RESPDATE"],
d["ARCHDATE"],d["CONTACT"],d["CONTACTDESC"],d["LINK"],
d["URL"],d["URLDESC"]])
inpresol = 0
continue
if inpresol == 1: # store the results
tagged_tag = "<" + tag + ">"
newline = line.replace(tagged_tag, "")
d[dicttag] = newline
oldtag = tag
f.close()
g.close()
I know that it is not much "pythonic", but it works pretty well and stores yearly files with the record in .csv format.
You could use regular expressions to create end-tags and do something like this:
text = re.sub(r"<(\w+)>\s+([^<]+|)", r"<\1>\2</\1>", text)
text = re.sub(r"<PRESOL>\s*</PRESOL>", "<PRESOL>", text)

I am trying to decompress a file that I created

My code has a file "filefile.txt" which has a compressed sentence in it. The file is laid out like :
1
2
3
4
5
1
2
6
9
10
11
2
12
12
9
This
is
a
sentence
.
too
!
Yo
yo
bling
The original text that I want to decompress says "!"
My code says:
fo = open("filefile.txt","r")
script = fo.readline()
script2 = fo.readline()
fo.close()
script2 = script2.split()
script = [s.strip("\n") for s in script]
sentencewords = []
while len(script) > 0:
for p in script:
sentencewords.append(enumerate(script2.index(p)))
script.remove(0)
print(sentencewords)
This is the error:
Traceback (most recent call last):
File "F:\code attempts\AT13.py", line 46, in <module>
sentencewords.append(enumerate(script2.index(p)))
ValueError: '1' is not in list
I need sentencewords to contain "This is a sentence. This is too! Yo yo bling bling!"
I have changed it now but it still doesn't work.
sentencewords.append(enumerate(script2.enumerate(p)))
'Traceback (most recent call last):
File "F:\code attempts\AT13.py", line 46, in
sentencewords.append(enumerate(script2.enumerate(p)))
AttributeError: 'list' object has no attribute 'enumerate''
Does anyone know if there is another way round this problem or how to fix my current code?
fo = open("filefile.txt","r")
script = fo.readline()
script2 = fo.readline()
fo.close()
script2 = script2.split()
script = [s.strip("\n") for s in script]
sentencewords = []
indexes = []
for line in fo:
if line.strip().isdigit():
indexes.append(line)
else:
break
words = [line.strip() for line in fo if line.strip()]
while len(script) > 0:
for p in script:
sentencewords.append(words[index-1])
print(sentencewords)
Updated code but I don't know what the I/O thing means in the latest output from python.
Traceback (most recent call last):
File "F:/code attempts/attempt14.py", line 45, in <module>
for line in fo:
ValueError: I/O operation on closed file.
fo.close() has been moved further down the code and now it says
Traceback (most recent call last):
File "F:\code attempts\attempt14.py", line 55, in <module>
sentencewords.append(words[index-1])
MemoryError
Any suggestions on how to fix my code, I'd be grateful for
thanks
Format the text file in a better way and it will be easier to deal with.
1 2 3 4 5 1 2 6 9 10 11 2 12 12 9
This is a sentence. too ! Yo yo bling
Then do this...
script = []
sentencewords = []
with open("filefile.txt", "r") as fo:
for line in fo:
script.append(line.strip("\n").split(" "))
for i in script[0]:
sentencewords.append(script[1][int(i)-1])
print(sentencewords)
Your indices above 10 will give issues though, because you don't have that many words.

Categories

Resources