The problem is: I have a data set to clean. I am currently using Python 3.6 as intrepreter in PyCharm(community edition) to work on this.
I need to:
Find a line where the word "Code" appears and
paste all the following lines in a single line together until
the next word "Code" comes
This is would essentially break the data into 2 fields ,namely; Code and details of the company.
The final output needs to be in a table in a text file or csv written through Pycharm itself and this format is critical.
The following is the input(extract from actual textfile) :
345- Code # 98882 +
"Ms, ABDUL RAFAY & COMPANY, +"
"907, 2nd Floor, tradeway Centre,33, Block-6, PECHS, Karach +"
Ph:345598 1334 558106 +
Mr. Abdul rafay Siddiqui +
347 Code # 96663 +
"Ms. BILAL & BROTHERS Plot No.F-8, Estate #2, Lalazar, Karachi Ph:322575.84 +"
Mr. Mubarak Shahid +
A23 - Code : BO229 +
"Ms. RAHMAN & SONS 303, 3rd Floor, Square One, Dundas street, Karachi P:36268947 +"
"Mr, Saleem Mughal +"
"349- Code # 93369 Ms, ALIAPPAREL +"
"Office No. 491/307, 1st Floor, Blessings Tower near Tipu Burger , P?:34990456 +"
"Mr, Nasir Wali +"
The output should be like this :
Code - Company details
345- Code # 98882 + -"Ms, ABDUL RAFAY & COMPANY, +""907, 2nd Floor, tradeway Centre,33, Block-6, PECHS, Karach +"Ph:345598 1334 558106 +Mr. Abdul rafay Siddiqui +
347 Code # 96663 + - "Ms. BILAL & BROTHERS Plot No.F-8, Estate #2, Lalazar, Karachi Ph:322575.84 +"Mr. Mubarak Shahid +
The key to the data is that the company details are sometimes in one line or two or three .So if there could be a way to iterate over these till the next 'Code' appears. I had tried this before in R but couldnt come up with anything concrete excepting adding + which could be stripped off here.
All you need to do is iterate through the file, looking for a line that signals the start of a new data block.
This does more or less what you want:
def emit(lines, dest):
if lines:
print("".join(lines), file=dest)
company_data = []
with open('details.txt') as data_in, open('fixed_details','w') as data_out:
for line in data_in:
if "Code " in line: # start of a new company: output the previous one
emit(company_data, data_out)
company_data = []
company_data.append (line.strip())
emit(company_data, data_out)
It doesn't do exactly what you want because your sample output sometimes specifies a hyphen between the company code and the rest of the data, and sometimes a hyphen and a space.
345- Code # 98882 + -"Ms, ABDUL RAFAY ...(etc)
347 Code # 96663 + - "Ms. BILAL & BROTHERS ...(etc)
^ this is the space
In line 345 there is no space but in line 347 there is. There is no corresponding space in your sample input data so it isn't clear what you want the program to do. I just left the hyphen out. I'll leave sorting that out (and supplying the headings) up to you. You will probably want to change the print() call to distinguish between the first line of data and the rest:
print(lines[0], "-", "".join(lines[1:]), file=dest)
This is the output:
345- Code # 98882 +"Ms, ABDUL RAFAY & COMPANY, +""907, 2nd Floor, tradeway Centre,33, Block-6, PECHS, Karach +"Ph:345598 1334 558106 +Mr. Abdul rafay Siddiqui +
347 Code # 96663 +"Ms. BILAL & BROTHERS Plot No.F-8, Estate #2, Lalazar, Karachi Ph:322575.84 +"Mr. Mubarak Shahid +
A23 - Code : BO229 +"Ms. RAHMAN & SONS 303, 3rd Floor, Square One, Dundas street, Karachi P:36268947 +""Mr, Saleem Mughal +"
"349- Code # 93369 Ms, ALIAPPAREL +""Office No. 491/307, 1st Floor, Blessings Tower near Tipu Burger , P?:34990456 +""Mr, Nasir Wali +"
Related
I have the following text file as input
Patient Name: XXX,A
Date of Service: 12/12/2018
Speaker ID: 10531
Visit Start: 06/07/2018
Visit End: 06/18/2018
Recipient:
REQUESTING PHYSICIAN:
Mr.XXX
REASON FOR CONSULTATION:
Acute asthma.
HISTORY OF PRESENT ILLNESS:
The patient is a 64-year-old female who is well known to our practice. She has not been feeling well over the last 3 weeks and has been complaining of increasing shortness of breath, cough, wheezing, and chest tightness. She was prescribed systemic steroids and Zithromax. Her respiratory symptoms persisted; and subsequently, she went to Capital Health Emergency Room. She presented to the office again yesterday with increasing shortness of breath, chest tightness, wheezing, and cough productive of thick sputum. She also noted some low-grade temperature.
PAST MEDICAL HISTORY:
Remarkable for bronchial asthma, peptic ulcer disease, hyperlipidemia, coronary artery disease with anomalous coronary artery, status post tonsillectomy, appendectomy, sinus surgery, and status post rotator cuff surgery.
HOME MEDICATIONS:
Include;
1. Armodafinil.
2. Atorvastatin.
3. Bisoprolol.
4. Symbicort.
5. Prolia.
6. Nexium.
7. Gabapentin.
8. Synthroid.
9. Linzess_____.
10. Montelukast.
11. Domperidone.
12. Tramadol.
ALLERGIES:
1. CEPHALOSPORIN.
2. PENICILLIN.
3. SULFA.
SOCIAL HISTORY:
She is a lifelong nonsmoker.
PHYSICAL EXAMINATION:
GENERAL: Shows a pleasant 64-year-old female.
VITAL SIGNS: Blood pressure 108/56, pulse of 70, respiratory rate is 26, and pulse oximetry is 94% on room air. She is afebrile.
HEENT: Conjunctivae are pink. Oral cavity is clear.
CHEST: Shows increased AP diameter and decreased breath sounds with diffuse inspiratory and expiratory wheeze and prolonged expiratory phase.
CARDIOVASCULAR: Regular rate and rhythm.
ABDOMEN: Soft.
EXTREMITIES: Does not show any edema.
LABORATORY DATA:
Her INR is 1.1. Chemistry; sodium 139, potassium 3.3, chloride 106, CO2 of 25, BUN is 10, creatinine 0.74, and glucose is 110. BNP is 40. White count on admission 16,800; hemoglobin 12.5; and neutrophils 88%. Two sets of blood cultures are negative. CT scan of the chest is obtained, which is consistent with tree-in-bud opacities of the lung involving bilateral lower lobes with patchy infiltrate involving the right upper lobe. There is mild bilateral bronchial wall thickening.
IMPRESSION:
1. Acute asthma.
2. Community acquired pneumonia.
3. Probable allergic bronchopulmonary aspergillosis.
I want the text file to be converted as an excel file
Patient Name Date of Service Speaker ID Visit Start Visit End Recipient ..... IMPRESSION:
XYZ 2/27/2018 10101 06-07-2018 06/18/2018 NA ....... 1. Acute asthma.
2. Community
acquired
pneumonia.
3. Probable
allergic
I wrote the following code
with open('1.txt') as infile:
registrations = []
fields = OrderedDict()
d = {}
for line in infile:
line = line.strip()
if line:
key, value = [s.strip() for s in line.split(':', 1)]
d[key] = value
fields[key] = None
else:
if d:
registrations.append(d)
d = {}
else:
if d: # handle EOF
registrations.append(d)
with open('registrations.csv', 'w') as outfile:
writer = DictWriter(outfile, fieldnames=fields)
writer.writeheader()
writer.writerows(registrations)
I'm getting an error
ValueError: not enough values to unpack (expected 2, got 1)
I'm not sure what the error is saying. I searched through websites but could not find a solution. I tried editing the file to remove the space and tried the above code, it was working. But in real-time scenario there will be hundreds of thousands of files so manually editing every file to remove all the spaces is not possible.
Your particular error is likely from
key, value = [s.strip() for s in line.split(':', 1)]
Some of your lines don't have a colon, so there is only one value in your list, and we can't assign one value to the pair key, value.
For example:
line = 'this is some text with a : colon'
key, value = [s.strip() for s in line.split(':', 1)]
print(key)
print(value)
returns:
this is some text with a
colon
But you'll get your error with
line = 'this is some text without a colon'
key, value = [s.strip() for s in line.split(':', 1)]
print(key)
print(value)
I have a requirement where in I have to create one file which will have multiple lines and need to pass the values of the multiple lines as a variable in the python script. And also output file should be created which will display the results.
For example:
input.text : /opt/app_name/file.txt
Andy City State India
Ram City State India
Sandy City State India
Leo City State India
output.text
Andy : success
Ram : Fail
Sandy : success
Leo : Fail
When the script will be executed, it will first ask to enter the file name
Enter the file name: /opt/app_name/file.txt
I am unsure what you are asking for, here is an attempt:
name = input('Enter the file name: ')
lines = open(name).readlines() #Then you don't have to split over '\n'.
lines = [line.strip().split() for line in lines] #Split defaults to spaces, but you can us ',', if you need instead.
processed = []
for name,city,state,country in lines:
success = "don't know how you are determining success"
processed.append(name,success)
open('output_file.txt','w').write('\n'.join(' '.join(line) for line in processed))
Thanks in advance for your help.
I have a large text document made up of many books. All the books have "running headers" and I have noticed that they appear just before the line of the page number. The page number has 1 to 4 digit. The page number is on a new line.
I want to iterate through the file and make Python to delete the previous iteration when it gets to the line that starts with a page number.
Thanks
Bennett
My sample code is:
import re
f=open("corpus.txt", "r+", "a")
for line in f:
line = line.rstrip()
if re.search('^[0-9]*?', line):
#delete previous line
Code:
file = open(r"C:\Users\Asus\Desktop\sample.txt").read().splitlines()
output = open(r"C:\Users\Asus\Desktop\output.txt",'w')
for index, line in enumerate(file):
try:
if file[index+1].strip().isdigit() == False and file[index].strip().isdigit() == False:
output.write(file[index])
output.write('\n')
except:
output.write(file[index]) #printing last line
output.write('\n')
output.close()
Input:
wearing sandals, a cock that crows, a cloak
to dissect, a sponge, some vinegar and one
man to hammer the nails home.
56
Or you can take a length of steel,
castle to hold your banquet in.
77
wearing sandals, a cock that crows, a cloak
to dissect, a sponge, some vinegar and one
Output:
wearing sandals, a cock that crows, a cloak
to dissect, a sponge, some vinegar and one
Or you can take a length of steel,
wearing sandals, a cock that crows, a cloak
to dissect, a sponge, some vinegar and one
I am writing a Python program that reads a file and then writes its contents to another one, with added margins. The margins are user-input and the line length must be at most 80 characters.
I wrote a recursive function to handle this. For the most part, it is working. However, the 2 lines before any new paragraph display the indentation that was input for the right side, instead of keeping the left indentation.
Any clues on why this happen?
Here's the code:
left_Margin = 4
right_Margin = 5
# create variable to hold the number of characters to withhold from line_Size
avoid = right_Margin
num_chars = left_Margin
def insertNewlines(i, line_Size):
string_length = len(i) + avoid + right_Margin
if len(i) <= 80 + avoid + left_Margin:
return i.rjust(string_length)
else:
i = i.rjust(len(i)+left_Margin)
return i[:line_Size] + '\n' + ' ' * left_Margin + insertNewlines(i[line_Size:], line_Size)
with open("inputfile.txt", "r") as inputfile:
with open("outputfile.txt", "w") as outputfile:
for line in inputfile:
num_chars += len(line)
string_length = len(line) + left_Margin
line = line.rjust(string_length)
words = line.split()
# check if num of characters is enough
outputfile.write(insertNewlines(line, 80 - avoid - left_Margin))
For input of left_Margin=4 and right_Margin = 5, I expect this:
____Poetry is a form of literature that uses aesthetic and rhythmic
____qualities of language—such as phonaesthetics, sound symbolism, and
____metre—to evoke meanings in addition to, or in place of, the prosai
____c ostensible meaning.
____Poetry has a very long history, dating back to prehistorical ti
____mes with the creation of hunting poetry in Africa, and panegyric an
____d elegiac court poetry was developed extensively throughout the his
____tory of the empires of the Nile, Niger and Volta river valleys.
But The result is:
____Poetry is a form of literature that uses aesthetic and rhythmic
______qualities of language—such as phonaesthetics, sound symbolism, and
______metre—to evoke meanings in addition to, or in place of, the prosai
________c ostensible meaning.
_____Poetry has a very long history, dating back to prehistorical ti
_____mes with the creation of hunting poetry in Africa, and panegyric an
_____d elegiac court poetry was developed extensively throughout the his
_____tory of the empires of the Nile, Niger and Volta river valleys.
This isn't really a good fit for a recursive solution in Python. Below is an imperative/iterative solution of the formatting part of your question (I'm assuming you can take this and write it to a file instead). The code assumes that paragraphs are indicated by two consecutive newlines ('\n\n').
txt = """
Poetry is a form of literature that uses aesthetic and rhythmic qualities of language—such as phonaesthetics, sound symbolism, and metre—to evoke meanings in addition to, or in place of, the prosaic ostensible meaning.
Poetry has a very long history, dating back to prehistorical times with the creation of hunting poetry in Africa, and panegyric and elegiac court poetry was developed extensively throughout the history of the empires of the Nile, Niger and Volta river valleys.
"""
def format_paragraph(paragraph, length, left, right):
"""Format paragraph ``p`` so the line length is at most ``length``
with ``left`` as the number of characters for the left margin,
and similiarly for ``right``.
"""
words = paragraph.split()
lines = []
curline = ' ' * (left - 1) # we add a space before the first word
while words:
word = words.pop(0) # process the next word
# +1 in the next line is for the space.
if len(curline) + 1 + len(word) > length - right:
# line would have been too long, start a new line
lines.append(curline)
curline = ' ' * (left - 1)
curline += " " + word
lines.append(curline)
return '\n'.join(lines)
# we need to work on one paragraph at a time
paragraphs = txt.split('\n\n')
print('0123456789' * 8) # print a ruler..
for paragraph in paragraphs:
print(format_paragraph(paragraph, 80, left=4, right=5))
print() # next paragraph
the output of the above is:
01234567890123456789012345678901234567890123456789012345678901234567890123456789
Poetry is a form of literature that uses aesthetic and rhythmic
qualities of language such as phonaesthetics, sound symbolism, and
metre to evoke meanings in addition to, or in place of, the prosaic
ostensible meaning.
Poetry has a very long history, dating back to prehistorical times with
the creation of hunting poetry in Africa, and panegyric and elegiac
court poetry was developed extensively throughout the history of the
empires of the Nile, Niger and Volta river valleys.
I have a text file with all of them currently having the same end character (N), which is being used to identify progress the system makes. I want to change the end character to "Y" in case the program ends via an error or other interruptions so that upon restarting the program will search until a line has the end character "N" and begin working from there. Below is my code as well as a sample from the text file.
UPDATED CODE:
def GeoCode():
f = open("geocodeLongLat.txt", "a")
with open("CstoGC.txt",'r') as file:
print("Geocoding...")
new_lines = []
for line in file.readlines():
check = line.split('~')
print(check)
if 'N' in check[-1]:
geolocator = Nominatim()
dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5]
address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
f.write(dot_number + '\n')
try:
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
try:
address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
print("Cannot Geocode")
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
f.close()
Output:
2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N
As you can see I am already keeping track of which line I am at just by using x. Should I use something like file.readlines()?
Sample of text document:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
Thank you!
Edit: updated code thanks to #idlehands
There are a few ways to do this.
Option #1
My original thought was to use the tell() and seek() method to go back a few steps but it quickly shows that you cannot do this conveniently when you're not opening the file in bytes and definitely not in a for loop of readlines(). You can see the reference threads here:
Is it possible to modify lines in a file in-place?
How to solve "OSError: telling position disabled by next() call"
The investigation led to this piece of code:
with open('file.txt','rb+') as file:
line = file.readline() # initiate the loop
while line: # continue while line is not None
print(line)
check = line.split(b'~')[-1]
if check.startswith(b'N'): # carriage return is expected for each line, strip it
# ... do stuff ... #
file.seek(-len(check), 1) # place the buffer at the check point
file.write(check.replace(b'N', b'Y')) # replace "N" with "Y"
line = file.readline() # read next line
In the first referenced thread one of the answers mentioned this could lead you to potential problems, and directly modifying the bytes on the buffer while reading it is probably considered a bad idea™. A lot of pros probably will scold me for even suggesting it.
Option #2a
(if file size is not horrendously huge)
with open('file.txt','r') as file:
new_lines = []
for line in file.readlines():
check = line.split('~')
if 'N' in check[-1]:
# ... do stuff ... #
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
This approach loads all the lines into memory first, so you do the modification in memory but leave the buffer alone. Then you reload the file and write the lines that were changed. The caveat is that technically you are rewriting the entire file line by line - not just the string N even though it was the only thing changed.
Option #2b
Technically you could open the file as r+ mode from the onset and then after the iterations have completed do this (still within the with block but outside of the loop):
# ... new_lines.append('~'.join(check)) #
file.seek(0)
for line in new_lines:
file.writelines(line)
I'm not sure what distinguishes this from Option #1 since you're still reading and modifying the file in the same go. If someone more proficient in IO/buffer/memory management wants to chime in please do.
The disadvantage for Option 2a/b is that you always end up storing and rewriting the lines in the file even if you are only left with a few lines that needs to be updated from 'N' to 'Y'.
Results (for all solutions):
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y
And if you were to say, encountered a break at the line starting with 220940, the file would become:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
There are pros and cons to these approaches. Try and see which one fits your use case the best.
I would read the entire input file into a list and .pop() the lines off one at a time. In case of an error, append the popped item to the list and write overwrite the input file. This way it will always be up to date and you won't need any other logic.