Remove string from text file after print (python) - python

I have the following code that pulls a list from a txt file and then prints one line at a time. The problem I am having is that I want the code to remove that line from the txt file after it has printed it.
I have tried using a few different methods I found online but had no success on maiking it work.
Would any have any idea's on how to achive this?
import time
from time import sleep
import random
my_file=open('testlist.txt','r')
file_lines=my_file.readlines()
my_file.close()
for line in file_lines:
try:
sep = line.split(":")
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
sleep(1)
except Exception as e:
print(e)
Basically after the "print(random.choice(select_list))", we would want to delete "line" from "testlist.txt".

Let's go through some logic and see how to achieve the results you are expecting.
Humanly / Intuitively, the actions are
1. Read files line-by-line to memory
my_file=open('testlist.txt','r')
file_lines=my_file.readlines()
my_file.close()
It would be a better practice to consider using with context managers (it will automatically help you close the file one you are out of the indentation from the with scope, i.e.
with open('testlist.txt','r') as my_file:
file_lines = my_file.readlines()
2. For each line that is read, (a) split it by : character, (b) perform a few string operations and (c) randomly select one of the output from (2b), i.e
for line in file_lines:
sep = line.split(":")
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
2b. Now, lets take a look at (2b) and see what we are trying to achieve, i.e.
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
We produce 3 items in the select_list, where it seems the "test " + sep[0] + "? " + sep[1] has 2 occurence and the "test " + sep[0] + "! " + sep[1] is included once.
"test " + sep[0] + "? " + sep[1]
"test " + sep[0] + "! " + sep[1]
"test " + sep[0] + "? " + sep[1]
in any case, the select_list = [ ... ] is a valid line of code.
2c. Regarding the print(random.choice(select_list)) line, it doesn't affect any variables and it's just randomly choosing an item from the select_list
Going back to original question,
I want the code to remove that line from the txt file after it has printed it.
Q: Would this mean removing the line from the original file_lines in open('testlist.txt','r')?
A: If so, then it would be removing all lines from the original testlist.txt, because if everything would checks out for step 2b and 2c (in the try part of the code).
But if step 2b or 2c throws an error and get caught in the except, then it would be a line that you won't want to throw out (as per your original question).
In that case, it looks like what you want to get eventually is a list of lines that falls into the except scope of the code.
If so, then you would be looking at something like this:
# Reading the original file.
with open('testlist.txt','r') as my_file:
# Opening a file to write the lines that falls into the exception
with open('testlist-exceptions.txt', 'w') as fout:
# Iterate through the original file line by line.
for line in my_file:
# Step 2a.
sep = line.split(":")
# Supposedly step 2b, but since this is the only
# point of the code that can throw an exception
# most probably because there's no sep[1],
# you should instead check the length of the sep variable.
if len(sep) < 2: # i.e. does not have sep[1].
# Write to file.
fout.write(line)
else: # Otherwise, perform the step 2b.
select_list = ["test " + sep[0] + "? " + sep[1], "test " + sep[0] + "! " + sep[1], "test " + sep[0] + "? " + sep[1]]
print(random.choice(select_list))
Now the new logic is a lot simpler than the intuition based logic you have in the original code but achieves the same output that you are expecting.
The new logic is as such:
Open the original file for reading, open another file to write the file that I expect
Read the file line by line
Split the file by :
Check if it allows the string operation to join the sep[0] and sep[1]
if yes, perform the string operation to create select_list and choose one of the item in select_list to print to console
if no, print out that line to the output file.
If for some reason, you really want to work with the file in place with Python, then take a look at Is it possible to modify lines in a file in-place?
And if you really need to reduce memory footprint and want something that can edit lines of the file in-place, then you would have to dig a little into file seek function https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects
Some unsolicited Python suggestions (hope this helps)
When you need to delete something from file, if disk space allows, don't delete it, create another file with the expected output without the deleted lines.
Whenever possible, try to treat files that you want to read as immutable, and less like Perl-style in-place files that allows edits
It's tempting to do try-excepts when you just want something to work, but catch-all excepts are hard to debug and normally shows that the logic of the steps can be better. https://docs.python.org/3/tutorial/errors.html#handling-exceptions

I would give you example but better try to follow this guide there is various ways , first issue is opening the file with 'r' mode
Deleting lines

Related

Add the actual characters '\n' to a string in python?

I'm writing a short program to go through a directory and write create table and load from csv statements for a bunch of csvs and get them all into mySQL. I'm sure there's an easier way to do this, but I thought it would be fun to make it myself.
This is one of the lines I have in python to build the load csv statement, where l_d is a variable I'm storing it in, f is the file path, and n is the table name:
l_d = "LOAD DATA INFILE " + "'" + f + "'" + "\nINTO TABLE " + n + "\nFIELDS TERMINATED BY ','\nENCLOSED BY '" + '"' +"'" + "\nLINES TERMINATED BY" +"\'\n\'" + "\nIGNORE 1 ROWS;"
The statement I want in SQL is:
LOAD DATA INFILE 'file.csv'
INTO TABLE table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY'\n'
IGNORE 1 ROWS;
but what I get is always
LOAD DATA INFILE 'file.csv'
INTO TABLE table
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY'
'
IGNORE 1 ROWS;
because it thinks my \n is supposed to be a line break and not the actual characters.
How can I get the actual characters to show up here?
Also, I know my whole string concatenation in the original statement is kinda gross (I'm pretty new to this), so any general tips on how to improve that would also be much appreciated :)
to escape the backspace add another one before it:
\\n
gets \n
so your code will be:
l_d = "LOAD DATA INFILE " + "'" + f + "'" + "\nINTO TABLE " + n +
"\nFIELDS TERMINATED BY ','\nENCLOSED BY '" + '"' +"'" + "\nLINES
TERMINATED BY" +"\'\\n\'" + "\nIGNORE 1 ROWS;"
print("hello\ \n")
#this print a original "\n"

Try/Except - If Exception occurs, execution stops. while True creates endless loop

I'm brand new to Python, my apologies if this is a trivial question. Have been googling for hours unsuccessfully.
I have a code that takes latitudes/longitudes from an Excel file and returns addresses using an API.
If Excel cells contain wrong Lat/Longs, it returns IndexError. In that case it just stops execution even if the next row contains correct (geocodable) Lat/Longs. I tried using while True, but it just keeps writing the results excluding the part after Except.
E.g. Excel has the following columns/values:
Lat Long
38.872476 -77.062334
1 23.456789
38.873411 -77.060907
The 1st line has correct Lat/Long, the 2nd incorrect, the 3rd correct. In the output file it shows the address of the 1st row and says "N/A" for the 2nd row, but ignores the 3rd row and stops execution.
try:
for row in range(rows):
row+=1
latitude = float(sheet.row_values(row)[0])
longitude = float(sheet.row_values(row)[1])
reverse_geocode_result = gmaps.reverse_geocode((latitude, longitude))
out_file.write("(" + str(latitude) + ", " + str(longitude) + ") location: " + str(reverse_geocode_result[1]['formatted_address']) + "\n")
except IndexError:
out_file.write("N/A")
else:
out_file.write("(" + str(latitude) + ", " + str(longitude) + ") location: " + str(reverse_geocode_result[1]['formatted_address']) + "\n")
print "Done."
I think you want to stick your try inside your for loop.
for row in range(rows):
try:
# Get the values
except IndexError:
out_file.write("N/A")
else:
out_file.write(...)
print "Done."
That way if there is an error, you'll write "N/A", but then be able to continue to the next element in the range.

Python, read and write file is cutting final output file to limited number of lines?

So I wrote a small script that will convert my g-code file commands by replacing "G01" to "G1" it is all working perfectly but these files are very big they can end up with more then 10 or 20k lines of code!
My problem is that file with all code converted ends up with 4715 lines but original file has 4817 lines. Funny thing is the for loop is going through all lines but only first 4715 are written(I checked that by simple a = a + 1 every time something is written to a file)!
Here is the code it is very simple!
import string
a = 0
b = 0
s = open("test.gcode","r+")
replaced = open("test_replaced.gcode","a")
for line in s.readlines():
if "G01" in line:
replaced.write(line.replace("G01", "G1" ))
print ("G01 ==> G1")
a = a + 1
elif "G00" in line:
replaced.write(line.replace("G00", "G0" ))
print ("G00 ==> G0")
a = a + 1
else:
replaced.write(line.replace("******", "**" ))
print ("***")
a = a + 1
b = b + 1
#replaced.write(line.replace("G01", "G1" ))
#replaced.write(line.replace("G00", "G0" ))
print ("Done! - " + str(a) + " number of operations done!")
print ("Loopcount: " + str(b))
s.close()
As pointed out in a comment to your question, you should probably replace your open() statements with with statements. So, your code would become.
...
with open("test.gcode","r+") as s:
with open("test_replaced.gcode","a") as replaced:
...
print ("Done! - " + str(a) + " number of operations done!")
print ("Loopcount: " + str(b))
Please note that there is no longer a close() at the end of the script because the context manager (with) closes the file already.
All you code dealing with the files needs to be within the with blocks.
You can find more information about context managers here.

Python - splitting lines in txt file by semicolon in order to extract a text title...except sometimes the title has semicolons in it

So, I have an extremely inefficient way to do this that works, which I'll show, as it will help illustrate the problem more clearly. I'm an absolute beginner in python and this is definitely not "the python way" nor "remotely sane."
I have a .txt file where each line contains information about a large number of .csv files, following format:
File; Title; Units; Frequency; Seasonal Adjustment; Last Updated
(first entry:)
0\00XALCATM086NEST.csv;Harmonized Index of Consumer Prices: Overall Index Excluding Alcohol and Tobacco for Austria©; Index 2005=100; M; NSA; 2015-08-24
and so on, repeats like this for a while. For anyone interested, this is the St.Louis Fed (FRED) data.
I want to rename each file (currently named the alphanumeric code # the start, 00XA etc), to the text name. So, just split by semicolon, right? Except, sometimes, the text title has semicolons within it (and I want all of the text).
So I did:
data_file_data_directory = 'C:\*****\Downloads\FRED2_csv_3\FRED2_csv_2'
rename_data_file_name = 'README_SERIES_ID_SORT.txt'
rename_data_file = open(data_file_data_directory + '\\' + rename_data_file_name)
for line in rename_data_file.readlines():
data = line.split(';')
if len(data) > 2 and data[0].rstrip().lstrip() != 'File':
original_file_name = data[0]
These last 2 lines deal with the fact that there is some introductory text that we want to skip, and we don't want to rename based on the legend # the top (!= 'File'). It saves the 00XAL__.csv as the oldname. It may be possible to make this more elegant (I would appreciate the tips), but it's the next part (the new, text name) that gets really ugly.
if len(data) ==6:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==7:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==8:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==9:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
if len(data) ==10:
new_file_name = data[0][:-4].split("\\")[-1] + '-' + data[1].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[2].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[3].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[4].replace(':',' -').replace('"','').replace('/',' or ') + '-' + data[5][:-2].replace(':',' -').replace('"','').replace('/',' or ')
else:
(etc)
What I'm doing here is that there is no way to know for each line in the .csv how many items are in the list created by splitting it by semicolons. Ideally, the list would be length 6 - as follows the key # the top of my example of the data. However, for every semicolon in the text name, the length increases by 1...and we want everything before the last four items in the list (counting backwards from the right: date, seasonal adjustment, frequency, units/index) but after the .csv code (this is just another way of saying, I want the text "title" - everything for each line after .csv but before units/index).
Really what I want is just a way to save the entirety of the text name as "new_name" for each line, even after I split each line by semicolon, when I have no idea how many semicolons are in each text name or the line as a whole. The above code achieves this, but OMG, this can't be the right way to do this.
Please let me know if it's unclear or if I can provide more info.

Read only one line every time, with or without an "\n" at the end

I have a file which is filled like this:
Samsung CLP 680/ CLX6260 + CLT-C506S/ELS + CLT-M506S/ELS + CLT-Y506S/ELS + 39.50
Xerox Phaser 6000/6010/6015 + 106R01627 + 106R01628 + 106R01629 + 8.43
Xerox DocuPrint 6110/6110mfp + 106R01206 + 106R01204 + 106R01205 + 7.60
Xerox Phaser 6121/6121D + 106R01466 + 106R01467 + 106R01468 + 18.20
When I read it with:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
sometimes there is an "\n" at the end of the line, and sometimes there is not, if line ends with \n splitting gives me 5 elements, if not 9 and etc., until it founds and "\n" as I guess
So, the question is: How do I read only one line in file each time, and obtain 5 elements every time, with or without an "\n" at the end? I can't check all all file whether there is, or not an "\n" at the end
Thanks
You might consider using the csv module to parse this, and placing into a dict by model:
import csv
data={}
with open('/tmp/excel.csv') as f:
for line in csv.reader(f, delimiter='+', skipinitialspace=True):
data[line[0].strip()]=[e.strip() for e in line[1:]]
print data
# {'Samsung CLP 680/ CLX6260': ['CLT-C506S/ELS', 'CLT-M506S/ELS', 'CLT-Y506S/ELS', '39.50'],
'Xerox Phaser 6121/6121D': ['106R01466', '106R01467', '106R01468', '18.20'],
'Xerox DocuPrint 6110/6110mfp': ['106R01206', '106R01204', '106R01205', '7.60'],
'Xerox Phaser 6000/6010/6015': ['106R01627', '106R01628', '106R01629', '8.43']}
for line in excelRead:
title = [x.strip() for x in line.rstrip('\n').split('+')]
It's better to avoid making one variable (title) mean two different things. Rather than give it a different name in your second line, I just removed the line entirely and put the split inside the list comprehension.
Instead of feeding line into split, first I rstrip the \n (removes that character from the end)
When \n is missing, this will split title[4] to give two titles:
import re
data = []
with open('aa.txt') as excelRead:
for line in excelRead:
title=line.split("+")
title=[lines.strip()for lines in title]
while len(title) > 5:
one = re.sub('(\d+\.\d+)', '', title[4])
five = title[4].replace(one, '')
title1 = title[:4] + [five]
title = [one] + title[5:]
data.append(title1)
data.append(title)
for item in data:
print(item)
You could easily make data a dictionary instead of a list.

Categories

Resources