I have two simple files that I want to open in python and based on a keyword print information in the file
file a.txt contains:
'Final
This is ready'
file b.txt contains:
'Draft
This is not ready'
I want to read these two files in and if the file reads 'Final' anywhere in the txt file to print out the rest of the text (excluding the word 'Final'). My for loop is not outputting correctly:
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
f = open(filepath, 'r') #openfiles
for line in f:
if line == firstLineCheck:
print line
else:
break
I feel like this is something simple - appreciate the help
fileList = ['a.txt', 'b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
with open(filepath, 'r') as f:
line = f.readline()
while line:
if line == firstLineCheck:
print f.read()
line = f.readline()
There are three faults in your code. First you will only print lines that match and second is that you trigger only on lines that contains only "Final", third it does not exclude the line containing "Final" as specified. The fix would be to use a flag to see if you found the "Final":
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final'
firstLineFound = False
for filepath in fileList:
f = open(filepath, 'r') #openfiles
for line in f:
if firstLineFound:
print line
elif firstLineCheck in line:
# print line # uncomment if you want to include the final-line
firstLineFound = True
else:
break
if you wanted to trigger only on lines containing only "Final" then you should instead use firstLineCheck = "Final\n" and elif line==firstLineCheck.
Assuming you want to print all lines starting a line that has only your firstLineCheck in it, and using your code ....
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
f = open(filepath, 'r') #openfiles
do_print = False
for line in f:
if line == firstLineCheck:
do_print = True
continue
if do_print:
print line
Note that break takes you out of the loop, and continue will move to the next iteration.
Assuming your keyword is the first line of the file, you can do this. This makes more sense as you could have the word "Final" somewhere in the content of "draft".
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
with open(filepath, 'r') as f:
first_line = f.readline() # read the first line
if first_line == firstLineCheck:
print f.read()
Since you wanted to check if Final was present in the first line you could read the file as a list and see if first element contains final if so prints the entire file except first line
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final'
for filepath in fileList:
f = open(filepath, 'r').readlines() #openfiles
if firstLineCheck in f[0]:
print "".join(f[1:])
output:
This is ready'
Related
I am trying to loop through directories. My goal is to open the directory ff for modifications in the files.
When I try open (ff, 'r') it does not work.
Further, the files in the directories d.txt has numbers and symbols x, 1, " in every line. I am seeking to remove these characters from each line.
import os
filenames= os.listdir (".")
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
f = open(str(ff),'r') #this line does not open the file
a = ['x','1','"']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
The Error that I am getting:
for line in f:
ValueError: I/O operation on closed file.
First of all, I think this part is wrong in your code:
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
As this will assign the last filename to ff. So I have moved the following code under this for loop. Now it will run for all files.
I belive this code should work:
import os
filenames = os.listdir('.')
lst = []
a = ['x','1','"']
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
with open(ff,'r') as file:
for line in file:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
with open(ff,'w') as file:
for line in lst:
file.write(line)
Edit: if the open('ff','r') line doesn't work then maybe the path you are giving is wrong. What are the contents of filenames? And why are you adding d.txt at the end?? Please edit your post and add these details.
Move f.close() to outside of loop. You're closing the file everytime loop runs.
import os
filenames= os.listdir (".")
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
f = open(str(ff),'r') #this line does not open the file
a = ['x','1','"']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
I'm reading in text files from the command line and I'm trying to produce output as follows...
Desired output given these command line arguments
Essentially, I want to read in files from the command line; take the first line from each file & print them on one line separated by a tab. Take the second line from each file & print them on the next line separated by a tab & so on.
This is the best code I've come up with (I'm a beginner and I've tried looking at other responses for far too long; glob & os hasn't been helping me understand how to do this; I'd just like to use basic loops and opening of files to do this):
import sys
l = []
list_files = sys.argv[:1]
for fname in list_files:
open(fname) as infile:
for line in infile:
line = line.strip()
if line == '':
l.append("''")
else:
l.append(line)
print(l) # List of all appended animals. Not in the right order
#(takes all names from one file, then all the names from the
#next instead of taking one line from every file on each iteration)
This is a minimally changed version that should work.
import sys
from itertools import zip_longest
files = []
list_files = sys.argv[:1]
for fname in list_files:
with open(fname) as infile: # Don't forget the `with`!
l = []
for line in infile:
line = line.strip()
if line == '':
l.append("''")
else:
l.append(line)
files.append(l) # list of lists
for lines in zip_longest(*files, fillvalue=''): # transpose list of lists
print(*lines, sep='\t') # separate with tabs.
The best way to open files in python is with with. More information can be found at https://www.pythonforbeginners.com/files/with-statement-in-python. Anyways:
import sys
if len(sys.argv) != 3:
sys.exit(1)
filename1 = sys.argv[1]
filename2 = sys.argv[2]
with open(filename1, 'r') as file1, open(filename2, 'r') as file2:
for line1, line2 in zip(file1, file2):
print(line1.strip(), line2.strip(), sep='\t')
This can be changed to allow for more than two files:
import sys
if len(sys.argv) != 3:
sys.exit(1)
filenames = sys.argv[1:]
all_lines = []
for filename in filenames:
with open(filename, 'r') as file:
all_lines.append([l.strip() for l in file.readlines()])
for line in zip(*all_lines):
print(*line, sep='\t')
my objective is to check the file and see if the last line is a newline(empty) and if so delete it. I've tried heaps of methods like this and etc:
for filename in os.listdir(directory):
if filename.endswith(".ADC"):
with open(os.path.join(directory, filename)) as infile, open(os.path.join(directory, filename)) as outfile:
lines = infile.readlines()
if lines:
lines[-1] = lines[-1].rstrip('\r\n')
infile.writelines(lines)
also tried the readlines method with no success.
try this:
for filename in os.listdir(directory):
if filename.endswith(".ADC"):
with open(os.path.join(directory, filename), "r") as f:
lines = f.readlines()
for line in lines[::-1]:
if line == '\n' or line == '\r':
lines = lines[:-1]
print(lines)
Check if the last line is an empty string. If it is, calculate the total size of the preceding lines, and then truncate the file to that size.
for filename in os.listdir(directory):
if filename.endswith(".ADC"):
with open(os.path.join(directory, filename), "r") as file:
lines = file.readlines()
if lines and lines[-1].rstrip('\r\n') == "":
lines.pop()
size = sum(len(l) for l in lines)
file.truncate(size)
Truncating is more efficient than rewriting the whole file without the last line.
I have many text files, and each of them has a empty line at the end. My scripts did not seem to remove them. Can anyone help please?
# python 2.7
import os
import sys
import re
filedir = 'F:/WF/'
dir = os.listdir(filedir)
for filename in dir:
if 'ABC' in filename:
filepath = os.path.join(filedir,filename)
all_file = open(filepath,'r')
lines = all_file.readlines()
output = 'F:/WF/new/' + filename
# Read in each row and parse out components
for line in lines:
# Weed out blank lines
line = filter(lambda x: not x.isspace(), lines)
# Write to the new directory
f = open(output,'w')
f.writelines(line)
f.close()
You can use Python's rstrip() function to do this as follows:
filename = "test.txt"
with open(filename) as f_input:
data = f_input.read().rstrip('\n')
with open(filename, 'w') as f_output:
f_output.write(data)
This will remove all empty lines from the end of the file. It will not change the file if there are no empty lines.
you can remove last empty line by using:
with open(filepath, 'r') as f:
data = f.read()
with open(output, 'w') as w:
w.write(data[:-1])
You can try this without using the re module:
filedir = 'F:/WF/'
dir = os.listdir(filedir)
for filename in dir:
if 'ABC' in filename:
filepath = os.path.join(filedir,filename)
f = open(filepath).readlines()
new_file = open(filepath, 'w')
new_file.write('')
for i in f[:-1]:
new_file.write(i)
new_file.close()
For each filepath, the code opens the file, reads in its contents line by line, then writes over the file, and lastly writes the contents of f to the file, except for the last element in f, which is the empty line.
You can remove the last blank line by the following command. This worked for me:
file = open(file_path_src,'r')
lines = file.read()
with open(file_path_dst,'w') as f:
for indx, line in enumerate(lines):
f.write(line)
if indx != len(lines) - 1:
f.write('\n')
i think this should work fine
new_file.write(f[:-1])
I am writing a script that reads files from different directories; then I am using the file ID to search in the csv file. Here is the piece of code.
import os
import glob
searchfile = open("file.csv", "r")
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
for line in searchfile:
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
searchfile.close()
train_file.close()
However, I am only able search couple of ID's from the csv file. Can someone point out my mistake. (please see comments for description)
EDITED
Instance of the text file.
192397335,carrello porta utensili 18x27 eh l 411 x p 572 x h 872 6 cassetti,,691.74,192397335.jpg
Your issue is that when you do for line in searchfile: you're looping over a generator. The file doesn't reset for every id - for example, if the first id you pass to it is in line 50, the next id will start checking at line 51.
Instead, you can read your file to a list and loop over the list instead:
import os
import glob
with open("file.csv", "r") as s:
search_file = s.readlines()
train_file = open('train.csv', 'w')
list_of_files = os.listdir("train")
for l in list_of_files:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
fname = os.path.splitext(os.path.basename(d))
print fname[0] # ID
for line in search_file:
if fname[0] in line: # search in csv file
value = line.split(",")
value = value[1]+" " + value[2] + "\n"
train_file.write(fname[0]+","+value) # write description
break
train_file.close()
I made a couple of other changes too - firstly, you shouldn't use the name id as it has meaning in Python - I picked fname instead to indicate the file name. Secondly, I canged your CamelCase names to lowercase, as is the convention. Finally, getting the file name and extension is neat and fairly consistent through a combination of os.path.splitext and os.path.basename.
You need to browse of lines of searchfile for each id found, but as you open the file outside of the loop, you only read each line once in the whole loop.
You should either load the whole file in a list and iterate the list of lines inside the loop, or if searchfile is really large and would hardly fit in memory reopen the file inside the loop:
List version:
with open("file.csv", "r") as searchfile:
searchlines = searchfile.readlines()
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
for line in searchlines: # now a list so start at the beginning on each pass
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
train_file.close()
Re-open version
train_file = open('train.csv','w')
listOfFiles = os.listdir("train")
for l in listOfFiles:
dirList = glob.glob(('/train/%s/*.jpg') % (l))
for d in dirList:
id = d.split("/")
id = id[-1].split(".")
print id[0] # ID
searchfile = open("file.csv", "r")
for line in searchfile:
if id[0] in line: # search in csv file
value= line.split(",")
value= value[1]+" "+ value[2] + "\n"
train_file.write(id[0]+","+value) # write description
break
searchfile.close()
train_file.close()