How to loop through directories and clean files?

How to loop through directories and clean files? - python

I am trying to loop through directories. My goal is to open the directory ff for modifications in the files.
When I try open (ff, 'r') it does not work.
Further, the files in the directories d.txt has numbers and symbols x, 1, " in every line. I am seeking to remove these characters from each line.
import os
filenames= os.listdir (".")
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
f = open(str(ff),'r') #this line does not open the file
a = ['x','1','"']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()
The Error that I am getting:
for line in f:
ValueError: I/O operation on closed file.

First of all, I think this part is wrong in your code:
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
As this will assign the last filename to ff. So I have moved the following code under this for loop. Now it will run for all files.
I belive this code should work:
import os
filenames = os.listdir('.')
lst = []
a = ['x','1','"']
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
with open(ff,'r') as file:
for line in file:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
with open(ff,'w') as file:
for line in lst:
file.write(line)
Edit: if the open('ff','r') line doesn't work then maybe the path you are giving is wrong. What are the contents of filenames? And why are you adding d.txt at the end?? Please edit your post and add these details.

Move f.close() to outside of loop. You're closing the file everytime loop runs.
import os
filenames= os.listdir (".")
for filename in filenames:
ff = os.path.join(r'C:\Users\V\Documents\f\e\e\data', filename, 'd.txt')
f = open(str(ff),'r') #this line does not open the file
a = ['x','1','"']
lst = []
for line in f:
for word in a:
if word in line:
line = line.replace(word,'')
lst.append(line)
f.close()

Related

Python - open all txt files, remove empty lines and find a specific character

What I want to do:
a) open all files in directory (in this case: chapters from long stories)
b) remove all empty lines
c) find sentences started with "- " (in this case: dialogues)
I was able to create code that works well, but only for one file:
file = open('.\\stories\\test\\01.txt', 'r', encoding="utf-16 LE")
string_with_empty_lines = file.read()
lines = string_with_empty_lines.split("\n")
non_empty_lines = [line for line in lines if line.strip() != ""]
string_without_empty_lines = ""
for line in non_empty_lines:
if line.startswith('- '):
string_without_empty_lines += line + "\n"
print(string_without_empty_lines)
I started mixed up with this because I have a lot of files and I want to open them all and print the results from all files (and probably save all results to one file, but it's not necessary right now). The first part of the new code successfully open files (checked with commented print line), but when I add the part with editing, nothing happens at all (I don't even have errors in console).
import os
import glob
folder_path = os.path.join('G:' '.\\stories\\test')
for filename in glob.glob(os.path.join(folder_path, '**', '*.txt'), recursive=True):
with open(filename, 'r', encoding="utf-16 LE") as f:
string_with_empty_lines = f.read()
# print(string_with_empty_lines)
lines = string_with_empty_lines.split("\n")
non_empty_lines = [line for line in lines if line.strip() != ""]
string_without_empty_lines = ""
for line in non_empty_lines:
if line.startswith("- "):
string_without_empty_lines += line + "\n"
print(string_without_empty_lines)

If you have your source files in the source_dir and you want to output the target files in the target_dir, you can do it like that:
import os
import path
source_dir = "source_dir"
target_dir = "target_dir"
# on linux or mac, you can get filenames in the specific dir.
# not sure what will happen on Windows
filenames = os.listdir(source_dir)
for filename in filenames:
# get full path of source and target file
filepath_source = path.join(source_dir, filename)
filepath_target = path.join(target_dir, filename)
# open source file and target file
with open(filepath_source) as f_source, open(filepath_target, 'w') as f_target:
for line in f_source:
if len(line.strip()) == 0:
continue
if line[0] == '-':
# do something
f_target.write(line)

On the example of one file, if there are more files, before you can say smt like
for file in dir: with open(file) ...., remember that you would also have to change the target file
with open('source.txt') as source:
with open('target.txt','w') as target:
for line in source.readlines():
l = line.strip('\n')
# identify if the 1st char is '-'
if l[0] == '-':
# do somethin e.g. add 'dialog' at the beginning...
# skip empty line
if len(l) == 0:
continue
#Rewrite to target file
target.write(l + '\n')
target.close()
source.close()

How to edit specific line for all text files in a folder by python?

Here below is my code about how to edit text file.
Since python can't just edit a line and save it at the same time,
I save the previous text file's content into a list first then write it out.
For example,if there are two text files called sample1.txt and sample2.txt in the same folder.
Sample1.txt
A for apple.
Second line.
Third line.
Sample2.txt
First line.
An apple a day.
Third line.
Execute python
import glob
import os
#search all text files which are in the same folder with python script
path = os.path.dirname(os.path.abspath(__file__))
txtlist = glob.glob(path + '\*.txt')
for file in txtlist:
fp1 = open(file, 'r+')
strings = [] #create a list to store the content
for line in fp1:
if 'apple' in line:
strings.append('banana\n') #change the content and store into list
else:
strings.append(line) #store the contents did not be changed
fp2 = open (file, 'w+') # rewrite the original text files
for line in strings:
fp2.write(line)
fp1.close()
fp2.close()
Sample1.txt
banana
Second line.
Third line.
Sample2.txt
First line.
banana
Third line.
That's how I edit specific line for text file.
My question is : Is there any method can do the same thing?
Like using the other functions or using the other data type rather than list.
Thank you everyone.

Simplify it to this:
with open(fname) as f:
content = f.readlines()
content = ['banana' if line.find('apple') != -1 else line for line in content]
and then write value of content to file back.

Instead of putting all the lines in a list and writing it, you can read it into memory, replace, and write it using same file.
def replace_word(filename):
with open(filename, 'r') as file:
data = file.read()
data = data.replace('word1', 'word2')
with open(filename, 'w') as file:
file.write(data)
Then you can loop through all of your files and apply this function

The built-in fileinput module makes this quite simple:
import fileinput
import glob
with fileinput.input(files=glob.glob('*.txt'), inplace=True) as files:
for line in files:
if 'apple' in line:
print('banana')
else:
print(line, end='')
fileinput redirects print into the active file.

import glob
import os
def replace_line(file_path, replace_table: dict) -> None:
list_lines = []
need_rewrite = False
with open(file_path, 'r') as f:
for line in f:
flag_rewrite = False
for key, new_val in replace_table.items():
if key in line:
list_lines.append(new_val+'\n')
flag_rewrite = True
need_rewrite = True
break # only replace first find the words.
if not flag_rewrite:
list_lines.append(line)
if not need_rewrite:
return
with open(file_path, 'w') as f:
[f.write(line) for line in list_lines]
if __name__ == '__main__':
work_dir = os.path.dirname(os.path.abspath(__file__))
txt_list = glob.glob(work_dir + '/*.txt')
replace_dict = dict(apple='banana', orange='grape')
for txt_path in txt_list:
replace_line(txt_path, replace_dict)

Python; trying to add these files line by line with a tab inbetween

I'm reading in text files from the command line and I'm trying to produce output as follows...
Desired output given these command line arguments
Essentially, I want to read in files from the command line; take the first line from each file & print them on one line separated by a tab. Take the second line from each file & print them on the next line separated by a tab & so on.
This is the best code I've come up with (I'm a beginner and I've tried looking at other responses for far too long; glob & os hasn't been helping me understand how to do this; I'd just like to use basic loops and opening of files to do this):
import sys
l = []
list_files = sys.argv[:1]
for fname in list_files:
open(fname) as infile:
for line in infile:
line = line.strip()
if line == '':
l.append("''")
else:
l.append(line)
print(l) # List of all appended animals. Not in the right order
#(takes all names from one file, then all the names from the
#next instead of taking one line from every file on each iteration)

This is a minimally changed version that should work.
import sys
from itertools import zip_longest
files = []
list_files = sys.argv[:1]
for fname in list_files:
with open(fname) as infile: # Don't forget the `with`!
l = []
for line in infile:
line = line.strip()
if line == '':
l.append("''")
else:
l.append(line)
files.append(l) # list of lists
for lines in zip_longest(*files, fillvalue=''): # transpose list of lists
print(*lines, sep='\t') # separate with tabs.

The best way to open files in python is with with. More information can be found at https://www.pythonforbeginners.com/files/with-statement-in-python. Anyways:
import sys
if len(sys.argv) != 3:
sys.exit(1)
filename1 = sys.argv[1]
filename2 = sys.argv[2]
with open(filename1, 'r') as file1, open(filename2, 'r') as file2:
for line1, line2 in zip(file1, file2):
print(line1.strip(), line2.strip(), sep='\t')
This can be changed to allow for more than two files:
import sys
if len(sys.argv) != 3:
sys.exit(1)
filenames = sys.argv[1:]
all_lines = []
for filename in filenames:
with open(filename, 'r') as file:
all_lines.append([l.strip() for l in file.readlines()])
for line in zip(*all_lines):
print(*line, sep='\t')

Remove the last empty line from each text file

I have many text files, and each of them has a empty line at the end. My scripts did not seem to remove them. Can anyone help please?
# python 2.7
import os
import sys
import re
filedir = 'F:/WF/'
dir = os.listdir(filedir)
for filename in dir:
if 'ABC' in filename:
filepath = os.path.join(filedir,filename)
all_file = open(filepath,'r')
lines = all_file.readlines()
output = 'F:/WF/new/' + filename
# Read in each row and parse out components
for line in lines:
# Weed out blank lines
line = filter(lambda x: not x.isspace(), lines)
# Write to the new directory
f = open(output,'w')
f.writelines(line)
f.close()

You can use Python's rstrip() function to do this as follows:
filename = "test.txt"
with open(filename) as f_input:
data = f_input.read().rstrip('\n')
with open(filename, 'w') as f_output:
f_output.write(data)
This will remove all empty lines from the end of the file. It will not change the file if there are no empty lines.

you can remove last empty line by using:
with open(filepath, 'r') as f:
data = f.read()
with open(output, 'w') as w:
w.write(data[:-1])

You can try this without using the re module:
filedir = 'F:/WF/'
dir = os.listdir(filedir)
for filename in dir:
if 'ABC' in filename:
filepath = os.path.join(filedir,filename)
f = open(filepath).readlines()
new_file = open(filepath, 'w')
new_file.write('')
for i in f[:-1]:
new_file.write(i)
new_file.close()
For each filepath, the code opens the file, reads in its contents line by line, then writes over the file, and lastly writes the contents of f to the file, except for the last element in f, which is the empty line.

You can remove the last blank line by the following command. This worked for me:
file = open(file_path_src,'r')
lines = file.read()
with open(file_path_dst,'w') as f:
for indx, line in enumerate(lines):
f.write(line)
if indx != len(lines) - 1:
f.write('\n')

i think this should work fine
new_file.write(f[:-1])

Python importing files - for loops

I have two simple files that I want to open in python and based on a keyword print information in the file
file a.txt contains:
'Final
This is ready'
file b.txt contains:
'Draft
This is not ready'
I want to read these two files in and if the file reads 'Final' anywhere in the txt file to print out the rest of the text (excluding the word 'Final'). My for loop is not outputting correctly:
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
f = open(filepath, 'r') #openfiles
for line in f:
if line == firstLineCheck:
print line
else:
break
I feel like this is something simple - appreciate the help

fileList = ['a.txt', 'b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
with open(filepath, 'r') as f:
line = f.readline()
while line:
if line == firstLineCheck:
print f.read()
line = f.readline()

There are three faults in your code. First you will only print lines that match and second is that you trigger only on lines that contains only "Final", third it does not exclude the line containing "Final" as specified. The fix would be to use a flag to see if you found the "Final":
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final'
firstLineFound = False
for filepath in fileList:
f = open(filepath, 'r') #openfiles
for line in f:
if firstLineFound:
print line
elif firstLineCheck in line:
# print line # uncomment if you want to include the final-line
firstLineFound = True
else:
break
if you wanted to trigger only on lines containing only "Final" then you should instead use firstLineCheck = "Final\n" and elif line==firstLineCheck.

Assuming you want to print all lines starting a line that has only your firstLineCheck in it, and using your code ....
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
f = open(filepath, 'r') #openfiles
do_print = False
for line in f:
if line == firstLineCheck:
do_print = True
continue
if do_print:
print line
Note that break takes you out of the loop, and continue will move to the next iteration.

Assuming your keyword is the first line of the file, you can do this. This makes more sense as you could have the word "Final" somewhere in the content of "draft".
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final\n'
for filepath in fileList:
with open(filepath, 'r') as f:
first_line = f.readline() # read the first line
if first_line == firstLineCheck:
print f.read()

Since you wanted to check if Final was present in the first line you could read the file as a list and see if first element contains final if so prints the entire file except first line
fileList = ['a.txt','b.txt']
firstLineCheck = 'Final'
for filepath in fileList:
f = open(filepath, 'r').readlines() #openfiles
if firstLineCheck in f[0]:
print "".join(f[1:])
output:
This is ready'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to loop through directories and clean files? - python

Related

Python - open all txt files, remove empty lines and find a specific character

How to edit specific line for all text files in a folder by python?

Python; trying to add these files line by line with a tab inbetween

Remove the last empty line from each text file

Python importing files - for loops

Categories

Resources