I have this function that is supposed to open all text files in a folder and remove all the "\n" in it.
def FormatTXT():
conhecimentos = os.listdir('U:/AutoCTE/Conhecimentos')
for x in conhecimentos:
with open(x, "r+") as f:
old = f.read()
text = old.replace("\n", "")
f.seek(0)
f.truncate(0)
f.write(text)
f.close()
But this function is returning the following error:
FileNotFoundError: [Errno 2] No such file or directory: '20200119-170415-Conhecimento de Transporte.txt'
Happens that this file actually exists in the directory and I can't figure out what I'm missing.
The file paths that you open in x are missing the prefix U:/AutoCTE/Conhecimentos. And since you are in a different directory, those relative paths will not work
def FormatTXT():
conhecimentos = os.listdir('U:/AutoCTE/Conhecimentos')
for x in conhecimentos:
with open('U:/AutoCTE/Conhecimentos/' + x, "r+") as f:
old = f.read()
text = old.replace("\n", "")
f.seek(0)
f.truncate(0)
f.write(text)
f.close()
There are better ways to do this. For example with the os.path module
I think the main problem you have is that you forgive to notice that os.listdir() return the name of the file in a directory not their path, you have to append the file name to the dir path using os.path.join()
There are several way to do this I will pick the 3 I use.
first let write a function that remove parse the file text because you get it right
, I would just recommend caution using read() in case of very large file.
def remove_end_lines(file_):
"""
remove "\n" from file
"""
with open(file_, "r+") as f:
old = f.read()
text = old.replace("\n", "")
f.seek(0)
f.truncate(0)
f.write(text)
now we have to tackle your main problem file path.
-> a choice could be to change the working dir (you should first register the original working dir in order to be able to go back to it)
def FormatTXT(my_dir):
original_dir = os.getcwd() # register original working dir
conhecimentos = os.listdir(my_dir) # liste file in the dir
os.chdir(my_dir) # change dir
for file_ in conhecimentos:
remove_end_lines(file_)
os.chdir(original_dir) # go back to original dir
second choice let's use os.path.join()
def FormatTXT(my_dir):
conhecimentos = os.listdir(my_dir) # liste all files in the dir
for file_ in conhecimentos:
file_path = os.path.join(my_dir, file_) # create the file path by appening the file name to the directory path
remove_end_lines(file_path)
In case you have subdirectory and want to perform the same operation you should use os.walk()
def FormatTXT(my_dir):
for dir_path, dir_name, files_name in os.walk(my_dir):
# files_name is a list of all file in dir_path,
if files_name: # if there is file in the current dir (the list is not empty)
for file_ in files_names:
file_path = os.path.join(my_dir, file_)
remove_end_lines(file_path)
I hope this help.
if you have more question don't hesitate to ask
Related
I would like to read all the contents from all the text files in a directory. I have 4 text files in the "path" directory, and my codes are;
for filename in os.listdir(path):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
content = f.read()
thelist = content.splitlines()
f.close()
print(filepath)
print(content)
print()
When I run the codes, I can only read the contents from only one text file.
I will be thankful that there are any advice or suggestions from you or that you know any other informative inquiries for this question in stackoverflow.
If you need to filter the files' name per suffix, i.e. file extension, you can either use the string method endswith or the glob module of the standard library https://docs.python.org/3/library/glob.html
Here an example of code which save each file content as a string in a list.
import os
path = '.' # or your path
files_content = []
for filename in filter(lambda p: p.endswith("txt"), os.listdir(path)):
filepath = os.path.join(path, filename)
with open(filepath, mode='r') as f:
files_content += [f.read()]
With the glob way here an example
import glob
for filename in glob.glob('*txt'):
print(filename)
This should list your file and you can read them one by one. All the lines of the files are stored in all_lines list. If you wish to store the content too, you can keep append it too
from pathlib import Path
from os import listdir
from os.path import isfile, join
path = "path_to_dir"
only_files = [f for f in listdir(path) if isfile(join(path, f))]
all_lines = []
for file_name in only_files:
file_path = Path(path) / file_name
with open(file_path, 'r') as f:
file_content = f.read()
all_lines.append(file_content.splitlines())
print(file_content)
# use all_lines
Note: when using with you do not need to call close() explicitly
Reference: How do I list all files of a directory?
Basically, if you want to read all the files, you need to save them somehow. In your example, you are overriding thelist with content.splitlines() which deletes everything already in it.
Instead you should define thelist outside of the loop and use thelist.append(content.splitlines) each time, which adds the content to the list each iteration
Then you can iterate over thelist later and get the data out.
I am intending to extract some data stored in a .txt file using python 3, however, when I tried to print out the file content, the program does not display any thing in the console. This is the code snippet I use to read the file:
def get_data(directory):
entries = os.listdir(directory)
#print(entries)
count = 0;
for file in entries:
#print(file)
if file.endswith('.txt'):
with open(file) as curr_file:
#print(curr_file)
#read data and write it to an
#excel worksheet
print(curr_file.readline())
curr_file.close()
What kind of changes am I supposed to make to let the program display contents of the file?
Update: I tried to print out all files saved in entries and the result looks fine. The following is the code snippet I used to unzip files in the directory, I am not sure whether there're anything wrong with it.
def read_zip(path):
file_list = os.listdir(path)
#print(file_list)
#create a new directory and store
#the extracted file there
directory = 'C:/Users/chent/Desktop/Test'
try:
if not os.path.exists(directory):
os.makedirs(directory, exist_ok=True)
print('Folder created')
except FileExistsError:
print ('Directory not created')
for file in file_list:
if file.endswith('.zip'):
filePath=path+'/'+file
zip_file = zipfile.ZipFile(filePath)
for names in zip_file.namelist():
zip_file.extract(names, directory)
get_data(directory)
zip_file.close()
Solution: It turns out that I didn't specify the file path when use with open() statement, which caused the program unable to locate files. To fix it, use with open(file_path, file, "r") as curr_file. See details in my updated code:
def get_data(path):
files = os.listdir(path)
for file in files:
#print(file)
try:
if file.endswith('.txt'):
print(file)
with open('C:/Users/chent/Desktop/Test/' + file, "r", ) as curr_file:
# print(curr_file.readlines())
print(curr_file)
line = curr_file.readline()
print(line)
except FileNotFoundError:
print ('File not found')
path = 'C:/Users/chent/Desktop/Test'
get_data(path)
The problem is that you use curr_file.readline() which only returns the first line.
Use curr_file.read() to get the whole file contents.
In order to open all the files in a specific directory (path). I use the following code:
for filename in os.listdir(path): # For each file inside path
with open(path + filename, 'r') as xml_file:
#Do some stuff
However, I want to read the files in the directory starting from a specific position. For instance, if the directory contains the files f1.xml, f2.xml, f3.xml, ... ,f10.xml in this order, how can I read all the files starting from f3.xml (and ignore f1.xml and f2.xml) ?
Straightforward way
import os
keep = False
first = 'f3.xml'
for filename in os.listdir(path): # For each file inside path
keep = keep or filename == first
if keep:
with open(path + filename, 'r') as xml_file:
#Do some stuff
Currently I am trying to write a function will walk through the requested directory and print all the text of all the files.
Right now, the function works in displaying the file_names as a list so the files surely exist (and there is text in the files).
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk(starting_dir):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
fileobj = open(fpath, 'r')
text = fileobj.read()
print(text)
Here is my output with some checking of the directory contents:
>>> PopularWordWalk ('text_dir', word_dict)
In /Users/normanwei/Documents/Python for Programmers/Homework 4/text_dir
>>> os.listdir()
['.DS_Store', 'cats.txt', 'zen_story.txt']
the problem is that whenever i try to print the text, i get nothing. eventually I want to push the text through some other functions but as of now it seems moot without any text. Can anyone lend any experience on why no text is appearing? (when trying to open files/read/storing&printing text manually in idle it works i.e. if I just manually inputted 'cats.txt' instead of 'file_name') - currently running python 3.
EDIT - The question has been answered - just have to remove the os.chdir line - see jojo's answer for explanation.
This line won't work
file = open(file_name, 'r')
Because it would require that these files exist in the same folder you are running the script from. You would have to provide the path to those files, as well as the file names
with open(os.path.join(starting_dir,file_name), 'r') as file:
#do stuff
This way it will build the full path from the directory and the file name.
If you do os.chdir(os.path.abspath(starting_dir)) you go into starting_dir. Then for (this_dir,dir_names,file_names) in os.walk(starting_dir): will loop over nothing since starting_dir is not in starting_dir.
Long story short, comment the line os.chdir(os.path.abspath(starting_dir)) and you should be good.
Alternatively if you want to stick to the os.chdir, this should do the job:
def PopularWordWalk (starting_dir, word_dict):
print ("In", os.path.abspath(starting_dir))
os.chdir(os.path.abspath(starting_dir))
for (this_dir,dir_names,file_names) in os.walk('.'):
for file_name in file_names:
fpath = os.path.join(os.path.abspath(starting_dir), file_name)
with open(fpath, 'r') as fileobj:
text = fileobj.read()
print(text)
You'll want to join the root path with the file path. I'd change:
file = open(file_name, 'r')
to
fpath = os.path.join(this_dir, file_name)
file = open(fpath, 'r')
You may also want to use another word to describe it than file as that's a built-in function in Python. I'd recommend fileobj.
Just to add on to the previous answer, you will have to join the absolute path and the relative path of the walk.
Try this:
fpath = os.path.abspath(os.path.join(this_dir, file_name))
f = open(fpath, 'r')
I would like to run a function over all files in one folder and create new files out of them. I have put the code for one file bellow. I would appreciate it if you kindly help me.
def newfield2(infile,outfile):
output = ["%s\t%s" %(item.strip(),2) for item in infile]
outfile.write("\n".join(output))
outfile.close()
return outfile
infile = open("E:/SAGA/data/2006last/325125401.all","r")
outfile = open("E:/SAGA/data/2006last/325125401_edit.all","r")
I would like to change all the files in the 'E:/SAGA/data/2006last/' folder and create new files with edit extension.
Use os.listdir() to list all files in a directory. The function returns just the filenames, not the full path. The os.path module gives you the tools to construct filenames as needed:
import os
folder = 'E:/SAGA/data/2006last'
for filename in os.listdir(folder):
infilename = os.path.join(folder, filename)
if not os.path.isfile(infilename): continue
base, extension = os.path.splitext(filename)
infile = open(infilename, 'r')
outfile = open(os.path.join(folder, '{}_edit.{}'.format(base, extension)), 'w')
newfield2(infile, outfile)
import os
def apply_to_all_files:
for sub_path in os.listdir(path):
next_path = os.path.join(path, sub_path)
if os.path.isfile(next_path):
infile = open(next_path,"r")
outfile = open(next_path + '.out', "w")
newfield2(infile, outfile)