How to open one folder at a time to acces files - python

I have multiple folders, in a common parent folder, say 'work'. Inside that, I have multiple sub-folders, named 'sub01', 'sub02', etc. All the folders have same files inside, for eg, mean.txt, sd.txt.
I have to add contents of all 'mean.txt' into a single file. I am stuck with, how to open subfolder one by one. Thanks.
getting all files as a list
g = open("new_file", "a+")
for files in list:
f = open(files, 'r')
g.write(f.read())
f.close()
g.close()
I am not getting how to get a list of all files in the subfolder, to make this work
************EDIT*********************
found a solution
os.walk() helped, but had a problem, it was random (it didn't iterate in alphabetical order)
had to use sort to make it in order
import os
p = r"/Users/xxxxx/desktop/bianca_test/" # main_folder
list1 = []
for root, dirs, files in os.walk(p):
if root[-12:] == 'native_space': #this was the sub_folder common in all parent folders
for file in files:
if file == "perfusion_calib_gm_mean.txt":
list1.append(os.path.join(root, file))
list1.sort() # os.walk() iterated folders randomly; this is to overcome that
f = open("gm_mean.txt", 'a+')
for item in list1:
g = open(item, 'r')
f.write(g.read())
print("writing", item)
g.close()
f.close()
Thanks to all who helped.

As i understand it you want to collate all 'mean.txt' files into one file. This should do the job but beware there is no ordering to which file goes where. Note also i'm using StringIO() to buffer all the data since strings are immutable in Python.
import os
from io import StringIO
def main():
buffer = StringIO()
for dirpath, dirnames, filenames in os.walk('.'):
if 'mean.txt' in filenames:
fp = os.path.join(dirpath, 'mean.txt')
with open(fp) as f:
buffer.write(f.read())
all_file_contents = buffer.getvalue()
print(all_file_contents)
if __name__ == '__main__':
main()

Here's a pseudocode to help you get started. Try to google, read and understand the solutions to get better as a programmer:
open mean_combined.txt to write mean.txt contents
open sd_combined.txt to write sd.txt contents
for every subdir inside my_dir:
for every file inside subdir:
if file.name is 'mean.txt':
content = read mean.txt
write content into mean_combined.txt
if file.name is 'sd.txt':
content = read sd.txt
write content into sd_combined.txt
close mean_combined.txt
close sd_combined.txt
You need to look up how to:
open a file to read its contents (hint: use open)
iterate files inside directory (hint: use pathlib)
write a string into a file (hint: read Input and Output)
use context managers for releasing resources (hint: read with statement)

Related

Edit multiple text files, and save as new files

My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')

IOError: [Errno 2] No such file or directory: when the name was made by looping over existing files

I'm trying to have the bottom part of the code iterate over some files. These files should be corresponding and are differentiated by a number, so the counter is to change the number part of the file.
The file names are generated by looking through the given files and selecting files containing certain things in the title, then having them ordered using the count.
This code works independently, in it's own (lonely) folder, and prints the correct files in the correct order. However when i use this in my main code, where file_1 and file_2 are referenced (the decoder and encoder parts of the code) I get the error in the title. There is no way there is any typo or that the files don't exist because python made these things itself based on existing file names.
import os
count = 201
while 205 > count:
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-decoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_1 = f
print(file_1)
indir = 'absolute_path/models'
for root, dirs, filenames in os.walk(indir):
for f in filenames:
if 'test-encoder' in f:
if f.endswith(".model"):
if str(count) in f:
file_2 = f
print(file_2)
decoder1.load_state_dict(
torch.load(open(file_1, 'rb')))
encoder1.load_state_dict(
torch.load(open(file_2, 'rb')))
print(getBlueScore(encoder1, decoder1, pairs, src, tgt))
print_every=10
print(file_1 + file_2)
count = count + 1
i then need to use these files two by two.
It's very possible that you are running into issues with variable scoping, but without being able to see your entire code it's hard to know for sure.
If you know what the model files should be called, might I suggest this code:
for i in range(201, 205):
e = 'absolute_path/models/test_encoder_%d.model' % i
d = 'absolute_path/models/test_decoder_%d.model' % i
if os.path.exists(e) and os.path.exists(d):
decoder1.load_state_dict(torch.load(open(e, 'rb')))
encoder1.load_state_dict(torch.load(open(d, 'rb')))
Instead of relying on the existence of strings in a path name which could lead to errors this would force only those files you want to open to be opened. Also it gets rid of any possible scoping issues.
We could clean it up a bit more but you get the idea.

Making a program recursively call itself in Python

I have written a simple script, it runs on a folder and will cycle through all of the files in a folder to do some processing (the actual processing is unimportant).
I have a folder. This folder contains multiple different folders. Inside these folders are a variable number of files, on which I want to run the script I have written. I'm struggling to adapt my code to do this.
So previously, the file structure was :
Folder
Html1
Html2
Html3
...
Now it is :
Folder
Folder1
Html1
Folder2
Html2
Html3
I still want to run the code on all of the HTMLs though.
Here is my attempt at doing this, which results in
error on line 25, in CleanUpFolder
orig_f.write(soup.prettify().encode(soup.original_encoding))
TypeError: encode() argument 1 must be string, not None
:
def CleanUpFolder(dir):
do = dir
dir_with_original_files = dir
for root, dirs, files in os.walk(do):
for d in dirs:
for f in files:
print f.title()
if f.endswith('~'): #you don't want to process backups
continue
original_file = os.path.join(root, f)
with open(original_file, 'w') as orig_f, \
open(original_file, 'r') as orig_f2:
soup = BeautifulSoup(orig_f2.read())
for t in soup.find_all('td', class_='TEXT'):
t.string.wrap(soup.new_tag('h2'))
# This is where you create your new modified file.
orig_f.write(soup.prettify().encode(soup.original_encoding))
CleanUpFolder('C:\Users\FOLDER')
What have I missed here? The main thing I am unsure about is how the line
for root, dirs, files in os.walk(do):
is used/made sense of in this context?
Here I have split your function up into two separate functions and cleared out redundant code:
def clean_up_folder(dir):
"""Run the clean up process on dir, recursively."""
for root, dirs, files in os.walk(dir):
for f in files:
print f.title()
if not f.endswith('~'): #you don't want to process backups
clean_up_file(os.path.join(root, f))
This has fixed the indentation problem, and will make it easier to test the functions and isolate any future errors. I have also removed the loop over dirs, as this will happen within walk anyway (and means you'd skip all files in any dir that doesn't contain any sub-dirs).
def clean_up_file(original_file):
"""Clean up the original_file."""
with open(original_file) as orig_f2:
soup = BeautifulSoup(orig_f2.read())
for t in soup.find_all('td', class_='TEXT'):
t.string.wrap(soup.new_tag('h2'))
with open(original_file, 'w') as orig_f:
# This is where you create your new modified file.
orig_f.write(soup.prettify().encode(soup.original_encoding))
Note that I have separated the two opens of original_file so you don't accidentally overwrite it before reading from it - there is no need to have it open for read and write simultaneously.
I don't have BeautifulSoup installed here, so can't test further, but this should allow you to narrow the issue down to a specific file.

Reading all files in all directories [duplicate]

This question already has answers here:
How do I list all files of a directory?
(21 answers)
Closed 9 years ago.
I have the code working to read in the values of a single text file but am having difficulties reading all files from all directories and putting all of the contents together.
Here is what I have:
filename = '*'
filesuffix = '*'
location = os.path.join('Test', filename + "." + filesuffix)
Document = filename
thedictionary = {}
with open(location) as f:
file_contents = f.read().lower().split(' ') # split line on spaces to make a list
for position, item in enumerate(file_contents):
if item in thedictionary:
thedictionary[item].append(position)
else:
thedictionary[item] = [position]
wordlist = (thedictionary, Document)
#print wordlist
#print thedictionary
note that I am trying to stick the wildcard * in for the filename as well as the wildcard for the filesuffix. I get the following error:
"IOError: [Errno 2] No such file or directory: 'Test/.'"
I am not sure if this is even the right way to do it but it seems that if I somehow get the wildcards working - it should work.
I have gotten this example to work: Python - reading files from directory file not found in subdirectory (which is there)
Which is a little different - but don't know how to update it to read all files. I am thinking that in this initial set of code:
previous_dir = os.getcwd()
os.chdir('testfilefolder')
#add something here?
for filename in os.listdir('.'):
That I would need to add something where I have an outer for loop but don't quite know what to put in it..
Any thoughts?
Python doesn't support wildcards directly in filenames to the open() call. You'll need to use the glob module instead to load files from a single level of subdirectories, or use os.walk() to walk an arbitrary directory structure.
Opening all text files in all subdirectories, one level deep:
import glob
for filename in glob.iglob(os.path.join('Test', '*', '*.txt')):
with open(filename) as f:
# one file open, handle it, next loop will present you with a new file.
Opening all text files in an arbitrary nesting of directories:
import os
import fnmatch
for dirpath, dirs, files in os.walk('Test'):
for filename in fnmatch.filter(files, '*.txt'):
with open(os.path.join(dirpath, filename)):
# one file open, handle it, next loop will present you with a new file.

Zipping only files in Python

So I create several files in some temp directory dictated by NamedTemporaryFile function.
zf = zipfile.ZipFile( zipPath, mode='w' )
for file in files:
with NamedTemporaryFile(mode='w+b', bufsize=-1, prefix='tmp') as tempFile:
tempPath = tempFile.name
with open(tempPath, 'w') as f:
write stuff to tempPath with contents of the variable 'file'
zf.write(tempPath)
zf.close()
When I use the path of these files to add to a zip file, the temp directories themselves get zipped up.
When I try to unzip, I get a series of temp folders, which eventually contain the files I want.
(i.e. I get the folder Users, which contains my user_id folder, which contains AppData...).
Is there a way to add the files directly, without the folders, so that when I unzip, I get the files directly? Thank you so much!
Try giving the arcname:
from os import path
zf = zipfile.ZipFile( zipPath, mode='w' )
for file in files:
with NamedTemporaryFile(mode='w+b', bufsize=-1, prefix='tmp') as tempFile:
tempPath = tempFile.name
with open(tempPath, 'w') as f:
write stuff to tempPath with contents of the variable 'file'
zf.write(tempPath,arcname=path.basename(tempPath))
zf.close()
Using os.path.basename you can get the file's name from a path. According to zipfile documentation, the default value for arcname is filename without a drive letter and with leading path separators removed.
Try using the arcname parameter to zf.write:
zf.write(tempPath, arcname='Users/mrb0/Documents/things.txt')
Without knowing more about your program, you may find it easier to get your arcname from the file variable in your outermost loop rather than deriving a new name from tempPath.

Categories

Resources