Making a program recursively call itself in Python - python

I have written a simple script, it runs on a folder and will cycle through all of the files in a folder to do some processing (the actual processing is unimportant).
I have a folder. This folder contains multiple different folders. Inside these folders are a variable number of files, on which I want to run the script I have written. I'm struggling to adapt my code to do this.
So previously, the file structure was :
Folder
Html1
Html2
Html3
...
Now it is :
Folder
Folder1
Html1
Folder2
Html2
Html3
I still want to run the code on all of the HTMLs though.
Here is my attempt at doing this, which results in
error on line 25, in CleanUpFolder
orig_f.write(soup.prettify().encode(soup.original_encoding))
TypeError: encode() argument 1 must be string, not None
:
def CleanUpFolder(dir):
do = dir
dir_with_original_files = dir
for root, dirs, files in os.walk(do):
for d in dirs:
for f in files:
print f.title()
if f.endswith('~'): #you don't want to process backups
continue
original_file = os.path.join(root, f)
with open(original_file, 'w') as orig_f, \
open(original_file, 'r') as orig_f2:
soup = BeautifulSoup(orig_f2.read())
for t in soup.find_all('td', class_='TEXT'):
t.string.wrap(soup.new_tag('h2'))
# This is where you create your new modified file.
orig_f.write(soup.prettify().encode(soup.original_encoding))
CleanUpFolder('C:\Users\FOLDER')
What have I missed here? The main thing I am unsure about is how the line
for root, dirs, files in os.walk(do):
is used/made sense of in this context?

Here I have split your function up into two separate functions and cleared out redundant code:
def clean_up_folder(dir):
"""Run the clean up process on dir, recursively."""
for root, dirs, files in os.walk(dir):
for f in files:
print f.title()
if not f.endswith('~'): #you don't want to process backups
clean_up_file(os.path.join(root, f))
This has fixed the indentation problem, and will make it easier to test the functions and isolate any future errors. I have also removed the loop over dirs, as this will happen within walk anyway (and means you'd skip all files in any dir that doesn't contain any sub-dirs).
def clean_up_file(original_file):
"""Clean up the original_file."""
with open(original_file) as orig_f2:
soup = BeautifulSoup(orig_f2.read())
for t in soup.find_all('td', class_='TEXT'):
t.string.wrap(soup.new_tag('h2'))
with open(original_file, 'w') as orig_f:
# This is where you create your new modified file.
orig_f.write(soup.prettify().encode(soup.original_encoding))
Note that I have separated the two opens of original_file so you don't accidentally overwrite it before reading from it - there is no need to have it open for read and write simultaneously.
I don't have BeautifulSoup installed here, so can't test further, but this should allow you to narrow the issue down to a specific file.

Related

Python to go through multiple folders and process files inside them

I have multiple folders than contain about 5-10 files each. What I am trying to do is go to the next folder when finishing processing files from the previous folders and start working on the new files. I have this code:
for root, dirs, files in os.walk("Training Sets"): #Path that contains folders
for i in dirs: #if I don't have this, an error is shown in line 4 that path needs to be str and not list
for file in i: #indexing files inside the folders
path = os.path.join(i, files) #join path of the files
dataset = pd.read_csv(path, sep='\t', header = None) #reading the files
trainSet = dataset.values.tolist() #some more code
editedSet = dataset.values.tolist() #some more code
#rest of the code...
The problem is that it doesn't do anything. Not even printing if I add prints for debugging.
First off, be sure that you are in the correct top-level directory (i.e. the one containing "Training Sets". You can check this with os.path.abspath(os.curdir). Otherwise, the code does nothing since it does not find the directory to walk.
os.walk does the directory walking for you. The key is understanding root (the path to the current directory), dirs (a list of subdirectories) and files (a list of files in the current directory). You don't actually need dirs.
So your code is two loops:
>>> for root, dirs, files in os.walk("New Folder1"): #Path that contains folders
... for file in files: #indexing files inside the folders
... path = os.path.join(root, file) #join path of the files
... print(path) # Your code here
...
New Folder1\New folder1a\New Text Document.txt
New Folder1\New folder1b\New Text Document2.txt

Is there a way to change your cwd in Python using a file as an input?

I have a Python program where I am calculating the number of files within different directories, but I wanted to know if it was possible to use a text file containing a list of different directory locations to change the cwd within my program?
Input: Would be a text file that has different folder locations that contains various files.
I have my program set up to return the total amount of files in a given folder location and return the amount to a count text file that will be located in each folder the program is called on.
You can use os module in Python.
import os
# dirs will store the list of directories, can be populated from your text file
dirs = []
text_file = open(your_text_file, "r")
for dir in text_file.readlines():
dirs.append(dir)
#Now simply loop over dirs list
for directory in dirs:
# Change directory
os.chdir(directory)
# Print cwd
print(os.getcwd())
# Print number of files in cwd
print(len([name for name in os.listdir(directory)
if os.path.isfile(os.path.join(directory, name))]))
Yes.
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
os.chdir(targetdir)
# Do your stuff here
os.chdir(start_dir)
Do bear in mind that if your program dies half way through it'll leave you in a different working directory to the one you started in, which is confusing for users and can occasionally be dangerous (especially if they don't notice it's happened and start trying to delete files that they expect to be there - they might get the wrong file). You might want to consider if there's a way to achieve what you want without changing the working directory.
EDIT:
And to suggest the latter, rather than changing directory use os.listdir() to get the files in the directory of interest:
import os
start_dir = os.getcwd()
indexfile = open(dir_index_file, "r")
for targetdir in indexfile.readlines():
contents = os.listdir(targetdir)
numfiles = len(contents)
countfile = open(os.path.join(targetdir, "count.txt"), "w")
countfile.write(str(numfiles))
countfile.close()
Note that this will count files and directories, not just files. If you only want files then you'll have to go through the list returned by os.listdir checking whether each item is a file using os.path.isfile()

How to open one folder at a time to acces files

I have multiple folders, in a common parent folder, say 'work'. Inside that, I have multiple sub-folders, named 'sub01', 'sub02', etc. All the folders have same files inside, for eg, mean.txt, sd.txt.
I have to add contents of all 'mean.txt' into a single file. I am stuck with, how to open subfolder one by one. Thanks.
getting all files as a list
g = open("new_file", "a+")
for files in list:
f = open(files, 'r')
g.write(f.read())
f.close()
g.close()
I am not getting how to get a list of all files in the subfolder, to make this work
************EDIT*********************
found a solution
os.walk() helped, but had a problem, it was random (it didn't iterate in alphabetical order)
had to use sort to make it in order
import os
p = r"/Users/xxxxx/desktop/bianca_test/" # main_folder
list1 = []
for root, dirs, files in os.walk(p):
if root[-12:] == 'native_space': #this was the sub_folder common in all parent folders
for file in files:
if file == "perfusion_calib_gm_mean.txt":
list1.append(os.path.join(root, file))
list1.sort() # os.walk() iterated folders randomly; this is to overcome that
f = open("gm_mean.txt", 'a+')
for item in list1:
g = open(item, 'r')
f.write(g.read())
print("writing", item)
g.close()
f.close()
Thanks to all who helped.
As i understand it you want to collate all 'mean.txt' files into one file. This should do the job but beware there is no ordering to which file goes where. Note also i'm using StringIO() to buffer all the data since strings are immutable in Python.
import os
from io import StringIO
def main():
buffer = StringIO()
for dirpath, dirnames, filenames in os.walk('.'):
if 'mean.txt' in filenames:
fp = os.path.join(dirpath, 'mean.txt')
with open(fp) as f:
buffer.write(f.read())
all_file_contents = buffer.getvalue()
print(all_file_contents)
if __name__ == '__main__':
main()
Here's a pseudocode to help you get started. Try to google, read and understand the solutions to get better as a programmer:
open mean_combined.txt to write mean.txt contents
open sd_combined.txt to write sd.txt contents
for every subdir inside my_dir:
for every file inside subdir:
if file.name is 'mean.txt':
content = read mean.txt
write content into mean_combined.txt
if file.name is 'sd.txt':
content = read sd.txt
write content into sd_combined.txt
close mean_combined.txt
close sd_combined.txt
You need to look up how to:
open a file to read its contents (hint: use open)
iterate files inside directory (hint: use pathlib)
write a string into a file (hint: read Input and Output)
use context managers for releasing resources (hint: read with statement)

How to move in and out of folders in python

so I'm a rookie at programming and I'm trying to make a program in python that basically opens a text file with a bunch of columns and writes the data to 3 different text files based on a string in the row. As my program stands right now, I have it change the directory to a specific output folder using os.chdir so it can open my text file but what I want is it to do something like this:
Imagine a folder set up like this :
Source Folder contains N number of folders. Each of those folders contains N number of output folders. Each output folder contains 1 Results.txt.
The idea is to have the program start at the source folder, look into Folder 1, look for output 1, open the .txt file then do it's thing. Once it's done, it should go back to folder 1 and open output 2 and do it's thing again. Then it should go back to Folder 1 and if it can't find any more output folders, it needs to go to Folder A and then enter Folder 2 and repeat the process until there are no more folders. Honestly not sure where to really start with this, the best I could do is make a small program that prints all my .txt files but I'm not sure how to open them at all. Hope my question makes sense and thanks for the help.
If all you need is to process each file in a directory recursively:
import os
def process_dir(dir):
for subdir, dirs, files in os.walk(dir):
for file in files:
file_path = os.path.join(subdir, file)
print file_path
# process file here
This will process each file in the root dir recursively. If you're looking for conditional iteration you might need to make the loop a little smarter.
Read the base folder path and stored into variable and move to sub folder and process the text file using chdir and base path change the directory and read the sub folder once again.
dirlist = os.listdir(os.getcwd())
dirlist = filter(lambda x: os.path.isdir(x), filelist)
for dirname in dirlist:
print os.path.join(os.getcwd(),dirname,'Results.txt')
first, i think you could format your question for better reading.
Concerning your question, here's a naïve implementation example :
import os
where = "J:/tmp/"
what = "Results.txt"
def processpath(where, name):
for elem in os.listdir(where):
elempath = os.path.join(where,elem)
if (elem == name):
# Do something with your file
f = open(elempath, "w") # example
f.write("modified2") # example
elif(os.path.isdir(elempath)):
processpath(elempath, name)
processpath(where, what)
I would do this without chdir. The most straight forward solution to me is to use os.listdir and filter the results. Then os.path.join to construct complete relative paths instead of chdir. I suspect this would be less prone to bugs such as winding up in an unexpected current working directory where all your relative paths are then wrong.
nfolders = [d for d in os.listdir(".") if re.match("^Folder [0-9]+$", d)]
for f1 in nfolders:
noutputs = [d for d in os.listdir(f1) if re.match("^Output [0-9]+$", d)]
for f2 in noutputs:
resultsFilename = os.path.join(f1, f2, "results.txt")
#do whatever with resultsFilename

Python - empty dirs & subdirs after a shutil.copytree function

This is part of a program I'm writing. The goal is to extract all the GPX files, say at G:\ (specified with -e G:\ at the command line). It would create an 'Exports' folder and dump all files with matching extensions there, recursively that is. Works great, a friend helped me write it!! Problem: empty directories and subdirectories for dirs that did not contain GPX files.
import argparse, shutil, os
def ignore_list(path, files): # This ignore list is specified in the function below.
ret = []
for fname in files:
fullFileName = os.path.normpath(path) + os.sep + fname
if not os.path.isdir(fullFileName) \
and not fname.endswith('gpx'):
ret.append(fname)
elif os.path.isdir(fullFileName) \ # This isn't doing what it's supposed to.
and len(os.listdir(fullFileName)) == 0:
ret.append(fname)
return ret
def gpxextract(src,dest):
shutil.copytree(src,dest,ignore=ignore_list)
Later in the program we have the call for extractpath():
if args.extractpath:
path = args.extractpath
gpxextract(extractpath, 'Exports')
So the above extraction does work. But the len function call above is designed to prevent the creation of empty dirs and does not. I know the best way is to os.rmdir somehow after the export, and while there's no error, the folders remain.
So how can I successfully prune this Exports folder so that only dirs with GPXs will be in there? :)
If I understand you correctly, you want to delete empty folders? If that is the case, you can do a bottom up delete folder operation -- which will fail for any any folders that are not empty. Something like:
for root, dirs, files in os.walk('G:/', topdown=true):
for dn in dirs:
pth = os.path.join(root, dn)
try:
os.rmdir(pth)
except OSError:
pass

Categories

Resources