Traversing a file-system directory structure - python

I am trying to do some work within some files in a directory. The basic structure of what I'm trying to work with is folder -> sub-folders -> files I need to access. data holds hundreds of subfolders, I am trying to access each one, find the file within them that ends in 'params', and for now just read the contents. My code is below:
import os
for sub_folder in os.scandir('data'):
os.chdir(sub_folder)
for file in os.scandir(sub_folder):
print(file.name)
if(file.name.endswith('params')):
with open(file.name, 'r') as f:
data = f.read()
I'm getting a FileNotFoundError, where it's telling me that the path 'data\\\run.0' doesn't exist. I have confirmed that 'run.0' is the first sub folder within data, so where I'm confused is how the path doesn't actually exist.
I know the error is happening when I attempt to change directories, so I'm suspecting the way that I am traversing the data folder is not a correct way of doing so. I understand that os.scandir gives a DirEntry object, which is what the variable sub_folder will be but is this not a valid input for the change directory function?

You can use os.walk, but I prefer use glob: See How to use Glob() function to find files recursively in Python?

Related

Modify all files in specified directory (including subfolders) and saving them in new directory while presevering folder structure (Python)

(I'm new to python so please excuse the probably trivial question. I tried my best looking for similar issues but suprisingly couldn't find someone with the same question.)
I'm trying to build a simple static site generator in Python. The script should take all .txt files in a specific directory (including subfolders), paste the content of each into a template .html file and then save all the newly generated .html files into a new directory while recreating the folder structure of the original directory.
So for I got the code which does the conversion itself for a single file but I'm unsure how to do it for multiple files in a directory.
with open('template/page.html', 'r') as template:
templatedata = template.read()
with open('content/content.txt', 'r') as content:
contentdata = content.read()
pagedata = templatedata.replace('!PlaceholderContent!', contentdata)
with open('www/content.html', 'w') as output:
output.write(pagedata)
To manipulate files and directories, you will need to import some system functionalites under the built-in module os.
import os
The functionalities under the os module include :
Listing the content of a directory :
path_to_template_dir = 'template/'
template_files = os.listdir(path_to_template_dir)
print(template_files)
# Outputs : ['page.html']
Creating a directory (If it does not already exist) :
path_to_output_dir = 'www/'
try :
os.mkdir(path_to_output_dir)
except FileExistsError as e:
print('Directory exists:', path_to_output_dir)
And since you know the names of the directories you want to use, and using these two functions, you now know the names of the files you want to use and generate, you can now concatenate the name of each file to the names of its directories to create the string str of the final file path, which you can then open() for reading and/or writing.
It's hard to give a perfect code example for your question since the logic of how you want to manipulate each of the template and content file is missing, but here is an example for writing a file inside the newly created directory :
path_to_output_file = path_to_output_dir + 'content.html'
with open(path_to_output_file, 'w') as output:
output.write('Content')
And an example for reading all the template files inside the template/ directory and then printing them to the screen.
for template_file in template_files:
path_to_template_file = path_to_template_dir + template_file
with open(path_to_template_file, 'r') as template:
print(template.read())
In the end, manipulating files is all about creating the path string you want to read from or write to, and then accessing it.
Anymore functionalities you might need (for example : checking if a path is a file os.path.isfile() or if it's for a directory os.path.isdir() can be found under the os module.

Different File Paths in Python ZipFile Depending on .write() vs .writestr()

I just wanted to ask quickly if the behavior I'm seeing in Python's zipfile module is expected... I wanted to put together a zip archive. For reasons I don't think I need to get into, I was adding some files using zipfile.writestr() and others using .write(). I was writing some files to zip subdirectory called /scripts and others to a zip subdirectory called /data.
For /data, I originally did this:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
zipFile.write(name, f'/data/{root_name}')
This worked fine and produced a working archive that I could extract. So far, so good. To write text files to the /script subdirectory, I used:
zipFile.writestr(f'/script/{scriptname}', fileBytes)
Again, so far so good.
Now it gets odd... I wanted to extract files in /data/. So I looked for paths in zipFile.namelist() starting with /data. My code kept missing the files in /data/, however. Doing some more digging, I noticed that the files written using .writestr had a slash at the start of the zipfile path like this: "/scripts/myscript.py". The files written using .write did not have a slash at the start of the path, so the data file paths looked like this: "data/mydata.pickle".
I changed my code to use .writestr() for the data files:
for root, _, filenames in os.walk(tmpdirname):
for root_name in filenames:
print(f"Handle zip of {root_name}")
name = os.path.join(root, root_name)
name = os.path.normpath(name)
with open(name, mode='rb') as extracted_file:
zipFile.writestr(f'/data/{root_name}', extracted_file.read())
Voila, the data files now have slashes at the start of the path. I'm not sure why, however, as I'm providing the same file path either way, and I wouldn't expect using one method versus another would change the paths.
Is this supposed to work this way? Am I missing something obvious here?

Check if there are .format files in a directory

I have been trying to figure out for a while how to check if there are .pkl files in a given directory. I checked the website and I could find ways to find if there are files in the directory and list them, but I just want to check if they are there.
In my directory are a total of 7 .pkl files, as soon as I create one, the others are created so to check if the seven of them exist, it will be enough to check if one exists. Therefore, I would like to check if there is any .pkl file.
This is working if I do:
os.path.exists('folder1/folder2/filename.pkl')
But I had to write one of my file names. I would like to do so without searching for a specific file. I also tried
os.path.exists('folder1/folder2/*.pkl'),
but it is not working neither as I don't have any file named *.pkl.
You can use the python module glob (https://docs.python.org/3/library/glob.html)
Specifically, glob.glob('folder1/folder2/*.pkl') will return a list of all .pkl files in folder2.
You can use :
for dir_path, dir_names, file_names in os.walk(search_dir):
# Go over all files and folders
for file_name in file_names:
if (file_name.endswith(".pkl")):
# do something like break after the first one you find
Note : This can be used if you want to search entire directory with sub directories also
In case you want to search only one directory , you can run the "for" on os.listdir(path)

Getting the Folder Path of the last location I right clicked in Python

I'm using Glob.Glob to search a folder, and the sub-folders there in for all the invoices I have. To simplify that I'm going to add the program to the context menu, and have it take the path as the first part of,
import glob
for filename in glob.glob(path + "/**/*.pdf", recursive=True):
print(filename)
I'll have it keep the list and send those files to a Printer, in a later version, but for now just writing the name is a good enough test.
So my question is twofold:
Is there anything fundamentally wrong with the way I'm writing this?
Can anyone point me in the direction of how to actually capture folder path and provide it as path-variable?
You should have a look at this question: Python script on selected file. It shows how to set up a "Sent To" command in the context menu. This command calls a python script an provides the file name sent via sys.argv[1]. I assume that also works for a directory.
I do not have Python3.5 so that I can set the flag recursive=True, so I prefer to provide you a solution which you can run on any Python version (known up to day).
The solution consists in using calling os.walk() to run explore the directories and the set build-in type.
it is better to use set instead of list as with this later one you'll need more code to check if the directory you want to add is not listed already.
So basically you can keep two sets: one for the names of files you want to print and the other one for the directories and their sub folders.
So you can adapat this solution to your class/method:
import os
path = '.' # Any path you want
exten = '.pdf'
directories_list = set()
files_list = set()
# Loop over direcotries
for dirpath, dirnames, files in os.walk(path):
for name in files:
# Check if extension matches
if name.lower().endswith(exten):
files_list.add(name)
directories_list.add(dirpath)
You can then loop over directories_list and files_list to print them out.

How to copy all files with a particular prefix to new directories with python?

Suppose I have the following directory structure:
C:\Test
C:\Test\2009
C:\Test\2009\files\Artists
C:\Test\2009\files\Artists\SnoopDog
C:\Test\2009\files\Artists\SnoopDog\albums.txt
C:\Test\2009\files\Artists\SnoopDog\albums.jpg
C:\Test\2009\files\Artists\SnoopDog\hobbies.doc
C:\Test\2009\files\Artists\SmashMouth\albums.txt
C:\Test\2009\files\Artists\SmashMouth\hobbies.doc
C:\Test\2010\files\Artists\SnoopDog\albums.txt
C:\Test\2010\files\Artists\SnoopDog\albums.jpg
C:\Test\2010\files\Artists\SnoopDog\hobbies.doc
The following is the directory structure I want as a goal:
C:\ToDirectory\
C:\ToDirectory\2009\albums\SnoopDog_albums.txt
C:\ToDirectory\2009\albums\SnoopDog_albums.jpg
C:\ToDirectory\2009\albums\SmashMouth_albums.txt
C:\ToDirectory\2009\albums\SmashMouth_albums.jpg
C:\ToDirectory\2009\hobbies\SmashMouth_hobbies.doc
C:\ToDirectory\2009\hobbies\SnoopDog_hobbies.doc
C:\ToDirectory\2010\albums\SnoopDog_albums.txt
C:\ToDirectory\2010\albums\SnoopDog_albums.jpg
Assuming C:\Test contains all the files and C:\ToDirectory starts off as being an empty directory.
What is the most efficient way to have a function where I simply give its source directory C:\Test and a target directory ToDirectory and the script goes to the lowest level of C:\Test, goes through each file in the directory and checks whether the filename (ignoring extension) is a durectiry in the ToDirectory structure, if not, create it and copy the file into it with the parent directory appended at the beginning of its name with python?
I am using os.listdir and os.isdir in a series of nexted loops, but it appears to be very lengthy and though it does it job, appears inefficient...
try pythons high level file operations library ( import shutil )
Also consider using pythons excellent directory walking function ( from os import walk )
You shouldn't need to copy the files but just rename the directories and filenames.
You can achieve shorter code using recursion instead of nesting loops explicitly.
Recursion is not always more efficient but your code will look a lot better!.

Categories

Resources