Skip a certain folder when using os.walk - python

Here is my code:
rootdir_path_without_slash = '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app'
rootdir_path_with_slash= '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app/'
dir_src = (rootdir_path_with_slash)
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
for file in files:
file_name=os.path.join(subdir, file)
if file_name.endswith('.html'):
print file_name
Here this code navigate all the sub directories from the given source directory for searching .html file.I need to skip if node modules folder found.Please help me.

You'll need to put an if condition on the root directory, to avoid traversing node_modules or any of its descendants. You'll want:
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
if 'node_modules' in subdir:
continue
... # rest of your code
Also, subdir here is a misnomer, the first argument os.walk returns is the root path.

Related

Python to go through multiple folders and process files inside them

I have multiple folders than contain about 5-10 files each. What I am trying to do is go to the next folder when finishing processing files from the previous folders and start working on the new files. I have this code:
for root, dirs, files in os.walk("Training Sets"): #Path that contains folders
for i in dirs: #if I don't have this, an error is shown in line 4 that path needs to be str and not list
for file in i: #indexing files inside the folders
path = os.path.join(i, files) #join path of the files
dataset = pd.read_csv(path, sep='\t', header = None) #reading the files
trainSet = dataset.values.tolist() #some more code
editedSet = dataset.values.tolist() #some more code
#rest of the code...
The problem is that it doesn't do anything. Not even printing if I add prints for debugging.
First off, be sure that you are in the correct top-level directory (i.e. the one containing "Training Sets". You can check this with os.path.abspath(os.curdir). Otherwise, the code does nothing since it does not find the directory to walk.
os.walk does the directory walking for you. The key is understanding root (the path to the current directory), dirs (a list of subdirectories) and files (a list of files in the current directory). You don't actually need dirs.
So your code is two loops:
>>> for root, dirs, files in os.walk("New Folder1"): #Path that contains folders
... for file in files: #indexing files inside the folders
... path = os.path.join(root, file) #join path of the files
... print(path) # Your code here
...
New Folder1\New folder1a\New Text Document.txt
New Folder1\New folder1b\New Text Document2.txt

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Select files from specific directories

I am trying to loop through a list of subdirectories, and perform two related operations:
Only select subdirectories that match a certain pattern, and save part of that name
Read a file in that subdirectory
I have tried adapting the answers in this question but am having trouble opening only certain subdirectories. I know I can do this recursively, where I loop through every file, and pull its parent directory using Path.parent, but this would also go into the directories I am not interested in.
My file structure looks like:
002normal
|- names.txt
|- test.txt
002custom
|- names.txt
|- test.txt
I would like only the directories ending in "normal". I'll then read the file named "names.txt" in that directory. I have tried something like the below, without luck.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
for f in files:
print(subdir)
You can modify the dirs list in-place to filter out any subdirectories with names not ending with 'normal' so that os.walk won't traverse into them:
for subdir, dirs, files in os.walk(root_dir):
dirs[:] = (name for name in dirs if name.endswith('normal'))
if 'names.txt' in files:
with open(os.path.join(subdir, 'names.txt')) as file:
print(os.path.basename(subdir), file.read())
Excerpt from the documentation of os.walk:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
if str(subdir).endswith("normal"):
for file in files:
if str(file).startswith("names"):
print(os.path.basename(subdir), file)
f = open(os.path.join(root_dir,subdir,file), "r")
print(f.read())
That's how you can do it with your file structure. First you check if any subdir ends with "normal" and if it does you can check the content in the file. Also you have to build the path to the file so that you can read the file with os.path.join
In case you have multiple subdirectories of unknown depth you have to do something with while, but as long as the directory which contains names.txt ends with normal it works.

Subfolder traversing us os.walk in python

I am trying to make a python code that traverse all subfolders in a parent folder and moves the subfolder contents to the parent folder.I used os.walk function in python but it keeps selecting the files in the parent folder as well.Is there a way to solve this..
for current_folder,subfolders_in_cwd,files_in_cwd in os.walk(some_dir_path):
print "Folders in %s = %s"%(current_folder,subfolders_in_cwd)
import os
PATH = '/home/pi/Public/'
for root, folder, files in os.walk(PATH):
files.sort(key=str.lower) #Sorts files alphabetically
subFolder.sort()
for item in files:
fileNamePath = os.path.join(root, item)
print(fileNamePath)

find files if directory name exists within directory in python

I am trying to read some file which is inside some specific directory like.
I am trying to use python2.7+ version. Here is my requirements like:
Find directory starts with B1234 (here 1234 is nr) inside output folder
if directory exists then goto directories starting with only TEST_
read only file endswith YYYY.txt
but that can reside inside subdirectory (name is not important here)
I am trying to make following code working but in vain
for root, dirs, files in os.walk('output'):
for file in files:
if file.startswith('test') and file.endswith('_YYYY.txt'):
try:
f = open(os.path.abspath(os.path.join(root,file)) , 'r')
except:
print 'oops'
Problem is here I can find all desired files but also from unwanted directories..
i would like to use like that
for dir in dirList:
if dir startswith(B1234)
for testdir in os.listdir(dir)
if dir.startswiht(TEST_)
goto dir
search for file
Any help will be appreciable..please askif you need more info:
I think you need to make the change here :
for root, dirs, files in os.walk('output'):
for dir in dirs:
if dir.startswith('B1234') is False:
continue
for r, drs, fls in os.walk(dir):
# now write your code here

Categories

Resources