Select files from specific directories - python

I am trying to loop through a list of subdirectories, and perform two related operations:
Only select subdirectories that match a certain pattern, and save part of that name
Read a file in that subdirectory
I have tried adapting the answers in this question but am having trouble opening only certain subdirectories. I know I can do this recursively, where I loop through every file, and pull its parent directory using Path.parent, but this would also go into the directories I am not interested in.
My file structure looks like:
002normal
|- names.txt
|- test.txt
002custom
|- names.txt
|- test.txt
I would like only the directories ending in "normal". I'll then read the file named "names.txt" in that directory. I have tried something like the below, without luck.
import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
for f in files:
print(subdir)

You can modify the dirs list in-place to filter out any subdirectories with names not ending with 'normal' so that os.walk won't traverse into them:
for subdir, dirs, files in os.walk(root_dir):
dirs[:] = (name for name in dirs if name.endswith('normal'))
if 'names.txt' in files:
with open(os.path.join(subdir, 'names.txt')) as file:
print(os.path.basename(subdir), file.read())
Excerpt from the documentation of os.walk:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again.

import os
root_dir = "/Users/adamg/IM-logs"
for subdir, dirs, files in os.walk(root_dir):
if str(subdir).endswith("normal"):
for file in files:
if str(file).startswith("names"):
print(os.path.basename(subdir), file)
f = open(os.path.join(root_dir,subdir,file), "r")
print(f.read())
That's how you can do it with your file structure. First you check if any subdir ends with "normal" and if it does you can check the content in the file. Also you have to build the path to the file so that you can read the file with os.path.join
In case you have multiple subdirectories of unknown depth you have to do something with while, but as long as the directory which contains names.txt ends with normal it works.

Related

Python: Finding files in directory but ignoring folders and their contents

So my program search_file.py is trying to look for .log files in the directory it is currently placed in. I used the following code to do so:
import os
# This is to get the directory that the program is currently running in
dir_path = os.path.dirname(os.path.realpath(__file__))
# for loop is meant to scan through the current directory the program is in
for root, dirs, files in os.walk(dir_path):
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
My current directory is as follows:
search_file.py
sample_1.log
sample_2.log
extra_file (this is a folder)
And within the extra_file folder we have:
extra_sample_1.log
extra_sample_2.log
Now, when the program runs and prints the files out it also takes into account the .log files in the extra_file folder. But I do not want this. I only want it to print out sample_1.log and sample_2.log. How would I approach this?
Try this:
import os
files = os.listdir()
for file in files:
if file.endswith('.log'):
print(file)
The problem in your code is os.walk traverses the whole directory tree and not just your current directory. os.listdir returns a list of all filenames in a directory with the default being your current directory which is what you are looking for.
os.walk documentation
os.listdir documentation
By default, os.walk does a root-first traversal of the tree, so you know the first emitted data is the good stuff. So, just ask for the first one. And since you don't really care about root or dirs, use _ as the "don't care" variable name
# get root files list.
_, _, files = next(os.walk(dir_path))
for file in files:
# Check if file ends with .log, if so print file name
if file.endswith('.log')
print(file)
Its also common to use glob:
from glob import glob
dir_path = os.path.dirname(os.path.realpath(__file__))
for file in glob(os.path.join(dir_path, "*.log")):
print(file)
This runs the risk that there is a directory that ends in ".log", so you could also add a testing using os.path.isfile(file).

Run the for loop for each file in directory using Python

I want to run for loop in python for each file in a directory. The directory names will be passed through a separate file (folderlist.txt).
Inside my main folder (/user/), new folders get added daily. So I want to run for loop for each file in the given folder. And don't want to run against folder which files have already been run through the loop. I'm thinking of maintaining folderlist.txt which will have folder names of only newly added folders each day which will be then passed to for loop.
For example under my main path (/user/) we see below folders :
(file present inside each folder are listed below folder name just to give the idea)
(day 1)
folder1
file1, file2, file3
folder2
file4, file5
folder3
file6
(day 2)
folder4
file7, file8, file9, file10
folder5
file11, file12
import os
with open('/user/folderlist.txt') as f:
for line in f:
line=line.strip("\n")
dir='/user/'+line
for files in os.walk (dir):
for file in files:
print(file)
# for filename in glob.glob(os.path.join (dir, '*.json')):
# print(filename)
I tried using os.walk and glob modules in the above code but looks like the loop is running more number of times than files in the folder. Please provide inputs.
Try changing os.walk(dir) for os.listdir(dir). This will give you a list of all the elements in the directory.
import os
with open('/user/folderlist.txt') as f:
for line in f:
line = line.strip("\n")
dir = '/user/' + line
for file in os.listdir(dir):
if file.endswith("fileExtension"):
print(file)
Hope it helps
*Help on function walk in module os:
walk(top, topdown=True, onerror=None, followlinks=False)
Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).*
Therefore the files in the second loop is iterating on dirpath(string), dirnames(list), filenames(list).
Using os.listdir(dir) gives a list of all the files and folders in the dir as list.

Finding duplicate folders and renaming them by prefixing parent folder name in python

I have a folder structure as shown below
There are several subfolders with duplicate name,all I wanted is when any duplicate subfolder name is encountered, it should be prefixed with parent folder name.
e.g.
DIR2>SUBDIR1 should be renamed as DIR2>DIR2_SUDIR1 , When the folder is renamed to DIR2_SUDIR1 , the file inside this folder should also have the same prefix as its parent folder.
eg. DIR2>SUBDIR1>subdirtst2.txt should now become DIR2>DIR2_SUDIR1>DIR2_subdirtst2.txt
What I have done till now ?
I simply have added all the folder name in a list , after this I am not able to figure out any elegant way to do this task.
import os
list_dir=[]
for root, dirs, files in os.walk(os.getcwd()):
for file in files:
if file.endswith(".txt"):
path_file = os.path.join(root)
print(path_file)
list_dir.append(path_file)
The following snippet should be able to achieve what you desire. I've written it in a way that clearly shows what is being done, so I'm sure there might be tweaks to make it more efficient or elegant.
import os
cwd = os.getcwd()
to_be_renamed = set()
for rootdir in next(os.walk(cwd))[1]:
if to_be_renamed == set():
to_be_renamed = set(next(os.walk(os.path.join(cwd, rootdir)))[1])
else:
to_be_renamed &= set(next(os.walk(os.path.join(cwd, rootdir)))[1])
for rootdir in next(os.walk(cwd))[1]:
subdirs = next(os.walk(os.path.join(cwd, rootdir)))[1]
for s in subdirs:
if s in to_be_renamed:
srcpath = os.path.join(cwd, rootdir, s)
dstpath = os.path.join(cwd, rootdir, rootdir+'_'+s)
# First rename files
for f in next(os.walk(srcpath))[2]:
os.rename(os.path.join(srcpath, f), os.path.join(srcpath, rootdir+'_'+f))
# Now rename dir
os.rename(srcpath, dstpath)
print('Renamed', s, 'and files')
Here, cwd stores the path to the dir that contains DIR1, DIR2 and DIR3. The first loop checks all immediate subdirectories of these 'root directories' and creates a set of duplicated subdirectory names by repeatedly taking their intersection (&).
Then it runs another loop, checks if the subdirectory is to be renamed and finally uses the os.rename function to rename it and all the files it contains.
os.walk() returns a 3-tuple with path to the directory, the directories in it, and the files in it, at each step. It 'walks' the tree in either a top-down or bottom-up manner, and doesn't stop at one iteration.
So, the built-in next() method is used to generate the first result (that of the current dir), after which either [1] or [2] is used to get directories and files respectively.
If you want to rename not just files, but all items in the subdirectories being renamed, then replace next(os.walk(srcpath))[2] with os.listdir(srcpath). This list contains both files and directories.
NOTE: The reason I'm computing the list of duplicated names first in a separate loop is so that the first occurrence is not left unchanged. Renaming in the same loop will miss that first one.

Skip a certain folder when using os.walk

Here is my code:
rootdir_path_without_slash = '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app'
rootdir_path_with_slash= '/home/winpc/Downloads/Prageeth/backups/Final/node-wen-app/'
dir_src = (rootdir_path_with_slash)
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
for file in files:
file_name=os.path.join(subdir, file)
if file_name.endswith('.html'):
print file_name
Here this code navigate all the sub directories from the given source directory for searching .html file.I need to skip if node modules folder found.Please help me.
You'll need to put an if condition on the root directory, to avoid traversing node_modules or any of its descendants. You'll want:
for subdir, dirs, files in os.walk(rootdir_path_without_slash):
if 'node_modules' in subdir:
continue
... # rest of your code
Also, subdir here is a misnomer, the first argument os.walk returns is the root path.

os.walk() returning '/' instead of actual folder name

I know there are a lot of questions related to this, but I can't seem to find an answer that helps me solve the problem.
I'm using os.walk() to loop through subfolders in my main folder, which contains both folders and files.
Main Folder
Pass Folder
files.txt
Fail Folder
files.txt
file.txt
file2.txt
So I'm using this code to create a new text file based on the subfolder names. However this returns folder/.txt, which means that dirs is returning '/' and files is returning ['file.txt', 'file2.txt'].
for root, dirs, files in os.walk(path):
for dirs in root:
new_txt = 'folder%s.txt' % (dirs)
How do fix it so that dirs returns ['Main Folder/Pass Folder', 'Main Folder/Fail Folder'] and files returns the files in each folder?
I used something similar to this in my code recently (which, if I recall correctly, I also found on SO). Mine went something like this:
for (dirpath, subdirs, filelist) in os.walk(folder):
# join directories in here
From the documentation:
dirpath is a string, the path to the directory. dirnames is a list of the names of the subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
I'm not sure os.walk() does quite what you expect. I would suggest joining the directories together using os.path.join() to get what you want.

Categories

Resources