I am learning python and just tried out the os.walk() function.I am using python 3.4.4 (64-bit) on windows platform.According to my understanding python should execute my statements line by line.
In this code i have iterated over a directory.The directory structure is
I need to print all the files first followed by the directory names.The code written is:
import os
dir_path = r"D:\\python_os_walk_check"
for root,dirs,files in os.walk(dir_path):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
The ouput printed is:
first_folder
second_folder
test1.txt
test2.txt
According to me the output should be:
test1.txt
test2.txt
first_folder
second_folder
Where am i going wrong?
You first get the contents of dir_path, which is only the two directories. The files are inside the directories, so you get to them later, in a second and third iteration of your loop. Add a print(root) as the first thing inside the loop and you'll see more clearly what's happening.
You could store the names of files/directories and print them once your for loop completes.
import os
dir_path = "D:\\python_os_walk_check"
file_list = []
dir_list = []
for root,dirs,files in os.walk(dir_path):
for file_name in files:
file_list.append(file_name)
for dir_name in dirs:
dir_list.append(dir_name)
for x in file_list:
print(x)
for i in dir_list:
print(i)
You might get a bit more like what you seem to want if you use the topdown parameter in os.walk like so:
import os
dir_path = r"C:\Python34\Tools\pynche"
for root,dirs,files in os.walk(dir_path, topdown=False):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
Unfortunately I think this is only good for two-level folders.
As a more up-to-date solution you could consider the abilities offered in the pathlib module, Python 3 doc 11.1.
Related
Using python 3: How do I get the path of my files?
Please check the code below:
path =r'\\Desktop'
os.chdir(path)
for root, dirs, files in os.walk(path):
for txt_file in files:
if txt_file.endswith(".txt"):
txt_fileSh=txt_file.rstrip(".txt")
path_txt_file=os.path.abspath(txt_file)
print(path_txt_file)
What I get is
\\Desktop\a.txt
\\Desktop\b.txt
...
What I should get:
\\Desktop\Fig1\a.txt
\\Desktop\Fig2\b.txt
Help is very much appreciated.
You don't need to change directory. You can build the full path based on the root of the walk as follows:
path = r'\Desktop'
import os
for root, _, files in os.walk(path):
for file in files:
if file.endswith('.txt'):
print(os.path.join(root, file))
The documentation for abspath states that:
os.path.abspath(path)
Return a normalized absolutized version of the pathname path. On most
platforms, this is equivalent to calling the function normpath() as follows:
normpath(join(os.getcwd(), path)).
So, you are actually just joining your path with the filename.
You want to use the path to the current directory instead, which is the first of the three items returned by os.walk, the one you named root.
So, just replace
path_txt_file=os.path.abspath(txt_file)
with
path_txt_file = os.path.join(root, txt_file)
I need to create a program in which I have been given a directory path, in that directory there can be n number of tree type directory structure, and in any directory there can be any number of .py File. Some files were executed are some dnt. So I need to create a script in python that only run those files which are not executed till now. Can someone please suggest the way.
Please take a look at os.walk().
import os
directory = '/tmp'
for (dirpath, dirnames, filenames) in os.walk(directory):
# Do something with dirpath, dirnames, and filenames.
pass
The usual approach is to use os.walk and to compose complete paths using os.path.join:
import os
import os.path
def find_all_files(directory):
for root, _, filenames in os.walk(directory):
for filename in filenames:
fullpath = os.path.join(root, filename)
yield fullpath
if __name__ == '__main__':
for fullpath in find_all_files('/tmp'):
print(fullpath)
In my experience, dirnames return value of os.walk is rarely used, so I omitted it with _.
As for your question about files being executed or not -- I don't get it. Please explain.
I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.
I tried this:
import os
rootdir ='C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file,'r')
lines=f.readlines()
f.close()
f=open(file,'w')
for line in lines:
newline = "No you are not"
f.write(newline)
f.close()
but I am getting an error. What am I doing wrong?
The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(os.path.join(subdir, file))
If you still get errors when running the above, please provide the error message.
Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):
from pathlib import Path
rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]
# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]
Since Python 3.5, the glob module also supports recursive file finding:
import os
from glob import iglob
rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob(rootdir_glob, recursive=True) if os.path.isfile(f)]
The file_list from either of the above approaches can be iterated over without the need for a nested loop:
for f in file_list:
print(f) # Replace with desired operations
From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:
import glob, os
for filename in glob.iglob('/pardadox-music/**', recursive=True):
if os.path.isfile(filename): # filter dirs
print(filename)
Output:
/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...
Notes:
glob.iglob
glob.iglob(pathname, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
If recursive is True, the pattern '**' will match any files and
zero or more directories and subdirectories.
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif') ['card.gif']
>>> glob.glob('.c*')['.card.gif']
You can also use rglob(pattern),
which is the same as calling glob() with **/ added in front of the given relative pattern.
my program does not believe that folders are directory, assuming theyre files, and because of this, the recursion prints the folders as files, then since there are no folders waiting to be traversed through, the program finishes.
import os
import sys
class DRT:
def dirTrav(self, dir, buff):
newdir = []
for file in os.listdir(dir):
print(file)
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
for f in newdir:
print("dir: " + f)
self.dirTrav(f, "")
dr = DRT()
dr.dirTrav(".", "")
See os.walk from there:
This example displays the number of bytes taken by non-directory files in each directory under the starting directory, except that it doesn’t look under any CVS subdirectory:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
The problem is that you're not checking the right thing. file is just the filename, not the pathname. That's why you need os.path.join(dir, file), on the next line, right? So you need it in the isdir call, too. But you're just passing file.
So, instead of asking "is .foo/bar/baz a directory?" you're just asking "is baz a directory?" It interprets just baz as ./baz, as you'd expect. And, since there (probably) is no "./baz", you get back False.
So, change this:
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
to:
path = os.path.join(dir, file)
if os.path.isdir(path):
newdir.append(path)
All that being said, using os.walk as sotapme suggested is simpler than trying to build it yourself.
I have some code that looks at a single folder and pulls out files.
but now the folder structure has changed and i need to trawl throught the folders looking for files that match.
what the old code looks like
GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta'
def get_module_data():
module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')
filenames = [os.path.join(module_folder, f) for f in
os.listdir(module_folder)]
data = [parse_file(f) for f in filenames]
return data
But now the folder structure has changed to be like this
GSB_FOLDER\data\modules
\folder1\data\modules
\folder2\data\modules
\folder3\data\modules
where folder1,2 or 3, could be any text string
how do i rewrite the code above to do this...
I have been told about os.walk but I'm just learning Python... so any help appreciated
Nothing much changes you just call os.walk and it will recursively go thru the directory and return files e.g.
for root, dirs, files in os.walk('/tmp'):
if os.path.basename(root) != 'modules':
continue
data = [parse_file(os.path.join(root,f)) for f in files]
Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0
os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;
in your example, you could do something like this:
for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
#whatever you want to do with these folders
if "/data/modules/" in dirpath:
print dirpath, dirnames, filenames
try that out, should be fairly self explanatory how it works...
Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.
import os
import re
import sys
def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
file_paths = []
if not os.path.exists(input_dir):
raise FileNotFoundError("Could not find path: %s"%(input_dir))
for dirpath, dirnames, filenames in os.walk(input_dir):
if re.search(path_pattern, dirpath):
file_list = [item for item in filenames if re.search(file_pattern,item)]
file_path_list = [os.path.join(dirpath, item) for item in file_list]
file_paths += file_path_list
if len(file_paths) > maxResults:
break
return file_paths[0:maxResults]
Example usages:
directory_spider('/path/to/find') --> Finds the top 500 files in the path if it exists
directory_spider('/path/to/find',path_pattern="",file_pattern=".py$", maxResults=10)
You can use os.walk like #Anurag has detailed or you can try my small pathfinder library:
data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]