program to traverse directory structure in python

program to traverse directory structure in python - python

I need to create a program in which I have been given a directory path, in that directory there can be n number of tree type directory structure, and in any directory there can be any number of .py File. Some files were executed are some dnt. So I need to create a script in python that only run those files which are not executed till now. Can someone please suggest the way.

Please take a look at os.walk().
import os
directory = '/tmp'
for (dirpath, dirnames, filenames) in os.walk(directory):
# Do something with dirpath, dirnames, and filenames.
pass

The usual approach is to use os.walk and to compose complete paths using os.path.join:
import os
import os.path
def find_all_files(directory):
for root, _, filenames in os.walk(directory):
for filename in filenames:
fullpath = os.path.join(root, filename)
yield fullpath
if __name__ == '__main__':
for fullpath in find_all_files('/tmp'):
print(fullpath)
In my experience, dirnames return value of os.walk is rarely used, so I omitted it with _.
As for your question about files being executed or not -- I don't get it. Please explain.

Related

Python, how to get the actual file location

Using python 3: How do I get the path of my files?
Please check the code below:
path =r'\\Desktop'
os.chdir(path)
for root, dirs, files in os.walk(path):
for txt_file in files:
if txt_file.endswith(".txt"):
txt_fileSh=txt_file.rstrip(".txt")
path_txt_file=os.path.abspath(txt_file)
print(path_txt_file)
What I get is
\\Desktop\a.txt
\\Desktop\b.txt
...
What I should get:
\\Desktop\Fig1\a.txt
\\Desktop\Fig2\b.txt
Help is very much appreciated.

You don't need to change directory. You can build the full path based on the root of the walk as follows:
path = r'\Desktop'
import os
for root, _, files in os.walk(path):
for file in files:
if file.endswith('.txt'):
print(os.path.join(root, file))

The documentation for abspath states that:
os.path.abspath(path)
Return a normalized absolutized version of the pathname path. On most
platforms, this is equivalent to calling the function normpath() as follows:
normpath(join(os.getcwd(), path)).
So, you are actually just joining your path with the filename.
You want to use the path to the current directory instead, which is the first of the three items returned by os.walk, the one you named root.
So, just replace
path_txt_file=os.path.abspath(txt_file)
with
path_txt_file = os.path.join(root, txt_file)

How to recursively copy all files with a certain extension in a directory using Python?

I'm trying to write a Python function that copies all .bmp files from a directory and its sub-directories into a specified destination directory.
I've tried using os.walk but it only reaches into the first sub-directory and then stops. Here's what I have so far:
def copy(src, dest):
for root, dirs, files in os.walk(src):
for file in files:
if file[-4:].lower() == '.bmp':
shutil.copy(os.path.join(root, file), os.path.join(dest, file))
What do I need to change so it copies every .bmp file from every sub-directory?
EDIT: This code actually does work, there were just fewer bitmap files in the source directory than anticipated. However, for the program I am writing, I prefer the method using glob shown below.

If I understand correctly, you want glob with recursive=True, which, with the ** specifier, will recursively traverse directories and find all files satisfying a format specifier:
import glob
import os
import shutil
def copy(src, dest):
for file_path in glob.glob(os.path.join(src, '**', '*.bmp'), recursive=True):
new_path = os.path.join(dest, os.path.basename(file_path))
shutil.copy(file_path, new_path)

Why python is not following my sequence of print statement calls

I am learning python and just tried out the os.walk() function.I am using python 3.4.4 (64-bit) on windows platform.According to my understanding python should execute my statements line by line.
In this code i have iterated over a directory.The directory structure is
I need to print all the files first followed by the directory names.The code written is:
import os
dir_path = r"D:\\python_os_walk_check"
for root,dirs,files in os.walk(dir_path):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
The ouput printed is:
first_folder
second_folder
test1.txt
test2.txt
According to me the output should be:
test1.txt
test2.txt
first_folder
second_folder
Where am i going wrong?

You first get the contents of dir_path, which is only the two directories. The files are inside the directories, so you get to them later, in a second and third iteration of your loop. Add a print(root) as the first thing inside the loop and you'll see more clearly what's happening.

You could store the names of files/directories and print them once your for loop completes.
import os
dir_path = "D:\\python_os_walk_check"
file_list = []
dir_list = []
for root,dirs,files in os.walk(dir_path):
for file_name in files:
file_list.append(file_name)
for dir_name in dirs:
dir_list.append(dir_name)
for x in file_list:
print(x)
for i in dir_list:
print(i)

You might get a bit more like what you seem to want if you use the topdown parameter in os.walk like so:
import os
dir_path = r"C:\Python34\Tools\pynche"
for root,dirs,files in os.walk(dir_path, topdown=False):
for file_name in files:
print(file_name)
for dir_name in dirs:
print(dir_name)
Unfortunately I think this is only good for two-level folders.
As a more up-to-date solution you could consider the abilities offered in the pathlib module, Python 3 doc 11.1.

python directory recursive traversal program

my program does not believe that folders are directory, assuming theyre files, and because of this, the recursion prints the folders as files, then since there are no folders waiting to be traversed through, the program finishes.
import os
import sys
class DRT:
def dirTrav(self, dir, buff):
newdir = []
for file in os.listdir(dir):
print(file)
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
for f in newdir:
print("dir: " + f)
self.dirTrav(f, "")
dr = DRT()
dr.dirTrav(".", "")

See os.walk from there:
This example displays the number of bytes taken by non-directory files in each directory under the starting directory, except that it doesn’t look under any CVS subdirectory:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print root, "consumes",
print sum(getsize(join(root, name)) for name in files),
print "bytes in", len(files), "non-directory files"
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories

The problem is that you're not checking the right thing. file is just the filename, not the pathname. That's why you need os.path.join(dir, file), on the next line, right? So you need it in the isdir call, too. But you're just passing file.
So, instead of asking "is .foo/bar/baz a directory?" you're just asking "is baz a directory?" It interprets just baz as ./baz, as you'd expect. And, since there (probably) is no "./baz", you get back False.
So, change this:
if(os.path.isdir(file)):
newdir.append(os.path.join(dir, file))
to:
path = os.path.join(dir, file)
if os.path.isdir(path):
newdir.append(path)
All that being said, using os.walk as sotapme suggested is simpler than trying to build it yourself.

os.walk to crawl through folder structure

I have some code that looks at a single folder and pulls out files.
but now the folder structure has changed and i need to trawl throught the folders looking for files that match.
what the old code looks like
GSB_FOLDER = r'D:\Games\Gratuitous Space Battles Beta'
def get_module_data():
module_folder = os.path.join(GSB_FOLDER, 'data', 'modules')
filenames = [os.path.join(module_folder, f) for f in
os.listdir(module_folder)]
data = [parse_file(f) for f in filenames]
return data
But now the folder structure has changed to be like this
GSB_FOLDER\data\modules
\folder1\data\modules
\folder2\data\modules
\folder3\data\modules
where folder1,2 or 3, could be any text string
how do i rewrite the code above to do this...
I have been told about os.walk but I'm just learning Python... so any help appreciated

Nothing much changes you just call os.walk and it will recursively go thru the directory and return files e.g.
for root, dirs, files in os.walk('/tmp'):
if os.path.basename(root) != 'modules':
continue
data = [parse_file(os.path.join(root,f)) for f in files]
Here I am checking files only in folders named 'modules' you can change that check to do something else, e.g. paths which have module somewhere root.find('/modules') >= 0

os.walk is a nice easy way to get the directory structure of everything inside a dir you pass it;
in your example, you could do something like this:
for dirpath, dirnames, filenames in os.walk("...GSB_FOLDER"):
#whatever you want to do with these folders
if "/data/modules/" in dirpath:
print dirpath, dirnames, filenames
try that out, should be fairly self explanatory how it works...

Created a function that kind of serves a general purpose of crawling through directory structure and returning files and/or paths that match pattern.
import os
import re
import sys
def directory_spider(input_dir, path_pattern="", file_pattern="", maxResults=500):
file_paths = []
if not os.path.exists(input_dir):
raise FileNotFoundError("Could not find path: %s"%(input_dir))
for dirpath, dirnames, filenames in os.walk(input_dir):
if re.search(path_pattern, dirpath):
file_list = [item for item in filenames if re.search(file_pattern,item)]
file_path_list = [os.path.join(dirpath, item) for item in file_list]
file_paths += file_path_list
if len(file_paths) > maxResults:
break
return file_paths[0:maxResults]
Example usages:
directory_spider('/path/to/find') --> Finds the top 500 files in the path if it exists
directory_spider('/path/to/find',path_pattern="",file_pattern=".py$", maxResults=10)

You can use os.walk like #Anurag has detailed or you can try my small pathfinder library:
data = [parse_file(f) for f in pathfinder.find(GSB_FOLDER), just_files=True]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

program to traverse directory structure in python - python

Please take a look at os.walk(). import os directory = '/tmp' for (dirpath, dirnames, filenames) in os.walk(directory): # Do something with dirpath, dirnames, and filenames. pass

Related

Python, how to get the actual file location

How to recursively copy all files with a certain extension in a directory using Python?

Why python is not following my sequence of print statement calls

python directory recursive traversal program

os.walk to crawl through folder structure

Categories

Resources