Directory Walker for Python

Directory Walker for Python - python

I am currently using the directory walker from Here
import os
class DirectoryWalker:
# a forward iterator that traverses a directory tree
def __init__(self, directory):
self.stack = [directory]
self.files = []
self.index = 0
def __getitem__(self, index):
while 1:
try:
file = self.files[self.index]
self.index = self.index + 1
except IndexError:
# pop next directory from stack
self.directory = self.stack.pop()
self.files = os.listdir(self.directory)
self.index = 0
else:
# got a filename
fullname = os.path.join(self.directory, file)
if os.path.isdir(fullname) and not os.path.islink(fullname):
self.stack.append(fullname)
return fullname
for file in DirectoryWalker(os.path.abspath('.')):
print file
This minor change allows you to have the full path within the file.
Can anyone help me how to find just the filename as well using this? I need both the full path, and just the filename.

Why do you want to do such boring thing yourself?
for path, directories, files in os.walk('.'):
print 'ls %r' % path
for directory in directories:
print ' d%r' % directory
for filename in files:
print ' -%r' % filename
Output:
'.'
d'finction'
d'.hg'
-'setup.py'
-'.hgignore'
'./finction'
-'finction'
-'cdg.pyc'
-'util.pyc'
-'cdg.py'
-'util.py'
-'__init__.pyc'
-'__init__.py'
'./.hg'
d'store'
-'hgrc'
-'requires'
-'00changelog.i'
-'undo.branch'
-'dirstate'
-'undo.dirstate'
-'branch'
'./.hg/store'
d'data'
-'undo'
-'00changelog.i'
-'00manifest.i'
'./.hg/store/data'
d'finction'
-'.hgignore.i'
-'setup.py.i'
'./.hg/store/data/finction'
-'util.py.i'
-'cdg.py.i'
-'finction.i'
-'____init____.py.i'
But if you insist, there's path related tools in os.path, os.basename is what you are looking at.
>>> import os.path
>>> os.path.basename('/hello/world.h')
'world.h'

Rather than using '.' as your directory, refer to its absolute path:
for file in DirectoryWalker(os.path.abspath('.')):
print file
Also, I'd recommend using a word other than 'file', because it means something in the python language. Not a keyword, though so it still runs.
As an aside, when dealing with filenames, I find the os.path module to be incredibly useful - I'd recommend having a look through that, especially
os.path.normpath
Normalises paths (gets rid of redundant '.'s and 'theFolderYouWereJustIn/../'s)
os.path.join
Joins two paths

os.path.dirname()? os.path.normpath()? os.path.abspath()?
This would also be a lovely place to think recursion.

Just prepend the current directory path to the "./foo" path returned:
print os.path.join(os.getcwd(), file)

Related

Python open filename from custom PATH

Similar to the system path, I want to offer some convenience in my code allowing a user to specify a file name that could be in one of a handful of paths.
Say I had two or more config paths
['~/.foo-config/', '/usr/local/myapp/foo-config/']
And my user wants to open bar, (AKA bar.baz)
Is there a convenient build in way to let open('bar') or open('bar.baz') automatically search these paths for that file in LTR order of precedence? Eg, will temporary adjusting my sys.path to only be these directories do this for me?
Else, how would you suggest implementing a PATH-like searching open-wrapper?

As other people already mentioned: sys.path only affects the module search path, i.e. it's relevant for importing Python modules, but not at all for open().
I would suggest separating the logic for searching the paths in order of precedence and opening the file, because that way it's easier to test and read.
I would do something like this:
import os
PATHS = ['~/.foo-config/', '/usr/local/myapp/foo-config/']
def find_first(filename, paths):
for directory in paths:
full_path = os.path.join(directory, filename)
if os.path.isfile(full_path):
return full_path
def main():
filename = 'file.txt'
path = find_first(filename, PATHS)
if path:
with open(path) as f:
print f
else:
print "File {} not found in any of the directories".format(filename)
if __name__ == '__main__':
main()

open doesn't get into that kind of logic. If you want, write a wrapper function that uses os.path.join to join each member of sys.path to the parameter filename, and tries to open them in order, handling the error that occurs when no such file is found.
I'll add that, as another user stated, this is kind of a misuse of sys.path, but this function would work for any list of paths. Indeed, maybe the nicest option is to use the environment variables suggested by another user to specify a colon-delimited list of config directories, which you then parse and use within your search function.

environmental variables
say your app is named foo ... in the readme tell the user to use the FOO_PATH environmental variable to specify the extra paths
then inside your app do something like
for path in os.environ.get("FOO_PATH",".").split(";"):
lookfor(os.path.join(path,"somefile.txt"))
you could wrap it into a generic function
def open_foo(fname):
for path in os.environ.get("FOO_PATH",".").split(";"):
path_to_test = os.path.join(path,"somefile.txt")
if os.path.exists(path_to_test):
return open(path_to_test)
raise Exception("No File Found On FOOPATH")
then you could use it just like normal open
with open_foo("my_config.txt") as f:
print f.read()

Extract from Python Standard Library documentation for open built-in function:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
...file is either a string or bytes object giving the pathname (absolute or relative to the current working directory) of the file to be opened ...
Explicitely, open does not bring anything to automagically find a file : if path is not absolute, it is only searched in current directory.
So you will have to use a custom function or a custom class for that. For example:
class path_opener(object):
def __init__(self, path = [.]):
self.path = path
def set(self, path):
self.path = path
def append(self, path):
self.path.append(path)
def extent(self, path):
self.path.extend(path)
def find(self, file):
for folder in self.path:
path = os.path.join(folder, file)
if os.path.isfile(path):
return path
raise FileNotFoundError()
def open(self, file, *args, **kwargs):
return open(self.find(file), *args, **kwargs)
That means that a file opener will keep its own path, will be initialized by default with current path, will have methods to set, append to or extend its path, and will normaly raise a FileNotFoundError is a file is not found in any of the directories listed in its path.
Usage :
o = path_opener(['~/.foo-config/', '/usr/local/myapp/foo-config/'])
with o.open('foo') as fd:
...

Python zip a sub folder and not the entire folder path

I have a program to zip all the contents in a folder. I did not write this code but I found it somewhere online and I am using it. I intend to zip a folder for example say, C:/folder1/folder2/folder3/ . I want to zip folder3 and all its contents in a file say folder3.zip. With the below code, once i zip it, the contents of folder3.zip wil be folder1/folder2/folder3/and files. I do not want the entire path to be zipped and i only want the subfolder im interested to zip (folder3 in this case). I tried some os.chdir etc, but no luck.
def makeArchive(fileList, archive):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
try:
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
a.write(f)
a.close()
return True
except: return False
def dirEntries(dir_name, subdir, *args):
# Creates a list of all files in the folder
'''Return a list of file names found in directory 'dir_name'
If 'subdir' is True, recursively access subdirectories under 'dir_name'.
Additional arguments, if any, are file extensions to match filenames. Matched
file names are added to the list.
If there are no additional arguments, all files found in the directory are
added to the list.
Example usage: fileList = dirEntries(r'H:\TEMP', False, 'txt', 'py')
Only files with 'txt' and 'py' extensions will be added to the list.
Example usage: fileList = dirEntries(r'H:\TEMP', True)
All files and all the files in subdirectories under H:\TEMP will be added
to the list. '''
fileList = []
for file in os.listdir(dir_name):
dirfile = os.path.join(dir_name, file)
if os.path.isfile(dirfile):
if not args:
fileList.append(dirfile)
else:
if os.path.splitext(dirfile)[1][1:] in args:
fileList.append(dirfile)
# recursively access file names in subdirectories
elif os.path.isdir(dirfile) and subdir:
print "Accessing directory:", dirfile
fileList.extend(dirEntries(dirfile, subdir, *args))
return fileList
You can call this by makeArchive(dirEntries(folder, True), zipname).
Any ideas as to how to solve this problem? I am uing windows OS annd python 25, i know in python 2.7 there is shutil make_archive which helps but since i am working on 2.5 i need another solution :-/

You'll have to give an arcname argument to ZipFile.write() that uses a relative path. Do this by giving the root path to remove to makeArchive():
def makeArchive(fileList, archive, root):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
a.write(f, os.path.relpath(f, root))
a.close()
and call this with:
makeArchive(dirEntries(folder, True), zipname, folder)
I've removed the blanket try:, except:; there is no use for that here and only serves to hide problems you want to know about.
The os.path.relpath() function returns a path relative to root, effectively removing that root path from the archive entry.
On python 2.5, the relpath function is not available; for this specific usecase the following replacement would work:
def relpath(filename, root):
return filename[len(root):].lstrip(os.path.sep).lstrip(os.path.altsep)
and use:
a.write(f, relpath(f, root))
Note that the above relpath() function only works for your specific case where filepath is guaranteed to start with root; on Windows the general case for relpath() is a lot more complex. You really want to upgrade to Python 2.6 or newer if at all possible.

ZipFile.write has an optional argument arcname. Use this to remove parts of the path.
You could change your method to be:
def makeArchive(fileList, archive, path_prefix=None):
"""
'fileList' is a list of file names - full path each name
'archive' is the file name for the archive with a full path
"""
try:
a = zipfile.ZipFile(archive, 'w', zipfile.ZIP_DEFLATED)
for f in fileList:
print "archiving file %s" % (f)
if path_prefix is None:
a.write(f)
else:
a.write(f, f[len(path_prefix):] if f.startswith(path_prefix) else f)
a.close()
return True
except: return False
Martijn's approach using os.path is much more elegant, though.

Extract files from zip without keep the top-level folder with python zipfile

I'm using the current code to extract the files from a zip file while keeping the directory structure:
zip_file = zipfile.ZipFile('archive.zip', 'r')
zip_file.extractall('/dir/to/extract/files/')
zip_file.close()
Here is a structure for an example zip file:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
At the end I want this:
/dir/to/extract/file.jpg
/dir/to/extract/file1.jpg
/dir/to/extract/file2.jpg
But it should ignore only if the zip file has a top-level folder with all files inside it, so when I extract a zip with this structure:
/dir1/file.jpg
/dir1/file1.jpg
/dir1/file2.jpg
/dir2/file.txt
/file.mp3
It should stay like this:
/dir/to/extract/dir1/file.jpg
/dir/to/extract/dir1/file1.jpg
/dir/to/extract/dir1/file2.jpg
/dir/to/extract/dir2/file.txt
/dir/to/extract/file.mp3
Any ideas?

If I understand your question correctly, you want to strip any common prefix directories from the items in the zip before extracting them.
If so, then the following script should do what you want:
import sys, os
from zipfile import ZipFile
def get_members(zip):
parts = []
# get all the path prefixes
for name in zip.namelist():
# only check files (not directories)
if not name.endswith('/'):
# keep list of path elements (minus filename)
parts.append(name.split('/')[:-1])
# now find the common path prefix (if any)
prefix = os.path.commonprefix(parts)
if prefix:
# re-join the path elements
prefix = '/'.join(prefix) + '/'
# get the length of the common prefix
offset = len(prefix)
# now re-set the filenames
for zipinfo in zip.infolist():
name = zipinfo.filename
# only check files (not directories)
if len(name) > offset:
# remove the common prefix
zipinfo.filename = name[offset:]
yield zipinfo
args = sys.argv[1:]
if len(args):
zip = ZipFile(args[0])
path = args[1] if len(args) > 1 else '.'
zip.extractall(path, get_members(zip))

Read the entries returned by ZipFile.namelist() to see if they're in the same directory, and then open/read each entry and write it to a file opened with open().

This might be a problem with the zip archive itself. In a python prompt try this to see if the files are in the correct directories in the zip file itself.
import zipfile
zf = zipfile.ZipFile("my_file.zip",'r')
first_file = zf.filelist[0]
print file_list.filename
This should say something like "dir1"
repeat the steps above substituting and index of 1 into filelist like so first_file = zf.filelist[1] This time the output should look like 'dir1/file1.jpg' if this is not the case then the zip file does not contain directories and will be unzipped all to one single directory.

Based on the #ekhumoro's answer I come up with a simpler funciton to extract everything on the same level, it is not exactly what you are asking but I think can help someone.
def _basename_members(self, zip_file: ZipFile):
for zipinfo in zip_file.infolist():
zipinfo.filename = os.path.basename(zipinfo.filename)
yield zipinfo
from_zip="some.zip"
to_folder="some_destination/"
with ZipFile(file=from_zip, mode="r") as zip_file:
os.makedirs(to_folder, exist_ok=True)
zip_infos = self._basename_members(zip_file)
zip_file.extractall(path=to_folder, members=zip_infos)

Basically you need to do two things:
Identify the root directory in the zip.
Remove the root directory from the paths of other items in the zip.
The following should retain the overall structure of the zip while removing the root directory:
import typing, zipfile
def _is_root(info: zipfile.ZipInfo) -> bool:
if info.is_dir():
parts = info.filename.split("/")
# Handle directory names with and without trailing slashes.
if len(parts) == 1 or (len(parts) == 2 and parts[1] == ""):
return True
return False
def _members_without_root(archive: zipfile.ZipFile, root_filename: str) -> typing.Generator:
for info in archive.infolist():
parts = info.filename.split(root_filename)
if len(parts) > 1 and parts[1]:
# We join using the root filename, because there might be a subdirectory with the same name.
info.filename = root_filename.join(parts[1:])
yield info
with zipfile.ZipFile("archive.zip", mode="r") as archive:
# We will use the first directory with no more than one path segment as the root.
root = next(info for info in archive.infolist() if _is_root(info))
if root:
archive.extractall(path="/dir/to/extract/", members=_members_without_root(archive, root.filename))
else:
print("No root directory found in zip.")

(python) recursively remove capitalisation from directory structure?

uppercase letters - what's the point of them? all they give you is rsi.
i'd like to remove as much capitalisation as possible from my directory structure. how would i write a script to do this in python?
it should recursively parse a specified directory, identify the file/folder names with capital letters and rename them in lowercase.

os.walk is great for doing recursive stuff with the filesystem.
import os
def lowercase_rename( dir ):
# renames all subforders of dir, not including dir itself
def rename_all( root, items):
for name in items:
try:
os.rename( os.path.join(root, name),
os.path.join(root, name.lower()))
except OSError:
pass # can't rename it, so what
# starts from the bottom so paths further up remain valid after renaming
for root, dirs, files in os.walk( dir, topdown=False ):
rename_all( root, dirs )
rename_all( root, files)
The point of walking the tree upwards is that when you have a directory structure like '/A/B' you will have path '/A' during the recursion too. Now, if you start from the top, you'd rename /A to /a first, thus invalidating the /A/B path. On the other hand, when you start from the bottom and rename /A/B to /A/b first, it doesn't affect any other paths.
Actually you could use os.walk for top-down too, but that's (slightly) more complicated.

try the following script:
#!/usr/bin/python
'''
renames files or folders, changing all uppercase characters to lowercase.
directories will be parsed recursively.
usage: ./changecase.py file|directory
'''
import sys, os
def rename_recursive(srcpath):
srcpath = os.path.normpath(srcpath)
if os.path.isdir(srcpath):
# lower the case of this directory
newpath = name_to_lowercase(srcpath)
# recurse to the contents
for entry in os.listdir(newpath): #FIXME newpath
nextpath = os.path.join(newpath, entry)
rename_recursive(nextpath)
elif os.path.isfile(srcpath): # base case
name_to_lowercase(srcpath)
else: # error
print "bad arg: " + srcpath
sys.exit()
def name_to_lowercase(srcpath):
srcdir, srcname = os.path.split(srcpath)
newname = srcname.lower()
if newname == srcname:
return srcpath
newpath = os.path.join(srcdir, newname)
print "rename " + srcpath + " to " + newpath
os.rename(srcpath, newpath)
return newpath
arg = sys.argv[1]
arg = os.path.expanduser(arg)
rename_recursive(arg)

Find a file in python

I have a file that may be in a different place on each user's machine. Is there a way to implement a search for the file? A way that I can pass the file's name and the directory tree to search in?

os.walk is the answer, this will find the first match:
import os
def find(name, path):
for root, dirs, files in os.walk(path):
if name in files:
return os.path.join(root, name)
And this will find all matches:
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
if name in files:
result.append(os.path.join(root, name))
return result
And this will match a pattern:
import os, fnmatch
def find(pattern, path):
result = []
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result.append(os.path.join(root, name))
return result
find('*.txt', '/path/to/dir')

In Python 3.4 or newer you can use pathlib to do recursive globbing:
>>> import pathlib
>>> sorted(pathlib.Path('.').glob('**/*.py'))
[PosixPath('build/lib/pathlib.py'),
PosixPath('docs/conf.py'),
PosixPath('pathlib.py'),
PosixPath('setup.py'),
PosixPath('test_pathlib.py')]
Reference: https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob
In Python 3.5 or newer you can also do recursive globbing like this:
>>> import glob
>>> glob.glob('**/*.txt', recursive=True)
['2.txt', 'sub/3.txt']
Reference: https://docs.python.org/3/library/glob.html#glob.glob

I used a version of os.walk and on a larger directory got times around 3.5 sec. I tried two random solutions with no great improvement, then just did:
paths = [line[2:] for line in subprocess.check_output("find . -iname '*.txt'", shell=True).splitlines()]
While it's POSIX-only, I got 0.25 sec.
From this, I believe it's entirely possible to optimise whole searching a lot in a platform-independent way, but this is where I stopped the research.

If you are using Python on Ubuntu and you only want it to work on Ubuntu a substantially faster way is the use the terminal's locate program like this.
import subprocess
def find_files(file_name):
command = ['locate', file_name]
output = subprocess.Popen(command, stdout=subprocess.PIPE).communicate()[0]
output = output.decode()
search_results = output.split('\n')
return search_results
search_results is a list of the absolute file paths. This is 10,000's of times faster than the methods above and for one search I've done it was ~72,000 times faster.

If you are working with Python 2 you have a problem with infinite recursion on windows caused by self-referring symlinks.
This script will avoid following those. Note that this is windows-specific!
import os
from scandir import scandir
import ctypes
def is_sym_link(path):
# http://stackoverflow.com/a/35915819
FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(unicode(path)) & FILE_ATTRIBUTE_REPARSE_POINT)
def find(base, filenames):
hits = []
def find_in_dir_subdir(direc):
content = scandir(direc)
for entry in content:
if entry.name in filenames:
hits.append(os.path.join(direc, entry.name))
elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
try:
find_in_dir_subdir(os.path.join(direc, entry.name))
except UnicodeDecodeError:
print "Could not resolve " + os.path.join(direc, entry.name)
continue
if not os.path.exists(base):
return
else:
find_in_dir_subdir(base)
return hits
It returns a list with all paths that point to files in the filenames list.
Usage:
find("C:\\", ["file1.abc", "file2.abc", "file3.abc", "file4.abc", "file5.abc"])

Below we use a boolean "first" argument to switch between first match and all matches (a default which is equivalent to "find . -name file"):
import os
def find(root, file, first=False):
for d, subD, f in os.walk(root):
if file in f:
print("{0} : {1}".format(file, d))
if first == True:
break

The answer is very similar to existing ones, but slightly optimized.
So you can find any files or folders by pattern:
def iter_all(pattern, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if pattern.match(entry)
)
either by substring:
def iter_all(substring, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if substring in entry
)
or using a predicate:
def iter_all(predicate, path):
return (
os.path.join(root, entry)
for root, dirs, files in os.walk(path)
for entry in dirs + files
if predicate(entry)
)
to search only files or only folders - replace “dirs + files”, for example, with only “dirs” or only “files”, depending on what you need.
Regards.

SARose's answer worked for me until I updated from Ubuntu 20.04 LTS. The slight change I made to his code makes it work on the latest Ubuntu release.
import subprocess
def find_files(file_name):
command = ['locate'+ ' ' + file_name]
output = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).communicate()[0]
output = output.decode()
search_results = output.split('\n')
return search_results

#F.M.F's answers has a few problems in this version, so I made a few adjustments to make it work.
import os
from os import scandir
import ctypes
def is_sym_link(path):
# http://stackoverflow.com/a/35915819
FILE_ATTRIBUTE_REPARSE_POINT = 0x0400
return os.path.isdir(path) and (ctypes.windll.kernel32.GetFileAttributesW(str(path)) & FILE_ATTRIBUTE_REPARSE_POINT)
def find(base, filenames):
hits = []
def find_in_dir_subdir(direc):
content = scandir(direc)
for entry in content:
if entry.name in filenames:
hits.append(os.path.join(direc, entry.name))
elif entry.is_dir() and not is_sym_link(os.path.join(direc, entry.name)):
try:
find_in_dir_subdir(os.path.join(direc, entry.name))
except UnicodeDecodeError:
print("Could not resolve " + os.path.join(direc, entry.name))
continue
except PermissionError:
print("Skipped " + os.path.join(direc, entry.name) + ". I lacked permission to navigate")
continue
if not os.path.exists(base):
return
else:
find_in_dir_subdir(base)
return hits
unicode() was changed to str() in Python 3, so I made that adjustment (line 8)
I also added (in line 25) and exception to PermissionError. This way, the program won't stop if it finds a folder it can't access.
Finally, I would like to give a little warning. When running the program, even if you are looking for a single file/directory, make sure you pass it as a list. Otherwise, you will get a lot of answers that not necessarily match your search.
example of use:
find("C:\", ["Python", "Homework"])
or
find("C:\\", ["Homework"])
but, for example: find("C:\\", "Homework") will give un-wanted answers.
I would be lying if I said I know why this happens. Again, this is not my code and I just made the adjustments I needed to make it work. All credit should go to #F.M.F.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Directory Walker for Python - python

os.path.dirname()? os.path.normpath()? os.path.abspath()? This would also be a lovely place to think recursion.

Just prepend the current directory path to the "./foo" path returned: print os.path.join(os.getcwd(), file)

Related

Python open filename from custom PATH

Python zip a sub folder and not the entire folder path

Extract files from zip without keep the top-level folder with python zipfile

(python) recursively remove capitalisation from directory structure?

Find a file in python

Categories

Resources