Essentially, I'm wondering if the top answer given to this question can be implemented in Python. I am reviewing the modules os, os.path, and shutil and I haven't yet been able to find an easy equivalent, though I assume I'm just missing something simple.
More specifically, say I have a directory A, and inside directory A is any other directory. I can call os.walk('path/to/A') and check if dirnames is empty, but I don't want to make the program go through the entire tree rooted at A; i.e. what I'm looking for should stop and return true as soon as it finds a subdirectory.
For clarity, on a directory containing files but no directories an acceptable solution will return False.
maybe you want
def folders_in(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent,fname)):
yield os.path.join(path_to_parent,fname)
print(list(folders_in("/path/to/parent")))
this will return a list of all subdirectories ... if its empty then there are no subdirectories
or in one line
set([os.path.dirname(p) for p in glob.glob("/path/to/parent/*/*")])
although for a subdirectory to be counted with this method it must have some file in it
or manipulating walk
def subfolders(path_to_parent):
try:
return next(os.walk(path_to_parent))[1]
except StopIteration:
return []
I would just do as follows:
#for example
dir_of_interest = "/tmp/a/b/c"
print(dir_of_interest in (v[0] for v in os.walk("/tmp/")))
This prints True or False, depending if dir_of_interest is in the generator. And you use here generator, so the directories to check are generated one by one.
You can break from the walk anytime you want. For example, this brakes is a current folder being walked, has no subdirectories:
for root, dirs, files in os.walk("/tmp/"):
print(root,len(dirs))
if not len(dirs): break
Maybe this is in line with what you are after.
Try this:
#!/usr/local/cpython-3.4/bin/python
import glob
import os
top_of_hierarchy = '/tmp/'
#top_of_hierarchy = '/tmp/orbit-dstromberg'
pattern = os.path.join(top_of_hierarchy, '*')
for candidate in glob.glob(pattern):
if os.path.isdir(candidate):
print("{0} is a directory".format(candidate))
break
else:
print('No directories found')
# Tested on 2.6, 2.7 and 3.4
I apparently can't comment yet; however, I wanted to update part of the answer https://stackoverflow.com/users/541038/joran-beasley gave, or at least what worked for me.
Using python3 (3.7.3), I had to modify his first code snippet as follows:
import os
def has_folders(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent, fname)):
yield os.path.join(path_to_parent, fname)
print(list(has_folders("/repo/output")))
Further progress on narrowing to "does given directory contain any directory" results in code like:
import os
def folders_in(path_to_parent):
for fname in os.listdir(path_to_parent):
if os.path.isdir(os.path.join(path_to_parent, fname)):
yield os.path.join(path_to_parent, fname)
def has_folders(path_to_parent):
folders = list(folders_in(path_to_parent))
return len(folders) != 0
print(has_folders("the/path/to/parent"))
The result of this code should be True or False
Related
I need to 1) Find a zipfile at a particular directory location 2) If it exists then unzip it 3) Out of its contents find a specific file and move it to other directory.
def searchfile():
for file in os.listdir('/user/adam/datafiles'):
if fnmatch.fnmatch(file, 'abc.zip'):
return True
return False
if searchfile():
print('File exists')
else:
print('File not found')
def file_extract():
os.chdir('/user/adam/datafiles')
file_name = 'abc.zip'
destn = '/user/adam/extracted_files'
zip_archive = ZipFile (file_name)
zip_archive.extract('class.xlsx',destn)
print("Extracted the file")
zip_archive.close()
file_extract()
When I execute the above script, it shows no compile time issues or runtime issues,. but it just works for the first function. When I check for the files in the extracte_files folder I don't see the files.
So for the sake of completeness and as my comment solved your issue, I guess I should make it an Answer:
In Python, if a function foo is defined (def foo(<...>):),
foo refers to the function itself and can be copied around (effectively copying a pointer), passed to other functions, ... as about any object;
foo() is a call without passed argument to that function.
As this question does not seem to be an assignment, I will add the following:
To improve your code, you might want to look into:
Parameters to functions (you functions are currently only doing one single thing. For example, you could pass the file and directory names to searchfile);
os.path and all its content;
The in test for checking if an object is in a container;
The with statement for clearer and safer handling of objects such as ZipFile instances;
The x if b else y construst;
Notice that even if the archive does not exist, your code still attempts to extract a file from it.
Here is a more robust way to implement what you want:
import os
import zipfile
arch_name, file_name = 'abc.zip', 'class.xlsx'
home_dir = os.path.join(os.path.abspath(os.sep), 'user', 'adam')
# or even better: home_dir = os.path.expanduser('~')
arch_dir = os.path.join(home_dir, 'datafiles')
dest_dir = os.path.join(home_dir, 'extracted_files')
arch_path = os.path.join(arch_dir, arch_name)
if os.path.isfile(arch_path):
print('File {} exists'.format(arch_path))
with zipfile.ZipFile(arch_path) as archive:
archive.extract(file_name, dest_dir)
print('Extracted {} from {}'.format(file_name, arch_name))
else:
print('File {} not found'.format(arch_path))
Disclaimer: This code is untested and could contain minor mistakes!
Notice how the second half of the code works with generic variables that can easily be modified in a single place in the first half. Also, notice the improved readability of if os.path.isfile(arch_path): as opposed to if searchfile(): (requiring us to then read the implementation of searchfile).
There is a good chapter that will help with this in Automate the Boring Stuff
https://automatetheboringstuff.com/chapter9/
Let's say on my filesystem the following directory exists:
/foo/bar/
In my python code I have the following path:
/foo/bar/baz/quix/
How can I tell that only the /foo/bar/ part of the path exists?
I can walk the path recursively and check it step by step, but is there an easier way?
No easy function in the standard lib but not really a difficult one to make yourself.
Here's a function that takes a path and returns only the path that does exist.
In [129]: def exists(path):
...: if os.path.exists(path): return path
...: return exists(os.path.split(path)[0])
...:
In [130]: exists("/home/sevanteri/src/mutta/oisko/siellä/jotain/mitä/ei ole/")
Out[130]: '/home/sevanteri/src'
I think a simple while loop with os.path.dirname() will suffice the requirement
path_string = '/home/moin/Desktop/my/dummy/path'
while path_string:
if not os.path.exists(path_string):
path_string = os.path.dirname(path_string)
else:
break
# path_string = '/home/moin/Desktop' # which is valid path in my system
I don't actually get your requirements as whether you want every path to be checked or upto some specific level.But for simple sanity checks you can just iterate through the full path create the paths and check the sanity.
for i in filter(lambda s: s, sample_path.split('/')):
_path = os.path.join(_path, i)
if os.path.exists(_path):
print "correct path"
Well, I think the only way is to work recursively... Though, I would work up the directory tree. The code isn't too hard to implement:
import os
def doesItExist(directory):
if not os.path.exists(directory):
doesItExist(os.path.dirname(directory)
else:
print "Found: " + directory
return directory
I have a simple directory structure:
rootdir\
subdir1\
file1.tif
subdir2\
file2.tif
...
subdir13\
file13.tif
subdir14\
file14.tif
If I call:
import os
print os.listdir('absolute\path\to\rootdir')
...then I get what you'd expect:
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
Same thing happens if I call os.listdir() on those sub-directories. For each one it returns the name of the file in that directory. No problems there.
And if I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
...then I get what you'd expect:
[]
['subdir1', 'subdir2', ... 'subdir13', 'subdir14']
['file1.tif']
[]
['file2.tif']
[]
...
But here's the strangeness. When I call:
import os
for dirpath, dirnames, filenames in os.walk('absolute\path\to\rootdir'):
print filenames
print dirnames
print dirpath
...it never returns, ever. Even if I try:
print [each[0] for each in os.walk('absolute\path\to\roodir')]
...or anything of the sort. I can always print the second and third parts of the tuple returned by os.walk(), but the moment that I try to touch the first part the whole thing just stops.
Even stranger, this behavior only appears in scripts launched using the shell. The command line interpreter acts normally. I'm curious, what's going on here?
-----EDIT-----
Actual code:
ALLOWED_IMGFORMATS = [".jpg",".tif"]
def getCategorizedFiles(pathname):
cats = [each[0] for each in os.walk(pathname) if not each[0] == pathname]
ncats = len(cats)
tree = [[] for i in range(ncats+1)]
for cat in cats:
catnum = int(os.path.basename(cat))
for item in os.listdir(cat):
if not item.endswith('.sift') and os.path.splitext(item)[-1].lower() in ALLOWED_IMGFORMATS:
tree[catnum].append(cat + '\\' + item)
fileDict = {cat : tree[cat] for cat in range(1,ncats+1)}
return fileDict
----EDIT 2----
Another development. As stated above, this problem exists when the code is in scripts launched from the shell. But not any shell. The problem exists with Console 2, but not the Windows command prompt. It also exists when the script is launched from java (how I originally came across the problem) like so: http://www.programmersheaven.com/mb/python/415726/415726/invoking-python-script-from-java/?S=B20000
I've never really trusted os.walk(). Just write your own recursive stuff. It's not hard:
def contents(folder, l): # Recursive, returns list of all files with full paths
directContents = os.listdir(folder)
for item in directContents:
if os.path.isfile(os.path.join(folder, item)):
l.append(os.path.join(folder, item))
else:contents(os.path.join(folder, item), l)
return l
contents = contents(folder, [])
contents will then be a list of all the files with full paths included. You can use os.split() if you like to make it a little easier to read.
Knowing how this works eliminates the uncertainty of using os.walk() in your code, which means you'll be able to identify if the problem in your code is really involved with os.walk().
If you need to put them in a dictionary (because dictionaries have aliasing benefits, too), you can also sort your files that way.
I'm attempting to loop through a directory and any nested directories within. It seemed like recursion would be a good way to go about it.
I ended up with this code:
def get_file_list(directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
print i
This prints everything beautifully -- exactly the output I expected. However, I wanted to take this list of files and pass it to another function for further processing. So I tried compiling everything into a list.
def get_file_list(directory=os.getcwd()):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(i)
continue
files.append(i)
return files
So now, the problem is that it only returns the files from the current working directory. After some thinking, I guess this is a scoping issue. A new files variable is being created in a unique piece of memory each time get_file_list() is called, right? So how do you get around something like this? How do you assemble the results from nested calls?
all_files =[]
for current_dir,files,directories in os.walk("C:\\"):
current_files = [os.path.join(current_dir,file) for file in files]
all_files.extend(current_files)
print all files
I would think would work better
Use extend:
def get_file_list(directory='.'):
files = []
for i in os.listdir(directory):
if os.path.isdir(i):
files.extend(get_file_list(i))
else:
files.append(i)
return files
Also, I changed your os.getcwd() call to just . since you probably want it to default to the current current working directory, not the working directory at the point at which the function was defined.
Use generators! They're very powerful and make things easy to read. Here are some references.
Basically, you use "yield" to return values instead of "return". When the function encounters a "yield" statement, it returns the value and pauses the execution of the function, meaning when the function is called again later, it picks up where it left off!
And to top it off, you can tell python to iterate over generator functions using "for x in my_generator_function()". Very handy.
import os
#this is a "generator function"
def get_files(directory='.'):
for item in os.listdir(directory):
item = os.path.join(directory, item)
if os.path.isdir(item):
for subitem in get_files(item):
yield subitem
# The fact that there's a "yield" statement here
# tells python that this is a generator function
else:
yield item
for item in get_files():
print item # Do something besides printing here, obviously ;)
A common way to do this recursively in the spirit of your original question is to pass in the list you are appending to as a parameter. Pass the empty list to the very first call to the function. A recursive "helper" (often implemented as a nested function) can accumulate the files.
EDIT:
Here is a complete script (fixed from a previous version):
import os
def get_file_list(directory=os.getcwd()):
def file_list(directory, files):
for i in os.listdir(directory):
if os.path.isdir(i):
file_list(i, files)
continue
files.append(i)
return files
return file_list(directory, [])
print get_file_list()
import os
def get_file_list(files,directory=os.getcwd()):
for i in os.listdir(directory):
if os.path.isdir(i):
get_file_list(files,i) #note me needed to amend this call to pass the reference down the calls
continue
files.append(i) #insert the file name into our referenced list.
myfiles = [] #the list we want to insert all the file names into
get_file_list(myfiles) #call the function and pass a reference to myfiles in
print('\n'.join(myfiles))
I am trying to do something to all the files under a given path. I don't want to collect all the file names beforehand then do something with them, so I tried this:
import os
import stat
def explore(p):
s = ''
list = os.listdir(p)
for a in list:
path = p + '/' + a
stat_info = os.lstat(path )
if stat.S_ISDIR(stat_info.st_mode):
explore(path)
else:
yield path
if __name__ == "__main__":
for x in explore('.'):
print '-->', x
But this code skips over directories when it hits them, instead of yielding their contents. What am I doing wrong?
Iterators do not work recursively like that. You have to re-yield each result, by replacing
explore(path)
with something like
for value in explore(path):
yield value
Python 3.3 added the syntax yield from X, as proposed in PEP 380, to serve this purpose. With it you can do this instead:
yield from explore(path)
If you're using generators as coroutines, this syntax also supports the use of generator.send() to pass values back into the recursively-invoked generators. The simple for loop above would not.
The problem is this line of code:
explore(path)
What does it do?
calls explore with the new path
explore runs, creating a generator
the generator is returned to the spot where explore(path) was executed . . .
and is discarded
Why is it discarded? It wasn't assigned to anything, it wasn't iterated over -- it was completely ignored.
If you want to do something with the results, well, you have to do something with them! ;)
The easiest way to fix your code is:
for name in explore(path):
yield name
When you are confident you understand what's going on, you'll probably want to use os.walk() instead.
Once you have migrated to Python 3.3 (assuming all works out as planned) you will be able to use the new yield from syntax and the easiest way to fix your code at that point will be:
yield from explore(path)
Use os.walk instead of reinventing the wheel.
In particular, following the examples in the library documentation, here is an untested attempt:
import os
from os.path import join
def hellothere(somepath):
for root, dirs, files in os.walk(somepath):
for curfile in files:
yield join(root, curfile)
# call and get full list of results:
allfiles = [ x for x in hellothere("...") ]
# iterate over results lazily:
for x in hellothere("..."):
print x
Change this:
explore(path)
To this:
for subpath in explore(path):
yield subpath
Or use os.walk, as phooji suggested (which is the better option).
That calls explore like a function. What you should do is iterate it like a generator:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
else:
yield path
EDIT: Instead of the stat module, you could use os.path.isdir(path).
Try this:
if stat.S_ISDIR(stat_info.st_mode):
for p in explore(path):
yield p
os.walk is great if you need to traverse all the folders and subfolders. If you don't need that, it's like using an elephant gun to kill a fly.
However, for this specific case, os.walk could be a better approach.
You can also implement the recursion using a stack.
There is not really any advantage in doing this though, other than the fact that it is possible. If you are using python in the first place, the performance gains are probably not worthwhile.
import os
import stat
def explore(p):
'''
perform a depth first search and yield the path elements in dfs order
-implement the recursion using a stack because a python can't yield within a nested function call
'''
list_t=type(list())
st=[[p,0]]
while len(st)>0:
x=st[-1][0]
print x
i=st[-1][1]
if type(x)==list_t:
if i>=len(x):
st.pop(-1)
else:
st[-1][1]+=1
st.append([x[i],0])
else:
st.pop(-1)
stat_info = os.lstat(x)
if stat.S_ISDIR(stat_info.st_mode):
st.append([['%s/%s'%(x,a) for a in os.listdir(x)],0])
else:
yield x
print list(explore('.'))
To answer the original question as asked, the key is that the yield statement needs to be propagated back out of the recursion (just like, say, return). Here is a working reimplementation of os.walk(). I'm using this in a pseudo-VFS implementation, where I additionally replace os.listdir() and similar calls.
import os, os.path
def walk (top, topdown=False):
items = ([], [])
for name in os.listdir(top):
isdir = os.path.isdir(os.path.join(top, name))
items[isdir].append(name)
result = (top, items[True], items[False])
if topdown:
yield result
for folder in items[True]:
for item in walk(os.path.join(top, folder), topdown=topdown):
yield item
if not topdown:
yield result